#392: Scroll To Text

Visit on Github.

Opened Jul 2, 2019

こんにちはTAG!

I'm requesting a TAG review of:

Further details:

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

  • Links to major pieces of multi-stakeholder review or discussion of this specification:
  • Links to major unresolved issues or opposition with this specification:

You should also know that...

One of our major discussion points currently is how the targetText= indicator should be delimited in the URL fragment. See https://github.com/bokand/ScrollToTextFragment/issues/15. The latest idea here is that we could use a double-hash syntax (e.g. example.com#fragment##targetText=example) to avoid breaking websites that use the fragment for routing/state. The browser would parse the ##targetText= identifier and then remove it from the fragment.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our GitHub repo for each point of feedback
  • open a single issue in our GitHub repo for the entire review
  • leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

Discussions

2019-07-10

Minutes

Peter: haven't there been multiple things like this:

David: an IETF thing

Tess: and an epub thing (epub-cfi). bunch of prior art.

Peter: also annotations...

Hadley: do we have past TAG reviews we could link to?

Tess: at least in the case of the similar epub technology - it has a syntax for doing assertions... and an algorithm for figuring out what the likely correct thing is. some robustness... I guess assign me.

David: i guess I could be. i've wanted this for a long time

2019-07-31

Minutes

Alice: we were waiting for... (goes through comments and summarizes them)

David: the two hash thing may actually be interesting architecturally; part of the URL that the page doesn't have access to

Peter: there's a lot of work that's been done with media fragments; seems like this should be compatible with that

Peter: I remember Doug Schepers talking about an annotation system.

Tess: There was a whole effort -- workshop, community group of some kind, which may still exist. Artefacts are still out there. I want to say the folks working on scroll to text have already looked at that; at least I remember offhand seeing they'd already seen that.

Peter: how to proceed?

Tess: Two things important: (1) adding something to URL not visible to page; media fragments didn't do that. They came up with a different syntax that is page visible. What's important about hiding from the page?

Alice: why hidden from the page? I recall seeing, but forgot.

Peter: I see concerns about pages that already use fragment ids for keeping navigation state, and they're worried about conflicting with that.

Tess: why more of a problem in this case than for media?

David: because it's a user feature? Users invoke new feature that the page doesn't expect, want to share URLs?

Alice: Explanation in https://github.com/WICG/ScrollToTextFragment/issues/15#issuecomment-506347279 . Concern about apps doing their own hash parsing. Also causes the text search part to not appear in the URL, so it would be an odd transient thing. So tradeoff between backwards compatibility for sites that do their own hash parsing, versus not changing the URL and consistency with media fragments etc.

Peter: Bigger than just consistency, adding a new delimiter to URLs. Big new feature. Something we should look at.

Alice: When would there be an actual conflict between -- when would you link to a fragment in a single page app?

David: I've seen examples... like cryptpad.

Sangwhan: Youtube for TVs.

David: Map examples -- do this with hash, not great example for text search.

Peter: How will these URL be minted? From user doing text search in browser? Or something page will generate?

Hadley: If the page generates it on its own, isn't it then pretty close to how anchors already work. Looking at search use case, thinking of something like schema.org -- pages provide some semantic context, but heavy lifting done by crawler/indexer.

Peter: Search engine use case would definitely be the url being minted by something outside the page.

Peter: sounds like we have some issues to raise -- someone want to write them up?

Hadley: One more issue: internationalization. In security section, potential threat is exfiltrating data from destination site. One proposed mitigation to say it only matches on word boundaries. Not all languages have visible word boundaries -- cites UAX 29 supplemeted by word dictionary, as done by ICU project's boundary analysis.

Peter: Thai example requiring dictionary; CSS doesn't have concept of word.

Tess: On the one hand, ICU is widely used in browser engines... though should we have specs that require a specific library, or should we have a spec for this so that we're not just depending on ICU. So we could reference a spec rather than an ICU behavior.

Peter: Somebody want to write this up?

Alice: I can try.

Peter: Bump issue a few weeks?

2019-08-21

Minutes

David: I had a note to schedule a breakout...

Alice: didn't we discuss this at our last breakout?

David: yes - I think Tess was going to write something..

Alice: let's bump it until she's back

2019-09-04

Minutes

David: we did talk about it last week...

Tess: yes, we did talk about it...

David: we struggled to draw conclusions. I was gonna go ask Anne what the right forum for talking about URL changes is. Anne got back to me. I haven't proxied that response yet.

David: one of the underlying question is "who are you supposed to talk to about fundamental changes for URLs"... [So difficulties] but at the same time it seems like a decent idea.

Peter: bump to TPAC? There might be more people involved with URLs.

Alice: do we have a good dislation of the things we're struggling with about it?

David: one thing is: if you are going to change URL syntax who do you need to socialize it with?

Alice: so, yes the design seems OK but wer'ee not sure about changing URL syntax.

Peter: for me i have concerns that micro-syntax is overloading fragment identifiers... everything is stepping on everything else...

David: that's why they are coming up with this new mechanism for a piece of the URL that is not expose to the site so it doesn't conflict with another thing.

Peter: but is this already being used by some sites...

Alice: yves points out that fragments need to be interpreted

David: it's a little more backwards compatable by using ##

Dan: We are kind of defenders of the URL.

Peter: f2f before tpac?