#635: Find the best terminology to restrict the usage of data urls

Visit on Github.

Opened May 13, 2021

Ya ya yawm TAG!

The category ("dispute escalation") is a misnomer; this is more a help/clarification request.

I'm requesting the TAG express an opinion on a problem related to:

  • Name: EPUB 3.3
  • Specification URL: https://www.w3.org/TR/epub-33/
  • Explainer (containing user needs and example code): (in preparation)
  • GitHub issues (if you prefer feedback filed there): https://github.com/w3c/epub-specs/issues/1592
  • Primary contacts (and their relationship to the specification): @mattgarrish @dauwhe @iherman (editors)
  • The group where the work on this specification is: EPUB 3 WG
  • Links to major pieces of multi-stakeholder review or discussion of this specification: n/a
  • Links to major unresolved issues or opposition with this specification: n/a
  • Relevant time constraints or deadlines: July 2021

We recommend the explainer to be in Markdown.

Explanation of the issue that we'd like the TAG's opinion on:

"There is no final agreement in the WG on how to precisely formulate the restrictions on the usage of data-url-s. The current formulation relies on the top-level browsing contexts term but that may not be adequate (e.g., if the top level document is an SVG file)."

Cc @ylafon

Discussions

Comment by @annevk May 14, 2021 (See Github)

(See also https://github.com/whatwg/html/pull/5279.)

Discussed May 24, 2021 (See Github)

Dan: missing some context here

Hadley: "reading systems mush prevent data URLs opening as if they are top level content documents..."

Hadley: why don't we invite someone from the wg to come and chat.

Hadley: [posts comment](https://github.com/w3ctag/design-reviews/issues/635#issuecomment-847666940

Comment by @hadleybeeman May 25, 2021 (See Github)

Hi @iherman. We're looking at this in our W3CTAG breakout, and we'd love a little more context.

What are you trying to accomplish with data URLs? Why is it helpful to restrict them? It seems like you've got a use case in mind, but it's hard to work it out from the spec you've linked to.

We think it may be to do with security, but we don't see it documented. Can you tell us a bit more about your thinking?

We can hopefully help more with that information. Thanks!

Comment by @iherman May 25, 2021 (See Github)

Hey @hadleybeeman!

I try to summarize, but I also cc @mattgarrish @dauwhe and @bduga, who have a deeper knowledge of what is happening. The relevant part in the specification is https://w3c.github.io/epub-specs/epub33/rs/#confreq-rs-data-urls.

In the EPUB jargon, a Reading System is, from the point of view of what we are discussing, like a browser, insofar as one of its main task is to render either HTML or (standalone) SVG documents; these documents provide the reader with the pages of the books. These documents, referred to as "Top Level Content Documents", can be thought of being, say, the chapters of a large book (and the metadata provided in the EPUB instance tells the Reading System in which order these files should be displayed). Of course, these pages, which are HTML pages, can link to other resources, some in the EPUB instance and some somewhere on the Web.

The security related issue is how to handle DATA URL-s. One approach is to universally disallow them; however, this might make some genuine use cases impossible (e.g., an SVG content is embedded in the HTML or CSS file as a DATA URI). Hence the approach taken in the spec to disallow them as, say, a href value in an <a> element, but allow them in, e.g., a CSS file. The question was how to turn this into spec-text.

We realized that browsers have similar restrictions, and the EPUB spec is keen not to reinvent not only a wheel, but not even a terminology, when possible. However, we did not find any normative statement in other specs that applies to this situation. We did put the text into the current draft, but we are not sure whether that is the proper reference/terminology. Hence the request for TAG help...

Some further references:

I hope this helps...

Comment by @mattgarrish May 25, 2021 (See Github)

In addition to what Ivan has already mentioned, using data URLs to embed resources doesn't appear problematic, but allowing data URLs to be referenced from a elements has the same security risks in EPUB that have been raised for browsers (i.e., phishing).

In other words, we want to disallow data URLs from opening a "top-level browsing context", except when explicitly requested by a user (e.g., to open an image in a new window), but aren't completely sure how best to say this since it seems to only be handled in bug trackers right now. For reference, see:

So, in the absence of more formal guidance (which we'd prefer to reference), does the following make sense:

Reading Systems MUST prevent data URLs [RFC2397] from opening in top-level browsing contexts [HTML], except when initiated through a Reading System affordance such as a context menu. If a Reading System does not use a top-level browsing context for Top-level Content Documents, it MUST also prevent data URLs from opening as though they are Top-level Content Documents.

Or do you have any suggestions on how we can improve this wording?

Discussed Jul 26, 2021 (See Github)

Amy: they have a para in their spec and they want out opinion ... They say "top level browsing context" and they may be refering to an SVG which isn't techically a top level browsing context so they want feedback. They have decided on behaviour, just want feedback on how to communicate it.

Yves: what is the origin when you navigate to a data URL? And do you use whether the document was sent using https to infer whether it can access powerful features, etc?

Dan: trying to figure out if the WHATWG PR is consistent with what Ivan is asking about

Hadley: MDN page says this already happens.

Amy: it is consistent with what the epub spec is asking for...

Yves: their spec says must prevent data url from opening in a top level browsing context. They say "browsing context" doesn't apply to SVG - only HTML. Something something Transcluded references - resources that are automatically downloaded like images and style?

Dan: could we just advise them to say "top level context"? Should they specify it in their doc?

Amy: or they could just say "in this context SVG also counts" [when referring].

Dan: I think that would be sufficient. But I'm not implementing epub readers...

Yves: should be secure contexts as well... there was an issue about if data URL is a secure context and the resolution was no ... do they require secure contexts or not? If they do then instead of using top level context you can say only for secure context documents - in that case it prevents Data URLs to be used.

Dan: let's leave those comments and maybe we can close it at the plenary if we get a response.

Comment by @rhiaro Aug 4, 2021 (See Github)

Hi @iherman and @mattgarrish. Sorry about the delay in responding to this.

A question that came up in our TAG meeting last week was: does the epub spec require secure contexts? I couldn't tell from a quick ctrl+F of the spec. If it does, then it was resolved that data URLs at the top level do not create a secure context, in which case your wording could include something like:

Reading Systems MUST prevent data URLs [RFC2397] from opening in insecure contexts [https://html.spec.whatwg.org/multipage/webappapis.html#secure-context]

Otherwise I think the wording you have is sufficient, and it is consistent with widely implemented browser behaviour. If the main concern is that an SVG at the top level doesn't count as a "browsing context", and SVG is the only exception, you could be explicit about this, eg:

Reading Systems MUST prevent data URLs [RFC2397] from opening in top-level browsing contexts [HTML], except when initiated through a Reading System affordance such as a context menu. If a Reading System does not use a top-level browsing context for Top-level Content Documents, for example if the Top-level Content Document is an SVG, it MUST also prevent data URLs from opening as though they are Top-level Content Documents.

My only other suggestion would be to consider the phrase "via a user-initiated navigation" instead of (or as well as) "through a Reading System affordance such as a context menu" if that adds any clarity to what you mean.

Discussed Aug 9, 2021 (See Github)

[closed on the basis of the PR]

Comment by @rhiaro Aug 10, 2021 (See Github)

I see the PR to update the wording was merged, so closing this. Thanks!

Comment by @iherman Aug 10, 2021 (See Github)

I see the PR to update the wording was merged, so closing this. Thanks!

Indeed, @rhiaro. Thank you!