#618: Early design review: opaque-blocklisted-never-sniffed MIME types

Opened Mar 12, 2021

@mikewest suggested that I request an early TAG design review of the "opaque-blocklisted-never-sniffed MIME type" definition. This definition is referred to from the ORB spec and implementation; this definition is also used in CORB implementation (although the definition is not currently referred to from CORB spec :-/ ).

Primary contacts:

ORB: annevk@annevk.nl (@annevk)
CORB: lukasza@chromium.org (@anforowicz), creis@chromium.org (@csreis)

External status/issue tracker: https://github.com/annevk/orb

Concerns for TAG discussion

How can we populate and maintain the definition of "opaque-blocklisted-never-sniffed MIME types" (a definition used by ORB and CORB)? In particular, how can we ensure that the list contains the right list of non-webby types both today and in the future?

A missing entry is a security problem for CORB, which will fail open and allow unrecognized MIME types. A missing entry might be a performance problem for ORB, which for unrecognized types falls back to sniffing for Javascript or audio/image/video patterns (see also https://github.com/annevk/orb/issues/3 for thoughts on future audio/image/video MIME types).

Note that "opaque-blocklisted-never-sniffed MIME types" appear on the web rather infrequently and therefore the scope of the performance problem might be relatively small. From that perspective, the concern might go away on its own in the future (if we adopt ORB rather than CORB going forward; it seems that ORB is in general preferable assuming that current experiments show acceptable performance and compatibility in real world scenarios). OTOH, it might still be worthwhile to discuss what to do in the short term.

More context about ORB and CORB

Cross-origin resources in OOPIFs

Out-of-process-iframes (OOPIFs) implemented in Chrome and/or planned to be implemented in Firefox continue hosting cross-origin, no-cors resources like audio, images, scripts, stylesheets, and video in the same process as the OOPIF (e.g. as the HTML document). OTOH, the OOPIFs implementations never require hosting other kinds of cross-origin content, like:

HTML and PDF, which can be isolated into a separate process / OOPIF. (In practice these are handled as navigate rather than no-cors requests, although some corner-cases might require additional spec work.)
JSON and ZIP, which are not natively supported by the web. (Reading them via fetch or XHR requires CORS. Downloading these resources and/or opening them in a separate application is outside the scope of a browser’s threat model.)

ORB and CORB

ORB and CORB are two algorithms that attempt to minimize what cross-origin resources may enter an OOPIF process. (We hope to converge on one of these algorithms and eventually implement/specify only one of them.) In particular, HTML, PDF, JSON, and ZIP are examples of resources that CORB and ORB attempt to block. Blocking such resources gives a security benefit of preventing a malicious web page from using Spectre to read a cross-origin resource (e.g. JSON) pulled into an OOPIF process via a no-cors request for an <img> element.

Never-sniffed-types heuristic

One heuristic used by CORB and ORB is based on a set of “opaque-blocklisted-never-sniffed MIME types”. All types in this set have the following two properties:

OOPIFs implementations above never require hosting such types in a cross-origin execution context (e.g. in the same process as the embedding HTML document).
It is perceived as unlikely that allowed no-cors resources (e.g. images or scripts) will be mislabeled by HTTP servers (via Content-Type HTTP response header) as one of the blocklisted types. (This is the reason why text/html is not an “opaque-blocklisted-never-sniffed MIME type” - images are frequently mislabeled as text/html.)

Based on that, CORB and ORB block such cross-origin, no-cors responses just based on the HTTP headers, without any sniffing.

More examples of never-sniffed types

Some examples from the current set of opaque-blocklisted-never-sniffed MIME types:

"application/gzip"
"application/msword"
"application/pdf"
"application/vnd.ces-quickword"
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
"application/x-protobuf"
"multipart/byteranges"
"multipart/signed"
"text/event-stream"
"text/csv"

Discussions

Comment by @torgo Mar 24, 2021 (See Github)

Hi @anforowicz thanks for sending this our way. I get that this is a little outside of our normal review process so maybe some elements of the template do not apply however, can you clarify where this work is happening and/or what spec this idea will become a part of? One of the things we try to emphasise in our explainer guidance is the need to start with user need. Can you elaborate on that a bit?

Comment by @annevk Mar 24, 2021 (See Github)

We're iterating on the design in https://github.com/annevk/orb and it'll be folded into Fetch once prototypes demonstrate feasibility.

Comment by @anforowicz Mar 24, 2021 (See Github)

One of the things we try to emphasise in our explainer guidance is the need to start with user need. Can you elaborate on that a bit?

User impact of ORB and CORB:

behavior/functionality: no impact (assuming HTTP responses use correct Content-Type)
security: big positive impact - Site Isolation / Out of Process IFrames (OOPIFs) do not offer a meaningful security boundary without ORB and/or CORB. OOPIFs combined with ORB/CORB are an important piece of defenses against the threat of Spectre-based attacks.
performance: some risk

Discussed Mar 29, 2021 (See Github)

Yves: it's for things they don't want to sniff for security reasons. Things that are undefined like gzip or package, word files. When you're doing inclusion.

Ken: sounds like a good thing. It's basically a block list?

Yves: Don't try to process that, we know in advance that we won't be able to use it anyway

Ken: the readme is a list

Dan: Anne responded to my question about venue, which is that it's folded into fetch, so a WHATWG thing.

Ken: if we know there are issues with sniffing but it's course to allow it but it's not going to cover all mime types. This is not file extensions I guess..

Yves: I have nothing against it. The issue is more about maintenance of this. Lack of maintenance will lead to possible security issues where you are forced to inspect something that may be malicious with an improper media type. It's not a good issue because you can always add the same kind of cntent sent with a media type that is known to be sniffed, with the wrong media type. I don't think security is a big issue. Maintenance is always an issue, there is a need to have a registry for that.

Ken: if it's just in the spec itself, as long as there are wpf tests? You want to make sure if they add more mime types that browsers implementing fetch will also add those to the browsers. Doesn't need a wiki or anything, if it's in the fetch spec that's fine. Just need to make sure they're covered by tests.

Yves: I'm wondering why they have multipart byte range and multipart signed, why not multipart/ * If you're using multipart you might not be able to figure out the real media type of the resource anyway so something can be sneaked in

Ken: that's good feedback. Would be the same for the openxml format office document?

Yves: and message/something. Font or remove all the fonts from there? Fonts should be in the excluded list.

Ken: what is application/vnd?

Yves: vendor extension

Ken: some are from open office right?

Yves: it's media type that is not html or css or images or video. May want to exclude the + extensions, +json, +zip, +xml

Dan: I'm going to write back, they said in response to my question about user needs which elaborates. Going to suggest that goes in explainer.

Yves: will put questions into an issue comment

Dan: let's see what we get as a response and see if we can iterate

Comment by @torgo Mar 30, 2021 (See Github)

@anforowicz thanks for that – can I suggest that you make the user benefits (protection against Spectre-like attacks and hence enhanced security whilst using the web) clear in the explainer?

Comment by @ylafon Mar 30, 2021 (See Github)

Looking at the list, I wonder why substring match is not used to filter more. Like +zip +rar, or even blocking things like message/ It not using regex, but just substring match so performance shouldn't be an issue.

Discussed Jun 21, 2021 (See Github)

Sangwhan: Awaiting feedback on a question from Yves

Comment by @ylafon Jun 23, 2021 (See Github)

After discussion in today's call, we don't have any other comment on this. Closing issue now, thank you.