design-reviews#995: Document-Isolation-Policy

#995: Document-Isolation-Policy

Opened Sep 18, 2024

こんにちは TAG-さん!

I'm requesting a TAG review of Document-Isolation-Policy.

Developers want to build applications that are fast using SharedArrayBuffers (SAB), which can improve computation time by ~40%. But SharedArrayBuffers allow to create high-precision timers that can be exploited in a Spectre attack, allowing to leak cross-origin user data. To mitigate the risk, SharedArrayBuffers are gated behind crossOriginIsolation (COI). CrossOriginIsolation requires to deploy both Cross-Origin-Opener-Policy (COOP) and Cross-Origin-Embedder-Policy (COEP). Both have proven hard to deploy, COOP because it prevents communication with cross-origin popups, and COEP because it imposes restrictions on third-party embeds. Finally, the whole COOP + COEP model is focused on providing access to SharedArrayBuffers to the top-level frame. Cross-origin embeds can only use SABs if their embedder deploys crossOriginIsolation and delegates the permission to use COI-gated APIs, making the availability of SABs in third-party iframes very unreliable.

Document-Isolation-Policy, is proposing to solve these deployment concerns by relying on the browser Out-of-Process-Iframe capability. It will provide a way to securely build fast applications using SharedArrayBuffers while maintaining communication with cross-origin popups (needed for OAuth and payment flows) and not requiring extra work to embed cross-origin iframes. Finally, it will be available for embedded widgets as well as top-level frames, allowing to build efficient compute heavy widgets that are embedded across a variety of websites (e.g. photo library, video conference iframe, etc….

Explainer: https://github.com/WICG/document-isolation-policy
Specification: https://wicg.github.io/document-isolation-policy/
WPT Tests: https://wpt.fyi/results/html/document-isolation-policy?label=experimental&label=master&aligned
User research:
Security and Privacy self-review: https://github.com/WICG/document-isolation-policy/blob/main/security_privacy_self_review.md
GitHub repo: https://wicg.github.io/document-isolation-policy/
Primary contacts:
- Camille Lamy (@camillelamy), Google, spec author
Organization/project driving the specification: Google
Multi-stakeholder support:
- Chromium comments: We are excited to start trialing this API with developers through origin trials.
- Mozilla comments: https://github.com/mozilla/standards-positions/issues/1074
- WebKit comments: https://github.com/WebKit/standards-positions/issues/399
- Positive feedback from developer on the initial WICG proposal.
Status/issue trackers for implementations:
- Chromium: https://chromestatus.com/feature/5141940204208128

Further details:

I have reviewed the TAG's Web Platform Design Principles
The group where the work on this specification is currently being done: WICG
The group where standardization of this work is intended to be done (if different from the current group): WHATWG
Major unresolved issues with or opposition to this specification:
This work is being funded by: Google

You should also know that...

This proposal is solving the same issues as our previous proposal COOP: restrict-properties. This new proposal is meant to replace the old one.

Discussions

Discussed Oct 14, 2024 (See Github)

Martin: the goal is to allow access to shared array buffers -

Dan: the risks are to do with cross-proccess attacks with high-resolution timers...

Martin: timing attack related stuff ...

Dan: what is the user need?

Martin: being able to deploy your wasm code... you've got a popup with some code, some of the code is wasm, you want to use shared array buffers... but you can't use it because ... The user benefit here would be that the app would use less of the battery, cpu, etc... because it's able to use the more efficient version of the code... There's a lot of work here .. and a lot of use of mechisms that maybe shouldn't exist...

Dan: example?

Martin: like not partitioning popups...

Martin: key example - cross origin isolation breaks the communication ... you have an iframe, that opens a popup window, then that window is able to communicate back.

Yves: payment is one use case.

Martin: ... use cases that involve cryptography ...

Yves: this could always be misused for [advertising]... Reading the explainer - it's possible to do this but they want to remove it for security reasons...

Martin: but wouldn't be able to access coi restricted APIs...

Yves: allowing this capability... you open up another process to avoid timing attacks...

Martin: i don't like these things without going through the negatives...

Yves: all efforts around restricting sharedarraybuffer... it's done by patching and patching and patching ...

Martin: they want to restore the ability to do cross-origin communication... if you isolate something then you want to be able to communicate it still... but question on whether that should be allowed - the reasonable one is : if you COEP something it can't embed something... it would be seem reasonable to loosen the restrictions on COEP if you isolate...

Dan: we could get them along to a call...

Martin: first thing to do might be ask something... In the explainer (though it's a lot) there is no discussion of the trade-off. They want to change the way cross-origin isolation works so...

Questions:

We're still struggling a little bit with the explainer here, but there are a few things you might add to help us a little.

What is the user need that this addresses? There is some mention of performance benefits in your writeup here (~40%! [citation-needed]), but that isn't really substantiated. Can you give us some examples of real-world scenarios in which this would be used and provide material end-user benefit. In particular, why do these applications need to jointly use popups and this sort of arrangement? In "use cases" you list "App with cross-origin popup" and state "The user would like some of their compute heavy web apps to be faster" - but there is no discussion of why these applications must exist cross-origin.

What are the trade-offs taken for this? Or, what are the risks that we'd be taking on by allowing this loosening of cross-origin isolation? The explainer focuses exclusively on arguing for the proposal, but if we're going to make a reasoned decision, we need to understand the cost of the change. Are there any safeguards that might be needed to mitigate these risks?

Are there cases where applications might want to use SharedArrayBuffer, but are unable to employ a fallback method, such that enabling easier access to the capability would dramatically simplify the deployment/development of those applications?

we agree on the above comment and Martin posts it

Comment by @martinthomson Oct 16, 2024 (See Github)

Hi @camillelamy, thanks for bringing this to us. We're still struggling a little bit with the explainer here, but there are a few things you might add to help us a little.

What is the user need that this addresses? There is some mention of performance benefits in your writeup here (~40%! [citation-needed]), but that isn't really substantiated. Can you give us some examples of real-world scenarios in which this would be used and provide material end-user benefit. In particular, why do these applications need to jointly use popups and this sort of arrangement? In "use cases" you list "App with cross-origin popup" and state "The user would like some of their compute heavy web apps to be faster" - but there is no discussion of why these applications must exist cross-origin.
What are the trade-offs taken for this? Or, what are the risks that we'd be taking on by allowing this loosening of cross-origin isolation? The explainer focuses exclusively on arguing for the proposal, but if we're going to make a reasoned decision, we need to understand the cost of the change. Are there any safeguards that might be needed to mitigate these risks?
Are there cases where applications might want to use SharedArrayBuffer, but are unable to employ a fallback method, such that enabling easier access to the capability would dramatically simplify the deployment/development of those applications?

Comment by @Sora2455 Nov 21, 2024 (See Github)

As a web developer, I'm very interested in the "Prevent XS-leaks and Specter without having to force all my 3rd-party iFrames to use COEP" part of this proposal.

Comment by @camillelamy Apr 3, 2025 (See Github)

Hi @martinthomson,

A real world example I can give you is a compute heavy application like a spreadsheet, a video conference app or a video game that is looking to improve its performance by taking advantage of the concurrency offered by SharedArrayBuffers and WASM threads (concurrency does offer significant speed ups to compute heavy applications). At the same time those applications might use SSO authentication or payment flows that are popup based and they can't easily change that. Or they might embed third-party iframes (for example ads for monetization, or some enterprise set ups allow companies to customize web app and add their own frames to the spreadsheet app). What we've found is that it's really hard to deploy crossOriginIsolation whenever a page interacts with thrid-party content, since the interaction might not work anymore or it might require a lot of work from third-party developers. And just dropping requirements on SharedArrayBuffers has security issues. So we want compute heavy applications to be able to use concurrency through SharedArrayBuffers to improve their performance, without having to rely on 3rd party content they interact with having to do any work.
The trade-offs are around the process model allocation the browser vendors will use to back this. Basically, to make it work you need to process isolate the documents that have requested Document-Isolation-Policy, which requires the use of Out-Of-Process Iframes. Whereas the original COOP+COEP model was designed to work with only page level isolation. In some cases, it might lead to increased memory consumption. That said, process isolation is not actually mandated by the spec, because we do not wish to specify the implementation of process model for browser vendors. It is simply the suggestion of the authors that this is a safe way to implement the API. Not implementing it in that way would increase the risk of XS-Leaks attacks.
Yes, as part of the OriginTrial on Chrome, we have been working with several developers who find it much easier to gain access to SharedArrayBuffers using DocumentIsolationPolicy. This has been the case for Zoom, Gmail, Google Meet and Google Earth. Basically, large applications have a lot of cross-origin iframes, and it has been hard for the applications to get the iframes to deploy COEP. The interaction with SSO logging flows through popups was also a blocker for a number of those websites.

Discussed Apr 21, 2025 (See Github)

Dan: It's Chromium high priority. 3 weeks ago we had a response from Camille to Martin with a real world example. This is about providing some safeguards to sharedarray buffer that doesn't expose applications to SPECTRE-type attacks... Use of sharedarray buffer was restricted because they were open to these attacks, this is about trying to open that up so we should proceed with caution. This is a real world security issue. I think this is what you were trying to channel Martin?

Martin: Yeah

Yvves: most usecases were SSO based and we suggested alternatives such as FEDCM

Dan: Camille has provided an example that isn't SSO

Martin: Her example is SSO, the page has to show an SSO pop-up.

Marcos: so the attack is cross-origin sharedarray buffer?

Martin: I think it's about cluster allocation for the processes that are involved. isolate and credential-less says that...

none is the default value. It does not do anything.

isolate-and-credentialless impacts agent cluster allocation for the document. It also forces no-cors cross-origin requests to be sent without credentials for subresources embedded by the document (like COEP credentialless).

isolate-and-require-corp impacts agent cluster allocation for the document. Additionally, the document can only load subresources from the same origin, or subresources explicitly marked as loadable from another origin. For non-cors requests, the response should have an appropriate Cross-Origin-Resource-Policy header (like COEP require-corp).

Martin: This is going to require more work. I'm concerned that htis is just setting it up so that the embbeded content gets to access the content that has sharedarray buffer access.

Dan: I would suggest commenting today

Martin: I don't know what comment to put

Dan: [Working on text...]

Hi Camille - thanks for the above comments. We took another look at this in today's TAG breakout. We need to re-review and we want to be sure we can give it the attention it deserves as we're still concerned that this might just re-enable the SPECTRE-type vulnerabilities. If you have further detail, especially an analysis that shows that this doesn't re-open those vulnerabilities, or any information that could help demonstrate the feasibility of this approach, that would be helpful.

Feedback from other implementers would helpful. We see both Anne has feedback but Mozilla hasn't provided anything substantive yet.

Dan: Anne says could "harm the device independent nature of the web" what does that mean?

Martin: Process isolation on mobile processors is different from desktop... this forces browsers to hand over process allocation control to web sites. Kind of oK with top level isolation ... but it's different with nested frames, etc... the web site could spin up an arbitrary number of process..

Xiaocheng: we should not assume that there is only one page active on mobile;

Martin: yeah, but mobile devices might have fewer active windows

Xiaocheng: a smaller upper limit

Post the comment, keep this open for now.

Comment by @martinthomson Apr 24, 2025 (See Github)

Hi @camillelamy,

Thanks for the above information. We took another look at this in today's TAG breakout. We need to re-review (it has been a while) and we want to be sure we can give it the attention it deserves as we're still concerned that this might just re-enable the SPECTRE-type vulnerabilities. If you have further detail, especially an analysis that shows that this doesn't re-open those vulnerabilities, or any information that could help demonstrate the feasibility of this approach, that would be helpful.

Feedback from other implementers would helpful. We see that Anne has feedback but Mozilla hasn't provided anything substantive yet.

Comment by @camillelamy Apr 24, 2025 (See Github)

Hi @martinthomson,

I am not sure what you mean by an analysis, but maybe I can try to explain it in a different way and see if it is clearer. Please do keep in mind that Spectre risks are strictly limited to resources inside the process of the attacker, since this is the key point on which the security model hinges.

Basically, Document-Isolation-Policy divides the documents of a particular origin into two groups, the ones that have asked for isolation and the ones that haven't. And then, it prevents documents in two different groups to have synchronous script access to each other. This allows a browser to put them in different processes (provided the browser supports Out-Of-Process-Iframes), because the communication between those documents is entirely asynchronous and so they can live in different processes. With OOPIF backing, this means that we can have a process that contains only documents of a particular origin that requested isolation. So there is no risk of a Spectre attack from those documents on other cross-origin documents.

Now we still have the question of cross-origin subresources. To make it safe, DocumentIsolationPolicy also imposes constraints on subresource loads. Since all documents in the process are same-origin, and isolated, subresources fall into one of theses buckets:

same-origin subresources, which are fine to load in the context.
cross-origin subresources loaded through CORS. It's fine to load them because CORS either allowlist the document origin to read their content (so no extra information gained through Spectre) or they are requested without credentials (an attacker could already request the uncredentialled resource by itself).
cross-origin subresources loaded without CORS. these are the subresources at risk. To protect them, we do exactly as with COEP. We have two modes: require-corp and credentialless. In require-corp mode, cross-origin resources are blocked from loading unless they have a CORP header (which is an opt-in to emebedding). In credentialless mode, all non-CORS cross-origin requests are made without credentials, which makes them uninteresting to an attacker.

Now, there is always an argument that resources could be personalized due to the user's network position, but that is an existing risk of the crossOriginIsolationModel. To address it, Chrome has been looking at restrictions on private network access.

So overall, as long as the browser uses Out-Of-Process iframes to isolate the documents with DocumentIsolationPolicy, there should be no spectre risks.

Comment by @martinthomson Apr 24, 2025 (See Github)

A small bit of feedback if I may. That response was very good. So good, that I think you should delete the entire explainer and replace it. The explainer is very long and convoluted, but this is both clear and concise. I understand that there are some important details in the explainer that you haven't repeated, so you'd need a bit more, but I'd risk losing detail if it provided the clarity you achieved here.

With that understanding (which I'm sure I had an approximation of six months ago, but lost), I think that my concerns might now be reduced to questions about spelling. For instance, why invent new labels for things when require-corp and credentialless are labels in COEP that apply directly in this design? That's relatively minor. (Anne's comments are still relevant, of course.)

Comment by @camillelamy Apr 25, 2025 (See Github)

Thanks for the feedback! I have added the explanation to the beginning of the explainer.. Then if folks want more details, they can still read the rest of the explainer.

Regarding the naming, we wanted to convey the fact that the policy does more than COEP because it also puts your documents in a different agent cluster from the documents that don't have it. Hence the different name. We also thought that the mechanism of separating same-origin documents in different agent cluster is interesting (and something that some developers have requested in the past). So we thought that maybe we'll eventually extend Document-Isolation-Policy to do that with no impact on subresources and process allocation. And we would probably name such a mode isolate-something so maybe it would make sense to have all policies start with isolate.

Regarding Anne's comments, we addressed them in the request for position. As explained above, the security model works because of Out-Of-Process Iframes. In Chrome, we have them everywhere except on very low end Android devices (devices with less than 2GB of RAM). So we have two options, either not give access to COI APIs on those, which is safe. This was our initial thinking, but Anne was concerned that this would break the device agnostic nature of the web. So we went back and evaluated the risk of shipping DocumentIsolationPolicy without Out-Of-Process Iframes on low-end Android devices. Based on the fact that most of those devices do not have processes with speculative execution, that it is a small fraction of devices, and that the number of those devices is going down (because even super cheap new phones have more than 2Gb of RAM nowadays), we think we should still go ahead with launching DocumentIsolationPolicy on low end Android, even if we can't have OOPIF.

The way we see it, having an OOPIF-based policy for SharedArrayBuffers is a reasonable long term model for the web. It is safe, and also deployable without too many constraints for websites. So we think this is where we should end up. And in Chrome, we're nearly there except for that small portion of Android devices. So we could either completely block on that, or go ahead with the launch even if we can't make it fully secure for those low-end devices. Considering that the reality of SharedArrayBuffer usage is that most developers use the ungated SharedArrayBuffer reverse Origin Trial on Chrome desktop, and get access to SABs without crossOriginIsolation, we think the tradeoff to get them something they can use and finally end this ungated access is worth it.

Discussed May 12, 2025 (See Github)

(Martin) Propose that we close satisfied.

Martin: Camille put an excellent comment in the issue tracker - and put that in the expainer and it makes the whole thing more comprehensible. So great. I'm happy with where this is right now.

Dan: let me take a look at this after today's call. Martin, can you draft a comment? And then I'll respond to you about it. And then we can agree to close satisfied.

Martin: I'm concerned about the complexity here, it's getting to be a lot to understand.

Dan: [talks about speaking to teenagers at a school, who were concerned about the web becoming too complex]

I feel like we should also being talking abot what could be simplified, or refactored. We often talk about not making breaking changes, not break existing content — but we have been able to depricte some features like app cache.

Martin: I'd like to ask Camille to think about how to document this feature. The contiual creep of complexity is our problem to worry about, as the TAG.

Dan: DepricateCon