design-reviews#625: API for display-capturing the current tab

#625: API for display-capturing the current tab

Opened

Apr 23, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of getCurrentBrowsingContextMedia.

Overview

Consider the existing navigator.mediaDevices.getDisplayMedia(). It allows a user unlimited choice of sources - any monitor, window or tabs.

We’re in the process of standardizing a new API - getViewportMedia - that will allow web-applications to present a simple confirmation-only prompt to the user. The security requirements of this API are under active discussion, but consensus is forming that both cross-origin isolation and a new opt-in header will be required.

Not all applications can accept these requirements - at least not in the short-term. However, by forcing such applications to use getDisplayMedia, the user is pushed towards the riskier option of sharing the entire monitor. Why is that the riskier option? Because at the moment capture starts, the entire current monitor includes the current tab. Note that the moment capture starts is sufficient for almost any attack, as all attacks we have thus far considered could be carried out using a single frame.

A hybrid API is deemed necessary in order to offer some of the benefits of getViewportMedia without its elevated security requirements. This hybrid API will allow the application to signal its preference for capturing the current tab by way of a new dictionary member parameter for getDisplayMedia. Namely, we will extend DisplayMediaStreamConstraints by adding another dictionary member called preferCurrentTab with a default value of false. When getDisplayMedia is invoked with preferCurrentTab=true, the browser will offer the current tab as the first option to the user, but will still offer unlimited choice of capture sources (see image below).

The unlimited choice of sources makes this new API compliant with the requirements of getDisplayMedia. Since it complies with the requirements of getDisplayMedia, the security requirements placed on getDisplayMedia are sufficient for this new hybrid API.

Screen Shot 2021-06-03 at 23 40 58

Links and Details

Explainer: bit.ly/3dJgLfS
Specification URL: https://eladalon1983.github.io/prefer-current-tab/
Security and Privacy self-review: TODO (I will edit this comment and add the link.)
Primary contacts (and their relationship to the specification):
- Elad Alon (@eladalon1983), Google,
Organization(s)/project(s) driving the specification: Google
Key pieces of existing multi-stakeholder review or discussion of this specification: getViewportMedia and its security-requirements
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): Chrome Status entry

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: We aim to ship in Chrome m92 or m93.
The group where the work on this specification is currently being done: WebRTC WG works on getViewportMedia, but is not interested in this hybrid API.
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): WICG (I will link once this is in the WICG.)
Major unresolved issues with or opposition to this specification:
- Mozilla and Apple have voiced the opinion that getViewportMedia should be sufficient, and were not interested in "weakened" version.
- Our position, on the contrary, is that this hybrid is necessary and does not degrade security when compared to getDisplayMedia.
This work is being funded by: Google

You should also know that...

A word of caution over a source of potential confusion:

The name getViewportMedia is a later conclusion. Initially, that API was offered under the name getCurrentBrowsingContextMedia. Chrome has an active origin-trial for getCurrentBrowsingContextMedia which accomplishes the same thing as preferCurrentTab, but uses a new method instead of a new dictionary member. See the explainer.

We'd prefer the TAG provide feedback as (please delete all but the desired option): 💬 leave review feedback as a comment in this issue and @-notify @eladalon1983

Discussions

Comment by @annevk Apr 26, 2021 (See Github)

Why does this use browsing context in its name? Does this survive navigations somehow?

cc @jan-ivar

Comment by @eladalon1983 Apr 26, 2021 (See Github)

The capture does not survive navigation - the capturing app is unloaded on navigation. I am open to renaming. Any thoughts on what could be a good name for this hybrid?

Comment by @annevk Apr 26, 2021 (See Github)

I'm not sure, but from the proposed UI this seems like an option (which would have a name related to viewport to stay consistent) you would pass to getDisplayMedia().

Comment by @eladalon1983 Apr 26, 2021 (See Github)

I did consider the option of an additional constraint to getDisplayMedia, but that becomes less convenient if getViewportMedia is ever extended to receive additional parameters that don't make sense for getDisplayMedia - something which I do plan. In that case, the hybrid gCBCM can be meaningfully extended to accept that parameter and apply it only if the user chooses the current tab. (I suggest we migrate this discussion the WICG repo when that one is set up. I can @mention you when it's time, if you'd like - let me know.)

Comment by @jan-ivar May 13, 2021 (See Github)

Note: this is a Google-only request, not a successor to #609 which is the request from the WebRTC WG.

I've closed https://github.com/w3c/mediacapture-screen-share/pull/148#issuecomment-840636775 to avoid confusion, and requested #609 be reopened.

As the OP mentions, we are opposed to this "hybrid" API.

Mozilla and Apple have voiced the opinion that getViewportMedia should be sufficient, and were not interested in "weakened" version.

Comment by @jan-ivar May 13, 2021 (See Github)

A hybrid API - getCurrentBrowsingContextMedia - is deemed necessary in order to offer some of the benefits of getViewportMedia without its elevated security requirements. This hybrid API will allow the application to signal its preference for capturing the current tab. The browser will then offer the current tab as the first option to the user, but will still offer unlimited choice of capture sources (see image below). The unlimited choice of sources makes this new API compliant with the requirements of getDisplayMedia.

An application signal does not alleviate the "elevated security requirements" if the application is malicious, it defeats them.

The getDisplayMedia API deters social engineering: "User Agents are encouraged to warn users against sharing browser display devices as well as monitor display devices where browser windows are visible, or otherwise try to discourage their selection on the basis that these represent a significantly higher risk when shared." ¹

Providing malicious applications with a method that does exactly what they need seems like a bad idea.

I also worry it would undermine adoption of getViewportMedia which requires sites to isolate to have this ability, specifically to mitigate this threat, which Chrome Security agrees is significant.

<sub>1. See the questionaire.md and subsequent links for details of these unobvious treats on the same-origin policy from sharing web surfaces under attacker control.</sub>

Comment by @dontcallmedom May 18, 2021 (See Github)

Since I was confused and created confusion in terms of the relationship with #609, I thought I would summarize what I understand about this particular design review (at the request of @LeaVerou and @kenchris I was chatting with this morning):

the proposal in this issue hasn't been discussed (let alone endorsed) by the WebRTC Working Group
the proposal in this issue addresses similar needs as the ones identified for the getViewportMedia API (on which #609 focuses) but proposes a different solution
the proposal in this issue is essentially the equivalent of the API defined by the WebRTC Working Group getDisplayMedia, but with a specific hint to suggest the current browser tab should be captured - which as @jan-ivar commented on probably reduces the effectiveness of the mitigation set by getDisplayMedia() to avoid giving too much control to the API-calling-page on what is being captured

The motivation I understand behind the proposal in this issue is that getting the security model being developed for getViewportMedia (which requires any embedded resources to adopt & deploy new HTTP headers) is likely to be very challenging. I'm mentioning this in case the TAG would like to chime in more generally on other approaches that might make it easier to deploy getViewportMedia.

Comment by @alvestrand May 18, 2021 (See Github)

The Chrome decision on the "need for elevated permission" for getDisplayMedia (which presents all the capture surfaces without calling out special considerations about their risks) was based on the understanding that the most common use cases would be displaying a tab or displaying the screen, so it did not make much sense to increase the cognitive overload by calling out cases that had lower risk than the common ones.

It is logical based on this standpoint that presenting the present tab as a capture option doesn't need any more elevated warning; the warning is already elevated.

Discussed May 31, 2021 (See Github)

Ken: Still needs explainer

Dan: need to resolve is 625 a duplicate of 609?

Lea: understand it's not, but they're working together to consolidate them. Ken and I were in this call with Dom where he explained.

Ken: Dom said he wants to reopen a TAG review with updated materials, updated exlpainer and S&P. Don't need the TAG to pick. Wanted to determine whether either request was safe. That was two weeks ago.

Lea: still pending external feedback? What about the other one?

Dan: 625, also 14 days ago Dom sent something. Multiple arguments going on with different people in these discussions. TAG feedback should be for them to sort it out..

Lea: not the TAG's job to consolidate

Dan: right, shouldn't be two issues open for us. But got pushback. Posting on both issues.

Yves: another option to close both and open another one if there is a dispute they want feedback on.

Lea: last resort to use us to resolve dispute

Hadley: encourage them to work together

Dan: we need more guidance to figure out what we should be reviewing

Comment by @torgo Jun 1, 2021 (See Github)

So just to clarify - is there now going to be one consolidated proposal merging #609 and #625? If so, can we agree to close one of these issues and update the other with the consolidated and agreed proposal?

Comment by @eladalon1983 Jun 1, 2021 (See Github)

There is not going to be a consolidated proposal. (Btw, the current proposal - #625 - is going to be amended today/tomorrow, so if it's possible to hold off on reviewing it for 2 days, that'd be better.)

Comment by @torgo Jun 2, 2021 (See Github)

Hi @eladalon1983 can you please clarify this. It's highly unlikely that the TAG is going to endorse a single proposal when there are multiple competing proposals from different vendors and lack of consensus. Happy to postpone until our next design review week - which will be the 14th of June. Hoping we can have better news by then.

Comment by @eladalon1983 Jun 2, 2021 (See Github)

Glad to clarify. There are no competing proposals.

#609 is a proposal for a capture-this-tab API. That API will be gated behind (a) Cross-Origin Isolation and (b) an opt-in header.
#625 is a proposal for a second API that achieves something similar, but does requires neither Cross-Origin Isolation nor an opt-in header.

I think the two proposals can be judged independently.

Comment by @eladalon1983 Jun 2, 2021 (See Github)

I've updated the original comment to reflect our change from a method-based API to a new-dictionary-member-based API.

Comment by @eladalon1983 Jun 4, 2021 (See Github)

Spec added. Could the labels be adjusted, @cynthia and @LeaVerou?

Discussed Sep 1, 2021 (See Github)

Dan: recent comment lays out a position from Google ... we have to consider both of these issues in parallel. Elad is saying that the thing they're proposing in 625 is a temporary measure, maybe we should be saying that's great okay but why won't that then become the de facto thing that people use? The web is full of short term measures that became the long term unfortunate thing.

Sangwhan: a lot of people are using this feature without realising its Chrome only - it's not standardised

Dan: [shares tab using the API described in 625]

Sangwhan: a lot of people are using this - though it's only implemented in chrome. I think the trade off given the situation we're in is something we're going to have to live with.

Dan: Why don't we say we appreciate .. thank you for clarifying that #609 is the long term proposal, we will focus our efforts on reviewing #609. Can you provide a roadmap for how you see transitioning people from use of this API to what is described in 609 once the issues are resolved? Eg. would 609 be layered on top of 625? Based on that we could propose closed.

Dan: why is it called preferCurrentTab? comment left

Comment by @eladalon1983 Sep 1, 2021 (See Github)

The long-term path (getViewportMedia) has a standard consensus track, and that is what is tracked in TAG issue #609. But this solution has multiple complexities and non-trivial security aspects that we still need to iron out. Therefore -
preferCurrentTab is a short-term measure that solves some use cases to some degree, and doesn't have the security problems associated with getViewportMedia.
After months of discussion, there is no consensus on getViewportMedia with Mozilla, so Chrome gave up and shipped preferCurrentTab.
We are still committed to getViewportMedia.

Comment by @jan-ivar Sep 1, 2021 (See Github)

After months of discussion, there is no consensus on getViewportMedia with Mozilla, so Chrome gave up and shipped preferCurrentTab.

@eladalon1983 What is the disagreement on getViewportMedia?

We are still committed to getViewportMedia.

I'm glad to hear this. Mozilla is eager to engage on this.

Comment by @torgo Sep 3, 2021 (See Github)

Thanks for the update @eladalon1983. We are going to review this at our "f2f" coming up on the 13th. I hope we can resolve and close the review by then.

Comment by @torgo Sep 14, 2021 (See Github)

Just discussed in our virtual f2f breakout. Thank you for clarifying that getViewportMedia is the long term proposal, we will focus our efforts on reviewing that. Can you provide a roadmap for how you see transitioning people from use of preferCurrentTab to getViewportMedia once the issues are resolved? The concern we have is that the web is full of technologies that were designed as short term stop gaps until a longer term thing could be worked out. We're rather not see another one added to that list.

Comment by @eladalon1983 Sep 14, 2021 (See Github)

Once the security measures getViewportMedia requires are sufficiently rolled out, applications will naturally migrate from preferCurrentTab to getViewportMedia, because the latter offers a superior UX; namely, the user is presented with a clearer choice, and cannot choose anything but the current tab.

Chrome has UMA tracking calls to getDisplayMedia with/without preferCurrentTab (and the API invocation's result). getViewportMedia will be associated with similar UMA.

When we feel that adoption is sufficient, or that the challenges to it are no longer as significant, we can (a) communicate publicly that preferCurrentTab is about to be deprecate and (b) start printing deprecation warnings to the dev-console whenever it is used.

Comment by @torgo Sep 16, 2021 (See Github)

Ok this sounds good. We still have concerns about interoperability and strongly encourage convergence on one consensus-based solution as you have laid out above.