design-reviews#645: Capture Handle Identity

#645: Capture Handle Identity

Opened Jun 8, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of Capture Handle Identity.

The proposed capture handle is a mechanism that allows a web-application to ergonomically and confidently identify the web-application which it is display-capturing - provided that the captured application has opted-in. Such identification allows these two applications to collaborate in interesting ways. For example, if a VC application is capturing a presentation, then the VC application can expose user-controls for previous/next-slide directly in the VC application. This lets the user navigate presentations without having to jump between the VC and presentation tabs.

Explainer: https://github.com/w3c/mediacapture-handle/blob/main/identity/README.md
Specification URL: https://w3c.github.io/mediacapture-handle/identity/
Tests: N/A
Security and Privacy self-review: https://github.com/w3c/mediacapture-handle/blob/main/identity/security-privacy-questionnaire.md
Primary contacts (and their relationship to the specification):
- Elad Alon (@eladalon1983), Google
Organization(s)/project(s) driving the specification: Google
Key pieces of existing multi-stakeholder review or discussion of this specification: w3c/mediacapture-handle#52
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/4854125411958784

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: Origin Trial has started in Chrome m92
The group where the work on this specification is currently being done: WebRTC WG
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): W3C under mediacapture-handle/identity/.
Major unresolved issues with or opposition to this specification: N/A (minor issues remain and are being discussed)
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option): 💬 leave review feedback as a comment in this issue and @-notify @eladalon1983

Discussions

Discussed Jun 28, 2021 (See Github)

Amy: haven't had chance to look, defer

Peter: more people assigned to this?

Amy: ask in plenary?

Discussed Jul 12, 2021 (See Github)

Amy: capture handle lets the captured app communicate witht he capturing app - share info with the capturing app - on top of the existing screen share API.

Dan: What's the usert need.

Amy: Explainer is a google doc - 4 uses cases - sharing in video conference via another tab - being able to change slides without switching tab. Another about analytics. Not convinced - some surveillancy thing possibly. Detecting unintended captures... "you're about to screen cap this maybe not a good idea..." Then avoiding the hall of mirrors.

Dan: Why would you want to avoid the hall of mirrors?

Amy: would like to see abuse cases spelled out. They're S&P questionnaire answers is minimal. On the basis that the captured app is the one deciding what to share.

Dan: the captured app needs to opt in?

Amy: yes -

Dan: we could ask them to put it in markdown - and elaborate on the abuse cases...

Amy: it's been moved to WICG. Contraversial issue - they've listed it in the request - problems with Mozilla and Apple - that might be out of date.

Dan: where would it end up after wicg?

Amy: it started in webrtc maybe?

Dan: Was there not consensus in webrtc wg on it?

Hadley: it would be useful to know more history.

Dan: left a comment

Amy: will follow up about abuse cases and multi stakeholder support, and check for relation/dependencies to capturetab

Comment by @torgo Jul 13, 2021 (See Github)

Hi @eladalon1983 thanks for this - we are reviewing this morning. Can I ask that you bring the explainer into the WICG repo at a stable URL? Also, can you let us know more of the background of why this is being done in WICG as opposed to the webrtc wg? What's the trajectory for this after incubation in WICG? Will it be going to webrtc?

Comment by @eladalon1983 Jul 13, 2021 (See Github)

Can I ask that you bring the explainer into the WICG repo at a stable URL?

It's now in the root of the WICG repo, here.

Also, can you let us know more of the background of why this is being done in WICG as opposed to the webrtc wg?

Mozilla has been opposing this and cited two arguments:

That they are concerned that this will make getDisplayMedia "too attractive" and delay adoption of getViewportMedia. Note that:
1. Deprecating getDisplayMedia is not under discussion (and Chrome would oppose). So making getDisplayMedia "too attractive" is not relevant criticism of Capture Handle.
2. getViewportMedia has been under meandering discussion for close to a year now, and it's unclear when/if it will eventually be specified, let alone implemented.
Mozilla has brought in their own proposal to address some of the use-cases handled by Capture Handle, but not all, using "Slide Controls." I applaud their efforts and will be cheering them on, but since Capture Handle addresses additional use-cases, I do not see that as relevant opposition either.

Apple's position has been unclear. They have made proposals for changes to Capture Handle. When those proposals were adopted, Apple suggested reconsidering the entire approach. A definitive answer as to what approach would satisfy Apple has not yet been given.

What's the trajectory for this after incubation in WICG?

I believe it's time to put this in web-developers' hands, show them how useful it is for them, and use their positive feedback to motivate progress in the WebRTC WG.

Comment by @torgo Jul 14, 2021 (See Github)

Hi Elad - we much prefer progress to be made in a multi-vendor way that emphasises consensus-building. We haven't been part of the discussion but it sounds from what you've said that Mozilla is making an effort to achieve consensus. The role of a standards group is for people to work together to achieve consensus. We'd really like to encourage you to engage in the webrtc group on a path that would be acceptable to all parties. We think doing an experiment with this as an origin trial could be a really good way to gauge developer feedback. However we caution against moving beyond that stage without bringing the results of that research back into the wider discussion and trying to achieve that consensus approach. In the mean time we are going to close this review as 'too early' and we hope to review a future API that represents the outcome of multi-stakeholder consensus in the webrtc working group.

Comment by @eladalon1983 Jul 14, 2021 (See Github)

Hi Daniel. We are in ongoing discussions with other vendors. Origin Trial is indeed planned (Chrome m92-m94). I'll be back after the trial is further along and more developer feedback is available.

Comment by @eladalon1983 Jul 27, 2021 (See Github)

Just to further clarify, we are indeed working with the WebRTC WG to discuss and build consensus on the right API. Mozilla's proposed API fails to cover the majority of use-cases Capture Handle tackles, and has some important deficiencies even when the use-cases do overlap. We are therefore proceeding with the origin trial, which has garnered interest from multiple sources ([1], [2]). The communication lines with Mozilla are open, though, and I'm happy to keep discussing.

Comment by @eladalon1983 Jan 21, 2022 (See Github)

We have continued our engagement with Mozilla and Apple. There's agreement that there are two different scenarios:

The captured and capturing applications have a close relationship set up. The captured application wishes to expose its Identity to the capturing application. The capturing application can use this to establish communications with the captured application using a mechanism of their choosing (e.g. shared cloud infrastructure). The original proposal addresses this scenario. We call this Capture Handle Identity.
The captured and capturing applications do not know each other. We intend to come up with another mechanism, broadly outlined in this comment, to address that case. To summarize that comment, the captured application will declare capabilities, and the capturing application will send actions. For example, a captured YouTube tab could declare that it supports prev/next, and a capturing VC application that has never heard of YouTube could nevertheless read this declaration and expose user-facing controls for prev/next, sending a prev/next Action to the captured application. (Note that in this scenario, the VC application does not know YouTube, but the user does.) The mechanism here is still being discussed, but there's rough agreement. We'll call this Capture Handle Actions.

The resolution of the Capture Handle discussions during the last WebRTC WG Interim was:

RESOLUTION: We start a call for adoption of Capture Handle by the WG

I am therefore asking for a TAG review of Capture Handle Identity - see original comment in this thread. :-)

Comment by @cynthia Mar 8, 2022 (See Github)

Reopened to revisit during F2F. (So that it gets captured by the scheduling tool, details will come from @torgo later.)

Comment by @eladalon1983 Mar 8, 2022 (See Github)

Please note that I have edited the issue to get new links to point at the new repo location. Also, that repo now hosts two documents for two APIs (Capture Handle Identity and Capture Handle Actions). This requests for TAG review is only for the Identity part.

Comment by @cynthia Mar 10, 2022 (See Github)

First pass, not an official review. Initial skim nothing serious really sticks out. Some comments:

The capturee-facing API is pretty much a manifest, which would be natural to have a complementing declarative path. Not only because this lets noscript applications to work, but the imperative-only path exposes the capturee to an unwanted capture timing attack if it happens between page load start and the first setCaptureHandle() call.

The asymmetric nature seems strange to me - why shouldn't the capturee know that the capturer has started capturing itself? This would be a rather wild paradigm shift, but could unlock some interesting use cases. (And obviously, abuse patterns.) Has a tradeoff discussion happened in the WG?

CaptureHandleConfig.permittedOrigins seems rather oversimplistic - an exact mechanism doesn't immediate come to mind, but wouldn't wildcard patterns *.capturer.com be something our users would expect to work?

Comment by @eladalon1983 Mar 10, 2022 (See Github)

First pass, not an official review. Initial skim nothing serious really sticks out. Some comments:

Early pass much appreciated - thanks!

The capturee-facing API is pretty much a manifest, which would be natural to have a complementing declarative path. Not only because this lets noscript applications to work, but the imperative-only path exposes the capturee to an unwanted capture timing attack if it happens between page load start and the first setCaptureHandle() call.

Sorry, I did not fully follow. Could you please clarify this for me?

The asymmetric nature seems strange to me - why shouldn't the capturee know that the capturer has started capturing itself? This would be a rather wild paradigm shift, but could unlock some interesting use cases. (And obviously, abuse patterns.) Has a tradeoff discussion happened in the WG?

It's been discussed in the WG in the sense that some people initially misunderstood my proposal of Capture Handle as proposing the inverse direction, and were alarmed at the abuse potential (which you have acknowledged).

I think it's possible to follow up in the WG with a proposal for a separate mechanism to allow a capturer to opt-in to alerting the capturee. Without opt-in, this would not be desirable, as capturees would be able to self-censor themselves when captured, perhaps even throwing up ads for some video-conferencing application of their choosing. This would push users away from the Web platform and towards native apps.

At any rate, it seems to me like an orthogonal API to the currently proposed mechanisms, even if we end up exposing them adjacently. Or wdyt?

CaptureHandleConfig.permittedOrigins seems rather oversimplistic - an exact mechanism doesn't immediate come to mind, but wouldn't wildcard patterns *.capturer.com be something our users would expect to work?

I am open to extending the spec later if there is consensus. Wdyt?

Comment by @cynthia Mar 10, 2022 (See Github)

Sorry, I did not fully follow. Could you please clarify this for me?

Something like an HTTP header or permission policy to set the preference, not just via an API.

I am open to extending the spec later if there is consensus. Wdyt?

I think it's worth bringing it up with the group. Something like the URL Pattern API might do the trick, but the downside of that is that you wouldn't be able to make it declarative.

The second point ("am I being captured") will need a group discussion, if there are WG meeting minutes on this topic that would be helpful.

One extra nit: the 1024 character limit, if it is to be normative should be in the validation logic. https://w3c.github.io/mediacapture-handle/identity/#dom-mediadevices-setcapturehandleconfig

Comment by @eladalon1983 Mar 10, 2022 (See Github)

Sorry, I did not fully follow. Could you please clarify this for me?

Something like an HTTP header or permission policy to set the preference, not just via an API.

The capture handle might be determined during application runtime. For example, when the user logs in, loads a deck, or makes some other decision that affects which "session" the application considers this to be. Setting the capture-handle late, or even changing it during application runtime, is therefore something we would like to support. So the imperative method is a must, and declarative is an optional method that could address a subset of use-cases. Would the declarative and imperative modes "play nicely" together? I'd be concerned that it could be slightly surprising when the imperative method overrides the declarative one. Or if the imperative mode fails on account of the declarative one. Would love to hear more of your thoughts here.

I am open to extending the spec later if there is consensus. Wdyt?

I think it's worth bringing it up with the group. Something like the URL Pattern API might do the trick, but the downside of that is that you wouldn't be able to make it declarative.

I see it as a natural later enhancement. Or are you concerned that it would not be backwards-compatible if we add it later?

The second point ("am I being captured") will need a group discussion, if there are WG meeting minutes on this topic that would be helpful.

I think that's an adjacent but distinct API. I am not presently planning on proposing it, as it does not address the use-cases that concern me. But if someone else proposes it, I think both APIs will not interfere with each other. Or wdyt?

One extra nit: the 1024 character limit, if it is to be normative should be in the validation logic. https://w3c.github.io/mediacapture-handle/identity/#dom-mediadevices-setcapturehandleconfig

Thanks for catching. I'll send a PR for that. (Sent)

Comment by @cynthia Mar 23, 2022 (See Github)

Discussed in the London hybrid F2F. Thanks @dontcallmedom and @eladalon1983 for joining our call and disambiguating the rationale and scope of this proposal.

Aside from the extensibility concerns (which we'd like the group to look into, as wildcards and URL patterns seem like a valid use case) we discussed in the meeting, we are happy to see this move forward. Thanks for bringing this to our attention.

Comment by @eladalon1983 Mar 23, 2022 (See Github)

Thank you for reviewing!

To summarize the discussion about patterns, we've mentioned how this could later be achieved by multiple means, one of which is changing permittedOrigins from sequence<DOMString> to sequence<DOMStringOrPattern> or something similar. (Other options cited included the less preferable option of exposing permittedPatterns alongside permittedOrigins.