#962: Captured Surface Control

Visit on Github.

Opened Jun 4, 2024

Uryyb GNT! V nz n ohqqvat pelcgbtencul rkcreg.

I'm requesting a TAG review of Captured Surface Control.

Summary

We introduce a new Web API that allows Web applications to:

  1. Read and write the zoom level of a captured display surface (tab or window).
  2. Produce wheel events in a captured tab or window.

Details

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): Screen Capture Community Group and WebRTC Working Group
  • The group where standardization of this work is intended to be done: WebRTC Working Group
  • Existing major pieces of multi-implementer review or discussion of this design: https://www.w3.org/2024/05/21-webrtc-minutes.html
  • Major unresolved issues with or opposition to this design: N/A
  • This work is being funded by: Google

Discussions

Discussed Aug 26, 2024 (See Github)

Martin: a bit of dispute... an issue raised... https://github.com/w3c/mediacapture-handle/issues/11

... 2 concerns - one is that this invents a new cross-origin communication mechanism. Maybe we should be using post message. Other concern: when you capture something and screen share it, the messages that come through from the other end.. e.g. requesting that page scroll or zoom - the design seems to make this indistinguishable from a regular input. Makes a machine available for remote control. This is at tension with user agency. It should be possible without the knowledge of the application, but heightened security...

Dan: abuse cases... we need to take seriously.

Martin: ... I'll take it, but we also should say "please talk to the wg"...

Comment by @martinthomson Sep 3, 2024 (See Github)

@jyasskin, @hober, and I discussed this today.

Thank you for bringing this to us. We think this seems like a generally useful feature, but we have some questions and suggestions for the explainer:

The explainer should discuss the alternative design of having the page cooperate, and accept dedicated events from the capturing process. We think there are both upsides and downsides to that option that deserve exploration.

The two interactions that are considered are scrolling and zooming. Is that list exhaustive? Are these uniformly safe to do? Are there not occasions where scrolling results in changes to things like form elements? That could require a change of focus before sending the events in, maybe, though with precise X and Y on events, that might still engage the element that is targeted. We're inferring that this is limited to those two actions because "spoofing" those events is safe, but the explainer doesn't give enough details to show that that's true.

There seems to be some heightened permissions UX being contemplated here. It's not clear to us what would be different from a regular screen capture. It would be helpful if the explainer could show a proof of concept that highlights those differences.

Discussed Sep 9, 2024 (See Github)

[We sent them a question that hasn't been answered yet, so skipping this week.]

Discussed Oct 7, 2024 (See Github)

Waiting on proponents.

Discussed Oct 14, 2024 (See Github)

Still waiting on proponents to respond/etc. Time to ask monseigneur Yasskin to apply the obligatory amount of force to colleagues.

Dan: pings Jeffrey to ping colleagues let's revisit at the plenary.

Comment by @eladalon1983 Oct 16, 2024 (See Github)

Apologies for taking some time here. I'll respond soon.

Comment by @eladalon1983 Oct 22, 2024 (See Github)

We think this seems like a generally useful feature

That's great to hear!

The explainer should discuss the alternative design of having the page cooperate, and accept dedicated events from the capturing process. We think there are both upsides and downsides to that option that deserve exploration.

I have now added a discussion select alternatives to the explainer.

The two interactions that are considered are scrolling and zooming. Is that list exhaustive?

For the time being - yes.

Apple's represenative, Youenn, suggested adding pinch. No Web developers have requested this feature, so we are leaving this as a potential extension. But note that the current API shape does not prevent such future extensions.

  • We could, in the future, define forwardPinch(element).
  • We could, in the future, transition to forwardGestures(element, gestures), where the second argument is a dictionary of relevant gestures.
  • Other API shapes would be possible.

Note: We intentionally exclude any interaction like clicking, delivering keystrokes, etc. We have no plans of ever extending the API to cover such gestures.

Are these uniformly safe to do? Are there not occasions where scrolling results in changes to things like form elements?

Web applications can attach any meaning to any user action, and that property is desirable and necessary to retain - the user expects scrolling to work identically when delivered from the capturing application; always, not just when it's a simple scroll. A concrete example is Google Maps, where scrolling results in change the region of the map being displayed, triggering the fetching of new assets, etc. Or think how Apple's main page often uses fancy animations of laptops opening and closing when scrolling.

We believe that this risk is sufficiently mitigated by the (1) pre-existing safeguards associated with screen-sharing to begin with, by (2) the additional permission prompt involved, and (3) by the steps taken to ensure that only the user's immediate interaction with the capturing application can trigger scroll-forwarding to the captured application.

That could require a change of focus before sending the events in

Mandating change of focus could break the experience for the user and subvert their expectation, that the scroll delivered on the capturing application's preview tile, would end up eliciting the exact same behavior on the captured surface, as though it were delivered directly there.

There seems to be some heightened permissions UX being contemplated here. It's not clear to us what would be different from a regular screen capture. It would be helpful if the explainer could show a proof of concept that highlights those differences.

When users are currently asked to grant permission to capture a tab/window/screen, they are used to a specific interpretation. Before elevating this permission to something new - capture plus scroll plus zoom - an additional prompt is required. User agents are free to infer this heightened permission using any heuristic, and may change that based on how user expectations evolve over time. For the time being, Chrome intends to use a run of the mill permission prompt, and to use some extra UX to clarify to the user that this permission is active. This is neither mandated by the spec, nor do we guarantee that Chrome will retain this particular UX.

Discussed Oct 28, 2024 (See Github)

Dan: We have a response from Elad

Martin: ... it's fine ... the discussion of alternatives I think we covered off well ... risk mitigation is a bit soft ... maybe should push more on that.

Dan: security & privacy section is small and does not talk about abuse cases

Matthew: looks like this would work OK if someone viewing the share is already zoomed in - but this is about controlling the remote viewport. So this could improve a11y in that sense. But... is there any provision for the people on the call being able to control?

Martin: unclear - I think it's only local playback. No protocol that exists ...

Matthew: it's just "you are the person sharing this tab and it allows you to control scolling or zoom from there"

Martin: yes...

Matthew: from a cursory look - looks fine - but in something like this that has profound UX implications we should be asking for an a11y considerations section in the explainer. But they need to demonstrate that they thought about it.

Dan: Should we recommend that they get an a11y review from APA?

Yves: also an attack surface analysis.

Matthew: maybe something that could be good for a11y.

Dan: a11y checklist

Martin: ... not sure it's enough

Matthew: if you're sharing a web site...

Martin: ways in which different browsers render things differently...

Matthew: in some cases like collaborative editing ...

Martin: a lot you miss with the screen shot... in some cases that's a feature.

Matthew: re: checklist - APA does theirs as a questionnaire that allows you to open an issue somewhere else - whereas ours opens an issue in our repo.

Thank you for your reply and all the info in the Explainer. We discussed this on our breakout today. We still feel the explainer needs more information on possible abuse cases and a bit more discussion of attack surface. The security considerations talks about potential confusion, but doesn't talk about how the API could be abused by bad actors. So we recommend a security analysis (and there is a W3C process spinning up for this) but in the mean time if you could bolster the current security considerations doc with some discussion of abuse cases and mitigations that would great. As there's a lot going on UI-wise here, we'd really like to see an 'Accessibility considerations' section in the Explainer (it's totally fine to use this section to show what the positives are) - please could you add one? Please also consider requesting a review from the APA WG: https://github.com/w3c/a11y-request/issues/new/choose

we agree to post this

Comment by @matatk Oct 30, 2024 (See Github)

Thank you for your reply and all the info in the Explainer. We discussed this on our breakout today. We still feel the explainer needs more information on possible abuse cases and a bit more discussion of attack surface. The security considerations talks about potential confusion, but doesn't talk about how the API could be abused by bad actors. So we recommend a security analysis (and there is a W3C process spinning up for this) but in the mean time if you could bolster the current security considerations doc with some discussion of abuse cases and mitigations that would great. As there's a lot going on UI-wise here, we'd really like to see an 'Accessibility considerations' section in the Explainer (it's totally fine to use this section to show what the positives are) - please could you add one? Please also consider requesting a review from the APA WG: https://github.com/w3c/a11y-request/issues/new/choose

Comment by @eladalon1983 Oct 30, 2024 (See Github)

We still feel the explainer needs more information on possible abuse cases and a bit more discussion of attack surface.

I have now added a "Security and Privacy Considerations" section in the explainer. It simply links to the corresponding section in the spec, where this information actually lives, so as to avoid duplication.

but in the mean time if you could bolster the current security considerations doc [Emphasis mine - Elad.]

Do I understand correctly, that you are asking for the information already in the spec (this section) to be replicated in questionnaire.md? I think it would be better to go with linking; maybe from section 2.18 to the spec's "Security and Privacy Considerations" section. Wdyt?

As there's a lot going on UI-wise here

Could you please clarify which UI changes you are referring to? As far as I can tell, this spec does not deal with anything UX-related. Although bespoke user agent UX associated with these APIs is possible, this is completely up to the UA's discretion; a spec-compliant implementation is possible even without any additional user agent UX.

To clarify, this mock is of the Web application's possible UX, not the user agent's.

Comment by @jyasskin Oct 30, 2024 (See Github)

FWIW, I don't think you should duplicate any information into https://github.com/screen-share/captured-surface-control/blob/main/questionnaire.md. Instead, questionnaire.md should include links to the places in the specification that answer the questions. We should improve the questionnaire and template to say that. I wasn't in the relevant breakout, so I don't want to comment on the other questions.

Discussed Nov 4, 2024 (See Github)

Jeffrey: I think this is probably ready to close as either satisfied or satisfied-with-concerns, but it's assigned to Martin so should probably wait until he's back.

Peter: Assigning it to the Nov 18 milestone.

Matthew: I'll also reply to the accessibility question.

Discussed Nov 18, 2024 (See Github)

Matthew: if we close, I can leave a comment after on my UI point.

Matthew: when I say there is a lot going on UI wise, Elad was thinking I was talking about the UI of the user agent, not the UI of the app, I think... I get it... this is interesting to me, because the spec is not specifying how you should do UI either within the UA or within the WebApp - it's up to the WebApp. From an A11Y pov both are covered by WCAG... But if the API is able to do things that are user-visible then that still counts as UI...

Matthew: I need to clarify what I meant by "UI-wise". If you give me until the plenary - I'll leave a comment and then we can close at the plenary.

Matthew to leave a comment on UI and A11Y and we can close at the plenary.

Comment by @matatk Nov 20, 2024 (See Github)

@eladalon1983:

Could you please clarify which UI changes you are referring to? As far as I can tell, this spec does not deal with anything UX-related. Although bespoke user agent UX associated with these APIs is possible, this is completely up to the UA's discretion; a spec-compliant implementation is possible even without any additional user agent UX.

To clarify, this mock is of the Web application's possible UX, not the user agent's.

Totally agree that we generally aim to avoid specifying UI/UX, and ACK that the UI in the example is from the app (and, of course, UI is already covered by WCAG - though I'll come back to that). Let me hopefully clarify...

Whilst a spec may be for a low-level API, products built with the API are often user-facing. Developers building things with the API may not imagine some of the ways users could be using them; it can be helpful to raise awareness of the opportunities, and any risks, and makes sense to do that in the spec itself.

A concrete and helpful example of some big accessibility wins, and some patterns to avoid, can be found in the Compute Pressure API's Accessibility Considerations section. This example is great because it shows how the API can affect users (UI decisions being made based on its output), how this can help users, and also the importance of meeting, but thinking beyond WCAG in a particular domain.

In the case of Captured Surface Control, there is a new avenue through which to interact with the preview, and a new avenue to scroll and zoom the target tab. As an extensive sample of one vision-impaired people, this seems like a helpful thing to me :-). I am not 100% sure how/if focus considerations would come into play (focusing the PiP window is likely out of scope, but would you expect there to be interactive controls floating within it?) It'd be great to read your thoughts on this in the explainer.

APA WG would be happy to follow the development of this API—please consider requesting a review, or tagging APA WG via the "a11y-tracker" label in any issue where you think some input may be of help.

Discussed Dec 16, 2024 (See Github)

Matthew: I think this is done - no comments on brainstorming thread -

Martin: you capture another tab and you want to be able to scroll it... It's cooperative so it's fine.

matthew to close

Discussed Dec 16, 2024 (See Github)

Max: Matthew suggested to update the explainer...

Dan: let's revisit in the plenary and ask jeffrey to ping elad?

Matthew: we might be able to close it... the main thing about my post: clarifying what I meant by UI changes... don't think we necessarily expected a reply...

Dan: let's close at the plenary.

Comment by @matatk Dec 20, 2024 (See Github)

Thanks again @eladalon1983 for your review request, and the updates you've made to the explainer. We discussed Captured Surface Control again this week, and are happy to close this review as satisfied.