design-reviews#710: Region Capture

#710: Region Capture

Visit on Github.

Opened Jan 20, 2022

Braw mornin' TAG!

I'm requesting a TAG review of Region Capture.

An API for cropping a video track derived from display-capture of the current browsing context.

Explainer: https://github.com/w3c/mediacapture-region/blob/main/README.md
Specification URL: https://w3c.github.io/mediacapture-region/
Security and Privacy self-review: https://github.com/w3c/mediacapture-region/blob/main/Questionnaire.md
GitHub repo (if you prefer feedback filed there): https://github.com/w3c/mediacapture-region/
Primary contacts (and their relationship to the specification):
- Elad Alon (@eladalon1983), Google
Organization(s)/project(s) driving the specification: Google
Key pieces of existing multi-stakeholder review or discussion of this specification:
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status):
- Chrome Status entry

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: N/A
The group where the work on this specification is currently being done: WebRTC WG
Major unresolved issues with or opposition to this specification: N/A
This work is being funded by: Google

We'd prefer the TAG provide feedback as: 🐛 open issues in our GitHub repo for each point of feedback

Discussions

Discussed Feb 7, 2022 (See Github)

Sangwhan: the jist is - it lets you stream over a shared screen an element - e.g. one <div> - the target is an element, not a region. So the name is confusing. You feed it an html element and it starts streaming it.

Dan: Over web rtc?

Sangwhan: whatever.

Dan: [reviewing s&p questionnaire responses] ... LGTM proviso that users may not know what they are sending when they select a target... Sangwhan to clarify in question.

Sangwhan: produce crop target will always ... what should happen - you can create a new html element that is detached from the DOM - what happens when you feed that ... that's one question we can ask. Also not sure how you can convey to the user what is going to be sent over the wire. Don't understand the user flow - you could hover on that div and share that... you could put an event handler on it...

Sangwhan: I think it would be better if it was not element bound, but box bound

Dan: that would be more intuitive for the user. if they just want to share one section of the document, but if they start adding text to that section it starts to get bigger

Sangwhan: one div and you're writing stuff in that div

Dan: is that the intended use case? That's not in the explainer

Sangwhan: what I think this is going to be used for... to share the part where the document is..

Dan: question you should still ask is how this works from an end user perspective, how can they be informed about what they're sharing and not sharing, and why isn't it just a good idea to send a crop of the area

Sangwhan: the box. For example you'd be letting users select stuff, completely bizarre because you'd be doing the equivalent of [opening the inspector] ..

Dan: use case of selecting a div, dev tools came to mind

Sangwhan: how to find the common denominator [from the source code] of what to project. I'll ask those questions.

Dan: it's officially adopted by the webrtc wg as a work item

Dan: could users unintentinally expose more than what they intend?

Sangwhan: I was going to ask about the preview. User should know what they're sending.

Sangwhan: leaves comment

Comment by @cynthia Feb 8, 2022 (See Github)

First of all, this looks pretty useful.

API surface-wise, what happens when one tries to create a crop target on a element that is not attached to a document, hence not visible? (I assume it can be rendered and therefore can be captured, but seems like a bug or an interesting side-effect feature?)

I see benefits from having the element being the region boundary, but also a bit unconventional. Initially we thought this was replicating what native does and provides a bounding box to be streamed, but that does not seem to be the case here. This feels like a user expectation that should be fulfilled.

We weren't able to figure out the user flow of how the preview is presented the user in the example code. Is it triggered when the crop target promise is being returned? (Presumable rejecting if the user dismisses the preview as "please don't"?)

Comment by @eladalon1983 Feb 8, 2022 (See Github)

First of all, this looks pretty useful.

Thanks. :-)

API surface-wise, what happens when one tries to create a crop target on a element that is not attached to a document, hence not visible? (I assume it can be rendered and therefore can be captured, but seems like a bug or an interesting side-effect feature?)

The thing that happens if one creates a CropTarget for an element that is attached, but then detaches the element. Namely, the track is muted¹.

Initially we thought this was replicating what native does and provides a bounding box to be streamed, but that does not seem to be the case here.

I think you initial interpretation was in fact correct. The element defines a bounding box to be streamed.

This feels like a user expectation that should be fulfilled.

The user is not in control of the feature; the application is. The application could be saving the cropped file to disk, storing it remotely, doing OCR inside of it... Anything, really. Whether it's user-facing or not is up to the application, and not up to the API.

We weren't able to figure out the user flow of how the preview is presented the user in the example code.

Could you please clarify the question? What preview are you asking about? (Note clarification above.)

-- [1] The muting part is currently under discussion. It's unclear at the moment if we'll set MediaStreamTrack.muted, but regardless, no frames will be delivered, so "effectively muted."

Discussed Feb 14, 2022 (See Github)

Sangwhan: miscommunication in Elad's response to my comment... i meant a box like a real box.. if the elemnt changes dimensions then you're changing size of box...

Dan: something something user need... complexity .. and what (possibly simpler) alternatives were considered?

Sangwhan: let me get in contact with Elad.

Discussed Feb 21, 2022 (See Github)

Sangwhan: a misunderstanding on my end, this makes sense now. I talked to the person proposing and now I understand. I think the proposal is fine, the name is unfortunate. I'll comment. It's not region capture as we expect it like draw a rectangle on your screen. It's a document region capture. Constrained to a specific tab. The application decides what inside that tab should be streamed.

Dan: I'm concerned about user expectation.. we understand region of a document, but end user is going to be looking at a rectangle on a screen

Sangwhan: what the user chooses is the tab. If I share screen, this api call will give me the share tab dialogue. The tab, the application running inside the tab, can decide to stream only a subset of what is available on that screen.

Dan: So it might decide to only share... I understand where that might cme in handy

Sangwhan: helps with mirror effect

Dan: if that tab were a presentation tool and you wanted it to only share the presentation not the controls and editing functions

Sangwhan: right. Now that that's clear, the only thing that bothers me is the name. It looks like a generic region capture mechanism when it's not

Dan: both the things are wrong.. it's not about a region and it's not about capture. It's about sharing.

Sangwhan: it's called capture because its from webrtc

Dan: should be StuffShare. Maybe we can close at plenary.

Discussed Feb 28, 2022 (See Github)

Dan: proposed closed, waiting on a closing comment about changing name

Sangwhan: I'll do that right now. There's a user need about capturing one area of the screen.. why only window or desktop? And this featuer dosen't give user control of exact bounding box but lets application choose, another unfulfilled user need here

Dan: and user sharing what they don't expect to be sharing?

Sangwhan: application chooses what is shared, so far nothing outside of user's viewport. However there is no way for the user to decide which part of a document to be shared, nor a way for the user to..

Dan: the application decides "this is the appropriate thing to be shared when you select my tab"

Sangwhan: this is fine, but why are we not giving the user a choice

Dan: the application might say it's the content well that is the appropriate thing to display, but the user might actually want to be displaying the navigation because they're in a customer service scenario to say 'the menu option you told me about doesn't exist' or something like that. Strange to be constraining the user choice that way.

Sangwhan: at least two native application focusing tools let the users do that.

Dan: that's the articulation of the user need that you mentioned. I want to display this section of the screen, it's a different thing.

Sangwhan: different but wanted to keep it on record. [close with comment]

Thank you for the clarification!

> I think you initial interpretation was in fact correct. The element defines a bounding box to be streamed.

I think there is a minor misunderstanding here, we understood this as a bounding box (draw a rectangle) in any part of the desktop - which is not what this proposal is about.

Given that the naming/expectations might introduce confusion - may we suggest renaming this to something that suggests a specific scope? (e.g. "Tab" Region Capture)

Additionally, we'd like the group to consider the unfulfilled user need of "stream arbitrary region from any part of the screen" in WebRTC at some point.

Thank you for bringing this to our attention, and we are happy to see this proposal move forward.

closed

Discussed Feb 28, 2022 (See Github)

Punted to C for Sangwhan

Comment by @cynthia Mar 1, 2022 (See Github)

Thank you for the clarification!

I think you initial interpretation was in fact correct. The element defines a bounding box to be streamed.

I think there is a minor misunderstanding here, we understood this as a bounding box (draw a rectangle) in any part of the desktop - which is not what this proposal is about.

Given that the naming/expectations might introduce confusion - may we suggest renaming this to something that suggests a specific scope? (e.g. "Tab" Region Capture)

Additionally, we'd like the group to consider the unfulfilled user need of "stream arbitrary region from any part of the screen" in WebRTC at some point.

Thank you for bringing this to our attention, and we are happy to see this proposal move forward.

Discussed Apr 25, 2022 (See Github)

Sangwhan: there are disputes happening within the group again. Elad reached out to discuss. I said I'll write my thoughts and bring it to TAG. We did a review on this and we were fine with this, there were some things I didn't like that much

Sangwhan: we reviewed it and were satisfied... Then there was some disputes in the working group...

What makes CropTarget special to require an asynchronous creation? #17

Background is that the factory function for CropTarget is async - and other implementors disagree. Implementing it as a sync method in Chromium seemingly would require blocking the main thread, which is undesirable. The other implementors have not done the investigation and it is rather unclear what the gains are to make this API synchronous.

Sangwhan: I have looked at this discusison and think the developer ergonomic gains would be minimal, unlike the autoplay API.

Sangwhan: based on example code I've seen I don't see a signifigant gain in terms of ergonomics for developers with sync. You do this once. You don't have to issue multiple crop targets. In this case I think it's fine to be async. Would like to hear from others.

Is CropTarget name too generic? #18

CropTarget name is too generic. Multiple proposals, none of them seem significantly better over another.

Sangwhan: None of them seem to be significantly beneficial over another, but prefixing it so that CropTarget contains more context on what kind of CropTarget this is might be useful. Indifferent on what it should be called, but the “Element” notions are probably only going to add confusion.

Sangwhan: not namespaced to a specific context.

Why expose produceCropTarget at MediaDevices level? #11

produceCropTarget() is only available at MediaDevices, why not layer it as a member function of an element? Counter argument is that there is already too much stuff on element.

Sangwhan: Initially was indifferent, but after thinking about this for a bit layering it at an element seems like it might have some issues - for some elements this wouldn’t make sense. (e.g. you don’t want to cast a <style> element, for example) Agree on the “there is too much on element already”

Sangwhan: a little bizarre if you stick it to an element - a lot of elements not visible... these make no sense to set a crop target on... makes more sense to have it layered elsewhere...

General discussion

These are the sorts of things that the WG should be able to come to consensus on. Not the TAG's job to pick a winner.

Yves: if WG can't come to consensus they can pick one and if it goes to FO there can be a council. If they're are both architecturally equal, we don't have to take a position.

Dan: we can feed back that interoperability is an imperative. That should be what should be driving them, not what is most technically pure. Priority of constituents?

Sangwhan: that was referenced

Dan: we know as webrtc users that the issues of interop are many. This is something that should be guiding the WGs work. That should be the primary goal of anything. Not the most elegant or correct API. Is it going to be implemented in as many places as possible. Do they have consensus? That should be the feedback that we give.

Sangwhan: I've given that feedback...

Hadley: reasonable

Sangwhan: we can say we don't regularly resolve disputes for WGs.

Hadley: I think we would jump in where there's an unresolveable issue that is architectural. That feels appropriate. The issue here is not architectural, we are happy whatever they choose, so we're not the final arbitrator.