#710: Region Capture

Visit on Github.

Opened Jan 20, 2022

Braw mornin' TAG!

I'm requesting a TAG review of Region Capture.

An API for cropping a video track derived from display-capture of the current browsing context.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: N/A
  • The group where the work on this specification is currently being done: WebRTC WG
  • Major unresolved issues with or opposition to this specification: N/A
  • This work is being funded by: Google

We'd prefer the TAG provide feedback as: 🐛 open issues in our GitHub repo for each point of feedback

Discussions

2022-02-07

Minutes

Sangwhan: the jist is - it lets you stream over a shared screen an element - e.g. one <div> - the target is an element, not a region. So the name is confusing. You feed it an html element and it starts streaming it.

Dan: Over web rtc?

Sangwhan: whatever.

Dan: [reviewing s&p questionnaire responses] ... LGTM proviso that users may not know what they are sending when they select a target... Sangwhan to clarify in question.

Sangwhan: produce crop target will always ... what should happen - you can create a new html element that is detached from the DOM - what happens when you feed that ... that's one question we can ask. Also not sure how you can convey to the user what is going to be sent over the wire. Don't understand the user flow - you could hover on that div and share that... you could put an event handler on it...

Sangwhan: I think it would be better if it was not element bound, but box bound

Dan: that would be more intuitive for the user. if they just want to share one section of the document, but if they start adding text to that section it starts to get bigger

Sangwhan: one div and you're writing stuff in that div

Dan: is that the intended use case? That's not in the explainer

Sangwhan: what I think this is going to be used for... to share the part where the document is..

Dan: question you should still ask is how this works from an end user perspective, how can they be informed about what they're sharing and not sharing, and why isn't it just a good idea to send a crop of the area

Sangwhan: the box. For example you'd be letting users select stuff, completely bizarre because you'd be doing the equivalent of [opening the inspector] ..

Dan: use case of selecting a div, dev tools came to mind

Sangwhan: how to find the common denominator [from the source code] of what to project. I'll ask those questions.

Dan: it's officially adopted by the webrtc wg as a work item

Dan: could users unintentinally expose more than what they intend?

Sangwhan: I was going to ask about the preview. User should know what they're sending.

Sangwhan: leaves comment

2022-02-14

Minutes

Sangwhan: miscommunication in Elad's response to my comment... i meant a box like a real box.. if the elemnt changes dimensions then you're changing size of box...

Dan: something something user need... complexity .. and what (possibly simpler) alternatives were considered?

Sangwhan: let me get in contact with Elad.

2022-02-21

Minutes

Sangwhan: a misunderstanding on my end, this makes sense now. I talked to the person proposing and now I understand. I think the proposal is fine, the name is unfortunate. I'll comment. It's not region capture as we expect it like draw a rectangle on your screen. It's a document region capture. Constrained to a specific tab. The application decides what inside that tab should be streamed.

Dan: I'm concerned about user expectation.. we understand region of a document, but end user is going to be looking at a rectangle on a screen

Sangwhan: what the user chooses is the tab. If I share screen, this api call will give me the share tab dialogue. The tab, the application running inside the tab, can decide to stream only a subset of what is available on that screen.

Dan: So it might decide to only share... I understand where that might cme in handy

Sangwhan: helps with mirror effect

Dan: if that tab were a presentation tool and you wanted it to only share the presentation not the controls and editing functions

Sangwhan: right. Now that that's clear, the only thing that bothers me is the name. It looks like a generic region capture mechanism when it's not

Dan: both the things are wrong.. it's not about a region and it's not about capture. It's about sharing.

Sangwhan: it's called capture because its from webrtc

Dan: should be StuffShare. Maybe we can close at plenary.

2022-02-28

Minutes

Dan: proposed closed, waiting on a closing comment about changing name

Sangwhan: I'll do that right now. There's a user need about capturing one area of the screen.. why only window or desktop? And this featuer dosen't give user control of exact bounding box but lets application choose, another unfulfilled user need here

Dan: and user sharing what they don't expect to be sharing?

Sangwhan: application chooses what is shared, so far nothing outside of user's viewport. However there is no way for the user to decide which part of a document to be shared, nor a way for the user to..

Dan: the application decides "this is the appropriate thing to be shared when you select my tab"

Sangwhan: this is fine, but why are we not giving the user a choice

Dan: the application might say it's the content well that is the appropriate thing to display, but the user might actually want to be displaying the navigation because they're in a customer service scenario to say 'the menu option you told me about doesn't exist' or something like that. Strange to be constraining the user choice that way.

Sangwhan: at least two native application focusing tools let the users do that.

Dan: that's the articulation of the user need that you mentioned. I want to display this section of the screen, it's a different thing.

Sangwhan: different but wanted to keep it on record. [close with comment]

Thank you for the clarification!

> I think you initial interpretation was in fact correct. The element defines a bounding box to be streamed.

I think there is a minor misunderstanding here, we understood this as a bounding box (draw a rectangle) in any part of the desktop - which is not what this proposal is about.

Given that the naming/expectations might introduce confusion - may we suggest renaming this to something that suggests a specific scope? (e.g. "Tab" Region Capture)

Additionally, we'd like the group to consider the unfulfilled user need of "stream arbitrary region from any part of the screen" in WebRTC at some point.

Thank you for bringing this to our attention, and we are happy to see this proposal move forward.

closed

2022-02-28

Minutes

Punted to C for Sangwhan

2022-04-25

Minutes

Sangwhan: there are disputes happening within the group again. Elad reached out to discuss. I said I'll write my thoughts and bring it to TAG. We did a review on this and we were fine with this, there were some things I didn't like that much

Sangwhan: we reviewed it and were satisfied... Then there was some disputes in the working group...

What makes CropTarget special to require an asynchronous creation? #17

Background is that the factory function for CropTarget is async - and other implementors disagree. Implementing it as a sync method in Chromium seemingly would require blocking the main thread, which is undesirable. The other implementors have not done the investigation and it is rather unclear what the gains are to make this API synchronous.

Sangwhan: I have looked at this discusison and think the developer ergonomic gains would be minimal, unlike the autoplay API.

Sangwhan: based on example code I've seen I don't see a signifigant gain in terms of ergonomics for developers with sync. You do this once. You don't have to issue multiple crop targets. In this case I think it's fine to be async. Would like to hear from others.

Is CropTarget name too generic? #18

CropTarget name is too generic. Multiple proposals, none of them seem significantly better over another.

Sangwhan: None of them seem to be significantly beneficial over another, but prefixing it so that CropTarget contains more context on what kind of CropTarget this is might be useful. Indifferent on what it should be called, but the “Element” notions are probably only going to add confusion.

Sangwhan: not namespaced to a specific context.

Why expose produceCropTarget at MediaDevices level? #11

produceCropTarget() is only available at MediaDevices, why not layer it as a member function of an element? Counter argument is that there is already too much stuff on element.

Sangwhan: Initially was indifferent, but after thinking about this for a bit layering it at an element seems like it might have some issues - for some elements this wouldn’t make sense. (e.g. you don’t want to cast a <style> element, for example) Agree on the “there is too much on element already”

Sangwhan: a little bizarre if you stick it to an element - a lot of elements not visible... these make no sense to set a crop target on... makes more sense to have it layered elsewhere...

General discussion

These are the sorts of things that the WG should be able to come to consensus on. Not the TAG's job to pick a winner.

Yves: if WG can't come to consensus they can pick one and if it goes to FO there can be a council. If they're are both architecturally equal, we don't have to take a position.

Dan: we can feed back that interoperability is an imperative. That should be what should be driving them, not what is most technically pure. Priority of constituents?

Sangwhan: that was referenced

Dan: we know as webrtc users that the issues of interop are many. This is something that should be guiding the WGs work. That should be the primary goal of anything. Not the most elegant or correct API. Is it going to be implemented in as many places as possible. Do they have consensus? That should be the feedback that we give.

Sangwhan: I've given that feedback...

Hadley: reasonable

Sangwhan: we can say we don't regularly resolve disputes for WGs.

Hadley: I think we would jump in where there's an unresolveable issue that is architectural. That feels appropriate. The issue here is not architectural, we are happy whatever they choose, so we're not the final arbitrator.