#358: MediaStream Image Capture API PTZ (Pan/Tilt/Zoom feature)

Visit on Github.

Opened Mar 26, 2019

Góðan dag TAG!

I'm requesting a TAG review of:

Code Snippet:

if (!('pan' in capabilities)) {
  return Promise.reject('pan is not supported by ' + track.label);
}
inputPanRange.oninput = function(event) {
  track.applyConstraints({
    advanced: [{
      pan: event.target.value
    }]
  });
}

Explainer/Motivation: Many cameras have the ability to Pan and Tilt which is specially useful in the video conferencing(WebRTC), like steering the camera to face the speaker, etc. FWIW Chrome has been using a private extension(webcam_private.idl) to satisfy this use case.

Further details (optional):

You should also know that...

Image Capture API is already shipping in Chrome 59 on Android and desktop. Pan/Tilt feature is a small addition to the property set for capabilities, constraints and settings.

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our Github repo for each point of feedback
  • open a single issue in our Github repo for the entire review
  • leave review feedback as a comment in this issue and @-notify [github usernames]

Discussions

Comment by @kenchris Mar 26, 2019 (See Github)

It is pretty hard to dig out from the example and a long Github discussion.

A small explainer showing the new API and explaining the use-cases would be much more welcome. We are all busy people and having to manually dig through a discussion thread is a bit too much. An explainer could summarize all this. You could also write a summary in the issue and link to that instead.

Comment by @kenchris Mar 26, 2019 (See Github)

FWIW Chrome has been using a private extension(webcam_private.idl) to satisfy this use case.

  • Does this API differ from the private and presumable battle tested extension?
  • pan and tilt values goes from what? 1-100? 0-1?
  • Can all cameras tilt and pan as much? or is there a way to query their limits?
  • It feels a bit weird that a pan value is considered a constrain and not just a control value.
  • what was the conclusion of the "- Pan, Roll, Tilt vs Pan and Tilt" discussion
Comment by @riju Mar 26, 2019 (See Github)

Thanks @kenchris for the feedback. I have added code snippet and a brief explainer/motivation in the issue. Summary of the long Github discussion was mainly -

  • Pan, Roll, Tilt vs Pan and Tilt
  • Units to use, specifically - Arc seconds vs degrees.
Comment by @kenchris Mar 26, 2019 (See Github)

Units to use, specifically - Arc seconds vs degrees.

So which one are you using?

Also can you give a short summary on the pan/roll/tilt vs no roll

Comment by @riju Mar 26, 2019 (See Github)

FWIW Chrome has been using a private extension(webcam_private.idl) to satisfy this use case.

  • Does this API differ from the private and presumable battle tested extension?

Presumably battle tested API gave Pan Tilt Zoom (PTZ). Zoom is already available from the start of MediaStream-ImageCapture API. Pan and Tilt are the recent additions and hence this issue

  • pan and tilt values goes from what? 1-100? 0-1?

depends on camera

  • Can all cameras tilt and pan as much? or is there a way to query their limits?

You can query the range. Different camera have different ranges. For example electronic Pan/Tilt do not pan more than 10 degrees.

  • It feels a bit weird that a pan value is considered a constrain and not just a control value.
  • what was the conclusion of the "- Pan, Roll, Tilt vs Pan and Tilt" discussion

Roll isn't commonly available in consumer webcams. Even though Pan/Tilt may be niche, we felt that Roll is even more niche.

Comment by @kenchris Mar 26, 2019 (See Github)

depends on camera

Actually reading the PR I read this: 1/3600th of a degree. Values are in the range from –180x3600 arc seconds to +180x3600 arc seconds

Comment by @riju Mar 26, 2019 (See Github)

Units to use, specifically - Arc seconds vs degrees.

So which one are you using?

arc-seconds. I have summarized this in this comment

Also can you give a short summary on the pan/roll/tilt vs no roll

Comment by @yell0wd0g Mar 26, 2019 (See Github)

Image Capture provides a way to query the supported the MediaTrackCapabilities, by querying that on a video track you'll get for both pan and tilt their allowed ranges and steps, e.g for pan (same for tilt):

const trackCapabilities = imageCapturer.track.getCapabilities();
if (trackCapabilities.pan === undefined) {
  console.error('pan not supported, boo!');
} else {
  const maxPan = trackCapabilities.pan.max;
  const minPan = trackCapabilities.pan.min;
  const stepPan = trackCapabilities.pan.step;
}

All those numerical values will be in arc-seconds.

Discussed Apr 17, 2019 (See Github)

Torgo: Kenneth, the thing you brought up... The Tilt Feature? Want to explain this to us?

Sangwhan: This isn't high priority for us, except that Blink intent to ship is delayed for our trip. But they do intend to implement and ship at the same time which is unusual.

...This is an extra capabiltity for webcam control -- rotate it, point to an object you want to focus on. Thus far, it's been proprietary APIs in Chrome.

...This is the first time we've introduced something something that physically moves into the web access

Torgo: moves and sees

Sangwhan: it does require you to grant access to the camera though. Which I think is good.

...They use slightly strange units, for reason. Arc-seconds, which is what UBC uses. But on Windows, half APIs are degrees based and half are seconds-based.

...Do we have plans to add more things that rotate into the platform? And are they okay with arc-seconds?

David: CSS has multiple angle units (e.g., for transform, radial gradients) and none of them are arc-seconds.

Torgo: Where is this? a PR on media capture API?

Sangwhan: Yeah.

Torgo: but this has been merged in the actual draft version?

Sangwhan: I believe so

Torgo: I just want to make sure this isn't only in a googe doc somewhere that we don't have access to

Sangwhan: No, it's been added to media capture.

Torgo: okay. Did you get a response about units?

Sangwhan: I understand wh ythey're doing this. it's because of rounding error potentials if you're using larger units. I think it's fine, but I don't think anything else on the web platform uses it.

...But given that the other things that also rotate are servo motors, I thin kthis is a good compromise.

DAvid: This ought to be designed so the capabilites of hte hw can change. today, the cameras in the laptop basically point straight ahead. But in the future they may rotate... and people will want permission to rotate it.

Torgo: I just saw a device with a camera that has the same housing, with a camera that flips from looking at you and looking away. Could be a privacy issue, "I granted you permission to see me, but not the room that I'm in". but if the remote user can tell the camera to move around... normally that's under user control. That seems like it would go against user expectation.

sangwhan: re david's question re extensibility... the constraint is an object.

Re privacy concerns... yeah

David: More... could permission prompts be added without breaking it. In other words, if it has to be synchronous without a prompt it nthe middle...I'll comment in the issue

Torgo: i'll add my privacy concern there

sangwhan: applied constraints does the rotation. and it is asych and uses a promise. Could have a permission, but isn't in the spec now.

Torgo: feels to me like it should be possible for implementations to add a permission where it fits the use case

sangwhan: it could look at hte constraints being passed over and say "these are dangerous". But that's later; it's not possible now.

...Add me to it.

This is being implemented by Intel.

Torgo: I'll put you and Kenneth on it. Let's come back to it on the 24th

Peter: this is only part of hte image capture? So if you have a video stream and you're just trying to get control of tilt for an image capture to take a high res photo in the middle of a stream... what does it mean?

sangwhan: you'll have to set it twice.

Peter: so there's no way of moving the camera as part of a regular RTC stream without taking a photo?

sanghwan: you'd have to set up the RTC stream first

Peter: right, but can I move the camera without grabbing the image?

sanghwan it's on the video track, actually. that correcsponds to the device, and you add the constraint to the track.

(Assigning the rest of the new issues to TAG members

Comment by @dbaron Apr 17, 2019 (See Github)

So we had a brief discussion of the potential privacy issues in today's TAG meeting -- if hardware capabilities were to change substantially (e.g., laptop cameras that rotate a lot more than today), it feels like an implementation might want to add a separate permission prompt for the user to grant access to pan/tilt. So we'd like to make sure that the specification is designed in a way that would allow that to happen later on. It sounds (from a very brief look/discussion today) like the API is sufficiently asynchronous that that's the case, but it's worth thinking about a little more carefully.

Comment by @riju Apr 17, 2019 (See Github)

@dbaron / TAG : Just for clarification, does this mean a permission prompt first for getUserMedia() to access camera, and then another permission prompt to access the pan/tilt feature? Suppose a developer is making a PWA camera where she wants to use both pan/tilt feature and other MediaStream properties, does she ask user for 2 separate permission prompts?

Comment by @dbaron Apr 17, 2019 (See Github)

I think it's up to the user-agent / implementation to decide how to structure the prompts -- the key part is that the API should be designed in a way that allows for appropriate choices. This generally means it needs to involve promises rather than being synchronous so that a user agent might resolve the promise after prompting the user. It may also mean that the intent to use both the camera and pan/tilt should be present at the same time in case the user-agent wants to combine them into one prompt.

Of course, these demands may have tradeoffs with other desired characteristics of the API, so there might be reasons not to satisfy them. But they should be considered.

Comment by @plinss Apr 22, 2019 (See Github)

I presume that pan/tilt can be applied to any video track regardless of whether or not the imagecapture feature is being used? If so, it seems odd that it's defined here (along with most (all?) of the other capabilities/constraints defined)

Comment by @riju Apr 23, 2019 (See Github)

These new constrains apply to the live video feed, and that's covered in the Spec by making these an extension to the MediaStreamTrack: they are MediaTrackSettings, whereas others are only seen upon takePhoto() and they are PhotoSettings.

Discussed Apr 24, 2019 (See Github)

Kenneth: additional permission prompts?

Sangwhan: don't think that's a good user experience.

Peter: happy-ish with the feedback to my comment.

Sangwhan: from an implementation perspective - if you want to have it separately then - underlying native api had to hold on to same object...

Peter: David did they address the comment about the prompt?

David: I haven't dug into the spec to know what that means.

Dan: fold two prompts into one?

Sangwhan: how?

Peter: you get one prompt when you invoke getusermedia. Theoretically you could get one prompt that asks for video feed and pan/tilt... then habe to re-apply constraints.

Kenneth: you will see what they are looking at.

Dan: yes if there is self-view.

Hadley: it feels weird to me - violating expectations and confusing users.

Sangwhan: I haven't seen other examples. Personally I think it's something that could be patched in later.

Kenneth: i think it's ok to have a notifciation "remote user is trying to move your camera"

Kenneth: professional google hangouts does that, right. Some will use machine learning to focus on who is speaking, for example...

Sangwhan: there are restrictions from blocking other party from messing with your camera.

Dan: blah blah blah ethical web

Sangwhan: permissions in the web are pretty bad. If we had a non-broken permissions. I'm OK with this shipping then we can add a permission later.

Dan: i think we should be challenging them to come back with a solution that addresses these concerns..

Hadley: i second.

Dan: this

Hadley: i'm conscious of how often you joke about when you take video calls what is off camera. I worry about violating that at a platform level - that doesn't fit with the user being in control of their experience of the web.

Peter: i agree with that but there are valid uses for remtoe control. I can imagine a camera that flips back and forth between front and rear facing - but that'd be a different API.

Kenenth: on new phones you have "selfie lens"... like wider angle view.

Peter: i beliebe there's a zoom feature here already

Sangwhan: there is.

Dan: isn't it desirable that we should be asking the developer to ask multiple permissions?

David: the UA might be able to bundle or could split them out but only if the API is

Hadley: I'm comfortable leaving that to the user agent.

David: In order to leave the choice of whether to bundle the permissions to the user agent, the user agent would need all the information at the time it's asking the user for the first permission.

Sangwhan: that makes sense. The feedback we would liek to bring back is that we'd like it o be a secondary permission but so you can ask for that permission in a bundled way.

David: assuming there's a reasonable way to do it.

Peter: when you call getusermedia you pass in a set of constraints that you're asking for - you try to set pan/tilt constraints then you could bundle it in one prompt.

David: that's even the sort of thing that could be patched later but it might be good to get it up front

Comment by @kenchris Apr 24, 2019 (See Github)

Btw, the permission thing @dbaron mentions also applies to "zoom"

Comment by @cynthia Apr 30, 2019 (See Github)

@riju Sorry this took so long. It's the first time we mechanically (or logically, depending on the hardware implementation) allowed to "move" things in the physical world from the web, and firsts are always a bit scary.

We've discussed this in quite a bit of detail, and the group opinion (after a bit of back and forth) is that capture and control should be modeled in a way that it can be two distinct permissions; ideally requestable in a single call. Cases are where you would be fine showing one fixed (tidy) part of a room that is conference safe, and not allow access to the other side; this in native is covered by the conference software, but in the web we can't assume the software is trustworthy and will respect the user's preference.

The plumbing for implementations to be able to provide a way for the user to opt-out of this feature, while giving permission to the video stream seems like a valid use case, and we'd like to see this covered.

Discussed May 1, 2019 (See Github)

Pending external feedback

Comment by @riju May 6, 2019 (See Github)

Thanks TAG @kenchris, @cynthia, @dbaron for the feedback.

Discussed May 8, 2019 (See Github)

Peter: Marked pending feedback... We got a "thanks" - and Ken and Sangwhan aren't here. I'll bump

Discussed May 15, 2019 (See Github)

Nobody assigned is present.

Comment by @cynthia May 21, 2019 (See Github)

We're changing the process of how we manage the reviews; this one is considered stalled because we have checked the relevant chromium.org ML thread and didn't see a clear resolution. (Last reply seems like from a couple weeks ago)

We'll leave this open so you can let us know what the resolution on your side is.

Comment by @riju May 21, 2019 (See Github)

We have contacted the chromium UX/Privacy team for their feedback on TAG's recommendation. We will update this thread when we get a response.

Discussed Jun 19, 2019 (See Github)

Sangwhan: Waiting on feedback on this one.

Dan: It says review complete...

Sangwhan: we reviewed everything they brought us...

... milestone empty for now, I'll add it to an agenda when they get back to us.

Comment by @cynthia Sep 10, 2019 (See Github)

@riju haven't heard back from you folks in a while, has anything happened in the meanwhile or were our concerns deemed irrelevant? Let us know if you have anything to share.

Comment by @riju Sep 10, 2019 (See Github)

@cynthia: we shipped the Pan/Tilt feature behind a flag. The other stakeholders like Chrome's UX and privacy teams are evaluating. We also need to evaluate how many users do have pan/tilt cameras. I can give TAG folks a demo in Tokyo/Fukuoka :)

Comment by @torgo Sep 11, 2019 (See Github)

Hi @riju we will be discussing this in a breakout at 15:30 tomorrow at our Tokyo f2f (most likely in the cafeteria on 43).

Comment by @kenchris Mar 4, 2020 (See Github)

@riju you promised us feedback from stakeholders but we haven't heard anything back and it has been a while.

Comment by @kenchris Mar 4, 2020 (See Github)

I don't like the name pan and tilt, it is really camera motion controls - maybe we could call it something like that instead

Comment by @cynthia Mar 4, 2020 (See Github)

The terminology might be weird, but it's somewhat well-known. e.g. https://en.wikipedia.org/wiki/Pan%E2%80%93tilt%E2%80%93zoom_camera

Comment by @kenchris Mar 4, 2020 (See Github)

PTZ seems to be the name that people use, so ptz: true might make more sense than Pan-Tilt-Zoom: true as I saw in one of the PRs

Discussed Mar 16, 2020 (See Github)

Alice: No explainer - but came in before that requirement

Ken: Pretty sure nothing has happened since (last comment). This is a powerful feature that can be abused.. It would be nice if they wrote an explainer... Security & privacy questionnaire responses also not correct... it says "no" to everything...

Dan: on s&p, 3.2 (data) and you can use this to gain access to data that the user has not intended you to see... 3.4 (cross-origin-state) they have said no, but can you read the angle?

Ken: looks like you can read the angle..

Comment by @kenchris Mar 17, 2020 (See Github)

@riju would it be possible to write an explainer which is a requirement today. Also @torgo has some privacy concerns, which conflicts with all the NO answers in the questionnaire (high value data indirectly).

Comment by @kenchris Mar 17, 2020 (See Github)

Can you read (or deduct) the angle using this API? Things like that would be very valuable information in an explainer

Comment by @torgo Mar 17, 2020 (See Github)

Looking at this again. The security & privacy questionnaire answers seem questionable. In particular, (3.2) the technology may allow access to data that the user had not intended to show and (3.4) the tech may allow access to persistent state from our read of the spec. Can you please update the questionnaire and let us know what the mitigation strategy is for these aspects? @riju

Comment by @riju Mar 17, 2020 (See Github)

@torgo @kenchris : Thanks for your patience, we worked with UX/Privacy folks (from Chrome team) and looks like we need to make some changes to the API in terms of letting users know that the camera supports "pan/Tilt/zoom". For example, the model we are experimenting now is -

Camera permission without PTZ (obtained using an older version of the user agent, or with another camera) is not implicitly upgraded to Camera permission with PTZ (even if the hardware supports it). The permission has to be re-requested.

We will come up with a solution in a few weeks and then ping TAG.

Discussed Apr 6, 2020 (See Github)

No explainer still; milestone changed to next wee

Comment by @kenchris Apr 7, 2020 (See Github)

This was on our breakout schedule but as there is still no explainer, this is stalled for now

Discussed Apr 13, 2020 (See Github)

Ken: they haven't done the explainer yet. I pinged them again.

Sangwhan: if they don't come back by our next milestone, suggest we close

Comment by @kenchris Apr 14, 2020 (See Github)

Friendly ping @riju

Comment by @kenchris Apr 14, 2020 (See Github)

The TAG has decided that if we don't hear back before our next meeting we will close this with [resolution: unsatisfied] to get it off our radar and concentrate on more active tasks.

Comment by @beaufortfrancois Apr 14, 2020 (See Github)

@kenchris I will be drafting the owed explainer in the following days.

Comment by @beaufortfrancois Apr 14, 2020 (See Github)

May I ask you kindly to keep this issue opened as I'm collecting info? The current COVID-19 situation doesn't help. Thank you for your understanding.

Comment by @kenchris Apr 14, 2020 (See Github)

Sure, sounds good - good to know you are working on this. @torgo let's postpone looking at this for a couple of weeks

Comment by @beaufortfrancois Apr 20, 2020 (See Github)

The PTZ explainer is now available at https://github.com/w3c/mediacapture-image/blob/master/ptz-explainer.md

Discussed Apr 27, 2020 (See Github)

Ken: i have progressed this.. We have an explainer now... it's pretty OK. You will get a dialog... up to implementation whether nor not if you deny permission the app will still get regular camera access.

... you will get video permission... then you (app dev) will have to check if you have access to pan/tilt...

Dan: that's to mitigate the fingerprinting...?

Ken: yes but also not to force users to accept a powerful permission. It's a specific other set of permissions that you will be shown in the permission prompt. Needs to work if you do or don't have the camera plugged in... Seems to be fulfilled.

... other thing... when a web site is not in focus (when you are not looking at the screen) then these capabilities will be disabled. Not yet in the explainer but they said they would consider it.

Dan: how much of what you just said is in the spec vs the implementation?

Ken: Not sure. They [chromium] are doing the implementation now.

Dan: [left a comment and bumped to next call]

Comment by @torgo Apr 28, 2020 (See Github)

Hi folks - we had a good discussion on today's TAG breakout with @kenchris and I just had a few followup questions.

  1. it was good to hear that there are mitigations against privacy issues being discussed, in particular a distinct permission prompt for pan/tilt/zoom and the idea that the feature will be disabled when the tab is not in focus.

  2. the explainer includes some of this info but not everything - can you please make it more clear?

  3. the explainer lacks a privacy & security considerations section (and lacks the word "privacy") and really needs to have this info explicitly called out for such a powerful API

  4. what aspects of any of the above are intended for the spec as opposed to the implementation?

Discussed May 11, 2020 (See Github)

Dan: [pings the issue to see if there have been any progress...]

Comment by @torgo May 12, 2020 (See Github)

Hi @riju - we are just following up on this one on our TAG breakout call today. Has there been any update / progress on the questions listed above?

Comment by @riju May 12, 2020 (See Github)

Hi @torgo, not much. We just landed the platform support for Windows and CrOS. Linux was working fine. We are still discussing with the Privacy folks about the details and then we can update this audience.

Comment by @torgo May 27, 2020 (See Github)

Hi @riju – As far as the API design design goes, we are OK. However, we're going to mark this as "unsatisfied" because it really doesn't look like the security & privacy issues we've raised are being taken seriously. I would really encourage you to re-read our security & privacy self check and to add some explicit info to the explainer and to the spec covering abuse scenarios and mitigations against these scenarios.

Comment by @riju May 27, 2020 (See Github)

Thanks @torgo for the thumps up on the API design. We will take another look at the explainer soon.

Comment by @beaufortfrancois Jul 3, 2020 (See Github)

@torgo We've finally added Security and Fingerprinting sections to the explainer.

Comment by @riju Sep 28, 2020 (See Github)

@torgo : We have had some long discussions regarding this API in the WebRTC group and looks like now there's overall consensus among the stakeholders. Hopefully TAG is now satisfied with Privacy and Security information we have added in the PTZ explainer .