#652: WebXR Raw Camera Access API

Visit on Github.

Opened Jun 24, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of WebXR Raw Camera Access API.

This specification introduces new WebXR Device API capability, namely Raw Camera Access API. The newly introduced API enables WebXR-powered applications to access camera image pixels, allowing them to leverage this new information to compute custom per-frame visual effects, or take a snapshot of the app-rendered content overlaid with the camera image.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: N/A.
  • The group where the work on this specification is currently being done: Immersive Web CG
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Immersive Web WG
  • Major unresolved issues with or opposition to this specification: No major issues yet.
  • This work is being funded by: N/A.

You should also know that...

N/A.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Discussions

Discussed Jun 28, 2021 (See Github)

[triaged]

Dan: Left comment

Comment by @torgo Jun 29, 2021 (See Github)

Thanks for sending this our way and thanks for documenting the user needs and filling out the security & privacy questionnaire. One question related to privacy - you note in the explainer that the feature does not currently exist due to privacy. However, when it comes to permissions you state:

If UA decides it needs to prompt the user for permission to use the camera, it can do so at this stage.

It feels to me like this needs to be stronger. Can there be a normative requirement to seek user permission or at least stronger language to that effect? It's good that you call this out in the spec https://immersive-web.github.io/raw-camera-access/#privacy-security but likewise it feels like this should be a stronger requirement. Also it feels like there should be more in this section that enumerates abuse scenarios and how the API proposes to mitigate against these abuses.

Comment by @bialpio Jun 29, 2021 (See Github)

Thanks for the quick response! I've landed on this phrasing since I'm viewing the explainers as a teaser to the devs that presents the API and what some of the steps may imply (i.e. it's important for the devs to know that there likely will be a permission prompt when using this feature). For the detailed descriptions, the spec should be the authority, which leads me to...

In the spec draft itself, I have tried to make it explicit that the UA must seek user consent. Unfortunately, I do not see a good way to include it explicitly in some algorithm, since session creation & seeking consent is handled by the WebXR Device API - the module just extends the core spec to add one more feature descriptor.

In the explainer I'll reword this to be a bit stronger that still is more developer-focused, maybe:

As per WebXR Device API, the user agent will seek either explicit or implicit consent before creating a session. This may mean that a permissions prompt will be displayed to the user.

As for the spec, I'll add a note that elaborates on this a bit more to refer implementers to the sections in the core spec that they should be aware of.

Discussed Jul 12, 2021 (See Github)

Dan: looks like the requestor has enagged with our feedback and added materail to s&p considerations section... I'd be happy to propose close given it's not a full spec but an addition. Should get Ken's feedback. Plenary..

Hadley: can of worms.... thinking about the fact that raw camera access is being able to access the images that the camera can see right? So in COOP and COEP which we were just looking at the webdev blog post was saying one of the main drivers was the spectre vulnerability, which through timing attacks makes it possible for a window to work out what data is in the image coming from another resource. I don't know how much that screws up what we're trying to do with web xr....

Dan: oh no

Hadley: on that note, plenar

Discussed Jul 19, 2021 (See Github)

Dan: I will pink Mike about this one - since we had a related security question on access to camera as an exploit...

Discussed Aug 23, 2021 (See Github)

Dan: there's a PR which addresses some of our feedback

Ken: do we already have camera access for web xr? What's different that makes it raw?

Dan: this provides more direct access to the camera pixels

Ken: texture, camera image.. are there different similar apis? They get an XR camera width and height, and getCameraImage which just gives you texture

Dan: you don't get that..

Ken: not much of a raw thing, more of a texture thing. Wondering whether people who work on media capture should look at this, whether they're doing liason with these people. An alternative camera api.. there's no configuration of the camera here. Cameras today we can do pan/tilt/zoom. Why is that not useful in this situation? Especially turning my head, maybe I want to do panning. What about configuring the camera? I don't want HD I want SD... doesn't seem like I can do any of these things with this API. Seems like a new API to do a specific use case, and it's an alternative, so all the existing API features are just gone. Half a year from now we'll get a new request to add new features instead of starting out with that. This is a paraellel API for a specific reason..

Dan: the current method of getting access to the camera, the website itself does not get access to the image of the user's environment. You get abstract concepts like being able to know where surfaces are, rather than actually seeing what the desk looks like. That works for many AR scenarios. But it doesn't work if you want to layer .. if you want to actually process the image, to provide a filtered version of it

Ken: i get it but it's still a different api that lacks a lot of features that i expect people would want

Dan: at the moment it's only AR concepts

Ken: only want SD stream, how do I select my camera? That's a basic thing. Just require camera access and you get this camera thing, but which camera is it? Is it HD, is it 4k? Seems to be a lot of things missing here. [leaves comment]

Dan: I'm still concerned about privacy. There's no additional scary privacy notice for raw camera access, so why wouldn't every AR request raw camera access, which destroys the mitigations of regular camera access. [leaves comment]

Comment by @kenchris Aug 24, 2021 (See Github)

I am a bit worried of us making parallel camera APIs expecially as this seems to be lacking a lot of features that I believe people would want at some later point.

  • How do I select which camera to use (external vs internal etc)
  • How to select quality, SD, HD, 4K?
  • How to do pan/tilt/zoom etc?
Comment by @torgo Aug 24, 2021 (See Github)

Hi - just coming back to this again. The PR you've referenced above is appreciated. I'm still concerned that with the presence of this API developers will always choose to request raw camera access even if they don't need it, thereby rendering the privacy-preserving aspects of regular camera access meaningless... Can the explainer or the spec go into more detail on why this won't be the case?

Comment by @bialpio Aug 26, 2021 (See Github)

I am a bit worried of us making parallel camera APIs expecially as this seems to be lacking a lot of features that I believe people would want at some later point.

  • How do I select which camera to use (external vs internal etc)
  • How to select quality, SD, HD, 4K?
  • How to do pan/tilt/zoom etc?

The goal here was to provide something that is tied to the camera that is currently being used by the XR system to provide the AR experience (for example, AR implementation in Chrome currently composes the camera feed with site's WebGL-rendered content, w/o offering any of those configuration knobs, and also w/o exposing the pixels to the site). I imagine that some of those settings would have to be exposed at XRSession creation (quality & camera selection, pan/tilt/zoom would not be available since the current spec text requires that camera image is aligned with XRView), but it may be something that remains fixed for the entire duration of the session (depending on whether it is allowed the underlying AR framework that the implementations use). There was a long discussion, whose outcome was to pursue a simpler, XRView-aligned camera access API in order to cater for smartphone-specific use cases, while leaving the door open for a more general solution which would integrate with getUserMedia() APIs.

I'm still concerned that with the presence of this API developers will always choose to request raw camera access even if they don't need it, thereby rendering the privacy-preserving aspects of regular camera access meaningless... Can the explainer or the spec go into more detail on why this won't be the case?

I do not think it is possible for us to guarantee that this won't ever be the case, we can only try to incentivize the developers to ask for what they need. We do not attempt to specify the UX in the normative text, but I'd like the UAs to make the distinction between "lower-privilege session" (w/o granting the app access to camera pixels, but exposing the information that ultimately is derived from camera pixels and IMU sensors) and "higher-privilege session" (the camera pixels are accessible to the app) understandable by the user when displaying permission prompts - if the distinction is clear, the raw-camera-access sessions should hopefully be more frequently rejected by the users. I'll add more text to the explainer.

With all that said, the privacy efforts made by WebXR can all be sidestepped if the app asks for camera via gUM() & the user allows it. :frowning_face:

Discussed Sep 1, 2021 (See Github)

Dan: (summarizes his conversation with Ada)

Ken: the webrtc api - getusermedia - has all the confuiguration stuff that webxr raw camera access is missing.

Rossen: Right, is that a superset of the rawimage?

Dan: they can get access to the raw image but also need to request acess to room geometry and other sensors data. So the bottom line is that this is achievable today without these API and the question is, should we enable these on WebXR and somewhat compromise privacy or not?

Left comment

Comment by @Sauski Sep 3, 2021 (See Github)

We do not attempt to specify the UX in the normative text

I think there's a great benefit to including non-normative descriptions / examples of UX that meet the expections in the privacy consideratons section of specs. Especially when the permissions model / user understanding is so integral to the privacy posture of the spec.

The WebNFC spec is a good example of this kind of approach.

Comment by @torgo Sep 8, 2021 (See Github)

Some additional thoughts. It feels like APIs like the upcoming marker detection API will be important for doing things in a privacy preserving fashion. While I understand that raw camera access will allow developers to do lots of things there need to be appropriate drawbacks and warnings because once you spin up this access you're really giving the web site everything.

I'm thinking especially of the cases where an application is designed to perform a certain function (e.g. you're at a restaurant and you scan a QR code to bring you to a web app that allows you to draw funny ears or hats on people's faces) that requires facial recognition and therefore raw camera, but then uses that same facial recognition data for a secondary use contrary to the user's expectations (e.g. correlating that info with other facial recognition info to build up a list of people who were with you at the table for sale to 3rd parties).

Beyond permissions, it really feels to me like there needs to be some additional drawbacks for use of this API that would encourage developers to use the privacy-preserving WebXR AR APIs instead unless the really need Raw Camera access. Have you considered this approach in the working group?

Comment by @bialpio Sep 8, 2021 (See Github)

I'm thinking especially of the cases where an application is designed to perform a certain function (e.g. you're at a restaurant and you scan a QR code to bring you to a web app that allows you to draw funny ears or hats on people's faces) that requires facial recognition and therefore raw camera, but then uses that same facial recognition data for a secondary use contrary to the user's expectations (e.g. correlating that info with other facial recognition info to build up a list of people who were with you at the table for sale to 3rd parties).

Yes, that is unfortunately correct. One thing to note is that the scenario you are describing is already possible on the web, without WebXR's Raw Camera Access API in the picture at all, so by introducing it, I don't believe we're weakening the platform.

Beyond permissions, it really feels to me like there needs to be some additional drawbacks for use of this API that would encourage developers to use the privacy-preserving WebXR AR APIs instead unless the really need Raw Camera access. Have you considered this approach in the working group?

I don't believe we had such discussions within the working group. Do you have specific examples of how could that work? I worry that introducing artificial drawbacks could cause the feature to be unusable, so we need to make sure we strike the right balance here.

Currently, the main limitation of the API that I'd argue falls into the "drawbacks" category is a requirement for the camera texture to align with an XRView - as a consequence, the camera texture has a more narrow field of view compared to the image that the site could get using getUserMedia() APIs, & users have clear visual feedback on what exactly is shared with the site because the same texture is displayed to them (this could be suppressed by a malicious app rendering opaque object across the entire viewport though).

Comment by @Sauski Sep 13, 2021 (See Github)

Beyond permissions, it really feels to me like there needs to be some additional drawbacks for use of this API that would encourage developers to use the privacy-preserving WebXR AR APIs instead unless the really need Raw Camera access. Have you considered this approach in the working group?

It's already challenging to correctly communicate the implications of the "privacy-preserving" WebXR AR APIs to users. Trying to communicate an intermediate level of protection, such that a user is meaninfully more likely to accept the WebXR raw camera permission over the regular camera permission, seems almost impossible.

Given that, I'm on the side of positioning this more as equivalent to the regular camera permission, and leveraging the strong existing user mental model around what granting camera access means. This seems better than the alternative of creating a new type of very-almost-camera that doesn't have the existing mental model to lean on.

Perhaps it's worth highlighting the fact that the site could have just asked for regular WebXR access, but is explictly asking for camera, so users don't build the model that raw camera access is "just part of doing WebXR".

Comment by @torgo Sep 15, 2021 (See Github)

Discussing again in our virtual f2f with @kenchris and @atanassov. @kenchris also pointed out that it's not great having 2 different ways to get access to the camera with totally different areas.

On the privacy topic, some other factors:

People who are bystanders don't have a way to opt out of being part of this scenario. The fact that you can be logged into a service and through the use of this API expose other people to privacy threats is problematic.

If people allow access to camera image currently - it's because they want to do a specific thing - usually video call or take a picture or scan a QR code. In AR you are using the camera more freely around you and for a longer period of time thereby exposing more information. It feels more privacy infringing and therefore worthy of greater protection than purely camera access. Also one benetit or WebAR is that anyone can pick up their phone and start using it without additional software download. That lower barrier to entry also calls for a stronger system of privacy protection. It feels like there needs to be some additional mitigation designed into the API - not part of the permissions request but intrinsic to how the API works - tha makes it more privacy-protecting than a similar approach on native platforms would be. Maybe that means fuzzing - maybe it means turning off access to something else. There must be privacy diffentiators for WebXR that alings with the ethical approach of the web - even if that means it's less powerful than a native equivelant.

Comment by @bialpio Sep 16, 2021 (See Github)

People who are bystanders don't have a way to opt out of being part of this scenario. The fact that you can be logged into a service and through the use of this API expose other people to privacy threats is problematic.

What do you mean here by "logged into a service"?

I think I'm missing something - is there a threat model document that I could read up on? All of the concerns that you are listing above are already possible via getUserMedia() if we assume a malicious website (camera web app / QR code scanner web app don't have to stop accessing the camera after it took a picture / scanned the QR code, and the API is accessible w/o additional software download). After a bad actor convinces the user to grant the permission for camera, the user and the bystanders are already potentially compromised - that's not something specific to WebXR's raw camera access. In a way, WebXR offers slightly more protection that is built into the behavior of the API (on smartphones, the camera is by default displayed to the user as well, and the field of view is more limited compared to what the camera could actually capture).

In the end, any API that allows the sites to access camera pixels can be used maliciously, and I'm worried that crippling this particular API will not increase users' security. On the contrary, if it turns out WebXR doesn't have an answer ready for the use cases that require access to pixels, app developers could fall back to relying on gUM() + SLAM algorithms to enable AR scenarios (we know of one existing product that does it now and is blocked to switch to WebXR due to the lack of a camera access API), in which case the UA is entirely out of the loop (while also potentially sacrificing battery life and the quality of experience, if not done right). The only outcome is that the barrier of entry to offer full AR experience while silently capturing the camera feed may be higher than with WebXR’s API, but there can always be a site that advertises itself as awesome-ar-experience.example.com, asks for camera permissions, fakes an attempt to enter AR, & immediately shows an "oops, something went wrong" message, but keeps recording the camera feed (or even falls back to WebXR for AR experience w/o raw camera access, but will start capturing the camera feed as soon as the user leaves the session).

Discussed Sep 20, 2021 (See Github)

Dan: I left a comment but no response yet.

Discussed Sep 20, 2021 (See Github)

Dan: I'm in the middle of trying to write a response to Piotr's response to my question about making it privacy friendly. Prompting is necessary but not sufficient. Needs to be some intrinsic part of the API that makes it more privacy friendly. Maybe again that's advanced fuzzing, I don't know. So I'll write a response. Discuss at plenary.

Comment by @torgo Sep 21, 2021 (See Github)

Hi @bialpio - very sympathetic to your point that developers will just use getUserMedia if this API doesn't provide the functionality they need. I think we want to make sure that developers make use of this API (and the WebXR stack in general) when they are doing AR or VR on the web. AR on the web was not possible until long after getUserMedia was written, so perhaps there needs to be a bigger discussion about permissions and prompting and risks of camera access in general - but that shouldn't block this work.

A privacy principles document including some threat model info is in development right now. We expect to have a first public working draft available soon. In the mean time, you can take a look at some of the relevant design principles such as it should be safe to visit a web page and ask users for meaningful consent when appropriate. To be clear: I am concerned about the threat model of web apps hat collect more data than they should and use it for purposes other than what the person using the ap expects. This concern is significantly amplified when the web application has access to the device's live video feed for a significant period of time. The thrust of the Ethical Web Principles (which was written to inform our other documents such as the design principles) is that the web must be a more ethical environment for people than other platforms (such as native apps) and it's at the design stage that we can make decisions that steer the web in that direction.

What do you think of @Sauski's suggestion that the prompting should highlight to the user that the app has requested special permission over and above regular WebXR AR? Again to be clear I don't think we can solve this issue only by adjusting prompts because of the problematic nature of prompts but it can be one mitigation.

Comment by @bialpio Sep 27, 2021 (See Github)

What do you think of @Sauski's suggestion that the prompting should highlight to the user that the app has requested special permission over and above regular WebXR AR?

I think we're on the same page here, I definitely agree we need to ensure the implementations give sufficient information to the users regarding potential implications of the choices they are presented with and I'll look at WebNFC for inspiration on how to best include this in the spec. I'm mostly wary of including normative text that would mandate the UAs to do one thing or the other, independent of the circumstances. One thing that comes into play here is that WebXR allows the implementations to infer user intent / consent.

To shed some light on the current, behind-the-flag implementation in Chrome: we display a different prompt based on the set of features that an app is requesting to be enabled in a given session (if raw camera access is requested, we will use a prompt that is distinct from what is displayed to the user when an app requires access to the less privacy-invasive features). This is still something that we want to iterate on with our privacy and UX teams, which is one more reason that makes me want to avoid mandating concrete UX in the spec.

Again to be clear I don't think we can solve this issue only by adjusting prompts because of the problematic nature of prompts but it can be one mitigation.

I'd like for the mitigations to stay on the UX side of things, but that does not necessarily mean that they will be limited to consent prompts (one other example: displaying some visual indicator for the entire duration of a session in which the camera pixels are accessible to the app). Unfortunately, I do not think it would be possible to introduce the limitations in the API shape itself without causing the API to fail to meet its purpose, but I admit the only thing that currently comes to my mind is to throttle how often the site would be able to get the camera texture.

In general, do you think that the problem with the current spec lies in the API shape itself (implying that we should change something in it, including at the Web IDL level), or do you expect that we should be able to address the concerns by ensuring that the users can make an informed decision and keep being informed about the camera being in use (implying that the solution can stay at the UX level)?

Discussed Oct 25, 2021 (See Github)

[recap of joint session]

Amy: should raw camera access make a loud noise to alert everyone around you that you're using it?

Hadley: like camera shutter sound being mandated on phones in certain countries

Dan: what does Dom think of monday's session?

Dom: very complicated.. lots of useful things to explore and lessons to learn. Bringing more friction to devs or users so they gone way or the other.. the amount of fuzzing and preprocessing that a browser should do for privacy protections that are potentially very performance impacting, fuzzing realtime video streams would take a lot of resources. It's a balance between performance and privacy for the end user. Some of the work in terms of annotating video frames might have useful impact on this but it's so fuzzy at this stage it's hard to say it's a workable path. I agree it's an interesting issues and one that is worth further digging into, there are for sure lessons and patterns to be extracted. I didn't feel like we were yet in a position to identify these patterns.

Dan: anything else regarding prompts / permission requests we're missing? Is there a nuance about permission requests we should be thinking about more?

Dom: one aspect, not a clear picture, part of what has made the landscape evolve is that we are now exposing more compute intensive apis to browsers with webassembly, webgpu, soon webml, thsi changes the risk of what happens with data that can be collected. Handwavey discussions about the fact that these apis themselves should be gated one way or another. Don't know if it's a permission prompt, hard to understand, but making clear that they bring greater power so greater responsibility, so need greater gating. That was compunded by the fact that there was some research a couple of years ago showing that web assembly was mostly used for bitcoin mining. Not that i have a clear proposal in mind but there is a risk that we've been exposing end users to without them necessarily realising or understanding it, and whether there is things we could do to eithe rreduce that risk or augment the exposure of the risk to those who suffer from it.

Dan: the idea of privacy budget has been floated.. part of the privacy sandbox thing.. but a general .. if you allow access to some very privacy infringing thing then you should be shutting down access to other things. That's a thing I've been trying to prompt webxr raw camera access folks in the issue but I don't feel like it's getting through. Should we be canonicalising that notion somehow better? Is that a ua specific thing?

Dom: my sense is that, gropus have their own ideas of constraints, the problem is that sometimes when you put all those things together they don't fit the way you want them to fit. In the privacy picture you might be designing great privacy mitigations in the very specific context of your api, but if you bring other apis together, maybe they mitigations don't work, or even if they all work tgoether they still leave huge gaps open because they were not designed as a whole. I feel like the TAG could come up with ways of exposing these holes that emerge from combining things together and proposing migitations that are not per spec or per domain. The privacy budget is one way. In general looking at how taking several specs together create..

Dan: unintended consequences

Dom: yes. Something like a budget sounds like a good way of managing this diverse set.. like how you manage a budget of several departments. Not sure which proposal is the right one. But the TAG looking ath the privacy budget of the platform is intersting

Hadley: been bouncing around in my head too. The idea of a privacy budget .. it's hard to think of it as a limited budget when you think about how much time and different ways we all interact with the web over such a long period of time. If I don't go outside that often I still go out lets' say once a day and over my lifetime that's a lot of exposure. It's hard to empower the user.

Dan: not just the session, over all use

Hadley: not just in the silo of a specific feature, but in an artificial asusmption of constrained time..

Dom: feels to me like the TAG driving a conversation on this with UX and privacy people of various vendors

Dan: we do have privacy taskforce, we could push this into privacy principles. With webxr raw camera access, the question is if you have ideas on specific.. we haven't got past the handwavey thing of what other things could be turned off if this api is turned on. Is that a real thing? Are there other things that would dissuade without crippling developers from always requesting this api when they want to do AR

Dom: one reason developers wouldn't do it is if they get better perf out of the browser based version. They get a better service to their end users. In general doing live video processing in js or webassembly is always going to be worse in terms of perf. If the browser does the work and just gives you events or frames you need to do your..

Dan: or special marker detection api

Dom: that's already one aspect of the friction. The question becomes what are the use cases where what the browser provides is too limited but still valuable enough that developers and end users are willing to pay the performance and privacy cost. We'd need more information from who is doing game based ar and why they wouldn't transition.

Discussed Oct 25, 2021 (See Github)

[intros]

Dan: the overarching question with the raw camera access api is, looking at the existing AR module of the webxr api set, it feels to me and I have come to understand that it's built in a way intended to preserve user privacy. Specifically so the web app does not get access to the camear image - that it gets access to room geometry. Yet there is still a warning because you're allowing access to a sensor so we want to be able to provide the end user with some kind of permission prompt, however the stumbling block of this review was that we were looking at this and saying now there's another api that is similar to the ar api but also provides direct access to the camera view and that also has a permission prompt associated with it. So one of thet hings I was trying to bring across in my review comment was why would I s a developer choose to use the less potetn version of the ar api if I could get a version that grants access to the camera image, so therefore when ar is deployed in the wild, eg. I'm sitting at a restaurant and I scan a qr code and I'm jumped into a fun web ar experience, why wouldn't the developer just choose to use the raw camera accss version of that and potentially open themselves up to additional privacy threats that they would not have had if the weba pp was using the non raw camera access of the ar spec. That's the discssion we had and we got into a conversation on the issue about additional mitigations that could steer developers towards the use ofo the standard augmented realtiy api unlesst hey absolutely needed ther aw camera access power. The thinking is eve if you provide a more scary warning if you're a regular web user and youre sitting down and you want to put the funny hats on your friends you're not going to pay attention to the prompt - it's necessary but not sufficient in order to protect the user from the privacy threats and end users ight not have the mental model to understand what those privacy threats are, like that this means that everyone is around me in cluding strangers oculd have their faces recognised using some web assembly code and I've given access to that and there's no additional privacy informatin available, but because I'm already authetnicated to facebook (for instance) that information can be correlated with other information and all of this creates additional privacy risks for the end user. Maybe we can come up with some ways to think about this that can move us in the right direction.

Rossen: let's go from here

Piotr: what I asked in the TAG review is do we think the changes we are thinking about to increase privacy of this api, would those mean we need to tweak the api shape itself or do we think we can keep all the mitigations in the less normative but still important privacy sections? What I'm trying to get out is do you think the api shape has good standing, can we defend it the way it is, or do we need to redo some of the work to introduce some privacy perserving mechanisms? I'm worried about how do we introduce them without crippling the api so it becomes unusable. If we are okay with the api shape itself and it's a matter of making sure the browsers make correct ui choices that becomes trickier because we usually stay away from having normative text around browser UI, but I do agree there has to be something that incentivises the users at least to pay attention to what we say to them. That's the only mechanism we have to tell them about potential issues that they might encounter by accepting the permission prompt. How do we communicate in a way that is clear to the user and says everything we want to say, but at the same time I know even webxr itself already has a lot of text around inferring intent, eg. text that says if an experience is a pwa that serves as inferring that the user intent is present because the user installed the app and we want them to feel the same way as native apps where you would get the prompt when you install the app from the store and maybe when you launch for the first time.

Dan: I think this is exactly the issue. We really want people to use this api rather than the laternative which is simply hacking it on top of getusermeda. But Iw ould challenge you to think about are there normative things that you could do that would - not cripple the api - that would .. eg. in other cases with powerful apis with privacy infringing information the people working on those apis decided to introduce fuzzing on values. If you're worried about specific threats like facial recognition is there something we could do that would mitigate against that. Facial recognition may be one of the use cases and it's very useful, however are there other kinds of fuzzing that could be enabled?

Dom: in terms of normative reqs on UI the getUserMedia spec has normative requirements on privacy indicators for usage of cameras and ability of launch camera. In a way getUserMedia is a bad guide here, it provides access, once you have it you have access to everything, but it comes with fairly highly normative degree of guideance on what needs to be displayed to the user. You asked is this a conersation in the webrtc wg - it has been quite a bit of a conversation to get to this level of specificity on UI. Anther evolution on getUserMedia in the context of realtime comms is we are now building the bricks for e2e encryption which will ultimately mean you can have something like what you are doing today without webar getting access to the camera feed with the browser reamining in control. Doesn't work if you want to do processing of the media. tHe conversations exist with regard to what you can achieve when you want to do media processing

Klaus: one alternative here is the getUserMedia based apps which bypass xr. Users have an all or nothing choice, either they grant camera access or nexperience at all. With webxr we want to support this as an optional feature so the site can say if you grant rawcamera access but without it the experience will still work but with some features unavailable. Applications can see if it's a required feature.. the goal would be to support this middle ground so people have an actual choice. They can say 'do webxr yes/no' and if they say yes they get another choice with raw camera or not. And we want to uspport apps making the rquest later whne they need the feature instead of upfront but that's still under active discussion. We want to give people the choice which is not a choice they have if the application doesn't use XR.

Piotr: with different permission prompts - if we assume people dn't read them there is not much we can do about it. And also to touch upon what Klaus mentioned this api is still not .. we still haven't gone through eg. UX review. I'm not the only stakeholder. It's hard for me to say what Chrome will say I'm allowed to do or not. It's also hard for me to mandate what Chrome UX team tells us to do in the spec. There are conversations that need to happen but I would be really cautious abut saying exactly how things need to look or behave in terms of UX as that also differentiates browsers. I looked at the indicators for getUserMedia, that's something I want to explore. How we can bring into Chrome. As I've said I'm not the only one making those calls. Additionally there might be some tweaks in the implementation, but those should be fine. So we'll see how things go, I'd like to be really careful about mandating UX.

Rossen: Generally speaking your question was is the api on a technical level okay or are we worried about privacy and what does that mean for the user? My view is that the shape of the api is more or less fine. I don't see a reason why if I was a user I wouldn't use that api comfortably. The elephant here - privacy is the problem. The conversation can go two ways. The basica feedback is when you work with audio/visual sensors, especially if the bytes can pipe back to who knows what, we're not talking about user privacy alone. We're talking about the privacy of everyone around them. As soon as I turn my camear n and I want to demonstrate on someone how funny a particular object will be on top of a table in a restaurant, I'm infringing on the privacy of everyone around me who is part of those frames. I don't think that we have UX solutions today that are resolving this. Not on native let alone on web. This is the biggest hurdle I have had with this particular review. Are there other APIs that are today just as faulty? Yes. Should we make it that much easier? Let's not. If there's a technical grind that people have to go through in order to get something very similar to what you're proposing to allow, for the time being, if this is what it takes to preserve some public privacy as much as possible then perhaps this is how we go. I'm trying to approach yoru question in the most straightforward way. In the past we've discussed mapping, object identification, this is already a step to mapping someone's building inside without them knowing, and perhaps this is leaking someone else's information without their consent. There's no way today that we're approaching public consent in the form of this capability. That's where I'm at with this.

Amy: +1 what Rossen said

Chris: The intent is definitely not to make it easier to expose that information. That is a non-goal. Everyone understands that when you turn on your native app camera on your cellphone that is potentially an invasion of privacy of everyone around you. Any AR real world geometry API is already doing some additioanl disclosure. We want to make sure this is underscored. this is raw camera access, you are giving potential access to everything, this isn't any easier than giving access to all the camear. Shouldn't be made any easier than accessing the camera becuase it's exactly the same thing. we're not trying to skate past that point, we're trying to make it clear that this is an extremely powerful api and the main reason behind doing this in my opinion is basically to figure out what things we can make easier that are safer, but that's going to be a long journey to figure out.

Piotr: to make sure I understand - let's say that we take the privacy section out of getUserMedia and copypaste it into this api would that be a satisfying solution? Mandate the indicators that the camera is in use? I think tha'ts the only gap we have right now between the behaviour of getUserMedia on android and our implementaiton. The permission prompt is displayed, with always the discussion of whether it's informed consent, which is something I'd like to improve. Let's say we do display the indicator when the camera is being used would that be something that would be sufficient for this API to meet the privacy bar? Or do we want to tweak it even higher? We already have cases where apps built on top of getUserMedia that do the same thing that native apps can do through ar core, because they run their own algorithms on the frames as they come on. This is something we right now want to reduce the nubmer of apps like this in the world becuase we have a company that asks us to expose the pixesl so they can build features on top of them, but it willa llow them to switch over the webxr and hpefully it'll allow them for the experiences that don't need camera access to not ask for it. I'm worried that if we don't have something like this in webxr people that are motivated enough wills till have access to the camera and they'll present the same experiences but they'll get no benefits that webxr would give them. THat's something we have to take into consideration

Dan: understood.

Dom: one thing that is very different with webxr in ar mode is it's very much expected it will be looking around you which isn't the case with getUserMedia which is usually looking at your own face. You can reasonably understand what it means to have camear recording you, undrstanding the fact that whichever company you've opened your camear to have fun putting pokemons around you can now figure out where you are at that time. That's not something any user would create a good mental model.. I'd love if we can have this firtual camear that can fuzz faces by default and you have to have another nudge to unfuzz faces, but where do you start

Dan: a lot is based on research that has gone on in varoius places that hasn't been really stitched together. We do have a lot of received wisdom regarding things like how useful prompts are, depending on who you ask you get a different answer. I think it is clear intuitively that I odn't think end users will understand the difference between you're giving me ar access so I'm seeing my camear image so I know it's on, vs raw camera access, which from my perspective will be a very similar user eperience, but also the web application has access to the camear access, that requires a sophisticated understanding of how the web works and how computers work, but most people don't have that understanding. To Dom's point about when you're using getUserMedia you expect that you're sending your camear image, but if you're using it for an AR session you may not understand the other people you're putting at risk. Are there other mitigation factors that could be put in place that oculd steer the developer towards using ther egular ar unless they absolutely need raw camera access.

Klaus: hypothetically the browser could detect faces and autoblur - the flip side is people might be looking at browser source code and seeing it's permanently doing facial recognition while ar is active, what is the evil company doing with that. And a difference between facial recognition and facial detecting. Be nice to fuzz things that are concerning, but how feasible? If you blur pixesl enough to make face recognition impossible it would likely make the use cases impossible. It's a hard question. About getUserMedia, as far as the api shape is concerned there wouldn't be anything stopping people from making a getUserMedia request during an AR session, not in Chrome's implementation, but if we really want to not use the camera in an AR session it would not be enoguh to say no raw camera access, you would also need to say you can't have getUserMedia during an ar session, nothing that forbids that

Peter: I'm hearing this doesn't enable a specific features for XR but it's a layer that allows the building of other features. In many casse we have that these other features could be built in a privacy perserving way, it's this raw part that's the dangerous bit. We've said the low level raw access does not belong on the web. Eg. custom shaders. Just becaus we need this to experiment doesn't mean this is something we need to expoes to the entire world / entire web. Maybe we want to develop the higher level features and ship those instead. That is an option here. We don't have to do this just to enable the experimentation.

Dan: I think they would say yeah but you can do it already with getUserMedia

Peter: I know it's not that simple. The low level not good enough excuse to open the privacy door.

Piotr: what I'm hearing - it seems that we might be able to proceed assuming we make the api more painful to use from the users perspective so developers will be disincentivsed to ask for raw camera access. Two prompts in a row, users will not be okay with answering without thinking. I just said yes, why are you asking a second time? Might be something we can lean n to say that this is now becoming a more informed consent. Maybe throwing in the indicators that the camera is in use. What kind of mitigations can we put in place for this api to be okay from the privacy standpoint? Assumign we compare to getUserMedia, the developers can always fall back to that if there is something here they cannot accept. To answer Peter - the use case that I've seen that I was impressed by is if we were thinking about introducing image tracking and I see... some ways of doing object tracking. Ther eis a use case that was shoe detection. The point was we don't know what people will want to do with it so it's hard for us to write a privacy perserving api in such a way that it's extensible to allow for all those use cases. I dont think we want a brand new shoe detection feature built into webxr. But it is possible with raw camera access. The other example I've seen is we are thinking about image detection, but now someone is saying cylindrical image detection, eg. an image printed on a bottle label that allows experiences when you're holding a product from some company and you can interact with it. THe point is we might not be able to standardise everything in a privacy perserving way, but it might give us ideas n what is good t stadardise to have this feature out there and see what people do with it. Maybe there are things we can make work in aprivacy perserving manner.

Dan: in the discussions in getUserMedia context, have there been discussion about additional privacy features which might be added in future?

Dom: lots of discussions about device selection to reduce fingerprinting. And about media streams in the context of webrtc, at least give primitives to allow transmit streams instead of bytes. Exposing object recognition detection.. in context of webrtc but it's built on top of getUserMedia medai streams. Maybe there another hook for additional privacy protections or scary prompts, I'm not sure what yet. That may be another place for exploration and experimentation.

Peter: maybe possibilities for adding extension points for doing additional types of object detection that don't require exposing the entire full media stream. An API that gives you snippets or a lower framework or something to a specific api that allows you to use a worker to return some object information that's isolated or sandbox. Has that been considered?

Dom: that's kind of what was discsussed at a session, only kind of.

Alex: I would caution against making it that much harder to do raw camera access than doing get user media. It's not speculative - there are concerete examplels of developers that do ar with getUserMedia right now. In a typical get user media session you're not doing xyz... developers use it and users are doing the exact thing they're doing in ar. You're incentivising users to move to built in ar apis, better power usage etc, and still improving user experience at the same level as if they're using getUserMedia. On object recognition, I have int he past looked into doing face detection, as I was talking to some devs about that, it wasn't something we pursued, one of the hard issues with standardising osmething like that is differnet people expect different things they can do. Expect the face mesh to look a certain way, more points, less points, more generic regions. Once you start looking at object recognition the field of what people want explodes. There are existing wasm modules and js apis that can do this detection but also you can do the shoe detection but because you can't do this other thing they still would just use their full pipeline.

Dan: it sounds to me there's the possibily there might be additional locking down of getUserMedia. It was designed at a time when web assembly didnt exist and it wasn't possible to do al ot of this image deteciton in the browser, these privcay doors weren't opened up. We can't divorce it entirely but you need to split apart what you can do with getUserMedia from what you want to do here.

Piotr: the sandboxing part, we have been thinking about previously and based on the feedback I got from eg. editor of core webxr spec, he's saying there have been ideas around sandboxing but there's no real good way to avoid the side channel leaking of the information. Even if you have a locked down sandbox with access to camear pixels there are still ways to get information out of that sandbox by inspecting how long things take for example. I don't have a good way to answer is there a way to extend it in a privacy perserving manner

Peter: not to design the sandbox, we do have things coming out to create more locked down environments. My concern was more has this been considered. Maybe should be explored before we just expose the camera. If yu're worried abut side channel there are ways to mitigate - adding stuttering, delays. Has that been explored? Maybe we should take time to explore that before we just expose everyting to everyone. I understand you can already do a lot of this with getUserMedia but it's not permissiont o just keep doing it - maybe means we need to lock down or get rid of getUserMedia because it's too dangerous. We can look at what is getUserMedia used for and support higher level apis that support that without giving access to media. Higher level principle of just cos we can do something doesn't mean we should continue to do that and use it as an excuse to do it more so somewhere else.

Dan: I have appreciated that you've all joined us today. I hope that's what's coming through is that you're understanding we're trying to be constructuve and it is an extremely difficult.. this is the most difficult kind of design choice from an architectural standpoing for the web that we have going right now. This is the point between new features and privcay and danger and tha's exactly why we're spenidng time and energy on it. What I'm hoping... maybe you all could brainstorm and come back with additional ideas about normative privacy mitigations that might when taken together. That has to be more than an additional prompt. I agree that it will be noise to people. There has to be something additionally that.. could the webxr folks agree to talk about this more and bring additional thoughts back to the issue?

Piotr: I want to fllow up with the UX team to see what they think. Let's go back to the discussion then. I think the way forward might be, I'll try to figure out if we can hae the same ways to bring this proposal to parity with getUserMedia, maybe it'll become easier to talk about to see what we can do above what getUserMedia does. Maybe there are ways to lock it down a bit further. I also want to say this api is strictly weaker than getUserMedia. You odn't get the choice of the camera. We dn't give you the same field of view that the camera actually can do. What the user sees on their screen is exactly what the site gets. Not the case with getUserMedia

Dan: documenting all that in your mitigations is really important

Piotr: if we got rid of getUserMedia we'd still have the tag meeting using webxr... We have to figure out how to live with this feature and how to make it work.

Dan: I hope that we can.

Comment by @torgo Nov 2, 2021 (See Github)

Hi folks - I read this today and felt it was relevant to our discussion https://www.eff.org/issues/xr. EFF are pointing out some privacy risks associated with XR (in general, not WebXR specifically) and call for (among other things) "privacy-by-design engineering". I think that's what we're doing here and what the TAG are calling for when it comes to the design of this API.

Comment by @torgo Nov 3, 2021 (See Github)

Hi @bialpio - has there been any activity on your end on additional privacy mitigations since we spoke last week?

Discussed Nov 8, 2021 (See Github)

Dan: they have responded, sounds like they're still waiting [writes comment]

Amy: response about locking down other APIs looks good, if they're actually going to do it. Even if they just mention it in privacy considerations as potential mitigation that'll be a start

Comment by @rhiaro Nov 8, 2021 (See Github)

Another option (as well as the visual indicator) for alerting the user - and in particular other people in the vicinity whose privacy may unknowingly be compromised - could be to require a sound, in a similar why to how camera apps are required to make the shutter sound in certain jurisdictions.

What other web platform APIs could be completely prohibited from use while the Raw Camera Access API is in use, without completely removing the utility of Raw Camera Access? Something like this might encourage developers to only use it if they really really need to, as well as protect privacy by restricting what other kinds of information can simultaneously be transmitted back to the origin.

Comment by @bialpio Nov 9, 2021 (See Github)

Hi @bialpio - has there been any activity on your end on additional privacy mitigations since we spoke last week?

Not yet, I still need to reach out internally to get some guidance from the UX team and Privacy team on how we could ensure we communicate that the camera is in use to the users. We'll likely postpone the OT until we have something that we can show to the developers (part of the reason for an OT is to also get feedback on the UX of the API, so it makes no sense to show something that we aren't sure is final).

Another option (as well as the visual indicator) for alerting the user - and in particular other people in the vicinity whose privacy may unknowingly be compromised - could be to require a sound, in a similar why to how camera apps are required to make the shutter sound in certain jurisdictions.

This may be something worth exploring, although I imagine emitting a sound w/ some regular cadence for the entire duration of a session could get tiresome to the users. Are we worried that a malicious app would be able to drown this sound out? At least by doing it, it will be obnoxious to the bystanders.

What other web platform APIs could be completely prohibited from use while the Raw Camera Access API is in use, without completely removing the utility of Raw Camera Access?

We'd need to lock down any kind of API that allows communication with the outside world (XHR, fetch(), maybe history?), and any kind of API that allows the app to persist state (local storage, session storage, file / filesystem?), otherwise, we're risking that whatever was extracted from the camera feed during XR session gets leaked to a server after the session has finished. We'd also need to clear the state of the script once the XR session has ended (so probably reload the site on session end).

Comment by @torgo Nov 9, 2021 (See Github)

Ok thanks for letting us know the status. Regarding locking down other APIs - is this something you're actively looking into? Would you, for example, document that in your privacy considerations? In any case, please ping us when you have an update. Meanwhile we will put this on the agenda for our next f2f in mid December to circle back to.

Discussed Dec 1, 2021 (See Github)

Dan: I understand from talking to the group chairs that there is work ongoing on figuring out how to address the issues we've raised. No response on our issue yet but I think the work is going on.

Discussed Feb 21, 2022 (See Github)

Dan: they are still working on coming up with an alternative design.

labeled - and Amy assigned

Discussed Feb 28, 2022 (See Github)

Dan: Will ping group chairs.

Comment by @tangobravo Mar 30, 2022 (See Github)

Is there any distinction to be made between headsets and handheld devices here?

Currently with handheld "WebAR" experiences based on getUserMedia, the camera frames are composited with content into a WebGL canvas. That canvas can be captured via captureStream() and Media Recorder, and then shared via Web Share. This is all with using client-side APIs, and with user consent for camera access. It seems the right balance to me between privacy and capability. It effectively allows many of the fun "AR filter" effects from social media apps to be available on the web. It's also useful for applications like product visualisation, combined with an easy photo capture + share mechanic to send on to a family member.

It would be a shame if WebXR sessions (which have the potential to offer better quality tracking with lower power usage) were not able to be used in the same way, I don't see any major difference privacy-wise on handheld devices vs the getUserMedia approach.

It's seems a non-starter to have an audible announcement every 10 seconds "RECORDING IN PROGRESS" for any use of the camera in the web - that would get annoying very quickly in Google Meet calls...

At least in the EFF article the main concerns seem to be around the always-on nature of headsets. One option would be strongly recommending some form of visual indicator that is visible to bystanders when camera access is used? Not all hardware (Oculus Quest?) would have the capability, but it's likely more of a concern for some future always-on AR eyewear.

Discussed Apr 4, 2022 (See Github)

Dan: recent comment from someone in the community. To my knowledge they're still working on this. This topic came up in the mini workshop we ran yesterday with the ethical ML work stream because the abuse cases we were talking about in webxr raw camera access issue are relevant to the use of ml on the web because one of the abuse cases was piping a raw camera image to some ml module doing facial recognition. How the user might not expect that.

Sangwhan: if it's running locally it's better than being sent over the wire..

Discussed Apr 25, 2022 (See Github)

Dan: There have been some discussions on this at the Immersive Web WG f2f last week, and minutes aren't published yet. But some discussion about lowering the permission barrier on webXR AR in the first place (doesn't provide the application access to the raw camera image) and the Raw Camera Access API, which does.

I worried why would anyone use the lower, more privacy preserving one, if the user has to agree to a permission request either way? Same amount of effort either way.

I don't know the output of that. Discuss at plenary.

Discussed Jun 6, 2022 (See Github)

Amy: a bit of an impasse... Idea of blocking access to other APIs to make it inconvenient..

Dan: https://github.com/immersive-web/webxr/issues/1269 - if you can remove the need for privacy prompt, you would require it for raw camera access, would be a mitigation, and encourage people to use the non-prompting, more privacy preserving version

Amy: yes that does sound interesting. There needs to be a clear advantage. Not having a permission prompt is a step in the right direction. Could we ask for a list of APIs that are essential for these use cases and turn everything else off?

Dan: Makes sense but they're not very receiptive to this kind of approach because they think people will just work around it. The reason I like the non prompted version is it gives more convience to the developers who are using the privacy perserving one

Amy: any other positive reinforcement we could give alongside that?

Dan: I'll liase with Ada about this

Discussed Jul 11, 2022 (See Github)

Amy: They've changed SHOULD to MUST when it comes to privacy indicator.

Dan: really good.

Sangwhan: this looks good to me.

Dan: the only thing that stops me from saying lets just close it is I know Rossen had some strong feelings about it regarding privacy implications, and he's not here.. wait for feedback

Peter: privacy indicator.. just going to be a camera icon or something?

Dan: they strengthened wording around prompting as well. They have an issue about lowering the prompting requirements for webxr AR which would then provide greater differential for raw camera access. Part of the issue is that we were requiring the same level of prompting for both, and I think that has been resolved in that the prompting for webxr ar without raw camera access is optional and that is based on the fact that it has to be up to the UA because some UAs might be specialised, eg. headset, where you're obviously entering into an AR session because it's an AR headset, so based on context. Leaves it slightly up to the UA but the prmpting in raw camera access has strong wording. Their PR also strengthens language around prompting, and talks about a prompt that would ask the user if they want coarse or fine camera access. A differential and explaining to the user a bit more what is going on.

Amy: concerned we've lowered the bar but it does feel like progress. They've not reduced the prompting for AR - it was always optional.

Peter: is it about the resolution? or fingerprinting surface as well?

Dan: about access to the camera image itself via the web api, whereas AR just gives you room geometry.

Peter: wasn't there something more privacy concerning about this than getUserMedia?

Dan: only the context in which people are using it, and you're feeding even more metadata, because in addition you're giving all the room geometry stuff.

Peter: seems like the hard part is educating the user about what the privacy implications are when you're asking for permission. Saying 'coarse' vs 'fine' doesn't sound like it's covering all of the issues here. Sounds like the resolution.

Dan: it's not exactly precise. 'coarse' in this case means ... it's suggested wording, it would be up to the UA

Amy: they have a few examples of how to do the prompt, and language about the UA introducing friction to discourage unnecessary use

Dan: I feel like our work here is done. Just waiting for Rossen's comment.

Discussed Jul 11, 2022 (See Github)

Amy: PR that talks about loads of the stuff we care about

Dan: I chimed in on this issue - I said I wanted to close but we're blocked on how powerful the API is. Saying that with regular AR functionality WebXR is already privacy preserving so mabye it doesn't require the level of prompt, but for rawcameraaccess because it allows the raw camera image it would reqauire a prompt, so that would be a nudge that would encourage developers to use the regular AR functionality when they don't need the full power of the camera image.

Amy: a bit about the permission prompt having 2 options - fine access, course access.. Not exactly the same... It's helpful. That example of what it could look like is good. Gives good info to the user on what they're agreeing to.

Dan: yeah I agree it looks really good. Prompt on regular AR access is already optional in the spec, eg. to enable experience with a headset and your expectation is not that you're going to have to answer a prompt to say yes I want AR when I put on AR glasses. UA should make the decision based on context.

looking at privacy indicators section

Amy: it's a should not a must ... "should attempt" is a bit weak. I'd accept "must attempt."

Sangwhan: it should be "should"

Amy: I wonder why it can't be a "must"

Dan: I agree i think it should be a must to allow access to the raw camera.

Amy: I'd like to hear what their really good reasons are for it to not be a must. Maybe it's "because goggles" - but I'd like to be spelled out.

Dan: agree, that's why you have access to the less privacy invasive version of the API

Comment by @rhiaro Jul 12, 2022 (See Github)

Thanks for your work on this @bialpio. We have left some feedback on the PR about the strength of the statement about the privacy indicator, but overall we are very happy with the direction this is going.

Discussed Jul 18, 2022 (See Github)

Amy: they've taken into account our feedback, we can close

Comment by @torgo Jul 19, 2022 (See Github)

Thanks for taking our feedback into consideration! We really appreciate the time and energy you've put in on this. We look forward to seeing the work progress.