design-reviews#403: WebXR Device API

#403: WebXR Device API

Opened Aug 8, 2019

こんにちはTAG!

I'm requesting a TAG review of:

Name: WebXR Device API
Specification URL: https://immersive-web.github.io/webxr/
Explainer: https://github.com/immersive-web/webxr/blob/master/explainer.md
GitHub issues: https://github.com/immersive-web/webxr/issues
Primary contacts:

@NellWaliczek, editor @toji, editor @cwilso, WG co-chair @AdaRoseCannon, WG co-chair @TrevorFSmith, CG chair

Further details:

The WebXR device API has recently reached the point where it is considered a feature complete replacement for the deprecated WebVR API. We have also switched the work mode to be based around modules where the current "vr complete" WebXR device API acts as a core with other modules such as 'webxr-ar-module' and 'webxr-gamepad-module' building on that, we are not requesting a review for these modules yet.

We are also working on a polyfill for the WebXR device API, https://github.com/immersive-web/webxr-polyfill/

In addition there are multiple browsers vendors working on implementations in their browsers.

Relevant time constraints or deadlines: [please provide]
I have read and filled out the Self-Review Questionnare on Security and Privacy. Since the WebXR device API is working in a very different domain to the usual Web APIs we have put together a privacy explainer: https://github.com/immersive-web/webxr/blob/master/privacy-security-explainer.md
I have reviewed the TAG's API Design Principles
The group where the work on this specification is: Immersive Web Working Group

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

Links to major pieces of multi-stakeholder review or discussion of this specification:
Links to major unresolved issues or opposition with this specification:

You should also know that...

[please tell us anything you think is relevant to this review]

We'd prefer the TAG provide feedback as (please select one):

open issues in our GitHub repo for each point of feedback
open a single issue in our GitHub repo for the entire review
leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

Discussions

Discussed Aug 21, 2019 (See Github)

Alice: I statred reading it - i was wondering if I might do an editing pass? Explainer text - I would make a PR - here are a bunch of suggested edits none of which you are obligated to accept.

Dan: I think that would be well received. Let me check with Ada.

Alice: Also I feel it's longer than it maybe needs to be. Some stuff I didn't fully understand...

Alice: it's very thorough - all the info that needs to be in there is in there. Didn't have an specific questions about the spec (API) yet.

Dan: I got a ping from Nell asking about them joining us to talk through it - i suggest TPAC.

Hadley: I'm not clear why the WebVR API is deprecated and therefore why replace it? Sure there are good reasons but would be good to understand the journey.

Dan: I'll ask Ada about getting some time on their Agenda at TPAC

Comment by @toji Aug 22, 2019 (See Github)

Hi TAG members! We saw that one of the questions that came up while reviewing this API is what relationship it has with WebVR. That's an excellent question, and one that we felt justified answering in our explainer. We just added a new section towards the end to cover the topic, copied here for convenience. (The short version is "WebXR is a replacement for WebVR, developed by the same group.")

What's the deal with WebVR?

There's understandably some confusion between the WebXR and an API that some browsers have implemented at various points in the past called WebVR. Both handle communication with Virtual Reality hardware, and both have very similar names. So what's the difference between these two APIs?

WebVR was an API developed in the earliest days of the current generation of Virtual Reality hardware/software, starting around the time that the Oculus DK2 was announced. Native VR APIs were still in their formative stages, and the capabilities of commercial devices were still being determined. As such the WebVR API developed around some assumptions that would not hold true long term. For example, the API assumed that applications would always need to render a single left and right eye view of the scene, that the separation between eyes would only ever involve translation and not rotation, and that only one cannonical tracking space was necessary to support. In addition, the API design made forward compatibility with newer device types, like mobile AR, difficult, to the point that it may have necessitated a separate API. WebVR also made some questionable descisions regarding integration with the rest of the web platform, specifically in terms of how it interacted with WebGL and the Gamepad API. Despite this, it worked well enough in the short term that some UAs, especially those shipped specifically for VR devices, decided to ship the API to their users.

In the meantime the group that developed WebVR recognized the issues with the initial API, in part through feedback from developers and standards bodies, and worked towards resolving them. Eventually they recognized that in order to create a more scalable and more ergonomic API they would have to break backwards compatibility with WebVR. This new revision of the API was referred to as WebVR 2.0 for a while, but eventually was officially renamed WebXR in recognition of the fact that the new API would support both VR and AR content. Developement of WebXR has been able to benefit not only from the group's experience with WebVR but also from a more mature landscape of immersive computing devices that now includes multiple commercial headsets, the emergence of both mobile and headset AR, and multiple mature native APIs.

WebXR is intended to completely replace WebVR in the coming years. All browsers that initially shipped WebVR have committed to shipping WebXR in it's place once the API design is finished. In the meanwhile, developers can code against WebXR, relying on the WebXR Polyfill to ensure their code runs in browsers with only WebVR implementations.

We also wanted to ask for the TAG to weigh in on a technical issue we've encountered with how WebXR interacts with Feature Policy. The full issue is detailed in WebXR Issue 768, but that's a long read and assumes some prior contextual knowledge, so I'll simplify it here:

On some devices (such as phones) WebXR surfaces motion data that is effectively a re-packaging of the data exposed by deviceorientation events or the generic sensors APIs. (In fact, the polyfill relies on deviceorientation to function on mobile devices.) It's not exactly the same, as WebXR applies various motion prediction and skeletal modeling algorithms to the data to better serve the API's purpose, but they're close enough that a motivated developer could use WebXR as a deviceorientation alternative if needed.

(Please note that this does not apply to devices such as tethered headsets connected to a PC, as they would not have their motion data exposed through deviceorientation/generic sensors.)

The question then is: If a developer has specified thought Feature Policy that WebXR is allowed but one of the sensor APIs which surface related data is blocked, should WebXR also avoid surfacing that data? This would result in the WebXR reporting that it is unable to support VR content on mobile devices, while allowing desktop devices in the same circumstances, which seems difficult for developers to predict and test. On the other hand, if we allow WebXR to surface similar data to blocked APIs, it may be possible for developers to use WebXR to polyfill the other sensor APIs, subverting the presumably intentional blocking of those features via Feature Policy.

Given that this seems to be a novel situation for the web platform, with the potential of setting precedent for how other APIs interact with Feature Policy in the future, we wanted to get the TAG's opinion before finalizing how WebXR will handle this situation. Any insight you may have is appreciated!

Comment by @alice Sep 3, 2019 (See Github)

Thanks for raising this review! I had a read through the spec (and, as discussed offline, sent a PR attempting to make some aspects of the explainer more concise and readable) and I came up with some questions/thoughts:

Accessibility

Obviously I'd like to see the question of accessibility addressed sooner rather than later.

I am looking forward to the session at TPAC dedicated to this question, but I noted that the Goals section noted only "Display imagery on the XR device at the appropriate frame rate" alongside "Poll the XR device and associated input device state".

That seems overly narrow even leaving the question of accessibility aside, given that many existing immersive experiences include sound and haptic feedback. In particular, though, for innovation in XR accessibility to be possible, authors will need the ability to control different modalities for conveying information about the virtual space which is being presented.

Could the Web Audio, Vibration and Gamepad APIs make use of XRViewerPose to provide this immersive experience? How does that work with the frame-based mechanism for updating the XRViewerPose? Could the explainer (or another document) provide some example code for linking these things together?

For users who require magnification, might it make sense to have an option on the viewport to perform appropriate scaling automatically?

There are also some interesting use cases around accessibility mentioned in the research document linked above, which might make good motivating examples:

virtual assistive devices (e.g. a "virtual cane") for navigating virtual environments
- I could also imagine an assistive technology which used augmented reality techniques to provide assistive feedback for the real world - such as using spatial audio to warn people with visual impairments about hazards at head height, or providing subtitles to real life for Deaf/hard of hearing individuals
simulation of disability in virtual reality, both as a rehabilitation aid and a training aid
virtual environments designed to be therapeutic to individuals with cognitive differences
virtual exploration of architectural designs developed according to Universal Design principles

Explainer/API questions

Please add a table of contents!
The explainer lists immersive video as a use case. Why would we not design an extension to <video> for this case?
Why does navigator.xr.supportsSession() not return a Promise<boolean> rather than rejecting in the case that the session type is not supported? That would seem like a better semantic match to the wording of the method, as well as not requiring the author to program a simple feature detection step in a try/catch style.
- Could you elaborate on why inline is the default?
Naming: it seems like the vr in immersive-vr is both redundant and inaccurate (since it doesn't encompass AR). Could it just be immersive?
The explainer doesn't provide example code for avoiding or handling WebGL context loss. Is the author supposed to avoid it altogether by using makeXRCompatible(), or are there other factors to consider?
Similarly, a code example for handling XR device changes would be useful.
Could you deep-link to the section in the Spatial Tracking Explainer which explains how to handle getViewerPose() failures?
Might it be helpful to provide a code example showing how to use the transform property of the XRViewerPose value?
Could you expand on the key concept of a Layer?
What are the key differences between an XRWebGLLayer and a <canvas>?
When might a session be blurred?

Discussed Sep 4, 2019 (See Github)

Alice: I had a look at that. I left a long comment. I haven't had a look at Brandon's comment. How webxr works with feature policy - i don't have context. Can someone else take it? Brandon explained the relationship with webvr and asking us for feedback on use of feature policy....

Dan: I would say yes, otherwise the data leaks through the webxr API then it could be used by e.g. trackers.

Alice: does blocking these features amount to blocking webxr when webxr exposes these features - or is it up to the developer to know "webxr exposes these features therefore I'll block webxr via feature policy"?

David: there are various options - if webxr might depend on a feature then you act as though it's always required...

Alice: device orientation?

David: You could say: if the feature policy says you're not allowed to use device orientation, then webxr doesn't work, even on a desktop device.

Alice: so you don't get the inconsistency.

Alice: the suggestion: if it depends on it in any context then access in all contexts should be restricted.

Dan: I can get on their agenda at TPAC about it?

Alice: 3 options ? 2 options ? 2.5 options: (1) if an API which webxr depends on on this platform is blocked then block webxr (1a) if an api which webxr depends on on any platform then block webxr on any platform (2) up to the developer to sepataely block webxr.

Dan: Did you make the big PR and did they comment?

Alice: I did send it in. They said thanks but they won't have time to think about it. I wrote a chunk on accessibility. It raises a question also relevant to Toast thing. In both cases there needs to be work done to come up with an accessibility story for the space. History of HTML - ARIA came much later.... How much do we want to require a coherent a11y strategy before allowing it onto the platform? Comments I had: what scope is there to develop that story - are we limiting the options?

Dan: Of note: https://www.w3.org/2019/08/inclusive-xr-workshop/ - workshop happening in November.

Alice: to what extent are the core xr team doing that work? or at least on top of that work?

Dan: I confirmed one of the chairs will be there but don't know about the rest of the core team.

[bumped to tpac

Comment by @toji Sep 5, 2019 (See Github)

Thank you for your feedback! I'll answer what I can below, with some tasks broken out into separate issues/PRs as indicated.

Focusing on the Explainer/API questions first, since those can generally be answered more concisely:

Please add a table of contents!

Thank you for demonstrating an effective way to do this in your explainer PR. If we don't merge that PR directly we'll be sure to add a TOC ourselves soon.

The explainer lists immersive video as a use case. Why would we not design an extension to <video> for this case?

We would very much like to see immersive playback in the <video> tag in the near future, but feel that implementing WebXR is an appropriate first step to getting there, in the spirit of the extensible web manifesto. Specifically, immersive <video> support can effectively be polyfilled with WebXR, while the reverse is not true. And, of course, a more general API like WebXR can also support many other non-video use cases, which has already proven to be valuable.

Additionally, there is not yet consensus on the video/audio formats and projection techniques that are optimal for these use cases. (This is a similar problem to map projection, in that there's no "perfect" way to lay out the surface of a sphere on a flat plane.) Similarly, we've seen on the 2D web that various video players are not satisfied with the default video controls and will frequently provide their own. It's reasonable to expect that trend to continue with immersive video and it is not yet clear what the appropriate mechanism is for providing custom controls in that environment, whereas in WebXR it's implicitly the application's responsibility to render them.

By starting with an imperative API we give developers a lot more flexibility in how they store, transmit, display, and control their content which ideally will help inform future discussions around what knobs and levers are necessary to add to the <video> tag. (And even then WebXR will serve as a fallback if your content doesn't fit into one of the canonical formats.) We do expect, and already see, libraries built around the API to simplify video playback, and would anticipate that those libraries could trivially redirect their functionality to a video tag should support be added in the future.

Why does navigator.xr.supportsSession() not return a Promise<boolean> rather than rejecting in the case that the session type is not supported? That would seem like a better semantic match to the wording of the method, as well as not requiring the author to program a simple feature detection step in a try/catch style.

I've opened an issue for further discussion on this topic, since it's one of the few potentially breaking changes you've brought up. It seems to me, though, like our usage here is in line with other similar methods that return a Promise<void> in APIs such as WebUSB and WebAudio. Are there guidelines regarding this type of use that we could refer to?

Could you elaborate on why inline is the default?

This was actually left in the explainer erroneously. There is no default mode, which is reflected in the rest of the explainer and spec IDL. (PR to fix) Historically it was default because it was the mode which requires the least user consent.

Naming: it seems like the vr in immersive-vr is both redundant and inaccurate (since it doesn't encompass AR). Could it just be immersive?

We intend to introduce an immersive-ar mode in a spec module soon after WebXR ships. In a previous iteration of the spec we specified the session mode as a dictionary, which treated "immersive" as a separate boolean and had a separate field for specifying that AR capabilities were desired like so:

// Not compatible with the current spec!
navigator.xr.requestSession({
  immersive: true,
  ar: true
}).then(/*...*/);

The primary issue this introduced was that it implied that a non-immersive AR mode was a possibility, when we had no intent of ever supporting it. Plus every new mode that is added would then have to reason about how it interacted with each of those booleans even if they weren't necessarily applicable. The use of enums was eventually deemed to be a cleaner approach.

The explainer doesn't provide example code for avoiding or handling WebGL context loss. Is the author supposed to avoid it altogether by using makeXRCompatible(), or are there other factors to consider?

Issue filed to ensure we demonstrate handling context loss.

More generally, there are two routes to ensuring context compatibility. If the context is created with the xrCompatible: true context creation argument, then the returned context will be compatible with WebXR uses and no context loss will be incurred for that reason. (The system may still lose the context for other reasons, such as reinstalling the graphics driver.) This is appropriate for pages who's primary purpose is to display WebXR content. For pages where immersive context is a secondary feature making the context compatible from the start may introduce undesired side effects (such as causing the context to run on a discreet GPU instead of a more battery-friendly integrated GPU), and so the compatibility bit can be set late using the makeXRCompatible() method. This may force a context loss on some devices if the context needs to be moved to a new adapter (while on others, such as those with only a single GPU, it can be a no-op).

Similarly, a code example for handling XR device changes would be useful.

Issue filed to add a devicechange event code sample

Could you deep-link to the section in the Spatial Tracking Explainer which explains how to handle getViewerPose() failures?

I'm not sure exactly what this is asking for? Deep link from where?

Might it be helpful to provide a code example showing how to use the transform property of the XRViewerPose value?

Issue files to add more code samples for XRRigidTransform use

Could you expand on the key concept of a Layer?

A layer is simply an image that will be displayed on the XR hardware somehow. Right now it's pretty minimal, with only a WebGL layer being exposed initially and only one active layer being allowed at a time. But we have known features that we'd like to implement in the future that would expand the types of layers that could be used and give more flexibility to how they're presented. For example, when WebGPU ships we would introduce a new layer type that allows a WebGPU context to render to the headset, and shorter term we'd like to add a layer type that takes better advantage of WebGL 2 features.

Other examples of how we may use layers in the future:

Displaying encrypted video streams
Displaying DOM content
Higher quality 2D surfaces

What are the key differences between an XRWebGLLayer and a <canvas>?

Slightly oversimplifying here, but a <canvas> is for compositing on the page and an XRWebGLLayer is for compositing on the headset. Both may share a WebGL Context, and in the end both are simply hosts for a framebuffer that WebGL binds and renders into. By making the XRWebGLLayer a distinct concept we have greater control over the framebuffer that it exposes and create it in a way that's optimal for XR.

It's worth noting that previously in WebVR we effectively used the <canvas> as the layer, but this caused several problems that all had their root in the fact that a web page and a headset are very different mediums and benefit from tailor-fit approaches. A couple of simple examples:

We were requiring developers to resize the canvas element to a resolution that was appropriate for the headset, which is typically quite large. This was easy to get wrong and frequently resulted in either grainy imagery in the headset or significantly oversized canvases on the page.
Presenting to a headset typically required taking ownership of the framebuffer that was going to be displayed, which often required an expensive copy because we didn't know if the same buffer would be shown on the page as well.
The canvas may be constructed with options (such as preserveDrawingBuffer: true) that weren't appropriate for use with XR hardware and introduced even more unnecessary overhead.

When might a session be blurred?

Having a concrete example in the explainer of when this state might apply would be a good idea. visible-blurred indicates that the user can see the application and it should respond appropriately to head movement to avoid user discomfort, but the user cannot interact with the app because input is captured by the system/UA. The most common scenario for this mode we see today is that many immersive computing devices have a "dasboard" can be pulled up without quitting the immersive application by pressing a dedicated button on the controller. Similarly, if it doesn't pose a privacy/security risk, the UA may choose to display some dialogs to the user without exiting the immersive app.

A quick visual aid, showing Oculus' dashboard system: oculus-dash-gif-small

Not all platforms support this type of interaction, especially if power is limited, and in those cases we would expect the session to only toggle between visible to hidden directly. Alternatively, the UA may take steps to reduce the app quality (such as lowering it's resolution) to improve performance while a dialog is up, which is allowed by the visible-blurred state.

Obviously I'd like to see the question of accessibility addressed sooner rather than later.

We definitely understand the importance of accessibility, and also want to ensure that immersive web content does not unnecessarily exclude users due to faulty assumptions on the part of developers about the user's abilities. This is a large topic, however, and one that we've been seeing more discussion on recently, and so I think it would be more productive for us to outline our current thinking about accessibility in a separate doc which we'll link here. Needless to say, it's a complicated problem made more difficult by the imperative nature of the rendering APIs we rely on, the relative newness of the VR ecosystem, and the type of content the device capabilities encourage. It seems likely that our accessibility story will span API enhancements, working with tool and content developers to take advantage of existing accessibility features when appropriate, encouraging best practices around use of audio and haptics, and detailing UA-level accessibility features that can apply to all content.

Comment by @hober Dec 4, 2019 (See Github)

Hi, @alice, @dbaron, @plinss, and I talked about this a bit today at our Cupertino F2F.

@NellWaliczek wrote, in a comment on immersive-web/webxr#818:

What is the appropriate way to handle enum values that are required by a specification other than the one that originally defined the eum? We talked about a few different options but didn't come to a concrete conclusion. After further discussion with the Gamepad API folks, I'm still not entirely sure what the right approach should be. We've gotten several suggestions on how to go about this, but they all have different drawbacks and there doesn't appear to be consensus on the approach. Given that this isn't a problem unique to WebXR, we'd really love to get a more definitive answer from the TAG about which approach is best for web platform consistency.

When the secondary spec nears CR, move nearly all references to the enum and its purpose into the original spec

When the secondary spec nears CR, move the value to the original spec and point to the secondary spec for the explanation of it's purpose and use.

Investigate adding partial enums to webidl

Change the enum to be a DOM string

I think a variant of (ii) is best. The variation being that I don’t think “nearing CR” is the trigger, it’s “this is being implemented in a browser engine.” (This is essentially what @dbaron said in two comments on w3ctag/design-principles#99: (1, 2))

Comment by @alice Dec 4, 2019 (See Github)

@toji

[Accessibility] is a large topic, however, and one that we've been seeing more discussion on recently, and so I think it would be more productive for us to outline our current thinking about accessibility in a separate doc which we'll link here.

I believe there was a workshop on XR accessibility recently. Were there any documents produced in that context which might be relevant here?

Could you deep-link to the section in the Spatial Tracking Explainer which explains how to handle getViewerPose() failures?

I'm not sure exactly what this is asking for? Deep link from where?

Apologies: from the first paragraph of the Viewer tracking section which links to the Spatial Tracking Explainer - could this instead be a deep link to the relevant section?

[...] Other examples of how we may use layers in the future: [...]

My proposed edits included an "Important concepts" section encompassing the concepts I had to draw out as I was reading the explainer and my best guesses as to how to explain them. It would be helpful to have an explanation about layers in your explainer, as well as in this issue thread.

When might a session be blurred?

Having a concrete example in the explainer of when this state might apply would be a good idea [...]

The example you gave here would work well! It doesn't seem to have been worked back in to the explainer yet.

One other thing:

Re-reading the explainer, I was confused by this sentence (emphasis added):

Once drawn to, the XR device will continue displaying the contents of the XRWebGLLayer framebuffer, potentially reprojected to match head motion, regardless of whether or not the page continues processing new frames.

What does that last clause mean? i.e. what does it mean for the page to continue processing new frames, if it's not writing to the framebuffer?

Discussed Mar 16, 2020 (See Github)

Tess: We marked this "pending external feedback" in December, and never got feedback. So we're waiting on them?

David: do they know we are?

Alice: likely slipped off their radar

Tess: I'll leave a comment -- poke the issue, say waiting on answers.

Alice: Maybe bring to Dan's attention in the plenary? I don't think anything a deal-breaker. I still want to see evidence that accessibility is a core part of their design process rather than a to-do list item.

Tess: Agreed.

Tess: Set a new milestone?

Alice: 2 weeks

Comment by @hober Mar 16, 2020 (See Github)

It looks like we marked this as pending external feedback back in December; are you still pursuing this review, @AdaRoseCannon @toji?

Discussed Apr 6, 2020 (See Github)

Dan: I asked Ada to comment. From my PoV I think we should close but I just wanted to be sure. Let's hopefully have a discussion and be able to close it by the plenary call

Comment by @alice May 27, 2020 (See Github)

It seems like this has largely settled, so I'm going to propose closing. We're generally happy with this direction, particularly since the supportsSession() issue was resolved.

Please comment here in the next week or so if you don't want us to close it; otherwise, you can comment or file a new issue after it's closed if you want more feedback.

Thanks!

Comment by @toji May 28, 2020 (See Github)

I agree, and am fine with seeing this closed. As always, we're happy to reach out to the TAG if we have additional questions in the future or for reviews of additional modules that we develop.

Thank you!

Discussed Jun 22, 2020 (See Github)

Dan: I think they're happy and we're happy so let's close.

Peter: [closes