#470: WebXR DOM Overlay Module

Visit on Github.

Opened Feb 4, 2020

Hello TAG!

I'm requesting a TAG review of WebXR DOM Overlay Module.

This module supports showing DOM content during immersive AR/VR sessions to cover common use cases. This includes displaying explanatory text or HUD elements alongside an AR scene, and providing interactive elements such as buttons or sliders that affect the scene.

Further details:

  • I have reviewed the TAG's API Design Principles
  • Relevant time constraints or deadlines: Chrome is aiming for an Intent to Ship soon for this API.
  • The group where the work on this specification is being done: Immersive Web CG
  • Major unresolved issues with or opposition to this specification: N/A
  • This work is being funded by: N/A

You should also know that...

N/A

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Discussions

Discussed Feb 1, 2020 (See Github)

Peter: Concerned about conflating XR events and DOM events...

...

David: Extra events seem ok, as long as normal events also happen

Peter: I'm reading it as the select events being overloaded to mean different things... I could be wrong.

Rossen: Is this running under the assumption that there can only be a single such layer?

Tess: the API assumes that...

Rossen: that seems unnecessarily restrictive...

Peter: Can see you might want to have multiple overlays in different places

Rossen: One of the early discussions I had with hololens... we were working on the browser and how we could decompose the browser and use HTML in scenes, especially int he immersive world... is geared and focused toward having multiple interactive HTML panels.

... imagine you want to put a <streaming TV service> player on your wall. That's what holo experience allows you to do... you pin these experiences and it serialises all the geo-locked data for you so can recosntruct your experience

... want to be able to incoropoate your daily activities in the scene... restricting the experience to saying you can only hve one HTML-based panel that is going to be always in front of you sounds like you are solving a specific use case... having a hard time imagining why this should be restricted to only one...

... we don't have these types of restrictions elsewhere int he platform

Tess: Well, fullscreen is a counter-example...

Rossen: Watch this space...

... Just not sure this restriction is warranted

Peter: How will this interact with future APIs that let you push DOM content into the scene? Is this just a special case of that? Should they be different APIs at that point?

Peter: They asked for issues in their repo for each point of feedback, so we can each open an issue and link it back to the TAG review

Comment by @torgo May 26, 2020 (See Github)

Hi - we are just discussing this issue in our virtual face-to-face meeting today. One piece of feedback about the explainer: it dives right into talking about this in terms of the rest of WebXR. I think it would be valuable to start with a user need right at the front - what user experience is this spec trying to enable ? This is described below in some detail but it could be explicitly described right at the front.

Comment by @klausw May 27, 2020 (See Github)

I've added an introductory paragraph to the explainer, and also clarified the accessibility paragraph. Does this help?

Discussed Jun 1, 2020 (See Github)

Alice: Suggest we push this for another week until both myself and Tess catch up on this one

Discussed Jul 1, 2020 (See Github)

Alice: some additions to the explainer

(reading of explainer)

Alice: side note: should we add the Declarative Shadow DOM explainer to the list of good explainers?

Tess: some thoughts:

  • many readers of explainers aren't AR experts
  • we're also not the explainer police. Does us saying we won't review until we can parse/understand the explainer cause improvements or do they just move on? That is, what do we accomplish by saying it's hard to review given state of explainer.

Alice: but how much TAG time should we dedicate to something that they should have been making easier to read?

Tess: TAG became relevant by providing reviews that people find useful. How do we send people away with the feeling that it was worth coming to us? (And is the effort differential of that worth it?)

Tess notes that the "design sketch" link in the explainer errors out.

Alice: Why have root? In what context would you use the other elements on the page?

Tess: Are the other elements for when you're not in the immersive experience?

Tess: The element is forced to fullscreen.

David: I think maybe what it's saying about fullscreen is that the fullscreen mode in the HTML spec applies, but it's not necessarily taking up the full screen?

( disagreement within the TAG about which parts of the two screenshots are supposed to be the overlay! Disagree about the green rectangle or purple swirl, or the "Enter VR" and "Exit AR" buttons.)

Alice and Tess to add specific questions and David to comment about the explaine

Comment by @dbaron Jul 14, 2020 (See Github)

So the TAG looked at this for about 20 minutes in our breakout just now, and we were struggling with the explainer in various ways:

  • It's quite long, and has quite a few pages of text before giving any concrete examples.
  • At the same time, it starts off assuming that the reader knows a good bit about the space already.
  • It doesn't give the necessary context around the examples that it does give. For example, when we were discussing the two images, participants in the discussion disagreed about which parts of the images were the DOM overlays. For example, in the first one, I thought the overlay was the bit at the top with text from "Barebones WebXR DOM Overlay" down to the "Exit AR" button, but reading the text made others think that the green rectangle was the overlay. I couldn't find any text in the explainer to show that my interpretation was correct other than reading the full source code of the example.
  • the discussion about connections to the fullscreen API was confusing; it wasn't clear to what extent it was talking about a specification notion of fullscreen elements versus actually displaying something to fill the entire screen. (I don't think either of the two screenshots shows an overlay that is fullscreen... but then again, maybe I was wrong about which part of the screenshot is the overlay.)

We suggest revisiting the tips for effective explainers, and perhaps also other parts of that document.

An explainer that is difficult to read makes it hard for interested members of the web community to understand what you're proposing and to provide useful feedback. These include the TAG. Given today's discussion, I think it's likely that the TAG wouldn't review this proposal with the explainer in its current state.

(I think @hober and @alice are going to add some more specific questions as well.)

Comment by @klausw Jul 14, 2020 (See Github)

Apologies for confusion caused by this. I'll work on updating the explainer, but here are some initial answers to your specific points.

First of all, this is not a standalone proposal. The DOM Overlay Module is an add-on feature for WebXR and the WebXR AR Module, and the feature is only usable as part of a WebXR immersive AR session. Unfortunately I now realize that the explainer doesn't actually mention that. The intended target audience for the explainer was people familiar with WebXR, and that's why it assumes that readers are familiar with the problem space. Without that context, the explainer won't be very helpful.

About the two images, each of the scenes has a transparent DOM overlay element that covers the entire picture and acts as a container element for DOM content. The visible child elements of each overlay element only cover part of the screen, with the transparent parts of the fullscreen DOM overlay remaining invisible so that they don't obstruct the camera view or AR content. The green rectangle (left image) and the pink spiral shape (right image) are WebGL content drawn by the WebXR application, and the background is the live camera image.

About the fullscreen API connection, in both cases the DOM overlay element is truly fullscreen, both in the sense of filling the entire screen, and also as being the active fullscreen element as defined by the Fullscreen API. However, since it's a transparent container element, this isn't clear from the picture.

Comment by @alice Jul 14, 2020 (See Github)

I found the introduction particularly confusing because the second paragraph is about not supporting inline AR content, and I couldn't tell how that was related to the first paragraph. It seems like the first paragraph of the introduction is really an "abstract" (a good one!), and then the second paragraph begins a "background" discussion, and then paras 4-6 are more specific use cases. It would be very helpful if it could be more structured to demarcate how the ideas in each paragraph relate to one another.

In general, the level of detail is often too deep, too quickly. For example:

The DOM overlay consists of a single rectangular DOM element and its children.

Very helpful!

It is composited on top of the immersive content by the user agent.

Also helpful.

The application can style the alpha color channel for elements to leave parts of the overlay transparent, but there is no depth-based occlusion.

This is extremely specific - worth mentioning in a details section somewhere, but not necessary in a summary.

The environment view and 3D elements in the scene are always covered by non-transparent DOM elements, using alpha or additive blending for partially transparent DOM elements as appropriate for the display technology.

Again, this is too much information for a summary (and, I would have thought, mostly implied by the second sentence anyway).

This kind of jumping around in levels of detail makes it very hard to keep track of the high level ideas as we read through. It would be much easier to read if it started with a high level overview and then discussed details as necessary and in the relevant context.

The overview/feature summary could probably mostly be replaced by diagrams or screenshots of examples where the API could be used, with captions explaining the important features (such as DOM content always appearing on top of XR content, and how events are handled).

There are also several cases where points seem to be repeated, e.g. "The DOM overlay is restricted to a single rectangle at a fixed Z depth chosen by the user agent" and "The application does not get low-level control over placement of the DOM overlay; the placement of the overlay is intentionally left up to the user agent."

I suspect that the general concepts involved are relatively straightforward, so a succinct explanation of the general concepts should be possible. More detail can be built on top of that context.

Comment by @hober Sep 23, 2020 (See Github)

@dbaron wrote, in July:

So the TAG looked at this for about 20 minutes in our breakout just now, and we were struggling with the explainer in various ways: […] We suggest revisiting the tips for effective explainers, and perhaps also other parts of that document.

An explainer that is difficult to read makes it hard for interested members of the web community to understand what you're proposing and to provide useful feedback. These include the TAG. Given today's discussion, I think it's likely that the TAG wouldn't review this proposal with the explainer in its current state.

@klausw replied:

Apologies for confusion caused by this. I'll work on updating the explainer

@torgo and I took another look at this during our virtual F2F this week, and it doesn't look like this explainer has been updated yet. Marking as stalled and pending editor update.

Comment by @klausw Dec 11, 2020 (See Github)

Apologies for the long silence, I have updated the explainer to address the outstanding feedback. I simplified the overview and added some diagrams to clarify how the feature works, and moved some of the fiddlier details to separate sections later in the document.

Discussed Jan 1, 2021 (See Github)

looking at the updated explainer

Also taking a look at Ada Cannon's example app which demonstrates the use of DOM Overlay.

Alice: on input event duplication the "bubbling" seems backwards...

Alice: on Compositing - couldn't they specify the alpha blend blend mode for DOM content [automatically]?

Dan: [the way they have it] seems unfriendly.

[Discussion on what is the default dom overlay type?]

Dan: Code has comment // Show which type of DOM Overlay got enabled (if any) so what does that mean? Can you have a dom overlay without a type?

Alice: writes up comment

...discussion on whether users might consider overlay content as trusted UI and therefore is there a need for additional mitigation against 3rd party content... we decided not to leave an additional comment...

Comment by @alice Jan 26, 2021 (See Github)

Thanks for the explainer updates!

We had a look at this in our virtual face-to-face, and we had some questions:

  • Could you explain the reasoning behind the decision to require developers to listen for the beforexrselect event to avoid events targeted at DOM content triggering XR events in the content behind the overlay? Would it be possible to avoid needing this event by having events bubble from the DOM to the XR scene, so that if, say, a click event is handled and propagation stopped in the DOM, no select event gets fired in the XR scene?
  • We were confused about xrsession.domOverlayState.type.
    • There are three different values there - does the author choose which one applies?
    • Can it change during the session?
    • The example code says "Show which type of DOM Overlay got enabled (if any)" - when would none apply?
  • The section on Compositing mentions:

    A see-through AR headset typically uses the "additive" blend mode where black pixels appear transparent and dark colors on a transparent background are likely to be hard to see. Opaque black is only visible if the session uses "alpha-blend" blend mode.

    • This seems like a potential "gotcha" - is there some technical limitation meaning that authors are required to know that black (typically the default colour for text in web content) is likely to be rendered as transparent?
Comment by @klausw Jan 27, 2021 (See Github)

Thanks for the explainer updates!

Thank you for the feedback!

  • Could you explain the reasoning behind the decision to require developers to listen for the beforexrselect event to avoid events targeted at DOM content triggering XR events in the content behind the overlay? Would it be possible to avoid needing this event by having events bubble from the DOM to the XR scene, so that if, say, a click event is handled and propagation stopped in the DOM, no select event gets fired in the XR scene?

This would introduce a significant delay - the goal is to deliver the selectstart event as soon as possible to minimize latency, i.e. when the screen is first touched for phone AR. The DOM click event is only generated when the finger is lifted. If the delivery of XR events depended on propagation being stopped on the click event, it would need to delay delivery of the XR selectstart event until that point. The alternative of tying XR event delivery to pointerdown/mousedown/touchstart would potentially get complicated and a bit messy, especially if existing DOM UI elements have their own handlers and expectations for such events.

The beforexrselect approach is intended to be a fairly lightweight handler that is run at the start of the selectstart/selectend/select sequence, and either allows all of them to proceed or suppresses all of them. This helps ensure that the XR events happen in the expected sequence independently of DOM event handling. Since it's independent of other DOM input events, it doesn't interfere with them.

Also, XR frame and input timing isn't currently required to be tightly synchronized with DOM frame and input handling. It's entirely possible for the XR scene to be running at a different framerate than DOM animations. In general, the goal for an AR system is to process XR frames and input with minimal latency, while DOM is often tuned for smooth animations even at the cost of slightly increased latency. Requiring a tight coupling between DOM and XR input events could be a significant challenge for implementations and could cause performance degradations.

  • We were confused about xrsession.domOverlayState.type.

    • There are three different values there - does the author choose which one applies?

No, this is chosen by the user agent, this field just informs the page about which is currently active. I'm adding a clarification.

  • Can it change during the session?

No, it's intended to stay the same for the duration of the session. This isn't currently explicit in the spec, I'll add an update there.

  • The example code says "Show which type of DOM Overlay got enabled (if any)" - when would none apply?

Sorry, this was a mistake in the example. It is supposed to check for xrsession.domOverlayState being present before referencing the type field. If the domOverlayState attribute exists, it must have a non-null type. I'm updating the example snippet to match the full code example where I had fixed this already.

  • The section on Compositing mentions:

    A see-through AR headset typically uses the "additive" blend mode where black pixels appear transparent and dark colors on a transparent background are likely to be hard to see. Opaque black is only visible if the session uses "alpha-blend" blend mode.

    • This seems like a potential "gotcha" - is there some technical limitation meaning that authors are required to know that black (typically the default colour for text in web content) is likely to be rendered as transparent?

Yes, this is a technical limitation of additive see-through headsets such as HoloLens. The waveguide optics can only add light to the scene, they don't have any way to darken the perceived image, and therefore cannot display an opaque black. The result is basically equivalent to a lighten screen blend mode due to the way the hardware works.

There's a corresponding @media(environment-blending: additive) mode in mediaqueries-5, though I'm unsure how widely this is currently implemented. I'll add a note to the explainer to point to this.

See https://github.com/immersive-web/dom-overlays/pull/40 for the planned changes.

Comment by @klausw Jan 27, 2021 (See Github)

Yes, this is a technical limitation of additive see-through headsets such as HoloLens. The waveguide optics can only add light to the scene, they don't have any way to darken the perceived image, and therefore cannot display an opaque black. The result is basically equivalent to a lighten blend mode due to the way the hardware works.

I got the blend modes mixed up here, I think the additive mode more closely corresponds to the screen blend mode ("like two images shone onto a projection screen"), or glBlendFunc(GL_ONE, GL_ONE).

Discussed May 1, 2021 (See Github)

Tess took a look a the explainer updates, wants to re-review, moved to next week

Discussed May 1, 2021 (See Github)

Thanks for bearing with us. We're happy to close this as satisfied. We appreciate the changes you made based on our feedback. Thank for sailing with TAG.

[posted and closed

Discussed May 1, 2021 (See Github)

Dan: came back to us with a bunch of stuff they have done, responses to Alice's comments. A number of closed PRs referenced.

Tess: would like to look at the PRs to see if they adequately address things we brought up before

Dan: [proposed close] something something the abyss looking into yo

Discussed May 1, 2021 (See Github)

[bumped]

Comment by @torgo Jun 2, 2021 (See Github)

Thanks for bearing with us. We're happy to close this as satisfied. We appreciate the changes you made based on our feedback. Thank for sailing with TAG sea-ways.