Design Review #288

#288: OffscreenCanvas new commit() and DedicatedWorker.requestAnimationFrame

Visit on Github.

Opened Jun 1, 2018

Hello TAG!

This is a follow up from: https://github.com/w3ctag/design-reviews/issues/141

I'm requesting a TAG review of:

Name: OffscreenCanvas new commit() and DedicatedWorker.requestAnimationFrame
Specification URL: there's an overall description and a WhatWG PR.
Tests: Worker requestAnimationFrame and OffscreenCanvas rendering with new commit
Primary contacts: @fserb

This change addresses concerns pointed out at https://github.com/w3ctag/design-reviews/issues/141, namely:

the previous Promise commit() solution doesn't exist anymore, instead requestAnimationFrame has been added to DedicatedWorkerGlobalScope, and a render model has been described on it. This is semantic similar to what happens on Window (although not attached to a Document animation frame).
There's still a need for a commit() function, to support the new WebAssembly cases of a busy loop execution (that doesn't finish the context task) but that still wants to commit a render frame. This new commit() is a blocking function that returns as soon as the UA is ready to receive a new rendering of the OffscreenCanvas.
Apart from the request above, and since this is relevant to OffscreenCanvas, there's still a minor follow up PR on whatwg that enables (https://github.com/whatwg/html/pull/3708)[text rendering on OffscreenCanvas].

We'd prefer the TAG provide feedback as leave review feedback as a comment in this issue and notifying myself (@fserb) if needed.

Discussions

Comment by @annevk Jun 20, 2018 (See Github)

The one thing of note here is that requestAnimationFrame() and cancelAnimationFrame() will only function in dedicated worker agents that are part of an agent cluster which includes a similar-origin window agent. Otherwise they'll throw. At least, as currently proposed.

An alternative could be that we don't expose them in agent clusters that contain a service worker agent or shared worker agent, but that would require more IDL infrastructure.

Noting this here as it sets some architectural precedent. (At least, I'm not aware of other APIs needing this.)

Comment by @RByers Jun 28, 2018 (See Github)

Note that there's now a blink intent-to-ship thread for this feature

Comment by @greggman Jul 6, 2018 (See Github)

So where is the up to date spec? The spec linked above says

When its canvas context mode is placeholder , a canvas element has no rendering context. It serves as a placeholder for an OffscreenCanvas object, and the content of the canvas element is updated by calling the commit() method of the OffscreenCanvas object's rendering context.

and

canvas . transferControlToOffscreen ()

Returns a newly created OffscreenCanvas object that uses the canvas element as a placeholder. Once the canvas element has become a placeholder for an OffscreenCanvas object, its intrinsic size can no longer be changed, and it cannot have a rendering context. The content of the placeholder canvas is updated by calling the commit() method of the OffscreenCanvas object's rendering context.

But talking to others (and in testing) these are no longer the case. For an OffscreenCanvas the contents of the placeholder is updated automatically after the current event exits just like a normal canvas. In other words, just like a normal canvas if you call gl.clear or gl.drawXXX then a task is queued to copy/swap the drawing buffer into the canvas. commit is only needed for spin loops

Example: This renders to the canvas. no call to commit needed

in worker

self.onmessage = function(evt) {
  const canvas = evt.data.canvas;
  const gl = canvas.getContext("webgl");
  gl.clearColor(0,1,0,1);
  gl.clear(gl.COLOR_BUFFER_BIT);
};

In main thread

  const canvas = document.querySelector("canvas");
  const offscreen = canvas.transferControlToOffscreen();
  worker.postMessage({canvas: offscreen}, [offscreen]);

https://jsfiddle.net/greggman/w3t0gmvz/

The same is true for Canvas2DRenderingContext. Changing the code above to use canvas 2d instead of webgl also renders which the spec linked requires it NOT to render.

in worker

self.onmessage = function(evt) {
  const canvas = evt.data.canvas;
  const ctx = canvas.getContext("2d");
  ctx.fillStyle = 'red';
  ctx.fillRect(50, 100, 70, 50);
};

https://jsfiddle.net/greggman/qmn2e78h/

Also the spec linked says

and

If an OffscreenCanvas object whose dimensions were changed has a placeholder canvas element , then the placeholder canvas element 's intrinsic size will only be updated via the commit() method of the OffscreenCanvas object's rendering context.

But that is also false.

Change the code above to

self.onmessage = function(evt) {
  const canvas = evt.data.canvas;
  const ctx = canvas.getContext("2d");
  ctx.canvas.width = 2;
  ctx.canvas.height = 2;
  ctx.fillStyle = 'red';
  ctx.fillRect(0, 0, 1, 1);
};

and we see the intrinsic size is updated without calling commit

https://jsfiddle.net/greggman/j3pgLs0o/

Where is the correct spec and has it been reviewed?

Discussed Jul 10, 2018 (See Github)

TL: The latest proposal looks like they have integrated a lot of our feedback. The explainer shows many things that were discussed at TPAC 2 years ago. Progress seems to be good. There is this thing called an Offscreen Canvas which is a proxy object that routes to another element in an actual document. It allows offscreen control to a dedicated worker. From there you can issue drawing commands to the context within the worker. You were expected to create an ImageBitmap, which is an opaque object.

KC: Is this like ImageData?

TL: No. This is different. It's sort of a blob. What has changed is that 1) they have figured out how to sync better, and 2) more proactively push frames from a dedicated worker to the actual document without jumping through too many hoops. There is also a PR which seemed a bit incomplete.

KC: Why would this not use SharedArrayBuffer? I assume it is because this is a specific case? Seems costly with copying.

TL: That's the point of ImageBitmap. It's a pointer to video memory.

KC: Is it possible to implement double buffering?

TL: Probably. You can create a bunch of these, although there is the risk of backpressure. There is an explainer, which isn't quite a spec.

KC: This is driven by WebXR?

DKA: Should note that there are no privacy considerations in the explainer.

TL: Does anyone know who raised this?

SM: What would be the actual privacy concern surface?

DKA: Well, there were specific measures from the TOR browser around <canvas>. (https://en.wikipedia.org/wiki/Canvas_fingerprinting)

Hadley: +1. It would be good to make the situation explicit in the explainer.

KC: Since this ties very closely into the GPU it would be possible to do timing attacks.

TL: The new proposal discusses batching the blit operations from the OSC to the placeholder <canvas> with commit. Second use case is for multiple views of the 3D scene, which requires the scene state to be in the same worker. The third is for WebVR. And so forth. The explainer could use a bit more detail, along with the PR. The rAF timing should be synchronized with the original context. (Scribe brainfart. Might have missed some details.) rAF in the worker is global, which is tied to the main window where the actual display canvas is.

KC: There is a possibility of rate changes with GPU changes or VR contexts.

TL: In the future the spec might accomodate this. One might use a WebVR rAF which runs at a higher framerate. The main purpose of this would be to run game engines built with WebASM from a worker.

KC: So commit() is blit.

TL: Yes. It's a method on a offscreen canvas, and it is a blocking operation. (previously it was not)

KC: Why is that not an option? (as in, a parameter)

TL: I believe this is because rAF serves a different purpose.

KC: So commit() is not something you'd always use.

TL: This seems specific to cases where you need to do blocking blit calls. Now that the background has been convered, the feedback is: The spec would currently only allow dedicated worker - since the other types don't have a strong binding with a specific window. The PR seems to suggest dedicated workers, but seems to be implicit about this. Will raise this. Curious about composition - are there cases where you want to draw parts in a worker and the rest in the main window. Not sure if this is a common use case that needs to be covered - should be worth bringing up. It's unclear what would happen when you call commit when you are in a rAF loop. commit seems to overlap the use cases that ImageBitmap covers. Code samples would be useful. Couple issues on the PR - text that has changed from async to sync, mentions about the user agents deciding specifics. Would like to see more work on this before it ships

Comment by @travisleithead Jul 10, 2018 (See Github)

@greggman part of this review process is to help align implementations with the spec and the spec to implementations. As you point out, clearly there are some discrepancies between the two. It also looks like this is not shipping [yet] in any release version of a browser (e.g., not behind a flag), so there's still an opportunity to get alignment. :)

Comment by @travisleithead Jul 10, 2018 (See Github)

@fserb some initial feedback from TAG review today:

This looks like great progress, and is addressing the high-level feedback about having a consistent requestAnimationFrame from our prior feedback. In general, we'd like to see more code examples in the describing document, especially code examples that highlight (at a high-level) the various use-cases outlined at the top (it is an "explainer" document, right?).

With our privacy hat on, we were disappointed to not see a Privacy considerations section :-( It would be great to add one, even if you believe there isn't much to put in it--it helps us know you've given it some thought.

@annevk left some feedback above about availability in agent clusters... the behavior of which is a side-effect of integration into the spec via the PR. My personal view is that the updated AnimationFrameProvider be limited to DedicatedWorker and Window (for now, until details about how to bind its timing for a SharedWorker or ServiceWorker). I'd rather not expose an API that will simply throw in most cases. I thought this can be done with [Exposed=(Window,DedicatedWorker)], but I'm not really up-to-speed on agent clusters or how to differentiate that in spec-ese.

Also as-noted, the PR has nothing for the commit() API. We expect to see that in the future.

A nit: the WebIDL description of Offscreen canvas' commit cannot have multiple inheritance :-)

Use case question: is composition a desired scenario? E.g., Worker+Offscreen canvas used to render the background of a 3D scene, while foreground rendering done on Window's browsing context and the two scenes are composited together? (Or foreground paints a player HP/MP/Score overlay, while offscreen canvas renders the game world?)

In the description of the processing model: "atomic update" text is used, but there's a lack of clarity on what it means. Does it mean a blocking synchronization point for the window's browsing context? There may be a definition of this in the 'Previously Considered Solutions' section, but I don't have confidence that that definition wasn't thrown out with the previous solution. :) Also, processing of the Offscreen Canvases is done after other animation callbacks are executed in window's browsing context. Is there a specific reason for that (other than arbitrary)? Is it to avoid blocking more of the event loop?) Also, I assume this processing is somehow recursive over the tree of potentially nested workers (as workers can spawn workers).

Relationship of commit() to ImageBitmap... are these duplicating effective behavior? ImageBitmap can allow multiple frames to accumulate in memory before being rendered (these pinning GPU memory until collected). Is ImageBitmap still a good pattern to use when commit() is available and can seemingly do the same transfer-and-display logic without using intermediate objects and without allowing for the possibility of unbounded GPU memory backpressure?

commit() + rAF behavior: Is the expectation that commit() must be used at the end of each worker's rAF? (in order to get the timing synchronized and send the frame at the appropriate time to the GPU?). If not, is there some assumption that all offscreen canvases implicitly commit at the end of the worker's rAF call? This was put in question by @greggman above and does need to be clarified.

From the PR:

Note: this PR seems incomplete to me... maybe because the explainer document is incomplete? Based on what I've seen, I certainly don't think this is ready to ship...

line: 89072 - seems to loose to just say "user agent decides" for the spec--doesn't it need to match the framerate of the associated window's browsing context? Must have missed where that was set-up...

Comment by @annevk Jul 25, 2018 (See Github)

@travisleithead I don't think you understood my point. It's not about exposing it outside of dedicated workers, it's about which dedicated workers. E.g., dedicated workers spawned from service workers. Are they exposed there or do they throw?

Comment by @greggman Jul 25, 2018 (See Github)

I don't know who's reading what where but on top of the issues mentioned above with commit are a host of others. Sorry to repeat this stuff but I have no idea if there is one official place to discuss this stuff or if I should just keep repeating myself in every place it's bought up.

The biggest issue is that commit (AFAIK) effectively creates a new kind of environment in the browser. It's use case is supposedly for native ports using spin loops

 while(true) { 
    render();
    gl.commit();
 }

Once you do that all access to all other APIs stops working in that worker because all other APIs are event based (onmessage, fetch, websockets, rAF, setTimeout, setInterval, etc...)

Is that fact well under by the people reviewing this proposal?

The people pushing for commit know that implication and are suggesting that a worker using a commit spin loop will be communicated with via shared arraybuffers since other communication becomes impossible.

It seems like there are a bunch of issues with that direction. For example it seems to make throttling hard. You can block the commit indefinitely when the page using it is not the front tab BUT, if you block then any other workers or the main thread will suddenly stop having their shared arraybuffer exchanges respond. While it might be possible to correctly program for that case I have a feeling it will lead to tons of subtle race conditions for most apps trying to use it. It was also suggested in that case that instead of blocking the browser might just throttle (eg. only process commit once a second) but that also seems like a bad idea as it will likely make using the browser extremely unresponsive until the tab using the spin loop is closed.

Comment by @kenrussell Jul 26, 2018 (See Github)

@greggman, @fserb and I are considering your and others' feedback regarding commit(). We have put it behind a flag in Chromium's implementation and will do more experimentation with multithreaded WebAssembly applications before aiming to both specify and ship it.

Comment by @juj Aug 12, 2018 (See Github)

The last I understood was that .commit() and while(true){} custom main loops are orthogonal concepts. A Worker that is using OffscreenCanvas and .commit() to present frames, may be rendering asynchronously event-based in the Worker, or synchronously via while(true) {} custom main loop. The wording in https://github.com/junov/OffscreenCanvasAnimation/blob/master/OffscreenCanvasAnimation.md#the-commit-processing-model suggests that would still be the case?

Browsers should not block .commit() to wait until page resumes from background, since that would indeed deadlock Workers and possibly cause event queues to build up, and a barrage of flooded computation when the tab finally resumes. If browsers want to throttle background tabs, they should do that at a global Worker/tab level, after all nothing prevents pages from wasting CPU via a for(;;) {} in a Worker today, already without OffscreenCanvas.

The issue with multithreaded applications is that only one Worker in such an application might be doing OffscreenCanvas WebGL rendering, and many other Workers in an app may be doing some other continuously running computation, that can produce and wait for events from the OffscreenCanvas WebGL rendering Worker. If just the OffscreenCanvas Worker paused in .commit() as a response to the tab going to background, but other Workers did not, they might keep piling up events to the OffscreenCanvas Worker queue, or they might also pause waiting for something from the OffscreenCanvas Worker. That is why throttling should be considered a) globally or b) cooperatively instead, and not on .commit(). If a browser globally throttles the whole tab, then the event queue build-up should be more moderate, since essentially the whole app would suspend, and not just parts of it.

If we think that a computation consists of two separate logical parts: game/app logic simulation+animation rendering, and would want to reduce the processing of an application to only doing the simulation portion when on the background since the animation bit is redundant, then we should provide APIs that make it possible for application developers to easily opt in to. Giving requestAnimationFrame() to Workers makes that bit easier. Emscripten provides uniform API for either main thread or pthread to register to get visibilitychange events into the calling thread, which allows applications to cooperatively stop rendering, whether they are in main thread or pthread using OffscreenCanvas, or direct WebGL.

If we wanted something more automatic on the cooperative front, it could be a property bool suspendMeIfMyTabGoesToBackground and/or bool graduallyStartThrottlingMeIfMyTabGoesToBackground or similar on WorkerGlobalScope, which developers would be able to set for Workers they know do not need to do anything if the page is on the background. That would allow developers to eagerly opt in to power saving.

For slightly easier rendering throttling management, it might be nice for .commit() to give back an enum return value 0: visible, 1: hidden (while at it, 2: contextlost?) that caller would be able to investigate right there in order to do cooperative throttling in a non-event-based manner, without having to set up visibilitychange listeners. And/or perhaps we should make Page Visibility API with is document.visibilityState readily available in Web Workers?

But apart from that, I don't think browsers have any other option than to globally throttle all Workers/the whole tab, independent of what they are doing. It is the same problem as how should browser react to for(;;){} background Workers already today, the rest is about providing developers with APIs how to easily manage cooperative throttling to ensure that the computation they are generating will be the most reasonable/optimized one for the particular situation at hand?

Discussed Aug 21, 2018 (See Github)

AR: I'm increasing uncomfortable with the commit method...

TL: They're subbing out the old commit method for a new (blocking) one...

AR: Would having them on a call help?

TL: They've responded to some previous comments of which I haven't read the responses yet. Would like to look those over... Promppose we come back to this next week?

Comment by @kenrussell Sep 4, 2018 (See Github)

As an early heads-up, Chromium is prototyping an alternate solution to commit(). As was pointed out here, commit() really only solves the problem of rendering graphics from a WebAssembly thread, and doesn't handle interoperability with other web APIs.

The basic idea is to allow WebAssembly (or other web APIs) to run a nested message loop which will exit in a bounded period of time. A call to this API would be inserted in the outer loop of the pthread's main function. This makes it practical for WebAssembly threads to use requestAnimationFrame without doing large code refactorings. These threads could also then handle postMessage requests, etc.

We're still working through bugs in Chromium's implementation of this before requesting feedback more broadly, but feel free to follow along and comment on http://crbug.com/869569 and https://chromium-review.googlesource.com/1200409 .

Comment by @greggman Sep 5, 2018 (See Github)

I'm glad alternative solutions are being considered.

I'd like to take a step back and suggest that maybe the goal of allowing unmodified native apps to run um ... unmodified is maybe not such a great goal for the web?

Native apps have all kinds of issues when ported to the web. Some issues off the top of my head

The kinds of apps we're talking about (non-event driven, ie, games) are usually resource intensive.

The web doesn't have a good solution here. In order for users to have a good experience those apps need to be redesigned to stream resources so they are usable quickly. Refactoring so say the first level can be played while the other 29 levels download is likely not a smaller amount of work. They may also need extra work to cache their resources so on second visit they don't take another 5-30 minutes to start as they redownload assets.
Most native apps (mobile - there are 10x more mobile apps than PC) are not mouse friendly as they are designed for touch only. That means in order to be useful on the web they need their interfaces redesigned.
Most native apps except to have different versions generated at compile time per platform whereas most web apps are expected to run everywhere.

This has issues like for example a cut/copy/paste where a web app will leave it to the browser to handle cut/copy/paste. A native app will hard code checking for Ctrl-C on PC and Cmd-C on Mac. This doesn't work on the web. Worse, a native app expects to just be able to read the clipboard on demand but the web requires the user to paste and the app to wait for a paste event. Even games benefit from being able to paste names, paste into chat, paste passwords into login screens, etc...

What these and other issues mean is that bringing native apps to the web and having the experience be good for the user requires lots of work. If that work is not spent then all users get is really poor experiences turning them off to the web. The amount of work to refactor these apps to be event driven is tiny compared to the amount of work required to make them good experiences on the web.

In other words, this goal of making spin loops magically work is basically saving native apps only a few percent of the work they really need while at the same time encouraging bad experiences for users since native app people can then just barf out their apps, no thought given.

oh hey, it runs but only on windows! Sucks to be web
oh hey, it runs but takes several minutes to start every few days. Web sucks doesn't it

That doesn't seem like a win for anyone.

Just for fun I went looked up the code for Lumberyard and Unreal

main calls RunMainLoop

https://github.com/aws/lumberyard/blob/0b34452ef270f6b27896858dc7899c9796efb124/dev/Code/Launcher/WindowsLauncher/Main.cpp

RunMainLoop

https://github.com/aws/lumberyard/blob/e881f3023cc1840650eb7b133e605881d1d4330d/dev/Code/Launcher/LumberyardLauncher.h

Checking Unreal

WinMain calls GuardedMain

https://github.com/EpicGames/UnrealEngine/blob/release/Engine/Source/Runtime/Launch/Private/Windows/LaunchWindows.cpp

GuardedMain

https://github.com/EpicGames/UnrealEngine/blob/f509bb2d6c62806882d9a10476f3654cf1ee0634/Engine/Source/Runtime/Launch/Private/Launch.cpp

In either case I doubt it would take more than 1-4 hours to throw in a few statics and/or singletons to take things off the stack and expose the main loop to be event driven. That matches my own experience of making a few native apps event driven. In my case it took 20-40 minutes.

It seems arguable that if native devs aren't willing to put in that small amount of time to make their apps event driven then they aren't likely to do any of the other work required to bring their app to the web and so this path of supporting spin loops is only really enabling bad experiences from thoughtless quick and dirty recompiles instead of thoughtful ports.

Anyway, just a thought.

Comment by @kenrussell Sep 5, 2018 (See Github)

There are many scenarios where native applications drive their threads using internal synchronization (mutexes / futexes, lockless algorithms, work stealing) where it is either impossible or impractical to change the threading model to be event driven. Being able to bring applications like these to the sandboxed and portable web environment is a benefit to users because it makes them easier to run, and Emscripten. asm.js, and WebAssembly have successfully brought many apps to the web's ecosystem already. We plan to continue prototyping the ability to process pending tasks as a compatibility and interoperability mechanism with native codebases.

Comment by @travisleithead Oct 31, 2018 (See Github)

CC issue #141

Hey all,

Took this up again at today's F2F in Paris.

Thanks for patiently explaining a complex space to us.

While we recognize the scenarios for having an explicit blocking commit() (e.g., porting native code to the web with near-zero change), we don't think that the case is yet sufficiently compelling to offset the architectural concerns raised by blocking commit.

Shared Array Buffers and atomics now allow developers to guard critical sections, and that may seem in conflict with a rAF-based solution, but the extent to which that is true isn't clear from discussion or the examples we've been able to read.

One of the most complex scenarios we've been pondering has been the need to run multiple displays at different rates, driven from the same GL context. This arises in VR and XR scenarios and we're very happy to see the progress made in providing requestAnimationFrame fused to the XRSession object, rather than an implicit higher frame rate (which is what commit appeared to enable in a world where the worker itself might only get a 60fps rAF).

This progress in XR takes a lot of the pressure off of the necessity of commit() (in our view) and we'd like to see the community move forward with this model if possible. If not, we'd like to discuss further, perhaps on a future call, to help clarify the (perhaps glaring!) gaps in our understanding.

Cheers -- Alex & Travis

Comment by @slightlyoff Oct 31, 2018 (See Github)

Per the above, to clarify, we want to close this out, noting that we're happy that DedicatedWorker::requestAnimationFrame has moved forward. Hopefully our feedback about commit() is coherent. If there are questions, we'd love to have @fserb and interested parties join a call to discuss.

Looking forward to hearing back about what you'd like to do next.

Comment by @kenrussell Oct 31, 2018 (See Github)

We're actively working on a port of a large multithreaded game engine to the WebAssembly environment. @kainino0x and @nickshinpho are driving this effort.

Once we have this working as expected, we will compare the approaches of commit() vs. an API which provides broader compatibility of web APIs to WebAssembly's threads (http://crbug.com/869569). One of these two approaches will definitely be needed, and the determining factor will be performance of the overall system. We'll come back to this thread after we've done these measurements and indicate which direction we would like to pursue in collaboration with the TAG.

Discussed Dec 19, 2018 (See Github)

Travis: porting games over to web workers - e.g. through web assembly.

David: worklets are designed for code that's happening on the UI thread or some other existing thread - isolation but not threading...

Travis: holding pattern -

Comment by @torgo Dec 19, 2018 (See Github)

Bumped this into the new year to come back and check it then.

Discussed Jan 22, 2019 (See Github)

Travis: I might ping Ken Russell to see if they have made progress?

Comment by @torgo Feb 5, 2019 (See Github)

We discussed briefly at the f2f and agreed to close this for now. Can be re-opened if required.

Comment by @juj Dec 17, 2019 (See Github)

Searching for word "commit" on this page https://html.spec.whatwg.org/multipage/canvas.html#the-offscreencanvas-interface still finds a ton of hits, so I wonder, should this bug be reopened, or a new one created, to remind to update the OffscreenCanvas spec to remove the commit() text?

The spec being out of date on commit() makes me wonder if there are other parts in the OffscreenCanvas spec that are out of date?

According to https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/OffscreenCanvas Chrome already says they would be shipping OffscreenCanvas, but if the spec is still in flux, what exactly is Chrome shipping?

Comment by @chrishtr Dec 17, 2019 (See Github)

According to https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/OffscreenCanvas Chrome already says they would be shipping OffscreenCanvas, but if the spec is still in flux, what exactly is Chrome shipping?

OffscreenCanvas was shipped in Chromium, without the commit() method, because incompatibilities with the event loop of workers was not resolved.

I agree it should be removed from the spec, will follow up with @fserb .