Design Review #612

#612: WebCodecs (again!)

Opened Feb 18, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of WebCodecs.

An early review was conducted in #433. The API has changed a lot since then and we now have a draft specification. Please see discussion of deadlines (relatively soon) and and "you should also know that" below (upcoming work, open questions).

Thank you for reviewing!

Explainer: https://github.com/WICG/web-codecs/blob/master/explainer.md
Specification URL: https://wicg.github.io/web-codecs/
Tests: https://github.com/web-platform-tests/wpt/tree/master/webcodecs
Security and Privacy self-review²: https://github.com/WICG/web-codecs/blob/master/security-privacy-questionnaire.md
GitHub repo (if you prefer feedback filed there): https://github.com/WICG/web-codecs/issues
Primary contacts (and their relationship to the specification):
- Co-editor: Chris Cunningham (@chcunningham), Google
- Co-editor: Paul Adenot (@padenot), Mozilla
- Media WG co-chair: Mounir Lamouri (@mounirlamouri), Google
- Media WG co-chair: Jer Noble (@jernoble), Apple
Organization(s)/project(s) driving the specification: Google, Mozilla.
Key pieces of existing multi-stakeholder review or discussion of this specification: Co-editors meet regularly one-on-one and biweekly with with representatives from Intel and Microsoft and with members of WebRTC WG (all supportive). Artifacts of these meetings are mostly issues filed in our Github.
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://www.chromestatus.com/feature/5669293909868544

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: Chrome would like to ship this API version 91, which is anticipated to release May 25, 2021. The code freeze for this release will occur around April 14, 2021.
The group where the work on this specification is currently being done: WICG
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Migrating to Media WG
Major unresolved issues with or opposition to this specification: There are no major objections. Our Github tracks a number of open design questions. The most pressing are tagged with milestones.
This work is being funded by: Google/Mozilla (as employers of it's editors/implementers).

You should also know that...

Author feedback from the Chrome origin trial has been very positive and these users are eager to see it ship. This API has generated considerable user excitement (twitter, discourse)
The API is no longer coupled to WhatWG Streams. Rationale here.
We are particularly interested in input on this issue concerning transfer of BufferSource inputs: https://github.com/WICG/web-codecs/issues/104
The previous tag review mentioned ImageDecoding. I will send a PR shortly to add this to the spec: https://github.com/dalecurtis/image-decoder-api

We'd prefer the TAG provide feedback as (please delete all but the desired option): 🐛 open issues in our GitHub repo for each point of feedback

Discussions

Comment by @padenot Feb 19, 2021 (See Github)

* The code freeze for this release will occur around April 14, 2021.

This seems way too early (about two month) considering the rather fundamental discussions happening around memory allocations and time units, among others (there is a large number of unsolved issues, and some groups of users haven't given feedback yet). Getting those two wrong (and again, there is more to discuss), or not perfectly right will mean the API cannot be used for some of its stated use-cases, or will have performance implications that mean that some use-cases won't be possible on particular devices (notably and predictably, non-high-end devices).

Considering this is a low-level API that is supposed to be the bottom-most layer of the entire media decoding and encoding stack of the web platform, I would be sad to not get this right, especially considering the expectations of developers in various domains.

Comment by @chcunningham Feb 19, 2021 (See Github)

@padenot, apologies for the surprise - I thought we were more aligned on timing. I agree we have a number of issues to resolve before shipping, but I think 2 months sounds workable. Its not my intention to rush this out the door. Lets discuss more in our upcoming call and we can revise the timeline as needed.

Timing aside, this is still a great time to gather TAG feedback. Reviewers, please proceed :)

Comment by @cynthia Feb 20, 2021 (See Github)

Strange! I swear I wrote early feedback on this yesterday, but apparently I was imagining I did or I got distracted and forgot to press the green button. Abridged summary of what I wanted to write:

Explainer and spec disagree on AudioFrame vs AudioPacket?
VideoFrame has destroy/clone but AudioFrame|Packet does not - why?
In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?) while the image bitmap method is expected to return something interoperable with the rest of the platform?

Still need to spend more time with this.

Comment by @chcunningham Feb 23, 2021 (See Github)

Thanks so much @cynthia

Explainer and spec disagree on AudioFrame vs AudioPacket?

Just updated to use AudioFrame.

VideoFrame has destroy/clone but AudioFrame|Packet does not - why?

Sorry, this part needs work. The plan is for both interfaces to have matching clone() and close() methods, removing destroy(). The semantics of clone() and close() will be akin to incrementing/decrementing a ref count on the underlying media data. I should send a PR by end of week. I'll update this thread when its ready.

In VideoFrame, wild guess is that plane is intended to give the original framebuffer (e.g. in YCbCr, as multiple grayscale images?)

Your idea is roughly correct. VideoFrame gives access to the raw pixel data via its Planes interface. For YCbCr you would get a plane for each of Y, Cb, and Cr. For the I420 pixel format, the Cb and Cr planes are subsampled. The spec will soon mention additional pixel formats. For ex: RGB (1 plane) and NV12 (Y plane + interleaved CbCr plane).

while the image bitmap method is expected to return something interoperable with the rest of the platform?

Yes, though we have some pending updates to mention here as well. We initially envisioned VideoFrame -> ImageBitmap -> Canvas as a primary means of painting VideoFrames. More recently, we intend to support painting of VideoFrames directly via WebGL tex(Sub)Image(2D|3D)() and Canvas drawImage(). The Chrome work for this is tracked here. We'll file PRs/bugs on the relevant specs shortly.

We still plan to support VideoFrame -> ImageBitmap for other interop, but VideoFrame.createImageBitmap() will be removed from the spec in favor of window.createImageBitmap(videoFrame). The former was just easier for us to prototype.

Comment by @chcunningham Feb 23, 2021 (See Github)

Also, I to highlight the time units discussion @padenot mentioned. https://github.com/WICG/web-codecs/issues/122

This and other top priority issues are all assigned the 2021 Q1 milestone. https://github.com/WICG/web-codecs/milestone/2

Input welcome!

Discussed Mar 8, 2021 (See Github)

[bumped]

Discussed Mar 15, 2021 (See Github)

Ken: Sangwhan gave comments and they responded and he wants to look further. Need Sangwhan for this one.

Yves: we did add something about adding to the web platform in the guidelines.. we can look if the guide is okay with what they proposed? ... Is it okay to add examples on how things can go wrong in design principles if they are not respected?

Dan: I think so... raise it as an issue and we can discuss it in design week.

Yves: in the case of the HTTP header syntax, explaining that it's imporatn to respoect it, and counter example is the cookie header which has been a nightmare.

Dan: if you have a fully formed idea of what you want to put in, make a PR.

Yves: the style is usually very terse

Dan: depends which part of the document you're in.. We need to keep balance. Having examples is always good.

Yves: first was too terse, I'll work on that.

.. for web codecs we should wait for Sangwhan

Comment by @chcunningham Mar 17, 2021 (See Github)

The previous tag review mentioned ImageDecoding. I will send a PR shortly to add this to the spec: https://github.com/dalecurtis/image-decoder-api

This PR is now out here: https://github.com/WICG/web-codecs/pull/152

A preview of the Image Decoding sections can be viewed here: https://pr-preview.s3.amazonaws.com/WICG/web-codecs/pull/152.html#image-decoding

Comment by @chcunningham Mar 24, 2021 (See Github)

@cynthia @kenchris, just checking in. How is the review going? Have you had a chance to discuss https://github.com/WICG/web-codecs/issues/104 and https://github.com/WICG/web-codecs/issues/122?

I know its a big spec and the ImageDecoder portion is new as of last week. We really appreciate your work!

Comment by @chcunningham Apr 1, 2021 (See Github)

@cynthia @kenchris, friendly ping. I'm hoping to hear from you soon so we can respond to your feedback before we ship. I've delayed the Chrome timeline by one milestone (now with code freeze around May 20), but we'll want to make any substantial changes well in advance.

Discussed May 1, 2021 (See Github)

Ken: discussion around timestamps. Not really timestamps.. offset inside the stream. Also depends on playback speed. Normally playback correlatest o timestamp but you might be able to change playback..

Sangwhan: a timer that starts at beginning of stream?

Ken: they want to use unsigned microseconds but depends on playback speed. A lot of issues. Makes sense to me to use unsigned microseconds. Can add timestamp if there's really a need.

Sangwhan: also asking about window environments, detach codec inputs..

Ken: only on worklets or workers initially.. a lot of people who want it on window as well.

Sangwhan: on window not harmful unless there's an implementation detail..

Ken: wanted it not to be a blocking api..

Sangwhan: don't necessarily always want a worker, more involved

Ken: if no-one sets up a worker you end up with crappy web experiences

Sangwhan: going to block the main thread.. very heavy stuff.. people won't do that

Ken: if it's heavy it should always be in a worker

Sangwhan: image decoder part..

Ken: can we expoes that to the main thread?

Sangwhan: use cases like short video transfer you can block for a little bit while you're encoding it.. makes sense on window.

Ken: makes sense to start on workers, then if you have those use cases add it to window. You can always add it to window. If you add it now you cannot remove it again. Gradually build API. Makes a lot of sense. [leaves comment]

Sangwhan: by having it worker only, using it in webrtc is going to be a pain.

Ken: those people work on webrtc

Sangwhan: on worker only, someone will polyfill for window. When there's enough usage on window, say it's time to expose it to window.

Sangwhan: PR for image encoder/decoder API.. we haven't had that.. it's a common thing people want to do.

Ken: PR is pretty big. Metadata as well?

Sangwhan: seems like no api for that... [reads stuff] ... [discussion] ... [drafts comment]

Comment by @chcunningham May 6, 2021 (See Github)

Also, I to highlight the time units discussion @padenot mentioned. w3c/webcodecs#122

@padenot @aboba @sandersdan and I discussed this offline and resolved to use integer microseconds. Please see a summary of how that decision was made here: https://github.com/w3c/webcodecs/issues/122#issuecomment-833788313

Comment by @chcunningham May 6, 2021 (See Github)

@kenchris @cynthia - here's another issue that TAG could help us resolve

Should WebCodecs be exposed in Window environments? https://github.com/w3c/webcodecs/issues/211

Comment by @kenchris May 7, 2021 (See Github)

Generally, I think the points in https://github.com/w3c/webcodecs/issues/122#issuecomment-833788313 make a lot of sense.

As I understand we could always add an alternative attribute with values represented by DOMHighResTimestamp. With that in mind, I support you using a different type (here unsigned microseconds).

@dbaron also made the point (in email thread) that the timestamps you are defining are within the time of the video or audio stream/file, and is independent of the playback rate as that doesn't change the timestamp but just causes it to advance at a different rate

Both David and I agree that if that is understood correctly, it is pretty reasonable that it is a different type, differently-defined.

Comment by @kenchris May 11, 2021 (See Github)

Regarding exposing to Window environments, we are very sympathetic to @youennf's comment

It seems safer though to gradually build the API. Exposing to Worker at first covers major known usecases and allows early adopter through workers or JS shims to start doing work in Window environment. This might help validating the model is right also for window environments.

Comment by @cynthia May 11, 2021 (See Github)

@chcunningham @padenot, Sorry this took so long!

@kenchris @rhiaro and I discussed this in our F2F. Here is a summary of our discussion.

Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation.
Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.
Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.

ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)

Comment by @dalecurtis May 11, 2021 (See Github)

ImageDecoder was an interesting review. API-design-wise, this felt a lot more like a video decoder API that "supports" images (not surprising given that this is exactly what it is), but we did have some mixed feelings about the ergonomics of the API. What mechanism would allow the user to say, decode an image and put it into an HTMLImageElement? (This seemed like something that people might do)

If you didn't see it, https://github.com/dalecurtis/image-decoder-api/blob/master/explainer.md#providing-image-decoders-through-the-videodecoder-api talks a bit about the reasons for why we didn't just add 'image codecs' to VideoDecoder.

I think it's unlikely someone would want to interoperate with the image element in that way -- we certainly haven't had any requests for it. Canvas is instead the natural rendering point for authored content like this and folks reimplementing aspects of HTMLImageElement with Canvas+ImageDecoder seems more likely. What we have also had requests for is something like a 'image-decoder-worklet' where folks can bring their own image decoders - but that's a bit orthogonal to what this API is trying to do and again is likely served more simply by canvas.

Comment by @chcunningham May 13, 2021 (See Github)

@kenchris @cynthia thank you so much for the review

Temporal representation, we've discussed this and an integer representation seems to be the most adequate given the tradeoffs and risks that are associated with using a real representation.

SG, I'll update the spec shortly.

Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.

In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces. Should take about a week. I'll provide updates on the issue. If the WG cannot come to consensus by Chrome's v1 launch, we can fallback to worker-only exposure while the discussion continues.

Transfer / detach? We read the discussion and making the transfer opt-in seems reasonable - although we did lean more towards implicit transfer and make it so that opt-out is explicit. (Basically, zero garbage unless requested) But there is the risk of this behavior being inconsistent with the rest of the platform.

We considered opt-out for the same reasons and were sensitive to the consistency risk. We also concluded that transfer is often undesirable/infeasable in a few common scenarios like passing ArrayBufferView's (particularly those that are backed by the WASM heap). We felt this made automatic transfer even riskier, as the behavior would vary based on the type of the provided BufferSource.

Comment by @chcunningham May 26, 2021 (See Github)

@cynthia @kenchris

Window or Worker? Our take on the discussion is that we should start with Worker and see if there are enough use-cases and demand out there that warrant it to be exposed to Window.

In the WG call Tuesday it was suggested that we could check with participants in the Chrome origin trial to see about use cases for window-exposed interfaces.

The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.

Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.

Comment by @cynthia Jun 7, 2021 (See Github)

@chcunningham

The replies from our user survey are now summarized in this comment. Please let me know if this changes your recommendation.

Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group. (Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)

Also, for other points mentioned above (ImageDecoder, temporal representation, transfer / detach), please let me know if the replies above have adequately addressed the concern. On temporal representation, the spec is now updated as described.

I personally believe so, we'll discuss this in a plenary call and close this if everyone agrees. Thanks a lot for being patient during the long process!

Comment by @hober Jun 10, 2021 (See Github)

Since the users have spoken, it seems like it would make sense to expose this to Window - unless there is strong opposition from the group.

The WG met the other day, and it was clear (at least to me) that there isn't consensus in the group on whether or not to expose this to Window.

(Even if there is opposition, there seems to be a clear preference here so I think it's worth negotiating towards the direction of Window exposure.)

I like the idea of starting with just exposing it on Worker. It's always easier to add additional exposure in the future than it is to remove it. Best to start small and iterate over time than to start out with too wide exposure & find you regret it in the future.

Comment by @dalecurtis Jun 15, 2021 (See Github)

During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?

Comment by @hober Jun 15, 2021 (See Github)

During the WG call today, @youennf expressed concern that he's unsure of the official TAG position on window vs worker. Presumably because @hober seems to have contradicted @cynthia's (presumed official TAG) position above. Can y'all affirm the official TAG position?

Hi,

Neither @cynthia nor my comment is "the TAG position", we each just expressed our personal opinions.

Comment by @dalecurtis Jun 15, 2021 (See Github)

Thanks for the clarity, that's a bit confusing from our perspective then. We'll keep that in mind for future comments. Should we expect an official TAG position here (presumably labeled as such)?

Comment by @cynthia Jun 16, 2021 (See Github)

We can do that if that helps - generally we tend to lean towards what users demand/prefer, in this particular case it's also probably worth clarifying if @hober's position is with or without the implementor hat on.

(I'm not an implementor, so I'm fairly neutral on this)

Comment by @jan-ivar Jun 16, 2021 (See Github)

... the users have spoken ...

A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.

Comment by @chcunningham Jun 16, 2021 (See Github)

@cynthia, having a clear TAG position here would be helpful to us in resolving the discussion. We are quite stuck.

A survey of participants who jumped on and invested resources in an experimental Chrome origin trial, is perhaps a skewed sample of users to rely on for a question of whether to exercise caution when it comes to exposing APIs prematurely.

We polled these users at the suggestion of the WG chair. In our view it seems correct that this savvy group of innovators and early adopters should have a strong voice in shaping the API.

Comment by @jan-ivar Jun 18, 2021 (See Github)

We are not "stuck". We have broad consensus to expose this API in workers immediately. There still exists concerns, trepidation, and questions around the urgency of additionally exposing this API on Window at this time.

We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent. I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.

Comment by @chcunningham Jun 18, 2021 (See Github)

We generally reserve "users" to describe end-users, not web developers. End-users deserve well-designed applications that work without jank across devices of all abilities, and regardless of user agent.

I respect that. I think we should also consider web developers as advocates for their users.

I think the WG with its current stance (which it is in the process of attempting to assert through CfC) is being cautious with the tools it exposes to web developers in different environments, and when, to help ensure this outcome for end-users.

At this time the WG does not have a stance. The members of the WG are in disagreement.

Discussed Aug 30, 2021 (See Github)

Sangwhan: there's a debate. Window or worker?

Ken: shouldn't it be both places?

Sangwhan: yes We agreed window or worker global scope. ... but Tess disagrees. 2 implementers disagree it should be excposed to window. one strongly disagrees.

Ken: we benchmarked (in intel) - a lot of overhead.

Sangwhan: a lot of cases where it makes sense running on the window...

Sangwhan: some argued what about single-cores... I think this should be on window or worker global scope.

Dan: if we can't get consensus we need to move on. I suggest we say these are the opinions, take it to plenary that we want to close it, we can leave a message saying we haven't had consensus

**status: we have a tentative consensus view

to validate at plenary.**

Discussed Sep 1, 2021 (See Github)

closed

Comment by @cynthia Sep 14, 2021 (See Github)

Thank you all for your patience, we hope the feedback we provided on the issue was useful at reaching a resolution. We believe that the feature itself would be a useful addition to the platform and would like to see this move forward. Thanks again for the patience and the apologies for the mixed messages. We'll discuss this in the rollups today during our VF2F and close.