design-reviews#933: Updated review of WebNN API

#933: Updated review of WebNN API

Opened Feb 15, 2024

(extracted from https://github.com/w3ctag/design-reviews/issues/771#issuecomment-1911927573)

I'm requesting an updated TAG review of WebNN API - previous TAG review: https://github.com/w3ctag/design-reviews/issues/771

Since the initial Candidate Recommendation Snapshot and the previous TAG review, the Working Group has gathered further implementation experience and added new operations and data types needed for well-known transformers https://github.com/webmachinelearning/webnn/issues/375. In addition, the group has removed selected features informed by this implementation experience: higher-level operations that can be expressed in terms of lower-level primitives in a performant manner, and support for synchronous execution. The group has also updated the specification to use modern authoring conventions to improve interoperability and precision of normative definitions and is developing a new feature, a https://github.com/webmachinelearning/webnn/issues/482, to improve performance and interoperability between the WebNN, WebGPU APIs and purpose-built hardware for ML.

The removal of support for synchronous execution is in-line with TAG's guidance (removal discussed in https://github.com/w3ctag/design-reviews/issues/531 and moving toward JSPI that is coming finally.

Explainer¹ (minimally containing user needs and example code): https://github.com/webmachinelearning/webnn/blob/main/explainer.md
Specification URL: https://www.w3.org/TR/webnn/
Tests: https://github.com/web-platform-tests/wpt/tree/master/webnn
User research: N/A
Security and Privacy self-review²: https://github.com/webmachinelearning/webnn/issues/119
GitHub repo (if you prefer feedback filed there): https://github.com/webmachinelearning/webnn/
Primary contacts (and their relationship to the specification):
- Anssi Kostiainen, @anssiko, Intel (chair)
- Dominique Hazael-Massieux, @dontcallmeDOM, W3C (staff contact)
- Ningxin Hu, @huningxin, Intel (editor)
- Dwayne Robinson, @fdwr, Microsoft (editor)
Organization(s)/project(s) driving the specification: W3C Web Machine Learning Working Group
Key pieces of existing multi-stakeholder (e.g. developers, implementers, civil society) support, review or discussion of this specification:
- Chromium comments: https://chromestatus.com/feature/5738583487938560 (in development)
- Mozilla comments: N/A
- WebKit comments: N/A (but Apple recently joined the WG)

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: expecting to republish as Candidate Recommendation Snapshot in Q1 2024
The group where the work on this specification is currently being done: W3C Web Machine Learning Working Group
Major unresolved issues with or opposition to this specification:
This work is being funded by:

You should also know that...

[please tell us anything you think is relevant to this review]

We'd prefer the TAG provide feedback as open issues in our GitHub repo for each point of feedback

Discussions

Comment by @anssiko Feb 29, 2024 (See Github)

We've issued a CfC to advance with the CR Snapshot publication mid-March noting in this CfC the TAG delta review is currently in flight. We expect this issue to be looked at in the context of the transition request.

As outlined in this issue, your earlier feedback (removal of sync APIs) has been addressed. The rest of the changes since your last review are evolutionary informed by implementation experience. Specifically, we are not expecting you to do another "full" review.

If the group doesn't hear any concerns from you it plans to proceed with the publication. Thank you for your review comments (https://github.com/w3ctag/design-reviews/issues/771#issuecomment-1332346304) that motivated the removal of the sync APIs.

Discussed Mar 25, 2024 (See Github)

bumped to next week

Discussed Mar 25, 2024 (See Github)

Yves: diff https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FTR%2F2023%2FCR-webnn-20230330%2F&doc2=https%3A%2F%2Fwebmachinelearning.github.io%2Fwebnn%2F

Max: I wan to mention... the author mentioned about the update for transformers... they talk about some use cases but don't find those use cases... One suggestion is to put the transformer use cases in...

Yves: the use cases has text to image, speech recognition and text generation...... Text to text transformer... It's not entirely clear... you have to infer what transformers are for... Also a few API signatures that have changed - to async using promises... But someone needs to look at the API to see if it makes sense or not. ... looking at the IDL, it seems pretty much OK...

Dan: reached out to Martin

Discussed Mar 25, 2024 (See Github)

bumped

Discussed Apr 1, 2024 (See Github)

Max: I think they accepted previous round of review... I don't see any issue.

Dan: Lea also responded that it looked good to her from an API design pov.

Yves: I also looked at it and the changes looked good to me.

Matthew: I had a question... They've done some stuff that's in line with what we asked... There's another thing it looks like they added. They added "matcher types" for certain problems... E.g. face detection or other common tasks. And i'm not sure whether we've done a review of that. One other thing: there is no security & privacy section in the explainer... but they did fill in the questionnaire... Answers are "no" which is true ... but I wonder if it's worth linking to the ethical web machine learning principles from this document...

we look at the spec

Matthew: oh there's "ethical considerations" - the link is there.

Matthew: still wondering about the custom transformers... webnn issue 375.

Matthew: it would be great if they gave a summary of what changed since last time...? One other thing from an architectural pov.. I believe they resolved to drop support for webgl and concentrate on interop with webgpu... just noting that.

Yves: the diff: https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fwww.w3.org%2FTR%2F2023%2FCR-webnn-20230330%2F&doc2=https%3A%2F%2Fwebmachinelearning.github.io%2Fwebnn%2F - the only mention of transformers in current spec is "adding new operations and data types needed for transformers..."

<blockquote> Thanks @anssiko, @dontcallmedom for the review request. We were wondering, could you clarify the changes around transformers? We note you've added new data types and operations in support of them (is there a list?) - did you also add/remove any transformers? </blockquote>

Comment by @matatk Apr 3, 2024 (See Github)

Thanks @anssiko, @dontcallmedom for the review request. We were wondering, could you clarify the changes around transformers? We note you've added new data types and operations in support of them (is there a list?) - did you also add/remove any transformers?

Comment by @dontcallmedom Apr 3, 2024 (See Github)

the list of operators added (and the removal of a redundant one) is in https://github.com/webmachinelearning/webnn/pull/478#issue-1988769687 based on the detailed analysis made in https://github.com/webmachinelearning/webnn/issues/375#issuecomment-1674224992

Comment by @anssiko Apr 3, 2024 (See Github)

@matatk you may also find the updated use cases for transformers https://github.com/webmachinelearning/webnn/pull/507 helpful -- these use cases motivated the new ops discussed in the above-mentioned issue, also linked from the SOTD.

To provide further context on the removal: one op (squeeze) was removed from the initial list of considered transformer ops because it was found out it can be expressed in terms of an existing lower-level op (reshape) in a performant manner. The emulation path for squeeze is presented informatively in the specification.

Please let us know if you have any further questions.

Discussed Apr 22, 2024 (See Github)

Dan: pings Sangwhan to get his feedback as he has domain knowledge

Comment by @anssiko Apr 25, 2024 (See Github)

For full disclosure and to close the loop on this review:

A new CR Snapshot (history) was published recently. Thank you for your questions and reviews (plural). We've already received two rounds of reviews from the TAG given we've hit the CRS milestone twice for this spec and appreciate your insights and persistence in working with us as we further evolve this specification. We look forward to another delta review with you as appropriate.

If you have further review comments now or at any time do not hesitate to reach out to our group. We will consider all suggestions regardless of the spec milestone we're targeting. We're currently iterating on CRDs and plan to publish a new CRS approximately every 6-12 months.

Comment by @matatk Apr 29, 2024 (See Github)

Hi @anssiko. Thank you for providing the context and info on recent changes, and for the publishing and cadence info. We are still looking into a few things on this review (noting that the 2024-04-29 version is now the current one, as you mentioned). We'll reply on this thread with any additional thoughts.

Discussed May 13, 2024 (See Github)

Matthew: I think we're happy with this - pending some internal thinking we're doing on the fit of the transformers. It's a living document, so we can give them feedback as we have it. The delta from the group was helpful.

Max: I'm happy with this; agree.

Discussed May 20, 2024 (See Github)

Matthew: Sangwhan gave some feedback - fine but with a concern "allows you to stave the main thread"... been waiting for a write-up.

we do some research in the explainer

<blockquote> We're generally happy with the way this is going. However, in previous discussions on this in the TAG, @cynthia expressed a concern regarding the threading model - that it's possible to ”starve the main thread.“ Sangwhan might be able to provide some further detail on his concern. Is this something you have considered? </blockquote>

we agree to ping Sangwhan aysnc and confirm our intuition about his feedback before posting this

Comment by @matatk May 23, 2024 (See Github)

We discussed WebNN earlier this week. We're generally happy with the way this is going. However, in previous discussions on this in the TAG, @cynthia expressed a concern regarding the threading approach - that it's possible that an intensive model running on the GPU could disrupt normal site/content rendering, and that would manifest as things like delays in requestAnimationFrame(). Is this something you have considered?

Comment by @RafaelCintron May 26, 2024 (See Github)

@matatk and @cynthia with Chromium on Windows, the WebNN context runs on a separate command queue from the Chromium compositor. Depending on the device, the ML work may run on a separate chip than the one which performs 3D work. Even when it runs on the same chip, the ML work is multi-tasked with other work on the system.

As with other web platform features (WebGL, WebGPU, 2D Canvas, CSS blurs, etc) overtaxing the system will eventually affect requestAnimationFrame. Web developers need to be responsible with the content they build.

Discussed Jun 10, 2024 (See Github)

Dan: we'd been in discussions with Sangwhan to try to channel Sangwhan's feedback. Matthew left an update and they replied.

... I think this is reasonable. There are a list of things that could have an impact on overall web perf.

Matthew: appreciate the detailed reply. It starts by saying "in Chromium on windows". So anything else wouldn't necessarily have that answer. General concern - yes there is a general thing of if you overtax the platform it goes slower. We've tried to mitigate against that in some areas in recent years with things like workers and trying to move stuff off the main thread. It seems a shame.. shouldn't we try to encourage new things to not have that type of failure mode? Where they can bring down the rest of the user experience? Most people probably will be responsible but... is that a high bar to ask? this is going to run on a lot of iddferent devices with different hardware, model sizes, processing capabilities... different *PUs ... agree that we should try to not get into a situation with new stuff to not degrade the user experience? It may be slightly unfair to put all that effort on these people, but this is a new thing and we have an opportunity. But I don't have a suggestion to say how they could do this. Whether there's some OS level multitasking they could fall back on rather than in browser?

Yves: question about that.. easy with the modern system to get over cpu starvation. Computers ar eknown to be able to cope with that and make priorities and cap cpu usage. I'm not sure it's the case for GPU. For the GPU case, if there is the same way to be able to share or reserve only part of the gpu? Or limit it? If those capabilities are in the systems or not? A few years ago I tried to do something with the GPU and I drove it to halt because there was no way to recover from that. In CPU, it's trivial to do so. Maybe things evolved.

Matthew: I wonder if WebGPU has anything to teach us. We probably has to decide what the TAG things about whether we should try and mitigate against this type of failure mode in future apis - status quo is yeah you can slow things down and break them and it's just the way it is. Maybe as a group we would probably form consensus about how important this is for us to address. Plenary?

Amy: "leave the web better than you found it"

Dan: there's stuff in the webgpu spec about denial of service. They're talking about system requirements of implementations

Matthew: did the WebNN people move to focus on just WebGPU for this spec? Previously supporting GL but decided to drop it? If they are GPU focused that's a possible avenue for mitigation. (Update: they resolved to drop WebGL interop)

Dan: do they delegate to the webgpu spec? We should talk about it in the plenary, and write some feedback now.

Hi @RafaelCintron - thanks for this detailed response. We're just discussing in our TAG breakout today. Can we just clarify 2 points:

You say "Chromium on Windows" but does this equally apply to other platforms - particularly mobile platforms? Is there implementation guidance pertaining to this in the spec?
We agree that the performance issues we've highlighted may equally apply to other web technologies such as WebGL as you've pointed out. Since we're building a new technology into the web, we have an opportunity to do something to improve the status quo. The WebGPU spec contains specific language in reference to denial of service attacks which seems related to our concerns. Would it be appropriate for WebNN spec to contain similar platform requirements, or at least point to this part of the WebGPU spec?

</blockquote>

Comment by @torgo Jun 10, 2024 (See Github)

Hi @RafaelCintron - thanks for this detailed response. We're just discussing in our TAG breakout today. Can we just clarify 2 points:

You say "Chromium on Windows" but does this equally apply to other platforms - particularly mobile platforms? Is there implementation guidance pertaining to this in the spec?
We agree that the performance issues we've highlighted may equally apply to other web technologies such as WebGL as you've pointed out. Since we're building a new technology into the web, we have an opportunity to do something to improve the status quo.The WebGPU spec contains specific language in reference to denial of service attacks which seems related to our concerns. Would it be appropriate for WebNN spec to contain similar platform requirements, or at least point to this part of the WebGPU spec?

Comment by @reillyeon Jun 10, 2024 (See Github)

@torgo, a challenge here is that WebNN supports more than just GPU compute. What @RafaelCintron mentioned makes sense as a concrete mitigation when the MLContext is configured to prefer GPU compute and we need to coordinate with the browser's WebGPU engine anyways for interop purposes (e.g. passing buffers between WebNN graphs and WebGPU shaders). We have the most implementation experience with that scenario when using DirectML on Windows but are actively prototyping with other frameworks such as Core ML on macOS.

When using the CPU or a dedicated ML accelerator the types of potential resource contention and their mitigations are different. I think a general statement similar to WebGPU's reference to denial of service attacks makes sense to add to WebNN as well, with the understanding that exactly how the mitigations work will be implementation- and configuration-dependent. Implementations should use whatever mechanisms are available from the platform (such as the watchdogs mentioned by WebGPU) to prevent sites from using an unfair amount of system resources but in the end these are shared resources and the use of any compute API will affect overall performance on a fully-loaded system.

Comment by @cynthia Jun 12, 2024 (See Github)

Having some guidance in non-normative text, specifically around the different DoS vectors and mitigations would be helpful.

Discussed Jun 24, 2024 (See Github)

Matthew: They've taken up the suggestion of adding some non-normative text about the gpu issue we raised. However, However the methods for mitigation vary. In terms of this design review / issue, TAG needs to decide is that enough? .. or do we want to push on this more actively? Don't know how far we want to push this.

Dan: another issue ...

Matthew: Tess and Martin both had concerns...

Dan: suggested that the API should be higher level abstraction...

Matthew: it tries to be fairly low level to avoid some problems...

Dan: they have chosen a certain level of abstraction. Tess and Martin were asking if indeed the API should be a higher level of abstraction - at the level of "face detection" for example instead of the middle level of abstraction they've chosen.

Matthew: if you want to use the CPU you can use WASM.. if you want to solve it in a GPU way you can use WebGPU. WebNN is trying to cover all of those bases...

Dan: there is discussion of "NPU"s in the spec...

Matthew: but the industry does not know what a machine learning processor is yet.... when we know, when it's settled... then there might be an obvious abstraction that works across the different types of processor...

Dan: plenary?

Matthew: let's take it to the plenary and then we can [formulate the feedback].

Matthew: in "getting started" section in the explainer ...

Dan: Maybe a "satisfied with concerns" outcome... where the concerns are to do with lack of multi-implementer support (possibly due to non-support for certain types of neural accelerator chips?)

Matthew: interesting thing about jsml frameworks being a key customer... assumes that web developers won't use it directly... like webXR... this isn't "you can do face detection" it's "use this to build the thing that does the face detection".. they're chaining the instructions... Question is : what are the low level instructions for a neural processor?

Hi - thanks again for bringing this to us. We appreciate that you've been responsive to our feedback. We still have some concerns, but considering the current status of the work, we are planning to close the current review with a 'satisfied with concerns' label.

Our main concern is: has this API considered the full range of hardware that it might need to run on? We see this running on CPUs without neural processing extensions, GPUs without extensions, CPUs with extensions, GPUs with extensions, and dedicated ML hardware. What steps have you taken to ensure that this runs across all of these targets, considering the range of hardware that exists and might exist?

Our second and related concern is about multi-implementer support. If this is going to be ubiquitous as an approach to "do NN on the web" then it really needs to be implemented across different platforms and different hardware.

We encourage you to consider these issues as the spec and technology continues to evolve.

</blockquote>

Comment by @torgo Jun 26, 2024 (See Github)

We encourage you to consider these issues as the spec and technology continues to evolve.

Comment by @anssiko Oct 24, 2024 (See Github)

We've made updates to the specification that we believe address the remaining concerns you had, namely:

Added resource contention considerations, and noted denial of service considerations similar to WebGPU apply: https://github.com/webmachinelearning/webnn/pull/765
To demonstrate this API considers the full range of hardware and has multi-implementer support, we've explicitly linked our Implementation Status tracker to the specification header. This tracker presents implementation status across multiple OSes and three backends that abstract out CPU, GPU and NPU device types: https://github.com/webmachinelearning/webnn/pull/769

Please let us know if there are any remaining concerns. Thank you!