design-reviews#570: Web Neural Network API

#570: Web Neural Network API

Opened Nov 13, 2020

Hi TAG!

I'm requesting a TAG review of the Web Neural Network API.

The Web Neural Network API (or WebNN API in short) is a specification for constructing and executing computational graphs of neural networks. It provides web applications with the ability to create, compile, and run machine learning networks on the web browsers. The WebNN API may be implemented in web browsers using the available native operating system machine learning APIs for the best performance and reliability of results.

Explainer: https://github.com/webmachinelearning/webnn/blob/master/explainer.md
Specification URL: https://webmachinelearning.github.io/webnn/
Tests: mocha tests, plan to migrate to wpt testharness.js
Security and Privacy self-review: https://github.com/webmachinelearning/webnn/issues/119 (related: it was suggested the W3C TAG to drive coordination on the larger question of permission model for compute-heavy APIs (WebNN, WebGL, WebGPU, Wasm) from a platform-wide perspective)
GitHub repo: https://github.com/webmachinelearning/webnn
Primary contacts:
- Ningxin Hu (@huningxin), Intel, Editor
- Chai Chaoweeraprasit (@wchao1115), Microsoft, Editor
- Anssi Kostiainen (@anssiko), Intel, Chair
Organization(s)/project(s) driving the specification: Machine Learning for the Web Community Group
Key pieces of existing multi-stakeholder review or discussion of this specification: Web and Machine Learning workshop report and spec GH issues
External status/issue trackers for this specification:

Further details:

I have reviewed the TAG's API Design Principles
Relevant time constraints or deadlines: We appreciate feedback by the end of 2020.
The group where the work on this specification is currently being done: Machine Learning for the Web Community Group
The group where standardization of this work is intended to be done: Web Machine Learning Working Group (see advance notice)
Major unresolved issues with or opposition to this specification: Appropriate API abstraction level discussed in WG Charter GH repo
This work is being funded by: N/A

You should also know that...

[please tell us anything you think is relevant to this review]

We'd prefer the TAG provide feedback as:

🐛 open issues in our GitHub repo for each point of feedback

Discussions

Comment by @anssiko Jan 7, 2021 (See Github)

The Machine Learning for the Web Community Group congratulates @cynthia for his re-election to the TAG and looks forward to the TAG review comments :-)

Comment by @anssiko Jan 27, 2021 (See Github)

Discussed this briefly with @kenchris who kindly volunteered to share his high-level review comments for the explainer:

Consider making use cases more prominent in the explainer, perhaps note the use cases (as bullets?) in the beginning (currently use cases are noted in the explainer key scenarios and linked from the spec header)
Note in the explainer (or in the TAG review request?) there is a spec-compliant polyfill that passes the test suite as well as samples that implement selected use cases using this polyfill (maybe the review request template could include new fields for polyfill and samples?)
Note this effort has rather diverse participation, including major browser vendors, key ML JS frameworks, interested hardware vendors, web developers
Note the design process of this API started by identifying key use cases, working down the levels of abstraction decomposing the key use cases into requirements, aligned with the guidance to put user needs first

I probably missed some of @kenchris insights, so please fill me in.

Comment by @kenchris Jan 28, 2021 (See Github)

I am looking at this with @cynthia now, but here are some of my comments from yesterday:

Yes, I definitely think the explainer should better explain the use-cases and quickly introduce the major new terminology such as Neural Network, AI, Model Loader etc.

Then it should clearly explain the pros/cons with each of the approaches (bullet points would be nice), so that it is clear that even if pursuing a model loader right seems complicated due to no standardized format, it also does not mean that a neural network API will be useless when that exists.

Also when you have the use-cases, it would be nice to be able to see what of the available options (model loader, neural network etc) would solve the use-cases and which ones doesn't, like "training" won't be solved with a module loader.

Also as some of this could be implemented / polyfilled with WASM, WebGL, WebGPU, that discussion seems important. In the explainer there are argumentations to why this might not be a good solution, but existing libraries work on top of this, so do they also suffer from all these issues you are listing? Maybe some look at the performance or battery efficiency of this new approach would be appropriate

Comment by @cynthia Jan 28, 2021 (See Github)

@kenchris and I looked this today.

First-pass review - we have a bunch of questions:

The fact that a GRU is in there really sticks out. I somehow found out why it is there, but it feels extremely inconsistent with the rest of the API which is fairly generic. (e.g. you should have a LSTM and a GRU, but not just a GRU - that's weird.)
In the spec, some of the activations are out in global scope (e.g. relu), some are in unary operators (sigmoid, tanh) - this doesn't look consistent.
The spec mentions training in the batch normalization section - but I'm fairly convinced that there is no support for training. Is this an error?
getNeuralNetworkContext() and createModelBuilder() seem strange (no parameters, for one thing) - is this expected to accept parameters/configs at some point? If so, we'd like to see what is intended here.
Wouldn't it make sense to have a constructor rather than a builder pattern for createModelBuilder()? (e.g. new ModelBuilder(navigator.ml.getNNContext());
I see quite a few view/reshape like functions, which of these are expected to copy and which are not? Probably good to note this in the spec.
If there are layers that will be taking activations as string enums, there should simply be a string enum for activations rather than have it just in RecurrentNetworkActivation. (One may argue that hyperbolic tangent is RNN specific, but...)
While the limitations of JavaScript probably contribute a lot to this, but the ergonomics of this API based on example code might have room for improvement.
It feels like errors/exceptions should probably fleshed out. (e.g. what happens when you try to reduce on a non-existent axis?)
I don't quite understand the NamedOutput mechanism. What if what is output just a feature?
A lot of the names are very generic (Operand, Compilation) - this feels like something we might want to prefix with something or synchronize with TC39 about.
What's the isomorphic JS story for this? Also, given that this is attached to vanilla navigator, is this not expected to work in a worker scope?
Given that bootstrapping a network is a lot of work, would it make sense to have some sort of serialization/caching story here?

Nits:

The one case I saw clamp() being used seemed to implement a relu?
Search for "creatModelBuilder" in the explainer.

Comment by @cynthia Jan 28, 2021 (See Github)

One more point - feels like having a Sequential() would be nicer syntax wise.

Comment by @anssiko Jan 28, 2021 (See Github)

Thank you @cynthia and @kenchris for sharing the TAG review feedback with us.

The group will discuss this feedback on its 4 February 2021 - 15:00-16:00 UTC+0 teleconference. We dedicated most of our 1-hour meeting for this topic. You're welcome to attend subject to your availability. I apologize in advance the time is suboptimal for APAC participants.

We may create separate GH issues to track this feedback in the https://github.com/webmachinelearning/webnn/ repo and @ you to review related PRs.

Thank you again for sharing your insights, we look forward to improving and clarifying the WebNN API with your help.

Comment by @anssiko Feb 4, 2021 (See Github)

Discussed this TAG feedback on the WebML CG call today (minutes). Below are the responses and issues labeled with "tag" opened in response to the feedback. Thank you!

The fact that a GRU is in there really sticks out. I somehow found out why it is there, but it feels extremely inconsistent with the rest of the API which is fairly generic. (e.g. you should have a LSTM and a GRU, but not just a GRU - that's weird.)

This is because we're not feature complete yet, this is a "v1" API.

In the spec, some of the activations are out in global scope (e.g. relu), some are in unary operators (sigmoid, tanh) - this doesn't look consistent.

https://github.com/webmachinelearning/webnn/issues/133

The spec mentions training in the batch normalization section - but I'm fairly convinced that there is no support for training. Is this an error?

https://github.com/webmachinelearning/webnn/issues/134

getNeuralNetworkContext() and createModelBuilder() seem strange (no parameters, for one thing) - is this expected to accept parameters/configs at some point? If so, we'd like to see what is intended here.

https://github.com/webmachinelearning/webnn/issues/135

Wouldn't it make sense to have a constructor rather than a builder pattern for createModelBuilder()? (e.g. new ModelBuilder(navigator.ml.getNNContext());

https://github.com/webmachinelearning/webnn/issues/136

I see quite a few view/reshape like functions, which of these are expected to copy and which are not? Probably good to note this in the spec.

https://github.com/webmachinelearning/webnn/issues/137

If there are layers that will be taking activations as string enums, there should simply be a string enum for activations rather than have it just in RecurrentNetworkActivation. (One may argue that hyperbolic tangent is RNN specific, but...)

https://github.com/webmachinelearning/webnn/issues/138

While the limitations of JavaScript probably contribute a lot to this, but the ergonomics of this API based on example code might have room for improvement.

https://github.com/webmachinelearning/webnn/issues/139

It feels like errors/exceptions should probably fleshed out. (e.g. what happens when you try to reduce on a non-existent axis?)

https://github.com/webmachinelearning/webnn/issues/19

I don't quite understand the NamedOutput mechanism. What if what is output just a feature?

https://github.com/webmachinelearning/webnn/issues/140

A lot of the names are very generic (Operand, Compilation) - this feels like something we might want to prefix with something or synchronize with TC39 about.

https://github.com/webmachinelearning/webnn/issues/141

What's the isomorphic JS story for this? Also, given that this is attached to vanilla navigator, is this not expected to work in a worker scope?

https://github.com/webmachinelearning/webnn/issues/142

Given that bootstrapping a network is a lot of work, would it make sense to have some sort of serialization/caching story here?

A non-goal per the explainer.

Nits:

The one case I saw clamp() being used seemed to implement a relu?

Search for "creatModelBuilder" in the explainer. One more point - feels like having a Sequential() would be nicer syntax wise.

https://github.com/webmachinelearning/webnn/issues/143

We ran out of time here. The rest of the feedback to be discussed on another call.

Comment by @wchao1115 Feb 6, 2021 (See Github)

@cynthia

The spec mentions training in the batch normalization section - but I'm fairly convinced that there is no support for training. Is this an error?

The description of batch-norm in the spec is accurate but is poorly worded. I'll fix that. To clarify it here, it attempts to explain the origin of the input params mean and variance that they are the products of a training phrase of batch-normalization (which is out of scope for webnn for now). This distinction is needed to draw a contrast between batch-normalization and instance-normalization as the latter operation computes the mean and variance values per-instance on the fly within an inference call. I'll make it more clear in the text to avoid confusion.

Comment by @cynthia Feb 8, 2021 (See Github)

@wchao1115 I see your intent now. I figured that mentioning training in general would be confusing for the readers. That description makes more sense and would like to see the new text when it's there. Thanks!

Discussed Feb 15, 2021 (See Github)

[bumped to plenary - possibly we can close]

Discussed Feb 22, 2021 (See Github)

Ken: we gave feedback, they filed issues.

Sangwhan: issues with API design... I provided some feedback. I suggested an alternative design to this. Factory that creates another factory... factory for oepration. Model builder give me an add operation.. model builder give me an xyz operation...

Dan: is the way they are building this coherant? The way the community builds these kinds of things?

Sangwhan: no

Dan: why are they building it this way?

Sangwhan: mapping native optimisation APIs straight to the web. Inspiration from Android API. Not justifiable. This JS example. They're saying it's for frameworks so it's fine.

Dan: this is an area we can provide feedback not specific to the neural network use case. Does design principles say don't do that? Low level vs high level. If you're designing this for the web you really need to think about how web developers are going to use it, not simply rely on frameworks. Parts of the web people dislike working with are the parts that have been designed in this way.

Ken: if you can make the api nice even if it's low level... might be use case for this in the future

Sangwhan: a couple of things they ignored in the principles, constructors. Everything is a factory pattern, but you should bind when you compile the graph and transfer it to the coprocessor, that's more common in actual frameorks. The only framework that does that they're doing now is tensorflow. types code in jitsi chat

model = nn.Sequential(
        nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, stride=1),
        nn.Tanh(),
        nn.AvgPool2d(kernel_size=2),
        nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5, stride=1),
        nn.Tanh(),
        nn.AvgPool2d(kernel_size=2),
        nn.Conv2d(in_channels=16, out_channels=120, kernel_size=5, stride=1),
        nn.Tanh(),
        nn.Flatten(1),
        nn.Linear(in_features=120, out_features=84),
        nn.Tanh(),
        nn.Linear(in_features=84, out_features=10),
        nn.Softmax()
    )
model = model.to('cuda0')


with tf.device('cuda0') as d:
    blah blah blah

Dan: I opened up a design principles issue to capture this.

Ken: sounds like you have good arguments about why they should redesign this

Sangwhan: I do.

Ken: I think they have implemented and have polyfills.

Dan: what's the advantage of having this api vs someting that sits in webassembly

Ken: they can use hardware

Dan: doesn't webGPU give you..

Ken: some of it, but hardware made for machine learning is always better

Sangwhan: in mobile phones you have a neural coprocessor which is not a GPU. A multiplication machine.

Ken: these are becoming more and more common.

Sangwhan: I don't think gup based computation is going to be a very common thing to do for this.

Ken: the approach is the same as WebGL. Mapping the OpenGL API to the web, and it's a terrible API for JS developers, which is why everyone is relying on frameworks [e.g. 3js] because it's so difficult to use. For WebGPU they're actually trying to make an API you can use as a developer directly, even though you might want to use a framework. WebGL feels like a C API, very foreign to web developers.

Sangwhan: I will join one of their calls

Dan: can we summarize into or issue?

Sangwhan: wanted to bring up priority of constituencies, but doesn't capture that APIs should be useable by average devs. When we bring that up there are plenty of counter-examples... extensible web manifesto says apis should be low level

Dan: We have more nuanced language in there now

Ken: something low level doesn't have to be bad

Dan: milestone to 29 March

Discussed Mar 29, 2021 (See Github)

Ken: they filed issues for all of the issues - they did that on a call in the wg and they haven't finished yet. Someone from MS commented. Seems that they are listening to our feedback. Sangwhan seemed happy with the asnwer. This is an early review and we've given feedback - to which they have been receptive. They are listening.

Dan: looking at PR that resolves some issues...

Ken: unless we have major concern... i think they're handling that in the WG itself. Of course there's concern about how the field of machien learning is evolving, is it going to change, but Intel is positive, Microsoft is also positive, they believe it's the right path forward. Google had some ideas of doing something smaller, but these are the discussions in the WG. It's not shipping.

Dan: We might want to close this at the plenary. We need Sangwhan.

[sangwhan joins]

Sangwhan: they haven't addressed the ergonomics of the API. It's not developer friendly. Their reasoning is that this makes it easier for them to bind it to hardware. But I'm not sure if that's..

Dan: the point of an API is to make it easier for the developer

Ken: how is that different from webgl and webgpu

Sangwhan: I'm not suggesting those are good, a nn is more of an application API rather than a hardware API. That's the bit that makes me uncomfortable. Their rationale is that this is for framework developers

Ken: that's the same with webgl

Dan: people use things like iframe and libraries

Sangwhan: that's the main concern I have. We might want to take a position on where we draw the line of these are APIs that are for framework developers and these are APIs that are supposed to be used as-is. We shouldn't have that many that are designed only for framework developers. But there's also the extensible web manifesto that suggests everything should be low level

Dan: we have more measured wording in design principles about that now.. don't we? Are we still talking about low level vs high level?

Sangwhan: it's a standing issue. I don't have strong opinions that the web should be this or that. Feels like we should draw a line on where we say this is best as a low level API and this is best as a high level API

Dan: in the issues .. they haven't addressed all of our issues

Sangwhan: other issues I raised are minor compared to this ergonomics issue. Number 8.

Dan: they think we're asking them about the ergonomics of the examples, you responded that it's the API itself.

Sangwhan: the API is incredibly loquacious. You have to do a lot of boilerplate. Rationale is you don't have to invent new types and can work with hardware, but it's really not nice to use. That's my main concern. It feels like C code. They haven't addressed that.

Dan: we need to be stronger then on our response. We need to say in our issue we appreciate they're being responsive to the list of issues. The main thing we're still concerned about is the developer ergonomics and pointing to that example. That might sound like we're asking them to reinvent the whole thing. We could suggest that there need to be better developer ergonomics, ways to use this API that don't require that amount of syntax

Sangwhan: I've suggested that on their issue and the response I got was it's fine because of framework developers

Dan: my suggestion is to say that's not okay. It is worth nothing that both WebGL and WebGPU are good examples of notoriously difficult to use APIs that nobody uses and developers don't like working with because they're design in this way. That ties people to dominant frameworks. Which creates a different set of problems.

Sangwhan: even the native people gave up on WebGL and WebGPU, that's why unity and realty are a thing.

Dan: it's example of a bad practice.

Sangwhan: I don't want to block their work or have them reinvent the API.

Dan: they should have an optional way to interact with the API that's built in and doesn't require you to use a framework on top of it.

Sangwhan: I'm surprised that the person who wrote the example code didn't feel the API is not user friendly. I will reply on their issue that is separate. Do we have consensus that we want developer friendly APIs over hardware friendly APIs?

Dan: this is a design principles issue 117 which is still open. We need to discuss it in that context. Feels like we do have that consensus but we haven't documented it. Lea is working on a PR, there's one open (291) which among other things talks about how you shouldn't exclusively develop low level APIs, design high level in terms of building blocks over low level... ensure continunity and ease of use..

Sangwhan: if we have group consensus I can say yes we don't want lots of boilerplate in the APIs

Dan: I feel like we do have that consensus. We can further discuss in the plenary.

Sangwhan: one part of the API design choice I do understand because javascript doesn't have operator overloading. Makes it hard to make this kind of API.

Ken: wasn't there some work on that?

Sangwhan: still stage 1

Discussed May 1, 2021 (See Github)

Sangwhan: raised issues, not satisfied, don't know how to deal with this

Ken: make sense to summarise in the issue?

Sangwhan: API is not great

Ken: support from MS and google?

Sangwhan: MS and intel pushing it. Parts I did not like... see issue 89, 139 is my main gripe

Ken: can live with low level API, but want to make sure it's futureproof

Sangwhan: also annoyed about 138. Need background knowledge. Bits that are extremely tied to specific hardware accelerated NN features. Gated Recurrant Unit.. used to remove background noise in video conferencing. LSTN is more common but not supported.. hardware acceleration for GRU, found it strange

Ken: some of the use cases driving this.....

Sangwhan: weird restrictions for GRU, weird inconsistency in the API

Ken: sounds like a good comment

Sangwhan: I've been saying this, they say it's because of hardware.

Ken: how does the hardware look in five years?

Sanghwan: Exactly.

Ken: what hardware? Intel, Apple? Might differ.

Sangwhan: I suggested.. if there's hardware that doesn't support it I suggested it should throw, they said no.. suggested canPlay type mechanism to probe if you can do it... they said no. ... Invite these people to a call? Issues are difficult because you need a lot of background to get started. ... Push to plenary.. see if chairs can help.

Discussed Sep 1, 2021 (See Github)

Ken: sounds like this has support from Google and Microsoft...

Dan: last comment from anssi sums it up well. They have made a numnber of changes based on our feedback. It's all tracked in their repo. They have transitioned from CG to WG in the mean time.

Ken: leaves comment on readability of code examples

Dan: leaves comment asking about multistakeholder support

Sangwhan: I [still] have some concerns regarding the design of that API (in particular, the low level nature of it) [but] feel free to close

Comment by @anssiko Sep 2, 2021 (See Github)

The Web Machine Learning WG (we transition from a CG into a WG during the TAG review!) has now addressed all TAG review feedback. We tracked your feedback in the Web Neural Network API GH repo issues with a "tag-tracker" label: https://github.com/webmachinelearning/webnn/issues?q=label%3Atag-tracker+is%3Aclosed

On behalf of the group, I want to thank @cynthia and the TAG for the careful review. With your feedback, the specification was substantially improved. Please do not hesitate to reach out to us with any further feedback or questions.

Comment by @kenchris Sep 16, 2021 (See Github)

Just a side-note here:

When I see code snippets like

return builder.add(
          builder.max(0, x),
          builder.mul(
            builder.constant(options.alpha), 
            builder.sub(
              builder.exp(builder.min(builder.constant(0), x)), 
              builder.constant(1))));

I am wondering if that can be made more readable when/if the pipeline operator lands in JavaScript https://github.com/tc39/proposal-pipeline-operator

It might make sense to look through examples like this as see if these fit well with pipeline operator or any change should be made

Comment by @torgo Sep 16, 2021 (See Github)

Hi @anssiko - thanks for this and for tracking this so excellently. It certainly seems the group has taken a lot of the TAG feedback onboard. Before closing, I still have a concern about multi-implementer support. Currently it doesn't seem like there is a Chrome Status entry for this API. What if any signals do you have from other implementers (e.g. is there is a Mozilla standards position)? As the group is a wg now (which is great) you'll definitely need to have multiple implementations. What's the plan for that and what's the plan for trialing this with developers?

Comment by @anssiko Sep 16, 2021 (See Github)

The WG is aware of multiple work-in-progress implementations that use independent backend implementations, building on top of existing major platform APIs, across major OSes.

Some group participants hinted we may hear more at WebML WG's TPAC meeting, including information on developer-facing trial plans.

Thank you!

Comment by @cynthia Oct 19, 2021 (See Github)

Sorry for the delay, we discussed this at length over multiple calls and while there have been some disagreements on the design principles of the API - we don't think it's critical enough to warrant an unsatisfied resolution. We're happy to see this work proceed. Thank you for bringing this to our attention.