design-reviews#487: WebAssembly SIMD review

#487: WebAssembly SIMD review

Visit on Github.

Opened Mar 19, 2020

Hello TAG!

I'm requesting a TAG review of WebAssembly SIMD.

WASM CG Explainer: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md
Web Platform focused Explainer: https://github.com/WebAssembly/simd/blob/master/proposals/simd/W3CTAG-SIMDExplainer.md
Proposal directory: https://github.com/WebAssembly/simd/tree/master/proposals/simd
Example usage and demos: https://v8.dev/features/simd
Tests: https://github.com/WebAssembly/simd/tree/master/test/core/simd
Security and Privacy self-review: There are no known security or privacy impacts of this feature.
GitHub repo: https://github.com/WebAssembly/simd
Primary contacts:
- Deepti Gandluri, @dtig, [Google]
- Thomas Lively, @tlively, [Google]
- Zhi An Ng, @ngzhian, [Google]
- Arun Purushan, @arunetm, [Intel]
External status/issue trackers for this feature: https://www.chromestatus.com/feature/6533147810332672

Further details:

I have reviewed the TAG's API Design Principles
The group where the incubation/design work on this is being done: WebAssembly Community Group
The group where standardization of this work is intended to be done: WebAssembly Working Group

You should also know that...

This is purely a WebAssembly performance feature that does not affect web API behavior, but is still useful for developers to be aware of as it can change performance characteristics of applications using WebAssembly. It adds a new 128-bit value type that is not exposed to JavaScript and several new opcodes for vector operations that are documented here.

We'd prefer the TAG provide feedback as:

🐛 open issues in our GitHub repo for each point of feedback

Discussions

Comment by @MattMenke2 Mar 26, 2020 (See Github)

If a platform doesn't natively support the corresponding SIMD operations, we WebAssembly falls back to emulation, right? So with respect to CPU fingerprinting, you can probably figure out whether a system supports the corresponding SIMD operations by timing attacks, but that's it?

How much do CPUs vary in terms of support for these operations? Just want to know how many bits of CPU information we're exposing here.

Comment by @dtig Mar 26, 2020 (See Github)

If a platform doesn't have native support for SIMD operations, the engines can either choose to handle this with a Scalar fallback, or applications will need to deploy a WebAssembly binary that can be used as a fallback. Picking between two binaries depending on the feature being supported is a common feature detection model for WebAssembly.

The baseline assumption for the current SIMD proposal is that the operations have reasonable mappings to hardware instructions on most modern hardware. In this case, all Intel hardware that has SSE4.1+, as well as most modern ARM, MIPS LE hardware. With respect to CPU fingerprinting, it is possible to figure out whether a system supports the corresponding SIMD operations, but given that the current set of operations are a basic set of vector operations available on most modern hardware, this information may not be particularly useful.

One of the goals for the proposal is also that the performance of these operations should be portable across architectures i.e. a similar class of applications should see a performance boost on all supported architectures. That said with a select set of operations, and careful perf measurements, it may be possible to distinguish between Intel/ARM hardware, but this will also depend to a large extent on engine implementations (level of optimization, code generation support for advanced extensions like AVX on Intel etc.).

Discussed Apr 6, 2020 (See Github)

Ken: I think it's pretty good.

[discussion of JS SIMD]

Tess: it's surprising to me that there would be a desire to expose SIMD just to WA and not JS.

Tess: worried that the shape of the API of the web is being driven by expediency... that worries me. I don't like the idea of JS and WA have different capabilities.

Ken: [WA] does have threading as well.. They are working on it [JS SIMD].

Tess: i will take a look soon.

Ken: our Intel SIMD experts have looked at this and ensured that it meets requirements... I could have a talk with internal team .... This covers the use cases they have.

Dave: have some people reviewed that it makes sense across other architectures?

Ken: it works on ARM.

Ken: will get additional feedback this week.

Peter: let's try to close it off at the plenary session

Comment by @kenchris Apr 8, 2020 (See Github)

Had a meeting with @arunetm where we discussed the TAG review, and how we review and what we expect.

We really would like a better explainer, geared toward landing the feature as part of the broader Web Paltform. The current one is really technical and written for the WASM CG and differs somewhat from what we are expecting.

Please read our explainer about explainers: https://w3ctag.github.io/explainers

For instance we would like a section about Considered Alternatives, which should list the SIMD.js work and why that was abandoned.

We would also like information about whether it would be possible to bring SIMD to JavaScript in the future so that WASM and JS don't diverge too much. Or whether that might not make sense.

Are there any current outstanding issues or disagreement that we should know about?

Is the proposal playing favors to some architectures or making sure that design is done in a way that different architectures can gain similar performance with optimized pipelines?

What is the plan for > 128 SIMD?

Comment by @dtig Apr 10, 2020 (See Github)

Thanks for the feedback.

Had a meeting with @arunetm where we discussed the TAG review, and how we review and what we expect.

We really would like a better explainer, geared toward landing the feature as part of the broader Web Paltform. The current one is really technical and written for the WASM CG and differs somewhat from what we are expecting.

Please read our explainer about explainers: https://w3ctag.github.io/explainers

Evaluating the criteria linked above, and your reply here the area that needs to be explicitly addressed is "the alternatives which have already been considered and why they were not chosen;", and code examples. Code examples from C/C++/Rust were not originally included in the overview as the bytecode should be language agnostic, and as there is currently no JS API, code examples where this could be used directly from JS are not available. Will work with @arunetm on an overview that links to the current technical document, but I would also briefly like to address some of the questions below.

For instance we would like a section about Considered Alternatives, which should list the SIMD.js work and why that was abandoned.

We would also like information about whether it would be possible to bring SIMD to JavaScript in the future so that WASM and JS don't diverge too much. Or whether that might not make sense.

The WebAssembly SIMD work is a direct offshoot of the SIMD.js work which is no longer in active development at TC39. The SIMD.js proposal is inactive for a few different reasons -

Most SIMD.js code with performance wins were mostly observed in carefully structured asm.js code, which is not representative of a large portion of JS code in the wild.
A higher level abstraction meant that there were several hidden performance cliffs, which unfortunately means that the performance is inconsistent for applications that are sensitive to it. A side effect of this was that while there were demonstrated performance gains on micro benchmarks, there weren't many real world examples that could demonstrate performance wins from experimental SIMD.js support.
Large API surface to implement just the slow path for SIMD.js, from the perspective of engine implementers, the implementation, and optimization cost was significant, and unfortunately outweighed the performance benefits, especially due to the fact that having performance cliffs disadvantages the very applications that would use this.

A lot of this is actually offset by introducing this at a lower level in WebAssembly - with the current proposal we've been able to demonstrate consistent performance gains across multiple architectures on real world applications.

There are no plans currently to expose this to JS, as the issues that existed when SIMD.js was marked inactive still exist today. That said, I don't see the current proposal as fully divergent from JS, as existing JS applications can indirectly use SIMD values in ArrayBuffers, and use Wasm function calls to manipulate SIMD values as long as the types themselves are not exposed to JS.

Are there any current outstanding issues or disagreement that we should know about?

Not necessarily a disagreement, but there are aspects of this proposal that conflict with the base assumption that WebAssembly is always deterministic. By this I mean that SIMD in hardware can be extremely performant, but can also exhibit non-determinism, which is something this proposal tries to avoid explicitly to be consistent with WebAssembly in general, and to avoid platform specific behavior to avoid finger printing (as brought up in a previous comment on this issue as well). A n example of this is the Wasm SIMD floating point min/max instructions - the hardware instructions for these are not IEEE 754 compliance, and Wasm MVP is specified with strict IEEE 754 compliance. Discussions are still in progress, but this is an example of the tradeoffs that are sometimes required.

Is the proposal playing favors to some architectures or making sure that design is done in a way that different architectures can gain similar performance with optimized pipelines?

The primary goals for the WebAssembly SIMD proposal is usability for real world applications, and consistent performance on benchmarks representative of real world usage - this is emphasized to avoid some of the pitfalls of SIMD.js. The WebAssembly SIMD proposal draws heavily from feedback provided by application developers that are experimenting with, and using the current proposal. The proposal does not play favorites, and where there are cases of suboptimal codegen for particular operations, it evaluates based on user feedback, and actively solicits concrete alternative semantics where available. Unfortunately due to the nature of the hardware support, alternative semantics are not always available - and when they are not, emulating these operations is usually more expensive in both execution time, and code size.

What is the plan for > 128 SIMD?

As the current fixed width proposal is still in the experimental phase, and implementations are gaining traction, there hasn't been a significant amount of work that has gone into designing a future version of this proposal. That said, there is some preliminary work in this area that will hopefully gain more traction as the MVP is stabilized. More information about active work in this area can be found here.

Comment by @kenchris May 28, 2020 (See Github)

I believe that @arunetm has a more Web Platform focused explainer ready to share soon. Arun?

Comment by @arunetm May 28, 2020 (See Github)

Thanks Kenneth. Yes, we have an updated explainer in the works developed with @dtig, and will share it soon.

Comment by @arunetm Jun 3, 2020 (See Github)

Please find an updated explainer with web platform focus here: https://github.com/WebAssembly/simd/blob/master/proposals/simd/W3CTAG-SIMDExplainer.md @kenchris fyi.

Discussed Jun 22, 2020 (See Github)

Peter: updated explainer... Missing Tess. Should we defer this to another breakout?

Ken: I'm fine with it. It's web assembly only. Designed in a way that works across architectures... So I think it's pretty well designed. Heavily used in codecs, gamed, machine learning. It would be sad not having access to this on the web I think. Tess said should we have this on javascript - some performance issues with that. It is specialized but for people that use WA it makes a lot of sense. I don't see anything wrong with the proposal.

Peter: marks as proposed closed - and let's close it at the plenary

Comment by @hober Jun 24, 2020 (See Github)

We talked about this in our plenary call today, and we think this is ready to close. Thanks for brining this to us! Please file a followup review request if your design significantly changes. Thanks!