design-reviews#410: CompressStream

#410: CompressStream

Visit on Github

Opened

Aug 28, 2019

こんにちはTAG!

I'm requesting a TAG review of:

Name: CompressStream
Specification URL: https://ricea.github.io/compression/ (final location TBD)
Explainer (containing user needs and example code)¹: https://github.com/ricea/compressstream-explainer/blob/master/README.md
GitHub issues (if you prefer feedback filed there): https://github.com/ricea/compressstream-explainer/issues
Tests: https://github.com/web-platform-tests/wpt/tree/master/compression
Primary contacts (and their relationship to the specification): CanonMukai (primary author), ricea (secondary author)

Further details:

Relevant time constraints or deadlines: None at this time
I have read and filled out the Self-Review Questionnare on Security and Privacy. The assessment is here.
I have reviewed the TAG's API Design Principles
The group where the work on this specification is:

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

Links to major pieces of multi-stakeholder review or discussion of this specification:
Links to major unresolved issues or opposition with this specification:

You should also know that...

This proposal is still in the early stages.

We'd prefer the TAG provide feedback as (please select one):

open issues in our GitHub repo for each point of feedback
open a single issue in our GitHub repo for the entire review
leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

Discussions

Comment by @annevk Aug 28, 2019 (See Github)

Is there a single gzip algorithm everyone can and is willing to converge on?

Comment by @CanonMukai Aug 29, 2019 (See Github)

As long as the compatibility (e.g. between Chromium and Firefox, or old versions and new versions of Chromium) is kept, I think it is better if CompressStream permits other gzip algorithms.

Comment by @annevk Aug 29, 2019 (See Github)

I haven't researched gzip much, but would a single implementation ever have more than a single algorithm? If the algorithm can change, how do we ensure other participants in this ecosystem remain compatible? Where are the algorithms defined?

Comment by @CanonMukai Aug 29, 2019 (See Github)

Yes, zlib uses different algorithms depending on the compression level. Gzip is standardized and implementations extensively test against each other, so participants in this ecosystem will remain compatible. The algorithms are defined in RFC 1952 https://tools.ietf.org/html/rfc1952.

Comment by @plinss Sep 11, 2019 (See Github)

Overall this looks good to us. A few relatively minor points:

Have you considered using different classes for each compression algorithm rather than a constructor argument? e.g. let s = new GzipCompressionStream(); this might make it easier to pollyfill new algorithms.
If the compression algorithm remains an argument, please ensure that the value uses (and references in the spec) the HTTP Content Coding tokens: https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding
It might be useful to have a property to access the compression algorithm of the stream (and any future options).
It may be worth considering adding support for Brotli in V1.

Comment by @kenchris Sep 11, 2019 (See Github)

There are cases where you might want to compress data but you don't have a stream. For instance in the Web NFC API people are usually dealing with small amounts of data (ArrayBuffers for instance) and it would be useful to be able to compress those in an easy manner instead of trying to turn things into a stream

Comment by @plinss Sep 11, 2019 (See Github)

There is an example of some boilerplate code to compress an ArrayBuffer into a Uint8Array, I agree having that (probably as a static method on the class) would be convenient and prevent a lot of repetition.

Comment by @cynthia Sep 11, 2019 (See Github)

If there are more than just compression cases (e.g. I can imagine stuff like streamed crypto) where you want to throw in non-stream data, feels like that mechanism could/should be provided as a infrastructure API.

(I agree the usecase is valid and makes sense.)

Comment by @ricea Oct 15, 2019 (See Github)

Have you considered using different classes for each compression algorithm rather than a constructor argument? e.g. let s = new GzipCompressionStream(); this might make it easier to pollyfill new algorithms.

I discussed this with @CanonMukai but we preferred to use a single constructor. At least in Blink, new symbols on the global object are moderately expensive, and I wanted to avoid adding them unnecessarily.

My current thinking for polyfilling / user extensibility is to have a dictionary CompressionStream.algorithms where the keys are algorithm names and the values are factory functions. This also makes inspecting available algorithms easy.

If the compression algorithm remains an argument, please ensure that the value uses (and references in the spec) the HTTP Content Coding tokens: https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding

I hadn't considered this. My one reservation is that using "br" for Brotli seems a little obscure.

Filed as https://github.com/ricea/compression-streams/issues/7.

It might be useful to have a property to access the compression algorithm of the stream (and any future options).

Yes. One thing that needs more examination is that different algorithms will have different options. For example, gzip will eventually have a filename option. It's not immediately clear to me how to handle this.

Filed as https://github.com/ricea/compression-streams/issues/6.

It may be worth considering adding support for Brotli in V1.

The awkward thing about Brotli is that to support compression we need to add a 140KB dictionary to the binary. This is a difficult trade-off and I don't have good answer for it yet. Supporting decompression is nearly free, but does it make any sense to only support decompression?

Comment by @domenic Oct 15, 2019 (See Github)

For example, gzip will eventually have a filename option. It's not immediately clear to me how to handle this.

There is a small precedent here with canvas.getContext(), whose second argument is an optional options object. This is handled on the spec level by declaring it optional any and then switching on the first argument before manually invoking the Web IDL "convert arg to Y" algorithm, where Y is the specific dictionary type in question (e.g. CanvasRenderingContext2DSettings vs. ImageBitmapRenderingContextSettings. I think that's a reasonable pattern, both for spec-writing and for web developer usage.

Comment by @annevk Oct 15, 2019 (See Github)

(If we keep using that pattern we should try to abstract it into IDL though.)

Comment by @ricea Oct 15, 2019 (See Github)

There is an example of some boilerplate code to compress an ArrayBuffer into a Uint8Array, I agree having that (probably as a static method on the class) would be convenient and prevent a lot of repetition.

I've filed https://github.com/ricea/compression-streams/issues/8 for this.

If there are more than just compression cases (e.g. I can imagine stuff like streamed crypto) where you want to throw in non-stream data, feels like that mechanism could/should be provided as a infrastructure API.

I filed issues against the streams standard to discuss these APIs: https://github.com/whatwg/streams/issues/1018 and https://github.com/whatwg/streams/issues/1019. If we implemented these, the boilerplate would reduce to

const output = await ReadableStream.from(input).pipeThrough(new CompressionStream('gzip')).arrayBuffer();

Although there's no reason why we can't do both.

Comment by @ricea Oct 15, 2019 (See Github)

A couple of updates to the metadata:

Specification URL: https://ricea.github.io/compression/ (final location TBD) Tests: https://github.com/web-platform-tests/wpt/tree/master/compression

Comment by @cynthia Oct 15, 2019 (See Github)

@ricea should I update the initial post's metadata to above?

Comment by @ricea Oct 15, 2019 (See Github)

@ricea should I update the initial post's metadata to above?

Yes, thank you.

Comment by @lknik Oct 21, 2019 (See Github)

Hi,

Looked at the security & privacy questionnaire. Thanks for listing CRIME. Otherwise for the time being looks OK to me.

Comment by @ylafon Dec 17, 2019 (See Github)

If the compression algorithm remains an argument, please ensure that the value uses (and references in the spec) the HTTP Content Coding tokens: https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding

I hadn't considered this. My one reservation is that using "br" for Brotli seems a little obscure.

A little bit obscure but still well known, so it is always better to avoid defining another such registry.