Design Review #562

#562: Storage Buckets API

Opened Oct 7, 2020

HIQaH! QaH! TAG!

I'm requesting a TAG review of Storage Buckets API.

The core of the proposal is granting sites the ability to create multiple storage buckets, where the user agent may choose to delete each bucket independently of other buckets. By contrast, today's user agents have a binary choice of either persisting or deleting all the data stored by a site.

Explainer: https://github.com/WICG/storage-buckets/blob/gh-pages/explainer.md
Security and Privacy self-review²: https://github.com/WICG/storage-buckets/blob/gh-pages/security-privacy-self-assessment.md
GitHub repo: https://github.com/WICG/storage-buckets/
Primary contacts:
- Ayu Ishii (@ayuishii), Google Inc., specifier / implementer
- Victor Costan (@pwnall), Google Inc., specifier / implementer
Organization/project driving the design: Chromium
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5739224579964928

Further details:

I have reviewed the TAG's API Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
The group where standardization of this work is intended to be done ("unknown" if not known): WHATWG
Existing major pieces of multi-stakeholder review or discussion of this design: https://github.com/WICG/storage-buckets/issues
Major unresolved issues with or opposition to this design: none
This work is being funded by: Google Inc.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Discussions

Discussed Nov 9, 2020 (See Github)

Tess: Storage standard... very abstract. defines conceptual framework that other storage mechnisms can hook onto. E.g. as other specs are built on top of Fetch. Similar idea here. Also specifying what happens when you "clear storage data" button .etc...

... key value store. Key is what origin and what kind and value is what the backing store is. Not really something that's exposed to authors. Definition of something called a storage bucket. All the storage for an origin fits into the same bucket. API allows for programmatic access to storage buckets. We now live in a world of partitioned storage. E.g. facebook.com is one bucket, facebook.com in an iframe on NYTimes is a different bucket. API helps websites understand when the bucket under them is changing. Can allow sites to create an ephemeral storage bucket. Adding an API to it might require more work.

Ken: I went to TPAC breakout. You can delete some of these buckets if there is a right to be forgotten request. For login info you really don't want that to be gone - or for a boarding pass - giving developers more control or due to regulatory requirements.

Tess: Dedicated proposals for login or regulatory consents... Site wouldn't have to use its own storage for that.

Ken: normally you have to write to people "we're going to store this for xyz timeframe" and after that's it's gone [under gdpr].

Dan: is there a privacy concern - data leakage between buckets?

Tess: example of 2 tabs open to facebook - it can already know...

Dan: container tabs?

Tess: it would have no way of knowing. This doesn't change that. Useful for things like storage access API - and storage access changes - getting notified that the bucket is invalid. "What happens to temp storage" - how do you swap it out in a way that's safe? Uncommittedd indexdb transaction... This API helps with defining that precisely. Super unclear how this should work right now.

Rossen: biggest worries?

Tess: I deeply understand why we need to do this but haven't reviewed if this is a good design.

Rossen: Reading the privacy assessment - on the question on deistinguising between 3rd and 1st party context - not proposing any distinction between 1st and 3rd. Down the road it may restrict access to 3rd party. That is a warning flag to me.

Tess: what's unclear from the explainer - defining partion storage is an ongoing effort. That's "in flight". I agree with the worry but since it's in flgiht it's probably appropriate.

Ken: this is early days.

Tess: I think we should take a closer look at the API itself.

Dan: I'll do a stronger review of the privacy & security response.

Ken: I'll look at the API.

Rossen: Storage buckets - the API is hanging off of navigator. That's for the current origin?

[bumped 2 weeks]

Discussed Nov 23, 2020 (See Github)

[]

Discussed Nov 30, 2020 (See Github)

[discussion on the API]

Ken: there is no API spec yet...

ken: [leaves commet]

Dan: I will look at the privacy & security response - briefly looking it looks reasonable.

[bumped a week]

Comment by @kenchris Nov 30, 2020 (See Github)

Hi there,

Thanks for the great explainer, we are generally supportive of this idea, but would like to review the API a bit more in detail. We usually don't want IDL listings in the explainer, but as you don't have a spec at this point, would it be possible to add an API.md file with all the proposal API? Thanks

Comment by @ayuishii Dec 2, 2020 (See Github)

Hi @kenchris ! We added an API.md file that outlines the IDL to go with the proposal. Thanks!

Discussed Dec 7, 2020 (See Github)

Tess: we wanted them to provide us a summary of the API - and they have done so - in IDL. We could look at it now.

Ken: on API I'm not a big fan of APIs with Open or Create... I don't understand - all storage buckets have a name and also a title - metadata you can set. You can go into settings and see a descriptive name. You run into issues of a11y, i18n..

Tess: we've been reluctant to expose author-provided strings in security UI like that...

Ken: I think it should just be called open. Also - i don't know about the expiration thing - takes a DOM timestamp. Will this work with summer time, etc...

Tess: DOM timestamps are always UTC.

Ken: ok.

Tess: the rest of it is "what hangs off of a storage bucket"

Ken: ...name of persist...

Tess: our naming advice would suggest a simpler name - like "save". Depends on what persist does.

... durability concept - durability relax - is the UA free to delete things things? But even in strict if the UA is under storage pressure it's gonna get rid of your stuff. So - don't like APIs that imply to the author that this is durable.

Ken: maybe "something i care about" / "something I don't care about"

Tess: The service worker integration is complex but it needs to be...

Ken: file API thing...

Tess: these APIs are straight from the file api... so consistency.

Ken: set expires -- doesn't it make sense to set expiration date? set expires sounds like a boolean.

Tess: to summarize: we have some naming concerns and other similar concerns... but overall it seems fine.

Dan: why is this better for user privacy?

Tess: the explainer covers this reasonably well.. Currently browsers have an all-or-nothing choice about storage. This API would allow browsers to be a little more intelligent. For example, an email client could put mailboxes in 1 bucket - and say that the durability is relaxed - and a 2nd bucket for draft messages you haven't sent yet - and put that under "strict" durability. So UA can delete the "relaxed" stuff first. This doesn't change partitioning of storage - it just makes it more convenient about how to selectively purge storage. Theoretically the privacy story is "no change" - there could be some fingerprintability - you could create different buckets but it's really hard to do...

Dan: expiration date could allow the developer to set expration on data that isn't needed anymore - aids in data minimization. Good for sustainability.

Tess: Yes.

... you could imaging user agents to want a default expiration. That would be a huge win compared to existing APIs.

... controling the expiration of the default bucket...

Ken: you can't get access...

Tess: in the S&P they talk about exposing the default bucket... could be good because it encourages sites to set a

[discussed possibly closing in the plenary call this week - otherwise 2nd week of jan]

Comment by @kenchris Dec 7, 2020 (See Github)

We (mostly @hober and me) looked at this in our TAG breakout, we have some naming concerns and other similar concerns... but overall it seems fine.

We are not a fan of APIs with openOrCreate - it could simply be called open or have a "create: true" as part of options, but we think auto creation is fine.

All storage buckets have a name and also a title (metadata) you can set. You can go into settings and see a descriptive name. You run into issues of a11y, i18n and generally user agents have been reluctant to expose author-provided strings in security UI like that.

Now a fan of the name of persist. It is unclear what it does and it is unclear if it modifies the object. Our naming advice would suggest a simpler name - like "save". Depends on what persist does.

Regarding durability concept - durability relax/strict - is the UA free to delete things things? But even in strict if the UA is under storage pressure it's gonna get rid of your stuff. So - don't like APIs that imply to the author that this is durable.

Maybe something like "ephemeral" and "important" would make more sense, as you indicate if you care about the content or not.

setExpires -- doesn't it make sense to call it setExpirationTime? setExpires sounds like a boolean.

Comment by @torgo Dec 7, 2020 (See Github)

It seems like this could be a win from a data minimization perspective - if it allows the developer to set an appropriate expiration over certain categories of data. This could aid user privacy.

Comment by @ayuishii Dec 14, 2020 (See Github)

Thanks for the feedback @kenchris, @hober & @torgo!

We are not a fan of APIs with openOrCreate - it could simply be called open or have a "create: true" as part of options, but we think auto creation is fine.

Sounds good, we can update this to open and auto create the Storage Bucket. (https://github.com/WICG/storage-buckets/issues/26)

All storage buckets have a name and also a title (metadata) you can set. You can go into settings and see a descriptive name. You run into issues of a11y, i18n and generally user agents have been reluctant to expose author-provided strings in security UI like that.

Sounds good, thanks for pointing this out! (https://github.com/WICG/storage-buckets/issues/27)

Now a fan of the name of persist. It is unclear what it does and it is unclear if it modifies the object. Our naming advice would suggest a simpler name - like "save". Depends on what persist does.

Sorry for the confusion here. Our intent (which I should probably more explicitly point this out in the explainer) was to adopt the StorageManager.persisted()/StorageManager.persist() feature and keep the naming consistent for user created Storage Buckets. StorageManager.persist() is a feature on the existing default bucket and the thought was to keep this feature for user created Storage Buckets as well.

Regarding durability concept - durability relax/strict - is the UA free to delete things things? But even in strict if the UA is under storage pressure it's gonna get rid of your stuff. So - don't like APIs that imply to the author that this is durable.

Sorry for the confusion here as well. This is specifically meant to be durability in the face of power failures and was named to be consistent with durability in IDBTransaction. But we realize this naming here isn't the most clear... would you have any suggestions for better naming for this option in terms of power failure?

setExpires -- doesn't it make sense to call it setExpirationTime? setExpires sounds like a boolean.

Would this just be for the setter, or would you expect the getter also be expirationTime() as well? We wanted to keep expires() consistent with Cookie Store API, therefore we thought setExpires would be a good matching setter. But I can see how it may sound like a boolean as well.

Thank you!

Discussed Jan 11, 2021 (See Github)

Ken: I read the feedback, it's really good. Some of the comments about names, it's consistent with existing things so I think that's okay. Consistency is very important even though the names might not be perfect.

Dan: propose close?

Ken: yeah this is early review. We like the feature, they are listening to our feedback, we have given some comments and we look forward to a late review.

Dan: did they say what the venue is?

Sangwhan: feels like webapps

Ken: they haven't said. Probably same as cookie store API. I'll write that the feedback sounds sensible.

Sangwhan: use cases and goals I'm happy with, detailed API I'm not up to date

Dan: and multi stakeholder, I don't see

Ken: done

Sangwhan: the multi stakeholder thing is at the bottom of the explainer

Ken: it's definitely there, I'll remove that

Dan: there are some words here.. when they say Gecko positive, what does that mean? I would like to see a link to the Mozilla standards position. Web developers - positive? I'd like to see more.. be good to see some evidence. Some blog posts?

Ken: I'm a web developer and I like it

Sangwhan: it takes away the problem of namespacing so you can scope storage from the developer to the browser, so that's useful

Dan: I'm in no way making the arguement that it's not useful. I'd just like to see more evidence when they make these assertions.

Ken: maybe we should change our issue template to point out please send the standards position

Dan: I might add an additional comment asking for more detail on the developer feedback.

Discussed Jan 11, 2021 (See Github)

Tess: Quick summary: Ken and I looked at it, liked the general idea, wanted some kind of summary IDL to understand what the API is.

... they provided that, Ken and I haven't managed to do a second review on it since then.

... We talked about doing a 1:1 breakout to do that, but we haven't yet scheduled that. Other folks are also assigned to this... but doing a breakout between Tess, Ken and Sangwhan is near-impossible timezone-wise.

Peter: Should we push this to the f2f?

Alice: still overconstrained.

Peter: I would simply have multiple breakouts...

Comment by @kenchris Jan 12, 2021 (See Github)

The feedback sounds sensible, and we are happy to see this move forward

Have you thought about what venue you want this to end up in after WICG?

Comment by @cynthia Jan 12, 2021 (See Github)

Sorry, found it. https://github.com/WICG/storage-buckets/blob/gh-pages/explainer.md#stakeholder-feedback--opposition

Comment by @torgo Jan 12, 2021 (See Github)

Can you provide a little more info on the positive developer feedback on this that you mention in the multi-stakeholder feedback section? Also, referencing Ken's comment, it would be good to see some more evidence of multiple implementations such as a positive Mozilla standards position?

Comment by @ayuishii Jan 14, 2021 (See Github)

The feedback sounds sensible, and we are happy to see this move forward

Awesome, thank you!

Have you thought about what venue you want this to end up in after WICG?

We see this eventually being a part of the Storage Standard.

Can you provide a little more info on the positive developer feedback on this that you mention in the multi-stakeholder feedback section? Also, referencing Ken's comment, it would be good to see some more evidence of multiple implementations such as a positive Mozilla standards position?

We've had developer participation & positive feedback in our in our TPAC breakout sessions in 2019 [1] & 2020[2] including participants from the Washington Post who have expressed interest in being early testers. We also are working with an internal Google partner who is interested in this API.

As for implementor interest, we have done a joint TPAC 2019 presentation with Mozilla, with some more context on this thread (https://github.com/whatwg/storage/issues/2). But I've just requested a formal position here which will hopefully reflect this. Although no formal position from Safari at this point, they have actively participated in our discussions in TPAC 2019 [1] & TPAC 2020 [2].

[1] TPAC 2019 discussion: doc [2] TPAC 2020 discussion: minutes

Thank you!

Discussed Feb 15, 2021 (See Github)

Yves: is there an issue with partitioning?

Ken: we discussed this but don't recall the answer.

Yves: [leaves comment asking this]

Comment by @ylafon Feb 16, 2021 (See Github)

Hi, Just a quick question, as partitioning might happen in storage as well, how does it impact Storage Bucket API? Thanks

Comment by @ayuishii Mar 15, 2021 (See Github)

Hi @ylafon, Storage Buckets API won't conflict with storage partitioning efforts. We are in communication with the team and making sure the implementation design allows for both.

As for Storage Buckets API support in storage partitioning, it is still undecided. It will likely follow the pattern of other storage APIs and will be partitioned in 3rd party contexts. However, there is consideration of not allowing for Storage Buckets API in partitioned storage as well.

Discussed Apr 26, 2021 (See Github)

Hadley: I'm not sure they answered Yves' question. diff between storage buckets and partitioning?

Yves: organise buckets by yourself rather than by each file and resource..

Ken: control your own storage inside your own application. Like caching. You can say which is privacy sensitive..

Yves: or a bucket you pin in your cache.

Ken: per user per site. Site is in control. UI that says you can delete cache but have a bucket you don't want to delete. Site has more control over how caches are stored. Good with regulation, some data needs to be stored and treated in a specific way. Eg. right to be forgotten.. want to remove private info but not necessarily cached images.

Dan: positive from firefox, no signal from edge or safari

Yves: fine to clos

Comment by @ylafon Apr 28, 2021 (See Github)

Thank you for the clarification. To reiterate what @kenchris said: we are happy to see this move forward.