design-reviews#536: IndexedDB putAll

#536: IndexedDB putAll

Opened Jul 14, 2020

Saluton TAG!

I'm requesting a TAG review of IndexedDB putAll.

Add a new endpoint to the IndexedDB API that will provide developers with a way to bulk-insert many records into an IndexedDB object store.

Explainer¹ (minimally containing user needs and example code): https://github.com/nums11/idb-putall/blob/master/Explainer.md
Security and Privacy self-review²: N/A
GitHub repo (if you prefer feedback filed there): https://github.com/nums11/idb-putall
Primary contacts (and their relationship to the specification):
- Daniel Murphy (@dmurph), Google
- Olivier Yiptong (@oyiptong), Google
- Joshua Bell (@inexorabletash), Google
Organization/project driving the design: Google
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/features/5789998911586304

Further details:

[ ✓ ] I have reviewed the TAG's API Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): Google
The group where standardization of this work is intended to be done ("unknown" if not known): unknown
Existing major pieces of multi-stakeholder review or discussion of this design: https://github.com/w3c/IndexedDB/issues/69
Major unresolved issues with or opposition to this design: N/A
This work is being funded by: N/A

You should also know that...

N/A

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

Discussions

Comment by @kenchris Jul 15, 2020 (See Github)

Why is there All in the name. You are just doing bulk insert, that might not be the same as "all" from developer point of view.

Comment by @dmurph Jul 15, 2020 (See Github)

I believe it was chosen to match getAll, and it's functionality seems to match other instances of putAll in languages (all java collections implementation of putAll, "insert all" for sql, etc).

What do you think developers will think putAll means, if not to put all of the arguments into the database? I'm not aware of another interpretation, but it could definitely exist :)

Discussed Jul 20, 2020 (See Github)

[reviewing]

Sangwhan: IndexedDB has this thing called put - people have been asking for "put many things" - so you have to have a for loop, which blocks. If you have a huge amount of stuff you want to put in and then get an event at the end there is no solution for that. This is one of many things like this. Because of legacy reasons they don't want to move away from an event based architecture (vs async). So I brought that up. The original IDB editor commented. There was an attempt to try that but it didn't succeed.

Dan: so more to do ... on our side.

Sangwhan: some archaeology. The naming issue isn't so much of an issue.

Dan: sounds like a reasonable request...

Sangwhan: yes, it doesn't seem nice [the way it is].

[bump

Comment by @cynthia Jul 20, 2020 (See Github)

I don't have a strong opinion on what is better, but aside from putAll(), putMultiple() and putBatch() also seem to exist as potential options in different contexts. Explicitly indicating that the function is intended for multiple entries does make perfect sense though.

Question 1) The explainer suggests that a later key-value pair clobbering an early key-value pair is fine; but this feels like it might obstruct debugging for cases where this happens due to human/code errors. (Imagine debugging something where the 3rd entry is clobbered by the 499844th entry out of 1M inserts, and you are trying to figure out why) Would this make sense to throw? (or optionally throw?)

Question 2) The examples seem to suggest this is a synchronous API; given that this may be used to batch insert lots of data that could potentially block - would there be a async path for this feature?

Comment by @nums11 Jul 20, 2020 (See Github)

Thanks for the input :). Inserting a record with a duplicate key using put overrides the previous record so I assumed that putAll would have the same behavior for consistency.

Sorry for the miscommunication, this API is asynchronous and uses the existing 'onsuccess' and 'onerror' event handling present in IndexedDB. I've updated the explainer to be more clear on this.

Comment by @nums11 Jul 20, 2020 (See Github)

Additionally, IndexedDB already has a separate API endpoint called add() that throws an error on duplicate key insertion so I tried to stay away from that. In the future considerations section I mentioned an addAll() which could possibly have similar behavior to putAll() except it would throw an error on duplicate insertion like add().

Comment by @cynthia Jul 20, 2020 (See Github)

this API is asynchronous and uses the existing 'onsuccess' and 'onerror' event handling present in IndexedDB. I've updated the explainer to be more clear on this.

Thanks for following up on this. There is an inherent risk of inconsistency in the API surface, but could a non-event based asynchronous design be considered here? The event based design I believe that IDB has predates idiomatic async patterns in JS, so it made sense back then but it feels a bit strange to continue with that pattern for a new-ish API.

In the future considerations section I mentioned an addAll() which could possibly have similar behavior to putAll() except it would throw an error on duplicate insertion like add().

We'll look into this and provide further feedback. Thank you for the info.

Comment by @dmurph Jul 20, 2020 (See Github)

this API is asynchronous and uses the existing 'onsuccess' and 'onerror' event handling present in IndexedDB. I've updated the explainer to be more clear on this.

Thanks for following up on this. There is an inherent risk of inconsistency in the API surface, but could a non-event based asynchronous design be considered here? The event based design I believe that IDB has predates idiomatic async patterns in JS, so it made sense back then but it feels a bit strange to continue with that pattern for a new-ish API.

I think this is definitely an option. The main hurdle, in my opinion, is event ordering. If this was a promise, how would we order the execution of this promise vs all of the other events? We have a lot of this defined by spec, and it could get complicated.

@inexorabletash you probably have the best context here. How difficult would it be to make this API return, say, a Promise spec-wise?

I know implementation-wise this would be a little bit rough. We would probably have to still use IDBRequest internally, and have that hold onto the promise resolve, to keep ordering correct with our auto-wrapper.

In the future considerations section I mentioned an addAll() which could possibly have similar behavior to putAll() except it would throw an error on duplicate insertion like add().

We'll look into this and provide further feedback. Thank you for the info.

Comment by @inexorabletash Jul 20, 2020 (See Github)

There has been some exploration of a new API based on top of Indexed DB that introduces a Promise-based async API rather than event-based async API kv-storage but that incubation has not progressed.

Incorporating Promises into Indexed DB is non-trivial, see for example https://github.com/inexorabletash/indexeddb-promises which describes some of the complexity around event loop / transaction integration. In short, doing it for a single method would introduce confusion for developers (as it would differ from other methods) and as the transaction activation mechanism would need to be redesigned to accommodate it (transaction activation is closely tied to the event loop, not microtasks.)

Practically speaking, Indexed DB is predominantly used indirectly by users who instead select libraries, many of which wrap the usage in promises. It's important to make sure that additions to the API surface can be integrated into libraries - i.e. by following the same transaction model.

Comment by @annevk Jul 21, 2020 (See Github)

I'm a little worried that the motivation here comes down to a slow binding layer. Is that the case?

Or is it really the primitive of bulk-insert-or-fail? I.e., either all the given records get inserted or none get inserted.

Comment by @nums11 Jul 21, 2020 (See Github)

To answer your question, the primary motivations are partner requests and customer feedback. We hope for some small performance improvements but those aren't certain and would most likely come from amortized costs of certain operations and this will have to be verified through testing.

It would be such that all the records get inserted or none get inserted.

Discussed Jul 27, 2020 (See Github)

[punt to plenary

Comment by @kbsspl Jul 28, 2020 (See Github)

would a putAll accepting options with a returnDuplicatekeys:true which onsuccess return a set of keys which existed prior and were updated be useful.

Comment by @dmurph Jul 28, 2020 (See Github)

That's a great question. We actually are interested in the ideas:

putAll returns nothing as a result
putAll returns all keys as a result

The duplicate keys idea could be an add-on to the second option.

What would you expect of the above return values?

Comment by @kbsspl Aug 6, 2020 (See Github)

I would expect 1. to be a complete success with all data either added or updated. For returning results, both, the updated and failed sets would be helpful.

apologies for the delay in responding.

Comment by @dmurph Aug 6, 2020 (See Github)

The API is being designed as all-or-nothing, so if one result fails then the whole set fails. Does this match your expectation? There shouldn't be some-failed-some-updated results right now, basically, as then the whole api call would fail anyways (and return an error event)

It sounds like you would expect the list of keys, which we can do.

Comment by @kbsspl Aug 7, 2020 (See Github)

I find both, all-or-nothing, or partial updates having genuine use-cases, but personally would expect a set of keys with partial commits happening.

additional thoughts - At the risk of increasing the complexity of the putAll usage early on. I would prefer it to be parametrized.

parameters - allOrPart = All/Part- the approach to take. returnKeys = Success/Failure/Both - the sets to return

"All" - In this approach while the data itself is not committed. For "All", "Success" returns all keys updated successfully (which is not very useful for an "All" approach and maybe ignored). "Failure" returns the keys which failed to update for whatever reason. "Both" returns whatever keys were successful and whatever keys failed, to help mitigate.

"Part" - In this approach, any data which could be successfully updated, gets committed. For "Part" - "Success" returns only all keys updated successfully, "Failure", only the failed keys, "Both" returns both the success and failed key sets.

Another point that I can think of is that the definition of "Success" itself may need to be clarified. Should everything that was "updated" irrelevant of an addition or update be labeled a "Success". OR Are only "updates" considered a success.

And what is the definition of an "update" Is updating data without any difference in the incoming and existing sets -("key_1", {title: "Quarry Memories", author: "Fred"}) updated the ObjectStore which previously had ("key_1", {title: "Quarry Memories", author: "Fred"}) an update ? vs updating ("key_1", {title: "Quarry Memories", author: "Fred"}) to ("key_1", {title: "Quarry Memories original value", author: "Fred"}) in the ObjectStore

I am not sure how well thought out the comment is, so please ignore or provide feedback to help me understand and see context if I'm not making sense..

Comment by @dmurph Aug 7, 2020 (See Github)

Thanks for the feedback!

I guess the main difficulty here with the functionality that you're mentioning is that it conflicts with the existing patterns of the IndexedDB API. Currently all calls to get or modify data in the database follows an asynchronous pattern where the result of the API call is an IDBRequest object, and then the developer listens to a 'success' or 'error' event on that request object.

We don't have any API calls that can 'partially succeed', they all either succeed totally or fail totally. All requests are associated with a transaction, which again either totally atomically commits, or fails & is aborted (and all changes are not in the database).

Regarding updating as changing vs not changing, this also isn't an existing feature of the database. putAll is trying to model a bulk-inserting of put, so at least initially we want to model what put does. The one restriction we DO provide is put vs add, where add will error if a record already exists at the given key. It wouldn't be too hard for us to also create an addAll as well with our current implementation, not sure about others.

The feature knowing if a record has been 'changed' or not might be valuable, but we haven't received any feedback that people want this. We only have feedback that people want a putAll.

To keep things consistent with the existing API patterns, and to keep this change simple, I think we want to have putAll* be all-or-nothing, and have no extra options. I think we should probably return the array of keys that were committed - this is useful for people who have auto-increment on.

If you have a use cases for the behaviors above, definitely let us know. The only one that might be 'faster' if the API supported it would probably be the constraint for telling the user if the value was updated or not. Allowing some puts to fail is already supported by just using the put API as it already exists.

Comment by @kbsspl Aug 10, 2020 (See Github)

thank you for such a detailed response. I understand the considerations better now.

Discussed Aug 17, 2020 (See Github)

Ken: I know a lot of people really want this for performance.

Sangwhan: I don't think this partial thing is really a big deal..

... if you do a partial you're breaking transactional guarantees. If you can't do further writes you've left your database in a broken state. Have to go through the remaining... seems weird. Should be all or nothing.

Alice: Basically in agreement with dmurph's comment..

Sangwhan: Yes.

Ken: I'm going to ask around about this, I don't know much about databases but I have friends who do, would like their input.

Sangwhan: I have some feedback...

... comment is that .. future considerations include an addAll API... similar to putAll but (??) entries in the same transaction.

... Would give feedback that that should come pretty soon rather than at some unclear future time. putAll proposal.. A=1 B=2 C=3 and A=4 in the same transaction... you're batching up a bunch of data and clobbering data in the same transaction... some cases where you want to be able to put those... some sort of programming error in a lot of cases. Would be useful to have this sooner rather than later.

... Discussion about returning to (?)... should be all or nothing.

... Happy to propose closing and discuss in plenary.

Discussed Aug 17, 2020 (See Github)

Tess: Good conversation on the issue but we don't really know if Dan or Ken have further issues that what's there. We shouldn't sign off without the assignees being here.

Peter: Has anyone contacted libs devs who would build on top ot this?

Rossen: Don't know.

Tess: Motivation stated is customer feedback, thus, we could assume they are interested.

Peter: That's reasonable.

Rossen: Push to Breakout C?

Peter/Tess: Yes.

Comment by @plinss Aug 19, 2020 (See Github)

Overall we're satisfied with this and ready to close. We just wanted to check if other projects that build on top of IndexedDB, such as PouchDB have been consulted and had an opportunity to provide feedback.

Comment by @torgo Sep 22, 2020 (See Github)

Discussed and agreed to close at our f2f.