#566: Review of NativeIO

Visit on Github.

Opened Oct 28, 2020

HIQaH! QaH! TAG!

I'm requesting a TAG review of NativeIO.

NativeIO is a storage API that resembles a very basic filesystem, with direct access to stored data through buffers and offsets. Our goal is to give developers flexibility by providing generic, simple, and performant primitives upon which they can build higher-level components. It's particularly well suited for Wasm-based libraries and applications that want to use custom storage algorithms to fine-tune execution speed and memory usage.

Further details:

  • I have reviewed the TAG's API Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done ("unknown" if not known): unknown
  • Existing major pieces of multi-stakeholder review or discussion of this design: https://github.com/fivedots/nativeio-explainer/issues
  • Major unresolved issues with or opposition to this design: none
  • This work is being funded by: Google Inc.

We'd prefer the TAG provide feedback as:

šŸ› open issues in our GitHub repo for each point of feedback

Discussions

Discussed Nov 1, 2020 (See Github)

Ken: if you want to implement a db in webassembly - implementing it on top of indexdb not good for performance. Alternative might be to use filesystem access API. You kind of want a big blob of data to read from and write to. Filesystem API has a lot of permissions issues... This is a different API - similar in shape to filesystem access but no bloat - because security is different. Can it be abused? Very nice place with file access - it's mutable.

Dan: left comment asking for implementer info and more info on use cases...

Ken: need to look closer at this.

[disvcussion of this vs. filesystem access]

Ken: video editing use case... other cases where performance is required..

Dan: we're back to primitives vs not-primitives

Ken: a native app will use a bit of the hard drive...

Ken: I don't think it will work for a db like mysql because the binary file for the webassembly will just be too big.

Ken: ... other cases like video editing or machine learning ...

Discussed Nov 1, 2020 (See Github)

Ken: seems to be orign-bound filesystem. Different to filesystem - you can open directory. Different sites can use the same directory.

Dan: So it might better be called "orgin bound virtual filesystem"... the current name is quite scary.

Ken: of course with this they don't need the same kinds of dialogs. I'm still worried about quota. Video editing, etc... needs lots of disk space. How is that managed? Handled with usual quota things.

Dan: left a comment

Sangwhan: let's imaging a scenerio where you use something like this to implemnt a PCB CAD webapp - you'd have multiple files... if you bind it to an origin you can't take it to another tool...

Dan: you'd need to export.

Ken: if you want to bring a video to another tool to say add special effects.

Sangwhan: moving between different CAD applications is a pretty common case. The filesystem API does do this.

Dan: trade-off between privacy ....

Sangwhan: what if you want to share between native app and this?

Ken: you'd need to use filesystem API

Sangwhan: these APIs don't acknowledge eachother...

Ken: yes could it be an option on the other API? Give me sandboxed filesystem?

Sangwhan: yes it's a sandboxed filesytem where you can't access the files... I don't see why you would use this over IndexDB... Except for porting legacy application...

Ken: music editing, video editing....

Sangwhan: yeah but this design doesn't make sense for music editing... You have to bring in small pieces of video / audio, assets...

Ken: You could use this in combination with filesystem access... or you get them from a cloud system...

crafting comment

Yves: could you add a few things to make Cache API use Native API... immutable flag... persistence... store until this freshness expires...

Sangwhan: something you can mark as ephemeral...

Yves: it would be good if instead of creating a new system, create a lower-level system that you could build something on top of...

Dan: ... and what about storage buckets?

Comment by @torgo Nov 24, 2020 (See Github)

Hi @fivedots - thanks for this. One initial question: is there any discussion going on with other implementers? Has there been any feedback or interest?

Also, you state that this helps with "much-requested use-cases for the web, such as implementing performant databases and gracefully managing large temporary files." I'm wondering if there are some bugs you can link to?

Comment by @torgo Dec 1, 2020 (See Github)

Taking a look at this again - one thing that occurs to us is that the name itself is a little confusing (and alarming). If it's origin-bound then maybe something like "origin bound virtual filesystem" would be more accurate? The name should be more descriptive so that web developers understand what this is and can make a reasonable decision about how to use it, compared to other APIs. Also "I/O" could cover a lot of bases - e.g. webUSB, webserial, ... but what we are talking about here is filesystem primitives...

Comment by @kenchris Dec 1, 2020 (See Github)

How much does this API differ from the one provided by File System Access? And when it differs substantially then why?

Comment by @torgo Dec 1, 2020 (See Github)

Also @fivedots @rstz - can you get back to us on the previous question regarding implementer interest?

Comment by @cynthia Dec 1, 2020 (See Github)

We discussed this in today's call. Some questions we had based on our understanding of the explainer:

  1. Given that the characteristic of what this provides is a "file" (quotes due to it being a quasi-file in a virtual filesystem land), how can one share this "file" or a set of "file"s to another origin? This feels like a useful feature that we are unsure how it would be implemented.
  2. How can these files be accessed from the OS? I suppose the answer is "you don't" - but given that everything doesn't happen on the web this seems like a missing feature.
  3. How does this API relate to storage buckets?
  4. Should there be an "ephemeral" mode for these handles? Curious about what your thoughts on with regards to immutability as well.

These are questions we had from the first skim - we'll look into this in a bit more depth in a separate breakout later.

Comment by @torgo Dec 1, 2020 (See Github)

Additionally, beyond WCIG where do you see this API being worked on?

Comment by @fivedots Dec 1, 2020 (See Github)

Iā€™ll try to reply to everything that has come up:

@torgo

  • Re: Other implementers, weā€™ve received feedback from people working on Firefox and Safari in the issues of the explainer, in particular #4, #9, and #12 and informally at TPAC. That being said, we havenā€™t yet met or discussed next steps.

  • Re: Bugs showing requested use cases, I donā€™t have publicly linked bugs at hand. The project was started before I joined, so Iā€™m asking the people involved, Iā€™ll update this issue once I have links to share. We have received direct positive feedback from partners, saying that NativeIO unblocked critical parts of their applications.

  • Re: Naming, we agree. Weā€™ve been meaning to change the name soon, the current candidates are Direct Access Storage API (to highlight the performance and direct random access of the allowed by the API), and Storage Foundation API (to highlight the main use cases, where the API acts as a low level backend for more sophisticated libraries e.g. SQLite or a bespoke cache).

  • Re: Beyond WICG, we were thinking that it should be worked on WebAppsWG

@kenchris

  • Re: Similarities with File System Access API, we differ on goals and surface, although weā€™ve had some conversations on possibly merging some of the functionality offered by the Origin Private File System and NativeIO. From my understanding, File System Access API was added to allow more powerful access to files stored in the client, with the Origin Private File System acting as helper for testing and allowing the writing of data before prompting for permission. NativeIO acts as a more traditional storage API, with data being bound to the origin and not necessarily accessible to applications in the host. The difference in interface (like File System Access API relying on Streams, while NativeIO using buffers) and assurances (like File System Access API requiring SafeBrowsing checks and creating file copies before writing to them) are usually in the service of performance and genericity. We are currently benchmarking and prototyping to see if itā€™s possible to merge parts of the APIs, with the main risk being that trying to cover many diverse needs, we wonā€™t meet the requirements that make NativeIO useful.

@cynthia

  1. We hadnā€™t considered sharing files between origins, it hasnā€™t come up in our dev trial yet. I agree it would be interesting, Iā€™ll look into what changes would have to be made and update this issue.

  2. Indeed, since we see NativeIO purely as a storage API, we havenā€™t added ways to interact with the OS. I think that may be covered by the File System Access API, so detailing the interaction between the two APIs might be useful. Generally exploring and documenting the interaction between NativeIO and other storage APIs was on our list, so weā€™ll eventually add it to the explainer.

  3. The proposals are not directly related, but we expect that NativeIO would support buckets when they ship. It would probably mean that developers could specify to which bucket a file belongs to. We also talked about relying on buckets to mark files as temporary or ephemeral, but we havenā€™t really explored much of that.

  4. Yes, ā€œephemeralā€ files have come up on partner feedback. Weā€™ve internally discussed a proposal to add a flag to the open method that allows marking a file as ā€œdeleteOnCloseā€, which would work for new and existing files. Iā€™ll update the explainer as soon as the details of the design are hammered out. As for immutability, we havenā€™t discussed it so far. I could see how it might be useful if files can be shared between origins, but right now an origin has to create all the data within NativeIO by themselves. Is there a particular use case that you have in mind?

Thank you all for the feedback and help!

Discussed Jan 1, 2021 (See Github)

Dan: Previously left a bunch of feedback saying given the characteristics of what ??? a file, how can I share this file with my web application... how can these files be accessed from the OS, how does it relate to storage buckets, should there be ephemeral mode... and what's the venue

... They responded saying it should be in WebApps...

Sangwhan: did we ever ask about fsync? Flush to disk. I don't remember seeing that, seeing any guarantees.

Ken: This is a performance thing ...

Sangwhan: There are times when you would definitely want to flush.

... say you put a flush on close... (missed)

Ken: Talking about having a delete on close, could have a flush on close.

Sangwhan: If execution context is destroyed... want something like Beacon?

Ken: There is a flush function actually, looking at the API..

Sangwhan: Doesn't come with guarantees

Ken: (quoting from explaiener)

Sangwhan: Reliable as long as flush finishes before execution context is destroyed.

Ken: It's a good question.

Sangwhan: A lot of things don't guarantee anything once you lose the execution context. All bets are off.

Dan: Same problem we've encountered elsewhere, API being introduced into the platform, a capability that is overlapping with similar capabilities with some significant differences, being introduced as a completely different technology that creates bloat.

Ken: I left feedback to that effect... for a start the name is important, they have decided to change the name to highlight the main use cases.

Dan: It's still using its own concept of a filesystem.

Ken: ... bloat in the implementation ... stay secure... this is for e.g. BYO database, like doing MySQL from WebAssembly. They def know about the filesystem access API, but you can't base one on the other for performance reasons...

Dan: That's an implementation detail; from a dev perspective feels like it should be the same thing, I want to open a file, how do I know which one to use? UAs should be abstracting that implementation difference away, making the dev experience more straightforward

Sangwhan: We have the abstractions, but not the unabstracted version, which this provides. This is the lower level.

Dan: So you think they've adequately addressed our questions?

Ken: Haven't finished designing yet so hard to say, but they're aware of the feedback..

Sangwhan: Not seeing solutions yet to the problems we raised

Ken: But they're still working on it.

Dan: If we've left the feedback we want to leave and they've taken it on, we should probably close with some directions... consider the dev experience of filesystem-like APIs overall when coming up with their final design

Ken: Try to keep the APIs as similar as possible... happy with (?), happy with the name - doesn't sound like how I access a file...

... not access to local files, more like a cache kind of thing...

... you don't have those files, you can't access them from the OS in general.

Sangwhan: This is not for "common" developers, this is for not having to reimplement the whole storage part when you're porting an app to webassembly.

Ken: Or ... people could use the FS access API and emulate this, but then an external application could modify those files and potentially even trigger browser bugs taht way.

Dan: Still feels like why isn't this an optional feature... or an option on FS access API?

Ken: It's not a filesystem

Sangwhan: If you get a handle from the FS API you might be able to ??? these APIs, but it's nto directly related.

Ken: It's like a virtual filesystem.

Sangwhan: This is like libc level.

Dan: Think we should close this and leave the feedback we've left. Propose close, close in plenary. Should be encouraging them to consider the complexity they're adding, thinking about ways to simplify.

Sangwhan: Would like folks to consider the use cases and concerns that we've noted above.

... Oh, and a flush guarantee.

Ken: I think I'm ok with the name. It doesn't say filesystem.

Dan: We've talked about how this is origin-bound and that's the web security model, but also asked about access across origins... kind of conflicting feedback. Intention was just to ask if there's a use case for cross-origin access. If there were that access, it would be problematic... unless we're talking about a first party sets scenario.

Ken: I assume it would have to be... don't want to transfer my database to another site

Sangwhan: You've got CAD, or Eagle... generating PCBs... want to be able to transfer that data from one app to another. Audio editor to video editor.

Ken: Write it to a file?

Sangwhan: It's possible by binding the FS access API... if a FS access API has these handles you can flush a file to disk, then load it up from the other app.

Ken: This is a pretty big use case... a lot of native apps today are looking at doing this kind of thing.

Sangwhan: It's not an imaginary use case. If these files are trapped in virtual FS land, it's inaccessible to the user. You can make SQL Lite easier to use, but that's just one use case...

Ken: Would you use this API for this instead of the FS Access API? When would I use this API instead.

Dan: leaves comment on developer complexity

Sangwhan: Legacy CAD format that's implemented in C, for example... don't want to port it all over to JS...

Ken: If we can write to a non-virtual file, that would solve this concern...

Sangwhan: If there were a connection, yes. That connection is currently missing.

... Will try and leave a comment around this.

Ken: Writing it to a different file mitigates Dan's concern

Sangwhan: Yes

Ken: It's like saving a file.. ... delegating access to storage from one origin to another origin... that's a can of worms.

Comment by @kenchris Jan 11, 2021 (See Github)

Hi there, we will look at this again tomorrow in a breakout, so wondering if there has been made any progress since your last comment?

Comment by @fivedots Jan 11, 2021 (See Github)

Hello, I only have a short update:

  • After socializing our possible new names, we are leaning towards changing the name to Storage Foundation API
  • We discussed sharing files across origins. Our main worry is that it will add a lot complexity (especially when managing file life cycle) and possibly constraints to the overall performance of the implementation. We haven't received feedback from devs requesting this sort of transferring, so we would need more context to figure out what the right cost/benefit balance is. We are keeping an ear out to understand how this could help!
Comment by @cynthia Jan 12, 2021 (See Github)

One question - will flush() come with guarantees? What happens if your flush() doesn't finish and the tab is closed by the user? Does this warrant a weird special path like we have with Beacons? (Feels like if there is a guarantee then it should be normative in the spec, when that is written.)

Comment by @kenchris Jan 12, 2021 (See Github)

I like the name "Storage Foundation API" as it will make it clearer to developers that this is not a file system API in the normal sense.

WebAppWG also sounds like the right venue.

Comment by @torgo Jan 12, 2021 (See Github)

One meta-level comment: in your ongoing work on this API and technology can you please consider the complexity of the developer experience. When we provide multiple, very similar and similarly-named APIs (which may have important differences) it makes the overall platform more difficult for developers to grok. Maybe consider an MDN article which lays out the differences for all the filesystem-like APIs?

Comment by @cynthia Jan 13, 2021 (See Github)

We'll close this for now - looking forward to revisiting when there is a spec. (hopefully with the points we brought up above sorted out!) Thanks for bringing this to our attention!