#371: File Handling

Visit on Github.

Opened May 8, 2019

Góðan dag TAG!

I'm requesting a TAG review of:

  • Name: File Handling
  • Specification URL: N/A
  • Explainer
  • Tests: N/A
  • Primary contacts: @fallaciousreasoning @raymeskhoury

You should also know that... The spec isn't written yet, so this is more of an opportunity to review the proposed API.

Further details (optional):

We'd prefer the TAG provide feedback as (please select one):

  • open issues in our GitHub repo for each point of feedback
  • open a single issue in our GitHub repo for the entire review
  • leave review feedback as a comment in this issue and @-notify @fallaciousreasoning @raymeskhoury

Discussions

Comment by @hober May 22, 2019 (See Github)

Hi,

Thanks for bringing this review to us. We're trying to require all requests for review to have considered the questions in our security & privacy questionnaire. Please let us know via a comment here when you've run through the security & privacy questionnaire and can share your findings with us.

Currently, your explainer mentions several previous attempts at addressing this use case on the web, but it doesn't consider why the previous attempts failed, nor does it say what makes this new approach more likely to succeed. For instance, registerContentHandler was dropped due to a lack of multi-implementor interest. Why not just resurrect that, since there is now apparently interest from a different vendor from last time?

Comment by @fallaciousreasoning Jun 6, 2019 (See Github)

Updated with a link to the security & privacy questionnaire.

Comment by @fallaciousreasoning Jun 20, 2019 (See Github)

I’ve expanded the section on why we can’t use registerContentHandler.

See here

Comment by @fallaciousreasoning Jun 28, 2019 (See Github)

Hey, just wondering if you folks need anything else from us before the review?

Comment by @kenchris Sep 11, 2019 (See Github)

The launch event seems to happen in the service worker and be called "launch", but in the explainer you are handling this from the main document with the "load" event. So would both options be supported? and how would that work? Maybe worth adding to the explainer

Also it is probably useful to be able to deal with this in the service worker, for instance, clicking on a calendar url (webcal like) you might not actually want to open that but just show a notification that that calendar has been added.

Comment by @dbaron Sep 11, 2019 (See Github)

I'm looking at this with @kenchris in a breakout at the TAG meeting right now.

One thought is that there seem to be a bunch of similarities between handling a file with a given MIME Type (or extension) that's already stored locally, and handling such a file that the user has downloaded from the web. That is, if your app can handle application/atom+xml off the local filesystem, it can probably handle that off of the Web too -- and maybe that should just work automatically without anything substantial needing to be done, if at that point it's the local application handling application/atom+xml. On the other hand, one thing in the past that's led to overuse of registerProtocolHandler is that there are cases where file type handlers want the URL of the resource rather than the current representation of it. In other words, many handlers for application/atom+xml probably want to be able to re-fetch the feed at arbitrary later times rather than being concerned with just the current contents of the feed. Maybe we've already settled on using the hack of registerProtocolHandler with a fake protocol for that -- but it's a pretty unpleasant hack. (That is, webcal is really just http, but the fake protocol was invented to allow certain HTTP resources to be dispatched by-URL (rather than by-representation) to an app that handled calendars.)

At the very least, it seems like perhaps the explainer should talk a little bit more about how this gets used both for already-local files and files coming from the web -- but it would also be interesting if there were a better solution for the "want the URL rather than the representation" problem.

Comment by @dbaron Sep 11, 2019 (See Github)

Another question is that of permissions. It seems like at some point the browser probably wants to ask the user for permission as to whether they want this particular installed web application to be able to access a particular file on the native file system. At the very least, it seems like a risk if there's no permission prompt anywhere in the system -- and it it at least seems like it might be easier to explain that stuff to users at use time rather than at registration time (since it's not clear that it's easy to explain to a user what a application/foo file is). This leads me to two thoughts:

  • the dependency on the native file system API might be a barrier to adoption by other browsers -- it might be that other browsers might be interested in implementing something with this set of capabilities but nonetheless aren't willing to implement the native file system api. (I'm not at all sure about this -- just speculating without having given it a lot of thought -- although I think there's a bunch of resistence to the native file system API.)
  • it's a little concerning that there doesn't appear to be asynchronous stuff in the explainer that gives a UA that wants to prompt the user for permission to have an opportunity to do so. Perhaps that's OK because the native file system API is sufficiently asynchronous to give the UA a chance to ask for permission through its API surface -- but it would be good to see examples in the explainer that show at what points a UA might choose to ask the user for permission.
Comment by @dbaron Sep 11, 2019 (See Github)

@kenchris also brought up the issue of what happens when multiple web applications register for the same extension and/or MIME type. It seems like many OSes have some sort of chooser model for that (which often have features such as remembering the choice). Will this be able to integrate well with such OS choosers? Or would it be problematic for the browser to register multiple web apps with different names and icons into such a chooser?

Discussed Oct 1, 2019 (See Github)

Kenneth: Didn't get any feedback on this one either.

David: We might need to open some issues in their github repo, per their request.

All: good point.

David: Some of these are more discussion points than issues, might be hard to frame as issues.

Peter: I'll leave this one as pending feedback for now.

Comment by @kenchris Oct 2, 2019 (See Github)

@fallaciousreasoning @raymeskhoury the above is more like a discussion and thus not filed as issues on your repo. It would be great if you could have a look!

Comment by @raymeskhoury Oct 2, 2019 (See Github)

Sorry for the delay! Thanks for having a look and pinging! @mgiuca is actively working on this - Matt, could you take a look with @fallaciousreasoning ?

Comment by @mgiuca Oct 7, 2019 (See Github)

I haven't had a chance yet. Sorry. I'll try to get to it this week and file bugs on the repo for any appropriate issues.

Comment by @dbaron Dec 3, 2019 (See Github)

@alice and I just looked at this quickly during the Cupertino face-to-face. It looks like there's stuff happening as of the last few days, and we'd probably like to take another look at the explainer once these additions/clarifications are made. So we'll leave this as pending external feedback for now.

Comment by @mgiuca Dec 5, 2019 (See Github)

@dbaron Yep, I'm slowly going through the history of this thread and filing separate issues to update the explainer from. I was planning to ping this when done.

Comment by @mgiuca Dec 5, 2019 (See Github)

I've filed bugs for the issues raised in this thread. We'll now work on updating the explainer to reflect those.

@kenchris :

The launch event seems to happen in the service worker and be called "launch", but in the explainer you are handling this from the main document with the "load" event. So would both options be supported? and how would that work? Maybe worth adding to the explainer

Captured in WICG/file-handling#34.

@dbaron :

One thought is that there seem to be a bunch of similarities between handling a file with a given MIME Type (or extension) that's already stored locally, and handling such a file that the user has downloaded from the web. That is, if your app can handle application/atom+xml off the local filesystem, it can probably handle that off of the Web too -- and maybe that should just work automatically without anything substantial needing to be done, if at that point it's the local applictaion handling application/atom+xml. On the other hand, one thing in the past that's led to overuse of registerProtocolHandler is that there are cases where file type handlers want the URL of the resource rather than the current representation of it. In other words, many handlers for application/atom+xml probably want to be able to re-fetch the feed at arbitrary later times rather than being concerned with just the current contents of the feed. Maybe we've already settled on using the hack of registerProtocolHandler with a fake protocol for that -- but it's a pretty unpleasant hack. (That is, webcal is really just http, but the fake protocol was invented to allow certain HTTP resources to be dispatched by-URL (rather than by-representation) to an app that handled calendars.)

So, I think this is out of scope of file handlers. What we're trying to build is specifically a way for websites to get the functionality of native apps opening real local files on disk: that is, being able to:

  • Random-access read them (being able to seek around through the file without loading the whole thing), and
  • Write back to them.

That's not to say an implementation can't expose virtual file-like objects, or read-only files, to this API. But they need to have the same interface as a real file.

The use case you're discussing here, specifically, exposing the URL of a web resource as a read-only "file handle" that can be stored in a database and refetched by the server on on ongoing basis, is quite a different interface to that of a local file.

I think what you're proposing is a legitimate use case, but it's best addressed with registerContentHandler. The fact that that API never took off might suggest that it isn't an important enough use case to try and capture. Either way, I think it's different enough to what we're trying to do with file handling that I don't want to capture it within our scope.

Another question is that of permissions. It seems like at some point the browser probably wants to ask the user for permission as to whether they want this particular installed web application to be able to access a particular file on the native file system. At the very least, it seems like a risk if there's no permission prompt anywhere in the system -- and it it at least seems like it might be easier to explain that stuff to users at use time rather than at registration time (since it's not clear that it's easy to explain to a user what a application/foo file is).

Captured in WICG/file-handling#35.

It seems like many OSes have some sort of chooser model for that (which often have features such as remembering the choice). Will this be able to integrate well with such OS choosers? Or would it be problematic for the browser to register multiple web apps with different names and icons into such a chooser?

Captured in WICG/file-handling#37.

the dependency on the native file system API might be a barrier to adoption by other browsers -- it might be that other browsers might be interested in implementing something with this set of capabilities but nonetheless aren't willing to implement the native file system api. (I'm not at all sure about this -- just speculating without having given it a lot of thought -- although I think there's a bunch of resistence to the native file system API.)

I hope this isn't a barrier to adoption, but if it is, I think it's too bad. The native file system API captures exactly the feature set we want here. (In fact, the two APIs were co-designed, initially part of the same proposal but split up because we saw an opportunity to do so and work on them in parallel; I recall those discussions started at TPAC 2018.) If folks are opposed to native file system API then I would rather try and convince them to adopt both of these proposals than water down file-handling (e.g., making it read-only).

Discussed Jan 1, 2020 (See Github)

Alice: David had a good point about things on the filesystem vs things fetched. ... some issues in their repo open... none of which are closed yet.

Dan: what's the implementation status?

Alice: Currently "proposed" in Chromium - under active development.

Dan: Feels like we have raised concerns, those have been listened to and issues raised...

Alice: It would ne nice to hear from David and for their issues to be closed...

Dan: Agreed.

Discussed Jan 1, 2020 (See Github)

Tess: Should we move this to breakout C, since David is unavailable this week and the other two assignees overlap in C?

Alice: Sure

Moved to breakout C.

Comment by @alice Jan 28, 2020 (See Github)

Chrome status: https://www.chromestatus.com/features/5721776357113856

Comment by @alice Jan 28, 2020 (See Github)

Mozilla standards position: https://github.com/mozilla/standards-positions/issues/158

Discussed Feb 1, 2020 (See Github)

Alice: Still pending feeback

Comment by @dbaron Mar 3, 2020 (See Github)

So, I think this is out of scope of file handlers. What we're trying to build is specifically a way for websites to get the functionality of native apps opening real local files on disk: that is, being able to:

At the same time, I think this might provide the functionality I was talking about for free.

Say, for example, https://calendar.google.com/ uses the File Handling API to say that it supports handing .ics files (text/calendar) so that a user choosing an ICS file in the OS's file browser will load Google Calendar's ICS importer. However, then, if a user browsing the web clicks on an ICS file... typically what the browser does in that case is show a dialog that offers to use whatever the OS handler for ICS files is... which in this case is the browser running Google Calendar mediated by this File Handling API. So maybe it just works. (Or if it doesn't work, maybe Firefox will load Google Calendar in Chrome, and Chrome will load Google Calendar in Firefox, providing both browsers an incentive to do it right!)

Comment by @dbaron Mar 3, 2020 (See Github)

Also, regarding the connection with native file system API -- I suspect I may not have appreciated the first time I read the explainer that writing to the files is also an major part of the feature set that this spec offers. It uses the word "edit" twice, but never actually uses the term "write", so I think it might not have gotten through to my head. It's possible it's worth making this clearer in the explainer.

Comment by @dbaron Mar 3, 2020 (See Github)

We should probably look a little more at how the permissions flow works here, and whether multiple implementations seem likely to be comfortable with that. (This might also be related to how to structure permission prompts for Native File System, given the dependency.)

It seems like there are two points when File Handling might ask a user for permission:

  1. when registering the handler, or
  2. when invoking the handler for the first time.

I think the first might be problematic because it's a permission request somewhat detached from any action that the user wants to complete.

But the second also seems problematic because it's putting the question in the middle of a flow where the user is trying to open the file, putting the user in a position where they're very likely to just say "Yes" to any prompts in order to get the result they want.

Comment by @mgiuca Mar 4, 2020 (See Github)

At the same time, I think this might provide the functionality I was talking about [websites able to open files from URLs rather than on disk] for free.

Say, for example, https://calendar.google.com/ uses the File Handling API to say that it supports handing .ics files (text/calendar) so that a user choosing an ICS file in the OS's file browser will load Google Calendar's ICS importer. However, then, if a user browsing the web clicks on an ICS file... typically what the browser does in that case is show a dialog that offers to use whatever the OS handler for ICS files is... which in this case is the browser running Google Calendar mediated by this File Handling API. So maybe it just works. (Or if it doesn't work, maybe Firefox will load Google Calendar in Chrome, and Chrome will load Google Calendar in Firefox, providing both browsers an incentive to do it right!)

Yes. This is the intention, and should come for free, as you said. Browsers could implement special handling for their download mechanism that automatically allows downloaded files to be opened in a registered file handler, rather than passing it to the OS (similar to how Chrome opens PDF files inside itself, or for that matter how any browser handles HTML files). Or, they could just pass it to the OS which may end up passing it right back to the browser again. Either way, that's fine and doesn't really require spec support.

However, what you were talking about earlier in the thread was the specific example of an RSS reader registering as a handler for application/atom+xml, and instead of being given the contents of the file, being given the URL to the RSS feed resource. The critical difference with being given the URL is that the site is able to fetch the contents of the document on the server backend, rather than it being done by the browser, and is able to re-fetch the contents at any time in the future. I said that was out of scope because it's a totally different design to what we have now, for a totally different use case. If the interface was "you get a URL", then we would have to create virtual URLs to represent local files on the user's disk, but then those virtual URLs would not be fetchable by the server. There are also auth, intranet and same-origin policy problems with this: (auth and intranet are problems for server-side fetching, while same-origin policy affects client-side fetching). It's a reasonable feature to want, but it provides a rather orthogonal set of use cases than the API we have designed.

I'll look at your other comments later.

Comment by @mgiuca Mar 5, 2020 (See Github)

On permissions: my feeling on this is that you don't need a permission prompt in either of those cases, except in one specific edge case.

The idea is that the file handlers are automatically registered (without a separate permission prompt) when you install the app, and the handler registration is tied to the app installation (i.e., if you uninstall the app, the registration is also removed). This is consistent with how file handlers generally work for native applications. While some Windows applications historically ask to register file handlers, it generally is accepted that installing an application will add file associations. Though I think it's pretty "rude" when applications become the default handler without asking; it's perfectly fine for an app to add itself as a non-default handler. Concordantly, I would expect web file_handlers to be added as non-default handlers without asking, but not become the default (the user can manually set a default handler).

At usage time, if a user opens a file using a non-default file handler, it means they have explicitly chosen to open the file in the specified app. That is essentially a permission grant, and I don't think we really need to show an additional prompt that says "Are you sure you want to do the thing you just explicitly said to do."

The one edge case, of course, is if there are no other available handlers for a particular file extension. That means that by registering a file association (depending on the OS), the host OS may automatically make the web app a default handler, which means the user could open data in the web app simply by double-clicking the file. In that case, a launch-time permission prompt seems appropriate.

Of course, any user agent can add extra permission prompts without having to specify it; for instance, a browser may let the user opt in to registering associations at install time, rather than doing it without asking. Or a browser may prompt to open the file at every launch. But I think it's not necessary.

The explainer says a little bit about this, but I agree, we need to be a lot more explicit about the above.

Comment by @kenchris May 28, 2020 (See Github)

@mgiuca any change the above could go into the explainer?

Comment by @dbaron May 28, 2020 (See Github)

@kenchris, @atanassov, and I just discussed this in a breakout at the TAG's virtual face-to-face meeting.

I think we're happy with closing this issue at this point. We've provided a good bit of feedback, which you've been responsive to, although I think some of it probably still needs to be incorporated into the explainer.

As an early-stage review, we're not expecting a full spec at this point. But we would encourage you to continue to solicit feedback from relevant parties, and if the specification continues to make progress, to try to find an appropriate standardization venue for it.

Comment by @evanstade Jan 26, 2022 (See Github)

Hi TAG reviewers,

We're getting close to launching File Handling and since it's been a while since this review (and our OT has completed), so we're curious if a new review is warranted. The explainer has been brought up to date with this commit [1]. Here is the spec PR (in review) [2].

The only difference I think might warrant consideration is that the file handles passed to the app now have read/write access. The justification is that:

  • the confirmation prompt still exists to prevent accidental usage of the API
  • users have an expectation that the file will be writeable (based on the same flow with a native app)
  • having separate confirmation prompts for read (part of FH API) and write (using File System Access API) creates a very high friction experience for the common "open - edit - save" user journey, and ultimately, the two confirmation prompts look pretty much the same to the user. (This was a particular concern for partners.)

The other recent major change from the user's perspective (although not really part of the API) is to the permission prompt.

  • the dialog appears before the app window is created, which improves the situation where a user blocks the action, as there's no longer an empty app window hanging around.
  • the permission is granted to the app, not the origin, which improves user control in situation where multiple apps share a single origin.

[1] https://github.com/WICG/file-handling/commit/612eb739859687e78cc31e42f55967f76c35f349 [2] https://github.com/w3c/manifest/pull/1005

Comment by @evanstade Apr 15, 2022 (See Github)

Updated spec link: https://wicg.github.io/manifest-incubations/#file_handlers-member

Discussed May 1, 2022 (See Github)

Dan: Last comment from dbaron was happy.. fine to ask for rereview based on there being a new spec. Let's look at the status.

Dan: explainer updated

Hadley: updated of what's changed from 26 jan.

Lea: seems to have mime types and extensions linked to eachother.. what happens if you try to open a file with not the correct mime type. Shouldn't these be seperate?

Yves: for local files?

Peter: taking the extension and mapping it to a mime type...

Lea: shouldn't the extension be the key in that case...?

Peter: multiple extensions mapped to one mime type... convenient but error prone...

Lea: seems to make it mandatory to define a mime type - which is not something you need.

Peter: don't you need it when you open the file? Presuming you send it toa blob which is a mime type.

Peter: you're making the browser dispatch it - you go to the OS, double click the file, it sends you to the URL (of the webapp).

Hadley: explaienr gives an example on linux - requires all extensions to map to a known mime type... UA enforces app to only open files with specified extensions... introducing new rules...

Dan: I'm wondering about security boundary violations ...

Hadley: they want to follow users' applications... but what if it's doing something the user doesn't expect?

Dan: will review security & privacy...

Peter: the explainer says the UA can issue a permissiomn prompt....

Discussed Jul 1, 2022 (See Github)

Sangwhan: should check if this already launched.. it's shipped.

Amy: there's no s&p considerations.. a single privacy consideration.. points to File Access for lots of security stuff, which says "The API provides a lot of scary power to websites that could be abused in many terrible ways". Google doc for file handling security.

We apologize for the delay in getting to this review request. We note that it has [shipped in Chromium](https://groups.google.com/a/chromium.org/g/blink-dev/c/Wxuo4lZi4vM) but that there are so far no other implementations. Please let us know if you expect to progress this along the recommendation track at any point, and if/when other implementer interest emerges.

While the security and privacy questionnaire has responses, there are no security and privacy considerations sections in the spec itself. The responses to the questionnaire indicate how potentially private/sensitive data may be exposed, but there is little discussion of threat models or mitigations. We see this discussed extensively in [this document](https://docs.google.com/document/d/1pTTO5MTSlxuqxpWL3pFblKB8y8SR0jPao8uAjJSUTp4/edit) and it would be reassuring to see actionable considerations / tradeoffs for implementers succinctly documented in the specification itself, or at least references to other documents if that's more appropriate.

We will be happy to review further changes and additions in a new review request in future.

Amy: suggest closing satisfied with concerns

Comment by @rhiaro Jul 26, 2022 (See Github)

We apologize for the delay in getting to this review request. We note that it has shipped in Chromium but that there are so far no other implementations. Please let us know if you expect to progress this along the recommendation track at any point, and if/when other implementer interest emerges.

While the security and privacy questionnaire has responses, there are no security and privacy considerations sections in the spec itself. The responses to the questionnaire indicate how potentially private/sensitive data may be exposed, but there is little discussion of threat models or mitigations. We see this discussed extensively in this document and it would be reassuring to see actionable considerations / tradeoffs for implementers succinctly documented in the specification itself, or at least references to other documents if that's more appropriate.

We will be happy to review further changes and additions in a new review request in future.