design-reviews#400: Font Table Access API

#400: Font Table Access API

Opened Aug 7, 2019

こんにちはTAG!

I'm requesting a TAG review of:

Name: Font Table Access API
Specification URL: https://wicg.github.io/local-font-access/
Explainer (containing user needs and example code)¹: https://github.com/inexorabletash/font-table-access/
GitHub issues (if you prefer feedback filed there): https://github.com/WICG/local-font-access/issues
Tests: TBD
Primary contacts (and their relationship to the specification): @chasephillips, @inexorabletash

Further details:

Relevant time constraints or deadlines: We'd like to discuss this at TPAC (Sep 16, 2019) so getting TAG review before then would be great. Overall, we plan to Dev Trial in M79 (mid Oct, 2019) and Origin Trial in M80 (early Dec, 2019).
I have read and filled out the Self-Review Questionnare on Security and Privacy. The assessment is here.
I have reviewed the TAG's API Design Principles
The group where the work on this specification is: TBD

We recommend the explainer to be in Markdown. On top of the usual information expected in the explainer, it is strongly recommended to add:

Links to major pieces of multi-stakeholder review or discussion of this specification:
Links to major unresolved issues or opposition with this specification:

We'd prefer the TAG provide feedback as (please select one):

open issues in our GitHub repo for each point of feedback
open a single issue in our GitHub repo for the entire review
leave review feedback as a comment in this issue and @-notify [github usernames]

Please preview the issue and check that the links work before submitting. In particular, if anything links to a URL which requires authentication (e.g. Google document), please make sure anyone with the link can access the document.

¹ For background, see our explanation of how to write a good explainer.

Discussions

Comment by @annevk Aug 20, 2019 (See Github)

The various non-goals seem like they would make it rather hard for alternative approaches to doing fonts to enter the market or for smaller players with alternative approaches to compete. If even on the same platform the data you get back depends on the implementation it's highly likely sites will code toward a single implementation.

Comment by @chasephillips Aug 21, 2019 (See Github)

Hi Anne, thanks for taking a look at the table access API. I filed https://github.com/inexorabletash/font-table-access/issues/6 to discuss it further. Let's follow up there.

Discussed Sep 4, 2019 (See Github)

Tess: I gathered internal feedback and will write it up.

[bump to f2f

Comment by @dbaron Sep 12, 2019 (See Github)

A very brief comment here after looking at #399 -- it seems like there's less additional fingerprinting surface here -- but it's not zero, and worth mentioning. In particular, it seems like there are some changes in fonts that are not detectable on the web today, but would be detectable with raw font table access. Thus this might be providing the additional fingerprinting entropy to distinguish a user with version 1.0.4 of a font from one with version 1.0.5 of the same font in cases where that wasn't detectable today.

Hoping to look more later...

Comment by @plinss Dec 3, 2019 (See Github)

@hober and I reviewed this at our December F2F, this still feels like it's too low-level of an API and is geared strongly to OpenType fonts, where there are a number of different font technologies in use (and are likely to be more down the road). We'd rather see higher level APIs exposing font metrics and glyph vectors that are font technology agnostic.

Comment by @atanassov Mar 2, 2020 (See Github)

@plinss and I looked at the proposal as part of the March F2F. There isn't any new information or updates in the explainer or relevant issues. Additional concern here is the dependency on Font Enumeration API and the fact that this proposal isn't progressing anymore.

Given we haven't heard back from the authors, should we consider there is no longer interest of making progress on either of these two proposals?

Comment by @inexorabletash Mar 2, 2020 (See Github)

re: lack of progress - we deferred work for a while, but are planning to revisit in the very near future.

re: two proposals - we are planning to re-merge the proposals; at one point we thought splitting them made sense to unblock some discussion. Closing one of these out is fine.

Comment by @hober May 27, 2020 (See Github)

We (@dbaron @hober @plinss) looked at this again during the TAG F2F this week. We've tried to file issues in the WICG/local-font-access repo which capture each of the concerns that have come up during the several TAG breakouts on these two reviews. Given the lack of engagement in these reviews thus far, we don't see any value in keeping these issues open. Please feel free to ping us to re-open or file a new issue when there are significant changes.

Comment by @inexorabletash Feb 25, 2022 (See Github)

Hey @hober - I think we're ready for another go-round on this. Should we re-use this or #399, or open a new issue?

Relative to the above list:

Non-OpenType fonts #19

Anne's feedback re: Non-goals and alternative approaches #20

⬆️ I believe these are effectively the same issue, although I may be missing nuance here. These are still fundamental issues, but apply to similar APIs for accessing local resources, e.g. file uploads, such as for images or videos, where web apps must be future-proof against new formats. That might suggest an approach, e.g. allowing (requiring?) web apps to specify accept to identify supported content types, e.g. query({accept: ['font/otf']})

Fonts need to be sorted #23

⬆️ Should be resolved.

mention additional fingerprinting surface in details of font versions #25

⬆️ Added.

Please add a Table of Contents to the explainer #26

⬆️ Added.

explainer suggests iterating all the fonts in order to find Consolas #27

⬆️ Resolved through allowing passing name list to query().

Comment by @inexorabletash Mar 21, 2022 (See Github)

@hober @plinss and anyone else - Can we re-open this per the above? We'd love to reboot the discussion here.

Comment by @annevk Apr 19, 2022 (See Github)

I really don't see the comparison to file uploads. With file uploads we don't provide some kind of normalized access to the underlying file that might differ across browsers. Each browser ends up with the same data.

I suppose what applies is that file uploads expose platform differences to websites, but I would take that as something to improve upon, not as something to imitate.

Discussed May 9, 2022 (See Github)

[discussed and re-triaged]

Discussed May 23, 2022 (See Github)

Rossen: if I remember... our main feedback has been you wouldn't want to expose this level of access direct, and instead prefer to see work happening in the form of font ?? being exposed. Is that right?

Peter: we closed this back in May last year and they came back in Feb and asked to reopen because they addressed a lot of our issues. We need to go back and take a look at the changes.

Rossen: fair. Will need some work. High level question is did we leave a comment about font features and exposing that functionality? Which is already exposed in CSS. or is that not enough?

Peter: I don't recall specifics about font features

Rossen: want to figure out the graunalrity of access and why they need font table access in particular, let alone exposing them. Need to look more.

Dan: agenda for +2 weeks.

Discussed Jul 11, 2022 (See Github)

Dan: Re-opened in March due to new information.

Rossen: reviewing new info

Dan: they think they've resolved the issues we raised in 2020.

Peter: a lot closed without explanation of how they were addressed.

Rossen: reviews status of linked issues

Peter: Joshua Bell is now main editor for the spec.

Rossen: I'd ask - what the actual resolutions to these issues?

Peter: I seem to recall that we didn't want to see 399 happen - TAG consensus was that we'd rather see a chooser than enumeration API. They still have an enumeration.

Rossen: I want to see where these issues are resolved - i looked through some of the commits they have - they added fingerprinting - but no clue what answer to ToC in explainer means...

Peter: older vers of API was only exposing one font format - now it looks like their API just returns a blob of bytes. Their response is "here's bytes parse it yourself" - which seems the exact opposite of what we're asking for which was "parse it but don't just to opentype".

Peter: regarding fingerprinting - no mitigation suggestions just some info - for example "in incognito mode, don't expose the fonts" or something similar.

Dan: it doesn't really look like they've resolved the issues we've raised.

Peter: they did add a "prompt the user to choose" step.

Dan: not mentioned - that should be listed under mitigations in the privacy & security section - and highlighted in their comment on our issue.

Peter: yeah it's not mentioned anywhere... They did throw in a choose but their mechanism is still "give me all the fonts" and then allow user to choose which fonts to expose - we asked for "let the user choose a font and then you get back a font". Users will just hit "select all" and move on.

Peter: they say users of this API will want to provide data to libraries that expect all the fonts... justification for blob approach. Curious what libraries they are talking about. Seems the goal of this API should be to replace those libraries.

Rossen: agree.

Rossen: will leave feedback accordingly.

Dan: LGTM

Peter: yeah.

Rossen: posts comment

Comment by @atanassov Jul 11, 2022 (See Github)

Hi @inexorabletash, @plinss and I had a chance to go over the updates you linked to - thank you for these and reopening. These are the concerns we have after this review pass:

The expanded fingerprinting issue commit is good, however there is no mitigation suggestions just some info. For example, "in incognito mode, don't expose the fonts" or something similar is what we would generally expect.

It seems you added a choose method, but the mechanism is still of the format "give me all the fonts" and allow user to choose which fonts to expose. What we asked for was - "let the user choose a font and then you get back a font". Exposing all fonts runs into the issue of users just hitting "select all" and move on.

Another concern we have is the exposing the blob function and access to the font raw data. You stated that users of the API will want to provide data to libraries that expect all the fonts. That is the justification for the blob approach. Can you provide examples of such libraries you're talking about? Our assumption was that the goal of such API is to replace those libraries.

Comment by @inexorabletash Jul 15, 2022 (See Github)

For example, "in incognito mode, don't expose the fonts" or something similar is what we would generally expect.

Thanks - I'll add something. In practice, it might be more subtle than that. Browsers allow file upload in incognito, and exposure of specific fonts seems comparable.

It seems you added a choose method, but the mechanism is still of the format "give me all the fonts" and allow user to choose which fonts to expose. What we asked for was - "let the user choose a font and then you get back a font".

Correct, and understood. We consider it critical to support use cases where multiple fonts - including potentially all local fonts - are provided to the site in one user interaction. We tried to craft an API shape that would allow browsers to support this use case, or alternately (perhaps in more privacy-focused modes, or just different UAs) limit the number of fonts exposed, perhaps to just a single font.

Another concern we have is the exposing the blob function and access to the font raw data. You stated that users of the API will want to provide data to libraries that expect all the fonts. That is the justification for the blob approach. Can you provide examples of such libraries you're talking about?

To clarify: libraries expect the full data for a given font, rather than a subset of tables. An library example is the pairing of HarfBuzz and FreeType, used by many native and (more recently) web applications to implement custom text stacks. These take as input full OpenType font files, and do the table parsing themselves. We did early experiments with APIs that produced only some of the font tables, and to interoperate with existing libraries required glue code that recomposed a full container file from the parts.

Our assumption was that the goal of such API is to replace those libraries.

The goal of the API is to provide the data so that it can be consumed by those libraries.

Discussed Aug 8, 2022 (See Github)

Peter: google shipped the prompt...

Lea: they shipped enumeration... this is access to font tables.

Dan: good to see they've added an issue on incognito mode behavior in response to our feedback.

Discussed Aug 15, 2022 (See Github)

Peter: I think we did not get to this at the f2f. Chrome is shipping.

Lea: didn't we recommend they have a picker instead? How is this different?

Peter: originally they had an api that enumerates all the fonts - we pushed back and said do a picker. They came back with a picker that is multi select and makes it easy for the user to pick all the fonts on the system. They wanted to expoxe open type metrics, we said how about being format agnostic. They said here's a blob with all the data, you parse it. They say they have a use case for raw font data for libraries. Acceptable for give me a blob of font 2, but I suspect they are focussing on one use case and aren't looking at bigger picture. Suspect authors want to be able to get some metadata without bringing in a big font parsing library.

Rossen: they don't seem very receptive to our feedback. For private browsing mode point they say ack but they won't do anything about it. Less than great. Given how much fonts are used for tracking I don't now about making this explicitly available will improve anything.

Tess: over the course of this review - a couple of years - we've filed a bunch of issues on their repo, at least 6, more, and at least 3 are still open. The ones that are still open feel pretty foundational to me. I'm not terribly enthused with the charging full speed ahead while leaving those things unaddressed

Dan: sounds like not satisfied - level of engagement, taken our issues seriously, and moving forward with major points unresolved

Amy: and not satisfied with adding an api that increases fingerprinting?

Dan: +1

Tess: there are some hardliners who don't want to add any more bits at all and that's not as nuanced as it needs to be. There are some APIs that are incredibly beneficial that adds some fingerprinting, let's do it. But in this case getting interesting font data for a specific font is potentially many interesting bits of fingerprinting, so the benefit has to be proportionally massive to justify that. We're working with a theoretical privacy budget of 31 bits or whatever before you're done, that's a small number of bits, this could blow right past that.

Rossen: how is allowing the file upload in incognito the same as exposing specific fonts on the system or elsewhere?

Dan: it's not, it's a total red herring

Rossen: they're saying that.. exposure of specific fonts seem comparable. I don't see how that is the case.

Tess: <devil's advocate> let me make the argument for a second.. file upload dialogue, I pick something unique from my filesystem, game over. Also a thing I can do with a filepicker is go to wherever the os keeps the fonts and pick the font file and upload that. If there's a specialised api that only lets you get at the font data, that's a strict subset of what the file dialogue can do.</devil's advocate>

Rossen: one is an explicit user action where I go and choose a bunch of stuff. The other is a function someone wrote running on my system selecting fonts and sending them to somewhere else...

Tess: I think that's why that argument doesn't hold water. From the user perspective they're totally different things.

Dan: writing "unsatisfied" closing post

Tess: Anne wrote issue - the crux ofthe matter - we need to be deliberate as stewards of the web platform about the question of how do we represent fonts. We need to specify that really well and not just punt to common file formats. That's a lot of work. I understand they're reluctant. But without serious thought about what is an abstraction that fits the use cases that we can code to so we're not dependant on file format details, that we're future-proofed, would go a long way to mitigate the privacy concern as well. The way that a specific opentype font got expressed into that platform abstraction could be done in a way that hid the sensitive stuff and let through the glyph information they need or whatever it is. That's a lot of spec work. The timelines that we do work on the web platform need to be long term.

Peter: I think they are focussed on enabling certain functionality and not thinking about the overall state of the web

Tess: it's our job to comment on that

Amy: could we be more strong about it being harmful?

Peter: I'd be okay with that

Dan: I don't think it would help

Tess: I like your comment as drafted. I don't think we need to say more about that.

Hi folks. The issues as we see them so far:

there are a number of unresolved issues which have been raised during the course of this review (e.g. #19, #20, #62, etc.)
the fingerprinting risk is quite large here. There's always a tradeoff to be made regarding fingerprinting surface area and new functionality - however in our view the high risk is not comensurate with the benefit in this case.
the API seems to be on a trajectory to ship regardless of our feedback and the other negative feedback surfaced on this issue.

In conclusion: we think we should be more deliberate as stewards of the web platform in terms of how we represent fonts. We don't think this is the right approach to solve these use cases in the context of the wider web.

</blockquote>

Rossen: posts comment

Comment by @atanassov Aug 15, 2022 (See Github)

The issues as we see them so far:

there are a number of unresolved issues which have been raised during the course of this review (e.g. #19, #20, #62, etc.)
the fingerprinting risk is quite large here. There's always a tradeoff to be made regarding fingerprinting surface area and new functionality - however in our view the high risk is not commensurate with the benefit in this case.
the API seems to be on a trajectory to ship regardless of our feedback and the other negative feedback surfaced on this issue.