#991: Writing Assistance APIs

Visit on Github.

Opened Sep 10, 2024

こんにちは TAG-さん!

I'm requesting an early TAG design review of the writing assistance APIs.

Browsers and operating systems are increasingly expected to gain access to a language model. (Example, example, example.) Web applications can benefit from using language models for a variety of use cases.

We're proposing a group of APIs that use language models to give web developers high-level assistance with writing. Specifically:

  • The summarizer API produces summaries of input text;
  • The writer API writes new material, given a writing task prompt;
  • The rewriter API transforms and rephrases input text in the requested ways.

Because these APIs share underlying infrastructure and API shape, and have many cross-cutting concerns, we include them all in one explainer, to avoid repeating ourselves across three repositories. However, they are separate API proposals, and can be evaluated independently.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done ("unknown" if not known): not completely known, but we are discussing the APIs with the Web Machine Learning Working Group at TPAC, and it is possible a future version of their charter would welcome us.
  • Existing major pieces of multi-implementer review or discussion of this design: see above.
  • Major unresolved issues with or opposition to this design:
    • We are aware of previous TAG feedback (in https://github.com/w3ctag/design-reviews/issues/948) regarding API surface details, and have captured that in the explainer.
    • As with the translator/language detector APIs (#948), there is a tension between interoperability and exposing whether the model is on-device or cloud-based; we discuss this a bit more in the explainer.
    • As with the translator/language detector APIs (#948), there are several privacy concerns, discussed in the explainer. We believe there are reasonable mitigations possible there, but will need to do some experimentation to find the best ones.
  • This work is being funded by: Google

You should also know that...

This is not a generic prompt API.

Discussions

Log in to see TAG-private discussions.

Discussed Sep 1, 2024 (See Github)

Tess: Don't call it "ai". If "ai" is meant to imply something that the developer needs to be aware of, put that "something" into the name or API shape.

Tess: 3 cases that are interesting: Happens right away; takes an expensive network transfer; won't succeed.

Jeffrey: ensureModelFetched(), which could take a long time, followed by useModel(), which is always fairly quick.

Jeffrey: Could pass "not if metered" into ensureFetched()

Tess: Is metered-ness exposed?

Jeffrey: I think so, at least in Chromium.

Tess: As a page author, I can imagine these features as nice-to-have. And then, maybe I don't want to cause a download.

Tess: We can encourage developers to do things by shaping APIs in particular ways, so providing the "only if downloaded" option could encourage developers to be more respectful.

Peter: There is the "readily" vs "after-download" option.

Tess: If you initiate the download in one tab, and then start in a second tab, is its first progress event 27%? Do you pretend it's a smaller download?

Peter: Or you make it take the time, but don't actually download anything.

Tess: In which case "readily" doesn't mean much.

Peter: Stepping back, I don't like the whole API. Attributes too much ability to LLMs, they're not good at doing these things. Takes too many resources. Don't think this should be in the browser, at least not yet.

Tess: Partitioned storage means that if 10 websites use large models, you download it 10 times. Also if the browser vendor has its own model, why not provide that as a shared resource?

Peter: Centralization of models + market dominance.

Tess: It's nice UI to make model output visually distinct from human-written text. This generic API won't make sites create a visually distinct appearance for it. Could imagine a declarative approach, with an HTML element that takes text as input and provides default- or mandatory-styled output text. But if you like LLMS, you might object that this makes their output second-class.

Jeffrey: I hear 3 levels of feedback here, and we should provide all of them.

Tess: General skepticism; opportunistically using features without triggering downloads; visual distinctions.

Tess: Imagine extending form features. E.g. input, textarea, and contenteditable could say they're input to one of these things. Think any declarative approach will get rejected, but rule of least power says if we can get the 80% case easily, we should do that.

Tess to draft a comment.

<blockquote>
  1. general skepticism (def. want peter's review of this bit)

At a high level, we wonder if these sort of features belong at the platform level at all, and (assuming they do) we worry it may be premature to bake them in.

This is a very active area of innovation in the industry, and there are many players buidling <abbr title="large language models">LLMs</abbr> and other such tools. Shouldn't we be sitting back to see if/how web developers incorporate such things into their sites first?

Also, browser vendors are not the only players in this space. Is an architecture that does not allow page authors to select from many possible models the right thing here? For some authors, the built-in/browser-provided models may be good enough. If they find the built-in model(s) limiting, it'd be a shame if there's a huge <abbr title="application programming interface">API</abbr> cliff when they go to switch to a third-party one.

  1. well-lit path for "i want to use these features on my page iff model is already downloaded" / "i do not want to cause download"

Consider an author who wants to integrate one of these features as a "nice to have"—if the browser's already downloaded a model, they'd like to take advantage of it, but if the browser hasn't, they don't want to be the cause of a large download on what may be a metered connection. While that's technically possible with your current <abbr>API</abbr> shape, it's not the easiest, most well-lit path. It feels like the extra effort case should be the one that causes the download, and the easier-to-code case should not.

  1. visual affordance for users to understand "this text was hallucinated by an LLM", declarative v. imperative tradeoff, Baby Steps

On other platforms which integrate these kinds of intelligence features, there's a clear visual affordance that a chunk of content is the product of a model and not something human-authored. Adding these features purely as JavaScript API means that there's no opportunit for interested User Agents to do the same. A declarative approach which the <abbr>UA</abbr>

</blockquote>
Comment by @jyasskin Sep 28, 2024 (See Github)

A public note (without TAG consensus) so that @domenic can start thinking in this direction too: We should think about how https://www.w3.org/reports/ai-web-impact/ and https://www.w3.org/TR/webmachinelearning-ethics/ should affect our opinions here. For example, https://www.w3.org/reports/ai-web-impact/#transparency-on-ai-mediated-services considers the use of Model Cards to help people evaluate the suitability of particular models for particular purposes. How should that information be exposed to the web developers considering use of this API, and to the end-users who have to evaluate the website's output?

Discussed Oct 1, 2024 (See Github)

Discussed and with the upcoming work on AI, it seems premature to say anything right now. We might add a note about that to the issue.

Discussed Oct 1, 2024 (See Github)

In theory we're starting a finding on that, but not all of us are making progress.

Lots of new developments, perhaps we can continue to wait on that basis.

Discussed Mar 1, 2025 (See Github)

Marcos: Primary issue is the naming.

Martin: Many things are bundled in this design review. 1) The capability is already available because the site can download the model itself. This sets up websites to rely on UA processing. Given that these are expensive and unwieldy, the UA or user has to find the compute to do that. Not going to happen on the couple-year-old phone I have. Then user has to pay for cloud compute.

Jeffrey: Then UA should provide the cloud compute?

Martin: Assumption that user provides it, and UA might do it for them. In a few years, this might be available locally everywhere.

Marcos: Could treat it like camera, where some hardware just doesn't have it.

Martin: Can't provide a camera in the cloud if you're building the website. But in this case, the website could do the compute pretty easily if they think it's important. There are some advantages to doing it locally. But the website has the text, so the privacy concern doesn't exist. And performance might not. Only helps people with high-end machines. Massively premature.

Marcos: Use case could be secure email/messaging. Do a summary of mail messages.

Martin: Good example of something the OS can provide on new hardware.

Jeffrey: Advice might be to find a way to ensure the site provides server-side capability even if it defaults to using client-side feature.

Martin: That assumes it's ready. Not enough people have the client-side capability.

Marcos: Think about whether you can do it with WebGPU or with WebML.

Jeffrey: Any idea of what fraction of users need a capability before we provide it?

Martin: It's a judgement call. I don't see people buying new devices at a high enough rate.

Martin: Privacy aspects. Not just "does the text leave the device". That's silly because site has the text and could just send it to the server. Server can do the copmutation faster than most client devices.

Martin: P&S Considerations section: 1) there are 3 types of models available here. Some computation testing the availability of models on end device. Depending on circumstances, you could get 3-6.5 bits of fingerprinting entropy. Depending on model availability + downloadability. That seems like a lot for a feature like this.

DanC: If I know properties of models, I could prompt them to distinguish models.

Martin: Characteristics of model are highly correlated with other things you can already determine.

Jeffrey: Any sense of how many bits this is worth?

Martin: I'd say 0. Since website can already do this, either on the server or by sending a WebGPU or WebNN program.

Jeffrey: Model Cards: website might want to know if its model has been trained on copyrighted material, or if it's been de-biased in certain ways.

Martin: Which is another reason for the site to source the models itself.

Jeffrey: And does this work well enough to ship? "It's called AI while it doesn't work yet, and then it gets a more specific name." Translation API is more promising.

Martin: I think they've gotten rid of the naming problem.

Jeffrey: Domenic was asking about tests: Different models aren't expected to produce identical output.

Martin: Need a model to test the model.

Jeffrey: Ew.

Martin: Naming problem is fixed! They made window.Summarizer and window.Writer, which is another naming problem. "Writer" isn't exactly unique or novel. Can we ask that they rename each with a prefix, like "BullshitSummarizer"? [1]

Martin: On testing: At least the test website will be responsible for sourcing the model it uses to evaluate summaries.

Jeffrey: That's enough questions. So we invite Domenic to a meeting the week of April 21 for a conversation. Would like to consider Translation API along with this one.

On translation...

Jeffrey: more established problem set, smaller and more efficient models, can be downloaded to more devices.

Marcos: Same thing happens there; Apple can source their own

Jeffrey: existence proof that models can be made that work well and run on lots of devices.

Martin: Mozilla also has translation models. Lots of language, 10s of Mb each.

Jeffrey: LLMs tend to be gigabytes each.

Martin: Mozilla will propose to only provide one direction. Text -> preferred language. If lots of people are trying to communicate, each one is responsible for translating other people's speech to their own language, but you don't need the full matrix.

Marcos: If I speak multiple languages?

Martin: Then you pick one as the preferred target language. Web page presents some content in its language, and then asks the browser to translate it.

Jeffrey: Seems even declarative.

Comment by @domenic Mar 31, 2025 (See Github)

FYI we have an almost-ready-to-land full security and privacy considerations section, which is probably of interest to the TAG: https://github.com/webmachinelearning/writing-assistance-apis/pull/47

Discussed Apr 1, 2025 (See Github)

Christian: It's an early review for all writing assistance. Pretty old. No comment because of capacity?

Jeffrey: There's still time to review; the team is in origin trials. Iterating on questions around interoperability, testing, ... We wanted to do a generic AI finding

Christian: Local model? That could be interesting. OSes already ship this. MacOS has this in a right-click menu. Raises the question of whether the browser should do it separately. Can it fall back?

Jeffrey: I already mentioned model cards in public.

Christian: And Dom's document on AI: https://www.w3.org/reports/ai-web-impact/.

Dan: Haven't fully read the new privacy section. General concern is how low-end devices can be served by it. As a web developer, I might like it because it's expensive, and if I want it on my device, I'd have to pay for an expensive server. So developer might like it because I'm pushing the cost to the user. As the user, if I have a modern chip, maybe I like it, but otherwise maybe not. Good for developer, but the user tradeoff might not make sense.

Discussed Apr 1, 2025 (See Github)

Jeffrey: There's a draft S&P considerations, but only in the PR.

Jeffrey to post to tag-all and ask Martin and Marcos to come to the meeting.