#991: Writing Assistance APIs
Discussions
Discussed
Sep 1, 2024 (See Github)
Tess: Don't call it "ai". If "ai" is meant to imply something that the developer needs to be aware of, put that "something" into the name or API shape.
Tess: 3 cases that are interesting: Happens right away; takes an expensive network transfer; won't succeed.
Jeffrey: ensureModelFetched(), which could take a long time, followed by useModel(), which is always fairly quick.
Jeffrey: Could pass "not if metered" into ensureFetched()
Tess: Is metered-ness exposed?
Jeffrey: I think so, at least in Chromium.
Tess: As a page author, I can imagine these features as nice-to-have. And then, maybe I don't want to cause a download.
Tess: We can encourage developers to do things by shaping APIs in particular ways, so providing the "only if downloaded" option could encourage developers to be more respectful.
Peter: There is the "readily" vs "after-download" option.
Tess: If you initiate the download in one tab, and then start in a second tab, is its first progress event 27%? Do you pretend it's a smaller download?
Peter: Or you make it take the time, but don't actually download anything.
Tess: In which case "readily" doesn't mean much.
Peter: Stepping back, I don't like the whole API. Attributes too much ability to LLMs, they're not good at doing these things. Takes too many resources. Don't think this should be in the browser, at least not yet.
Tess: Partitioned storage means that if 10 websites use large models, you download it 10 times. Also if the browser vendor has its own model, why not provide that as a shared resource?
Peter: Centralization of models + market dominance.
Tess: It's nice UI to make model output visually distinct from human-written text. This generic API won't make sites create a visually distinct appearance for it. Could imagine a declarative approach, with an HTML element that takes text as input and provides default- or mandatory-styled output text. But if you like LLMS, you might object that this makes their output second-class.
Jeffrey: I hear 3 levels of feedback here, and we should provide all of them.
Tess: General skepticism; opportunistically using features without triggering downloads; visual distinctions.
Tess: Imagine extending form features. E.g. input, textarea, and contenteditable could say they're input to one of these things. Think any declarative approach will get rejected, but rule of least power says if we can get the 80% case easily, we should do that.
Tess to draft a comment.
<blockquote>- general skepticism (def. want peter's review of this bit)
At a high level, we wonder if these sort of features belong at the platform level at all, and (assuming they do) we worry it may be premature to bake them in.
This is a very active area of innovation in the industry, and there are many players buidling <abbr title="large language models">LLMs</abbr> and other such tools. Shouldn't we be sitting back to see if/how web developers incorporate such things into their sites first?
Also, browser vendors are not the only players in this space. Is an architecture that does not allow page authors to select from many possible models the right thing here? For some authors, the built-in/browser-provided models may be good enough. If they find the built-in model(s) limiting, it'd be a shame if there's a huge <abbr title="application programming interface">API</abbr> cliff when they go to switch to a third-party one.
- well-lit path for "i want to use these features on my page iff model is already downloaded" / "i do not want to cause download"
Consider an author who wants to integrate one of these features as a "nice to have"—if the browser's already downloaded a model, they'd like to take advantage of it, but if the browser hasn't, they don't want to be the cause of a large download on what may be a metered connection. While that's technically possible with your current <abbr>API</abbr> shape, it's not the easiest, most well-lit path. It feels like the extra effort case should be the one that causes the download, and the easier-to-code case should not.
- visual affordance for users to understand "this text was hallucinated by an LLM", declarative v. imperative tradeoff, Baby Steps
On other platforms which integrate these kinds of intelligence features, there's a clear visual affordance that a chunk of content is the product of a model and not something human-authored. Adding these features purely as JavaScript API means that there's no opportunit for interested User Agents to do the same. A declarative approach which the <abbr>UA</abbr>
</blockquote> Comment by @jyasskin Sep 28, 2024 (See Github)
A public note (without TAG consensus) so that @domenic can start thinking in this direction too: We should think about how https://www.w3.org/reports/ai-web-impact/ and https://www.w3.org/TR/webmachinelearning-ethics/ should affect our opinions here. For example, https://www.w3.org/reports/ai-web-impact/#transparency-on-ai-mediated-services considers the use of Model Cards to help people evaluate the suitability of particular models for particular purposes. How should that information be exposed to the web developers considering use of this API, and to the end-users who have to evaluate the website's output?
Discussed
Oct 1, 2024 (See Github)
Discussed and with the upcoming work on AI, it seems premature to say anything right now. We might add a note about that to the issue.
Discussed
Oct 1, 2024 (See Github)
In theory we're starting a finding on that, but not all of us are making progress.
Lots of new developments, perhaps we can continue to wait on that basis.
OpenedSep 10, 2024
こんにちは TAG-さん!
I'm requesting an early TAG design review of the writing assistance APIs.
Browsers and operating systems are increasingly expected to gain access to a language model. (Example, example, example.) Web applications can benefit from using language models for a variety of use cases.
We're proposing a group of APIs that use language models to give web developers high-level assistance with writing. Specifically:
Because these APIs share underlying infrastructure and API shape, and have many cross-cutting concerns, we include them all in one explainer, to avoid repeating ourselves across three repositories. However, they are separate API proposals, and can be evaluated independently.
Further details:
You should also know that...
This is not a generic prompt API.