#1093: Prompt API

Visit on Github.

Opened May 14, 2025

こんにちは TAG-さん!

I'm requesting an early TAG design review of the Prompt API.

This is a JavaScript API for directly prompting an AI language model, including text, image, and audio inputs.

Although the Chrome built-in AI team is exploring purpose-built APIs for some use cases (#948, #991), we are also exploring a general-purpose "prompt API" which allows web developers to prompt a language model directly. This gives web developers access to many more capabilities, at the cost of requiring them to do their own prompt engineering.

You should also know that...

Although this feature exposes an on-device language model, the language model is never trained on, and does not have access to, any local user-specific data. (Sometimes people seem to be confused by this, so we're trying to include this disclaimer in as many places as possible.)

Discussions

Discussed May 19, 2025 (See Github)

[added reviewers]

Discussed May 19, 2025 (See Github)

[added reviewers]

Discussed May 26, 2025 (See Github)

Martin: Most implementations involve picking a model for their browser - i.e. their in-house one. This is seeking to drive adoption of models based on the existing market share of browsers.

DanA: What's the user need in this case? I select some text and want AI to comment on it?

Martin: This is unclear to me. For the prompting API, it's trying to make a generic chatbot capabiltiy for a site, using a model that's running locally. So you can do anything you may want to do with an LLM, not just summarization or writing assistance.

DanA: But the browser, rather than the site, will be choosing the LLM.

Martin: Ostensibly they are meant to provdie the same features - but the user experience will vary a lot. They may also use different models depending on device (mobile vs desktop) within a browser.

Christian: I think it's good and needed - sentiment analysis; data categorization; extracting info from email; multimodal pasting capabilities. I understand the concern that it binds the browser to a model - my understanding is that is not inherent in the API. Could use the model that's already on the device. This may be the current implementation but it's not necessarily the case.

Martin: I understand that in the Chrome case they are not going to use the underlying platform model.

Xiaocheng: Thinking about user need: the prompt API may be general enough to drive other lower-level APIs and build a full AI app. The potential is huge here. I am still wondering about the question I raised before which is 'what does interoperability' mean here? We should not expect the API to produce the same output across browsers, or across devices within a browser, or even across sessions.

Martin: The question is 'how would you write a test case?'

Xiaocheng: You could use another model to test it.

Martin: reward hacking

DanA: Assumptions about user expectations? I'm in a webpage... it feels to me like .. If that page wants to create a chatbot experience, so it uses this API. That experience already exists on many sites. I can choose models on some sites. Lots of capabilities, all running on servers. How do we make sure that information doesn't leak between origins?

Martin: this is part of why the Chrome team doesn't want to use the platform level API - to avoid such leakage. The platform API might take inputs into a common context that is not origin-aware.

DanA: Depends on the threat model. We need to find out how to respond appropriately to this.

Matthew: how much can this reasonably do that's use ful for the user? If it's only on the client side then the user won't be able to ask questions about past interactions with the particular site because the only info it has is on the current page. Also concerned about leakage issues as discusssed.

DanA: This is part of my concern - what is the actual user need here?

Christian: I would disagree here; the idea is using an on-device model, which is better for privacy and security, and supports more use cases. I would rather trust a local model.

DanA: Good point and there are energy arguments for this as well, as you're making the processing happen locally. Leveraging capabilities of the device. So we can't just say this is a bad idea, but it should be focused on the user need, and should fit the web. You only get that benefit if you're able to use the platform API.