#1093: Prompt API
Discussions
Log in to see TAG-private discussions.
Discussed
May 19, 2025 (See Github)
[added reviewers]
Discussed
May 19, 2025 (See Github)
[added reviewers]
Discussed
May 26, 2025 (See Github)
Martin: Most implementations involve picking a model for their browser - i.e. their in-house one. This is seeking to drive adoption of models based on the existing market share of browsers.
DanA: What's the user need in this case? I select some text and want AI to comment on it?
Martin: This is unclear to me. For the prompting API, it's trying to make a generic chatbot capabiltiy for a site, using a model that's running locally. So you can do anything you may want to do with an LLM, not just summarization or writing assistance.
DanA: But the browser, rather than the site, will be choosing the LLM.
Martin: Ostensibly they are meant to provdie the same features - but the user experience will vary a lot. They may also use different models depending on device (mobile vs desktop) within a browser.
Christian: I think it's good and needed - sentiment analysis; data categorization; extracting info from email; multimodal pasting capabilities. I understand the concern that it binds the browser to a model - my understanding is that is not inherent in the API. Could use the model that's already on the device. This may be the current implementation but it's not necessarily the case.
Martin: I understand that in the Chrome case they are not going to use the underlying platform model.
Xiaocheng: Thinking about user need: the prompt API may be general enough to drive other lower-level APIs and build a full AI app. The potential is huge here. I am still wondering about the question I raised before which is 'what does interoperability' mean here? We should not expect the API to produce the same output across browsers, or across devices within a browser, or even across sessions.
Martin: The question is 'how would you write a test case?'
Xiaocheng: You could use another model to test it.
Martin: reward hacking
DanA: Assumptions about user expectations? I'm in a webpage... it feels to me like .. If that page wants to create a chatbot experience, so it uses this API. That experience already exists on many sites. I can choose models on some sites. Lots of capabilities, all running on servers. How do we make sure that information doesn't leak between origins?
Martin: this is part of why the Chrome team doesn't want to use the platform level API - to avoid such leakage. The platform API might take inputs into a common context that is not origin-aware.
DanA: Depends on the threat model. We need to find out how to respond appropriately to this.
Matthew: how much can this reasonably do that's use ful for the user? If it's only on the client side then the user won't be able to ask questions about past interactions with the particular site because the only info it has is on the current page. Also concerned about leakage issues as discusssed.
DanA: This is part of my concern - what is the actual user need here?
Christian: I would disagree here; the idea is using an on-device model, which is better for privacy and security, and supports more use cases. I would rather trust a local model.
DanA: Good point and there are energy arguments for this as well, as you're making the processing happen locally. Leveraging capabilities of the device. So we can't just say this is a bad idea, but it should be focused on the user need, and should fit the web. You only get that benefit if you're able to use the platform API.
Discussed
Jul 14, 2025 (See Github)
Max: Last week we thought this might cause interoperability problems, because not every model will have the same features. Marcos drafted something.
Jeffrey: Some pieces of Marcos' draft feedback are good, but other parts don't make sense. "The presumption that the model is on device is wrong" can't be right, since the proponents say they're proposing to keep the model on the device. We probably can't agree to post anything here without Marcos.
Discussed
Jul 28, 2025 (See Github)
Jeffrey: It's incorrect to claim that this API
Marcos: Apple's platform doesn't guarantee it stays on device. It's irrelevant if the model is on- or off-device.
Marcos: It's privacy-preserving to send it off device. Apple claims it doesn't matter.
Jeffrey: Do we need to comment on whether it's off-device, if Apple believes ?
Dan: Like with Web Speech, on-device might be weird with unusual browser architectures.
Marcos: Thing that's proposed suggests downloading N GB things, even if they're only downloaded once. In those cases, it might be in the background but processed on server while it's being downloaded.
Marcos: "Tools" and tooling properrties got very complicated. Would be great to get even a small part to be uncontentious. And output formats are a bit presumptuous. Can't guarantee that models will produce a particular output format. E.g. on Apple, if you tell Apple Intelligence to rewrite an email in a particular way. It then puts the subject into the body of the email. Doesn't know how to format things. ChatGPT too. All pretty klunky now. Don't know about the Android side.
Jeffrey: Think we should say that the spec can require that it's "as if" it runs on-device, but a browser that can provide the same privacy properties in the cloud should be allowed. Think we can make that a MUST.
Marcos to draft a comment.
Discussed
Aug 4, 2025 (See Github)
Marcos: We have a rough idea. Challenges that arise with this API seem the same as for Writing Assistance APIs. We want to time out the Writing Assistance APIs. Think there's consensus that the issues have been captured. Anyone have any further concerns/opinions?
Ehsan: Might have a question what a prompt is. If there's a malicious web page, can they display an item with a malicious prompt in it? Still articulating the feedback and reading the spec.
Marcos: We've covered that in the feedback. You could abuse the prompt and do mining.
Martin: That's my core problem with this proposal, and I'm not sure if we will get consensus on this.
Ehsan: There are research papers with hidden prompts in them. Infusing a potential LLM to give the paper a positive review.
Marcos: That sounds like a more general societal problem. It's why we have peer review in academic contexts.
… Not sure if this is related to the prompt. Reminds me of the bias of ChatGPT etc. to be on your side. You can prompt it to be less on your side. Part of the prompt engineering part of things, but not really related to the Prompt API. Does the model sway that way? Probably. Good counter-example for this is X's Grok.
Marcos to draft the closing comment.
Discussed
Aug 11, 2025 (See Github)
Marcos has a message to post. It looks like ChatGPT wrote it, which I wouldn't be proud of, but it's there.
Discussed
Aug 18, 2025 (See Github)
Marcos: I put together a position, which might be ready to publish.
Jeffrey: Data governance?
Marcos: LLM brings this into question. Fonts question, model might be tuned based on your preferences. Whether there's a way to stop it from doing that is unclear.
Jeffrey: I think they clearly say it must not save the user's data.
Marcos: Don't know if that's possible.
Martin: This is a UA. So when the input goes in, that's data provided by the website, and it's working for the person using hte computer.
MArcos: Example I ran into today, an input field using Apple Intelligence. It shows up in ChatGPT history.
Jeffrey: Seems like an implementation bug.
Martin: Maybe a feature, since ChatGPT is a UA, and users might want that.
Marcos: But does it save it in private browsing mode? ChatGPT doesn't have that. And it saves customizations into private browsing. .... It will learn about you and respond in a way that is tailored to you.
Jeffrey: We need to justify this kind of comment, or they'll think we just didn't read the explainer.
Martin: There is a case for saving this for the user. Maybe as a user I want to save a log. Maybe to know the resources used. "Must not save" isn't necessarily the right thing to say, except in private browsing.
Jeffrey: It is more that saving must not affect the model operation in ways that might affect privacy. We need to respond to the explainer, not assume that it is incorrect. ... Fingerprinting risk is perhaps another example of that.
Martin: Computation risk is one of my major concerns. Why are users expected to provide the compute? The explainer doesn't really explain this. There's a trade-off there, but they don't really do that analysis.
Jeffrey: Think they'll be able to come up with an argument for that, but fair to say the explainer doesn't do it yet. ...Structured output is interesting. You are right in that this is a hole in how these things work, but that doesn't mean that we shouldn't try. People seem to be taking advantage of the mostly-structured output these things can create.
Marcos: There's a person trying to standardize something that defines the structured inputs and outputs that websites want. Proposing something for TPAC. You prompt the model and also define the structure for the application. The Prompt API doesn't handle that gracefully.
Martin: If you ask for a particular structure, especially the smaller models will occasionally diverge from that. E.g. https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec9bc#response-9 (pelican riding a bike)
Martin: Do you have more to say on progressive enhancement? That section is terse.
Marcos: I can add something.
Martin: Kagame's translation API alternative. Domenic's proposal is straightforward. Kagame's is "here's some DOM, please translate it." Would be entirely declarative. Does that work for a prompting API?
Marcos: Seems like that should work. Lot of subtleties that are lost in just taking text.
Martin: Declarative API would just do what the in-browser translation feature does, which could keep images in place.
Martin: I might disagree with the "difficult to polyfill" point.
Marcos: It's difficult if you don't have access to the large models ...
Jeffrey: The Prompt API proposal is to use a downloadable model, which could be downloaded onto the device.
Marcos: The objections here are assuming that you need a cloud model. Local models aren't very useful.
Jeffrey: We should make that explicit. Could ask the proponents to show that their proposal to use a local model is "good enough".
Martin: Especially on lower-end devices. Speed and output quality.
Marcos: The prompts themselves. The complexity of the prompts can be unbounded.
Jeffrey: Think there are token limits. Summarizer definitely has one.
Marcos: Not even about limits, but the complexity of the query. If you're asking the model to act adversarially.
Martin: Comes down to the quality of the model.
Marcos: Yes.
Comment by @marcoscaceres Aug 26, 2025 (See Github)
Hi Domenic, The TAG acknowledges the interest from implementers and developers in enabling web applications to integrate with generative models through prompting. However, we have significant concerns about the current level of specification detail and its implications for interoperability, security, and long-term sustainability of the platform. We believe the following concerns apply equally to the Writing Assistance APIs, which share many of the same underlying assumptions/models.
Despite what is mandated in the explainer, we have a concern that local models won't meet user and developer and result expectations, and could be severely limited on low-end devices. This problem could get worse over time, as users/developers become increasingly accustomed to crafting extremely complex prompts by today's standards. And if it's just too presumptuous to assume local models will stay "local", given the trend towards hybrid models that send off complicated prompts to servers (even in a privacy preserving way). As noted in the explainer:
“We do not intend to provide guarantees of language model quality, stability, or interoperability between browsers. ... These are left as quality-of-implementation issues.”
— Explainer: Goals
This is a notable departure on the web platform, where developers expect baseline interoperable behavior across user agents. At present, the API lacks a minimal interoperable model contract, has no reliable capability detection or quality signals, and makes assumptions (such as on-device execution or reliable structured output) that cannot be guaranteed across implementations. And given that models do computational work, there's a real threat of distributed computation abuse by exploiting this local computational resource at scale at the cost of users (i.e., answering other people's prompts on a user's device without the user knowing).
We believe further work is needed, particularly in defining testable guarantees, privacy, security, and data governance requirements, and an architectural fit with other emerging platform capabilities, before something like this should become part of the Web platform.
Detailed concerns:
1. Interoperability & Testability
- No shared baseline: As acknowledged in the explainer, the API does not guarantee any consistent model behavior or quality across browsers. This undermines interoperability and prevents developers from writing portable, testable code.
- Opaque model selection: There is no defined way for developers to know what model is in use, making debugging and conformance testing infeasible and errors often impossible to reproduce.
2. Architecture & Web Principles
- Procedural and nondeterministic: The API departs from predictable, repeatable behavior, though we acknowledge that in some context this might not matter.
- Difficult to polyfill: The explainer leaves open whether fallback to wasm/cloud models is even viable. In practice, the lack of standard behavior makes polyfilling impractical (again, because there is no real baseline - it's just whatever media the model gives developers back).
- Threat model missing: Security concerns such as prompt injection, history leakage, and cross-origin contamination are not clearly addressed.
3. Assumptions That May Not Hold
-
On-device vs remote execution: The explainer lists "execution location transparency" as a possible future goal, but not a guarantee:
“It may be desirable to offer more insight into whether a model is executing locally or remotely (e.g. to inform UX or data governance decisions).”
— README.md § Goals: Execution location transparency -
Structured output is unreliable: While structured outputs are discussed as desirable, they are not required (making this a risky feature that could negatively impact users and their ability even use a web page):
“Language models are not guaranteed to produce structurally valid results; efforts to constrain output structure using techniques like prompt templating may be employed...”
— README.md § Prompt lifecycle -
“Tools” abstraction: No such abstraction is described in the explainer. We have concerns about this proposal being overly tied to vendor-specific use cases.
4. Privacy & Security
-
Data governance can be challenging: The explainer notes that training on user data must be prevented, but, even if not used for training, the data is could be stored, transmitted, or retained. This might not be possible because the prompt may be handed off to a service that that does retain data or train the model by default (or by ULA) - or the model has been pre-tuned on purpose by the users.
“The platform MUST prevent user input from being used to fine-tune models or otherwise persist and train on user prompts.”
— Security & Privacy QuestionnaireHowever, there is no mention of how this is enforced, nor of retention policies, which remains a concern.
-
Potential for computation abuse: There are no guardrails discussed around background or opportunistic use of models, which could lead to battery or CPU exhaustion similar to past abuses (e.g. crypto-mining).
-
Fingerprinting risk: Even through the explainer demands local model (again, no guaranteed), prompt content can encode sensitive user data or preferences, introducing new unintentional vectors for surveillance or tracking.
5. Developer Experience & UX
- Inconsistent outputs: Developers cannot expect uniform results across UAs due to model divergence.
- No quality signals: There is no standard mechanism to detect hallucinations or confidence levels.
- No schema support: Inputs and outputs remain freeform text, which makes robust integration difficult.
6. Ecosystem & Governance
- Lack of accountability: With no shared test suite or conformance requirements, there is no path for developers to hold implementations to a common standard.
- Unclear relationship to adjacent specs: The explainer does not clarify how this proposal relates to WebNN, WebGPU, or potential future APIs for structured model execution.
We encourage continued exploration of this space, but recommend the following before the API progresses further:
- Define a minimal, testable capability baseline that all conforming implementations must meet.
- Provide reliable capability detection and developer-facing signals for output quality or confidence.
- Establish a clear threat model and data governance framework, including retention and training boundaries.
- Clarify the API’s relationship with adjacent emerging capabilities and ensure it fits within the layered architecture of the web platform.
- Propose some means to prevent distributed abuse of this computational resource.
The TAG would welcome re-review as these aspects evolve.
Comment by @domenic Aug 26, 2025 (See Github)
Thank you for your feedback. However, your review has several inaccuracies, and most of the parts that are accurate are repeating the same point about nondeterministic outputs.
Here are some specific points:
- Threat model missing: Security concerns such as prompt injection, history leakage, and cross-origin contamination are not clearly addressed.
I think this is not correct. https://webmachinelearning.github.io/writing-assistance-apis/#privacy is quite detailed, including https://webmachinelearning.github.io/writing-assistance-apis/#privacy-user-input which covers your "history leakage" and "cross-origin contamination". "Prompt injection" is not a concern here since the prompts are supplied directly by the developer; there is no API provided for allowing users or uncontrolled web content to inject prompts into otherwise developer-controlled sessions.
You state
- However, there is no mention of how this is enforced, nor of retention policies, which remains a concern.
But I don't think this makes sense. (Perhaps it was generated by an LLM?) It's quite simple how to enforce not feeding data into a model that the developer didn't request; simply don't do it. I don't know what type of specification you are expecting here, beyond what already exists.
Similarly, the idea of "retention policy" is nonsensical, when there is in fact no data retained (as explained in the S&P questionnaire and mandated in the specification). For the model itself, there is extensive discussion of retention, e.g. in https://webmachinelearning.github.io/writing-assistance-apis/#privacy-availability-eviction and elsewhere.
- Structured output is unreliable: While structured outputs are discussed as desirable, they are not required (making this a risky feature that could negatively impact users and their ability even use a web page): > “Language models are not guaranteed to produce structurally valid results; efforts to constrain output structure using techniques like prompt templating may be employed...” > — README.md § Prompt lifecycle
This appears to be a hallucination in whatever AI you are using to write this TAG review, as that text does not appear anywhere and neither does the anchor it links to. Structured outputs are in fact required; see their dedicated section: https://github.com/webmachinelearning/prompt-api/blob/main/README.md#structured-output-with-json-schema-or-regexp-constraints .
- On-device vs remote execution: The explainer lists "execution location transparency" as a possible future goal, but not a guarantee: > “It may be desirable to offer more insight into whether a model is executing locally or remotely (e.g. to inform UX or data governance decisions).” > — README.md § Goals: Execution location transparency
Similarly a hallucination. The API is in fact designed to allow "execution location transparency", as you put it. The closest text in the goals section is actually debating whether or not we should pierce transparency, by exposing to developers where the model is running. (With the tentative conclusion that we should instead preserve transparency and not expose this information.)
- Potential for computation abuse: There are no guardrails discussed around background or opportunistic use of models, which could lead to battery or CPU exhaustion similar to past abuses (e.g. crypto-mining).
This is false. See https://webmachinelearning.github.io/writing-assistance-apis/#security-runtime .
- Fingerprinting risk: Even through the explainer demands local model (again, no guaranteed), prompt content can encode sensitive user data or preferences, introducing new unintentional vectors for surveillance or tracking.
This is false. Prompt content is not retained.
- No schema support: Inputs and outputs remain freeform text, which makes robust integration difficult.
This is false, as explained in https://github.com/webmachinelearning/prompt-api/blob/main/README.md#structured-output-with-json-schema-or-regexp-constraints .
- Lack of accountability: With no shared test suite or conformance requirements, there is no path for developers to hold implementations to a common standard.
We are developing a shared test suite, some of which is already available on wpt.fyi.
- Unclear relationship to adjacent specs: The explainer does not clarify how this proposal relates to WebNN, WebGPU, or potential future APIs for structured model execution.
Although in theory we could expand this more, it's worth noting that it is discussed, e.g. in https://github.com/webmachinelearning/prompt-api/blob/main/README.md#execution-location-transparency:~:text=Currently%2C%20web%20developers%20wishing%20to%20use%20language%20models%20must%20either%20call%20out%20to%20cloud%20APIs%2C%20or%20bring%20their%20own%20and%20run%20them%20using%20technologies%20like%20WebAssembly%20and%20WebGPU .
Comment by @lolaodelola Aug 26, 2025 (See Github)
Hi @domenic, We've seen your response and have reopened the issue. We'll discuss how to address this at our plenary in 12 hours.
Comment by @lolaodelola Aug 27, 2025 (See Github)
Hi @domenic,
We reviewed the situation in our plenary this morning and we accept full responsibility for this oversight. We're sorry for not giving a thorough, accurate and considered review of the Prompt API. Moving forward we will be re-reviewing the proposed design, the issue will remain open while we do so and we rescind our previous resolution. Please bear with us in the meantime.
OpenedMay 14, 2025
こんにちは TAG-さん!
I'm requesting an early TAG design review of the Prompt API.
This is a JavaScript API for directly prompting an AI language model, including text, image, and audio inputs.
Although the Chrome built-in AI team is exploring purpose-built APIs for some use cases (#948, #991), we are also exploring a general-purpose "prompt API" which allows web developers to prompt a language model directly. This gives web developers access to many more capabilities, at the cost of requiring them to do their own prompt engineering.
You should also know that...
Although this feature exposes an on-device language model, the language model is never trained on, and does not have access to, any local user-specific data. (Sometimes people seem to be confused by this, so we're trying to include this disclaimer in as many places as possible.)