#1093: Prompt API

Visit on Github.

Opened May 14, 2025

こんにちは TAG-さん!

I'm requesting an early TAG design review of the Prompt API.

This is a JavaScript API for directly prompting an AI language model, including text, image, and audio inputs.

Although the Chrome built-in AI team is exploring purpose-built APIs for some use cases (#948, #991), we are also exploring a general-purpose "prompt API" which allows web developers to prompt a language model directly. This gives web developers access to many more capabilities, at the cost of requiring them to do their own prompt engineering.

You should also know that...

Although this feature exposes an on-device language model, the language model is never trained on, and does not have access to, any local user-specific data. (Sometimes people seem to be confused by this, so we're trying to include this disclaimer in as many places as possible.)

Discussions

Log in to see TAG-private discussions.

Discussed May 19, 2025 (See Github)

[added reviewers]

Discussed May 19, 2025 (See Github)

[added reviewers]

Discussed May 26, 2025 (See Github)

Martin: Most implementations involve picking a model for their browser - i.e. their in-house one. This is seeking to drive adoption of models based on the existing market share of browsers.

DanA: What's the user need in this case? I select some text and want AI to comment on it?

Martin: This is unclear to me. For the prompting API, it's trying to make a generic chatbot capabiltiy for a site, using a model that's running locally. So you can do anything you may want to do with an LLM, not just summarization or writing assistance.

DanA: But the browser, rather than the site, will be choosing the LLM.

Martin: Ostensibly they are meant to provdie the same features - but the user experience will vary a lot. They may also use different models depending on device (mobile vs desktop) within a browser.

Christian: I think it's good and needed - sentiment analysis; data categorization; extracting info from email; multimodal pasting capabilities. I understand the concern that it binds the browser to a model - my understanding is that is not inherent in the API. Could use the model that's already on the device. This may be the current implementation but it's not necessarily the case.

Martin: I understand that in the Chrome case they are not going to use the underlying platform model.

Xiaocheng: Thinking about user need: the prompt API may be general enough to drive other lower-level APIs and build a full AI app. The potential is huge here. I am still wondering about the question I raised before which is 'what does interoperability' mean here? We should not expect the API to produce the same output across browsers, or across devices within a browser, or even across sessions.

Martin: The question is 'how would you write a test case?'

Xiaocheng: You could use another model to test it.

Martin: reward hacking

DanA: Assumptions about user expectations? I'm in a webpage... it feels to me like .. If that page wants to create a chatbot experience, so it uses this API. That experience already exists on many sites. I can choose models on some sites. Lots of capabilities, all running on servers. How do we make sure that information doesn't leak between origins?

Martin: this is part of why the Chrome team doesn't want to use the platform level API - to avoid such leakage. The platform API might take inputs into a common context that is not origin-aware.

DanA: Depends on the threat model. We need to find out how to respond appropriately to this.

Matthew: how much can this reasonably do that's use ful for the user? If it's only on the client side then the user won't be able to ask questions about past interactions with the particular site because the only info it has is on the current page. Also concerned about leakage issues as discusssed.

DanA: This is part of my concern - what is the actual user need here?

Christian: I would disagree here; the idea is using an on-device model, which is better for privacy and security, and supports more use cases. I would rather trust a local model.

DanA: Good point and there are energy arguments for this as well, as you're making the processing happen locally. Leveraging capabilities of the device. So we can't just say this is a bad idea, but it should be focused on the user need, and should fit the web. You only get that benefit if you're able to use the platform API.

Discussed Jul 14, 2025 (See Github)

Max: Last week we thought this might cause interoperability problems, because not every model will have the same features. Marcos drafted something.

Jeffrey: Some pieces of Marcos' draft feedback are good, but other parts don't make sense. "The presumption that the model is on device is wrong" can't be right, since the proponents say they're proposing to keep the model on the device. We probably can't agree to post anything here without Marcos.

Discussed Jul 28, 2025 (See Github)

Jeffrey: It's incorrect to claim that this API

Marcos: Apple's platform doesn't guarantee it stays on device. It's irrelevant if the model is on- or off-device.

Marcos: It's privacy-preserving to send it off device. Apple claims it doesn't matter.

Jeffrey: Do we need to comment on whether it's off-device, if Apple believes ?

Dan: Like with Web Speech, on-device might be weird with unusual browser architectures.

Marcos: Thing that's proposed suggests downloading N GB things, even if they're only downloaded once. In those cases, it might be in the background but processed on server while it's being downloaded.

Marcos: "Tools" and tooling properrties got very complicated. Would be great to get even a small part to be uncontentious. And output formats are a bit presumptuous. Can't guarantee that models will produce a particular output format. E.g. on Apple, if you tell Apple Intelligence to rewrite an email in a particular way. It then puts the subject into the body of the email. Doesn't know how to format things. ChatGPT too. All pretty klunky now. Don't know about the Android side.

Jeffrey: Think we should say that the spec can require that it's "as if" it runs on-device, but a browser that can provide the same privacy properties in the cloud should be allowed. Think we can make that a MUST.

Marcos to draft a comment.

Discussed Aug 4, 2025 (See Github)

Marcos: We have a rough idea. Challenges that arise with this API seem the same as for Writing Assistance APIs. We want to time out the Writing Assistance APIs. Think there's consensus that the issues have been captured. Anyone have any further concerns/opinions?

Ehsan: Might have a question what a prompt is. If there's a malicious web page, can they display an item with a malicious prompt in it? Still articulating the feedback and reading the spec.

Marcos: We've covered that in the feedback. You could abuse the prompt and do mining.

Martin: That's my core problem with this proposal, and I'm not sure if we will get consensus on this.

Ehsan: There are research papers with hidden prompts in them. Infusing a potential LLM to give the paper a positive review.

Marcos: That sounds like a more general societal problem. It's why we have peer review in academic contexts.

… Not sure if this is related to the prompt. Reminds me of the bias of ChatGPT etc. to be on your side. You can prompt it to be less on your side. Part of the prompt engineering part of things, but not really related to the Prompt API. Does the model sway that way? Probably. Good counter-example for this is X's Grok.

Marcos to draft the closing comment.

Discussed Aug 11, 2025 (See Github)

Marcos has a message to post. It looks like ChatGPT wrote it, which I wouldn't be proud of, but it's there.

Discussed Aug 18, 2025 (See Github)

Marcos: I put together a position, which might be ready to publish.

Jeffrey: Data governance?

Marcos: LLM brings this into question. Fonts question, model might be tuned based on your preferences. Whether there's a way to stop it from doing that is unclear.

Jeffrey: I think they clearly say it must not save the user's data.

Marcos: Don't know if that's possible.

Martin: This is a UA. So when the input goes in, that's data provided by the website, and it's working for the person using hte computer.

MArcos: Example I ran into today, an input field using Apple Intelligence. It shows up in ChatGPT history.

Jeffrey: Seems like an implementation bug.

Martin: Maybe a feature, since ChatGPT is a UA, and users might want that.

Marcos: But does it save it in private browsing mode? ChatGPT doesn't have that. And it saves customizations into private browsing. .... It will learn about you and respond in a way that is tailored to you.

Jeffrey: We need to justify this kind of comment, or they'll think we just didn't read the explainer.

Martin: There is a case for saving this for the user. Maybe as a user I want to save a log. Maybe to know the resources used. "Must not save" isn't necessarily the right thing to say, except in private browsing.

Jeffrey: It is more that saving must not affect the model operation in ways that might affect privacy. We need to respond to the explainer, not assume that it is incorrect. ... Fingerprinting risk is perhaps another example of that.

Martin: Computation risk is one of my major concerns. Why are users expected to provide the compute? The explainer doesn't really explain this. There's a trade-off there, but they don't really do that analysis.

Jeffrey: Think they'll be able to come up with an argument for that, but fair to say the explainer doesn't do it yet. ...Structured output is interesting. You are right in that this is a hole in how these things work, but that doesn't mean that we shouldn't try. People seem to be taking advantage of the mostly-structured output these things can create.

Marcos: There's a person trying to standardize something that defines the structured inputs and outputs that websites want. Proposing something for TPAC. You prompt the model and also define the structure for the application. The Prompt API doesn't handle that gracefully.

Martin: If you ask for a particular structure, especially the smaller models will occasionally diverge from that. E.g. https://gist.github.com/simonw/25e7b7afd6a63a2f15db48b3a51ec9bc#response-9 (pelican riding a bike)

Martin: Do you have more to say on progressive enhancement? That section is terse.

Marcos: I can add something.

Martin: Kagame's translation API alternative. Domenic's proposal is straightforward. Kagame's is "here's some DOM, please translate it." Would be entirely declarative. Does that work for a prompting API?

Marcos: Seems like that should work. Lot of subtleties that are lost in just taking text.

Martin: Declarative API would just do what the in-browser translation feature does, which could keep images in place.

Martin: I might disagree with the "difficult to polyfill" point.

Marcos: It's difficult if you don't have access to the large models ...

Jeffrey: The Prompt API proposal is to use a downloadable model, which could be downloaded onto the device.

Marcos: The objections here are assuming that you need a cloud model. Local models aren't very useful.

Jeffrey: We should make that explicit. Could ask the proponents to show that their proposal to use a local model is "good enough".

Martin: Especially on lower-end devices. Speed and output quality.

Marcos: The prompts themselves. The complexity of the prompts can be unbounded.

Jeffrey: Think there are token limits. Summarizer definitely has one.

Marcos: Not even about limits, but the complexity of the query. If you're asking the model to act adversarially.

Martin: Comes down to the quality of the model.

Marcos: Yes.

Discussed Aug 25, 2025 (See Github)

Marcos: Domenic commented on the Writing comment. But we're not going deeply and instead focusing on the overall issues. Before we'd dive into the smaller issues, we'd have to accept that the overall idea is good.

Martin: In these sorts of scenarios, we're expecting a model, system prompt from the browser, and then an application-level set of instructions. And then the prompt. Is that right?

Marcos: My understanding is that there's a role, inputs, outputs, query coming in.

Martin: Concerned about alignment. We're relying on prompting for alignment. == "It does what 'who?' wants it to do?" What the site wants is one thing, and what's good for users is potentially another thing. It's an API for the site, so maybe that's the most important.

Marcos: You can signal who it's for. Can say the user's prompting or the site's prompting. As the site, you could ask for a summary of some data or answer the query.

Jeffrey: Does this concern need to be in the comment?

Marcos: Think not; focus on the broader stuff. The explainer has 'role'. e.g. role=system, role=user.

Martin: Isn't that just to supply context? "We've had the following conversation in the past."

Marcos: Who said what. Model might understand who they are.

Serena: Who's providing the role here? The website?

Martin: Yes, so it can put words in the assistant's mouth that never came out of the assistant. Can do interesting prompt injection this way.

Jeffrey: Any harm from that?

Martin: Don't think it's bad for the user to do that. Site could do it anyway. If site does this to itself, does it expose itself to more prompt injection? Just an application integrity question and not a major problem.

Jeffrey: Is this ready to post?

Martin + Jeffrey: Go ahead.

Marcos to post.

Comment by @marcoscaceres Aug 26, 2025 (See Github)

Hi Domenic, The TAG acknowledges the interest from implementers and developers in enabling web applications to integrate with generative models through prompting. However, we have significant concerns about the current level of specification detail and its implications for interoperability, security, and long-term sustainability of the platform. We believe the following concerns apply equally to the Writing Assistance APIs, which share many of the same underlying assumptions/models.

Despite what is mandated in the explainer, we have a concern that local models won't meet user and developer and result expectations, and could be severely limited on low-end devices. This problem could get worse over time, as users/developers become increasingly accustomed to crafting extremely complex prompts by today's standards. And if it's just too presumptuous to assume local models will stay "local", given the trend towards hybrid models that send off complicated prompts to servers (even in a privacy preserving way). As noted in the explainer:

“We do not intend to provide guarantees of language model quality, stability, or interoperability between browsers. ... These are left as quality-of-implementation issues.”
Explainer: Goals

This is a notable departure on the web platform, where developers expect baseline interoperable behavior across user agents. At present, the API lacks a minimal interoperable model contract, has no reliable capability detection or quality signals, and makes assumptions (such as on-device execution or reliable structured output) that cannot be guaranteed across implementations. And given that models do computational work, there's a real threat of distributed computation abuse by exploiting this local computational resource at scale at the cost of users (i.e., answering other people's prompts on a user's device without the user knowing).

We believe further work is needed, particularly in defining testable guarantees, privacy, security, and data governance requirements, and an architectural fit with other emerging platform capabilities, before something like this should become part of the Web platform.

Detailed concerns:

1. Interoperability & Testability

  • No shared baseline: As acknowledged in the explainer, the API does not guarantee any consistent model behavior or quality across browsers. This undermines interoperability and prevents developers from writing portable, testable code.
  • Opaque model selection: There is no defined way for developers to know what model is in use, making debugging and conformance testing infeasible and errors often impossible to reproduce.

2. Architecture & Web Principles

  • Procedural and nondeterministic: The API departs from predictable, repeatable behavior, though we acknowledge that in some context this might not matter.
  • Difficult to polyfill: The explainer leaves open whether fallback to wasm/cloud models is even viable. In practice, the lack of standard behavior makes polyfilling impractical (again, because there is no real baseline - it's just whatever media the model gives developers back).
  • Threat model missing: Security concerns such as prompt injection, history leakage, and cross-origin contamination are not clearly addressed.

3. Assumptions That May Not Hold

  • On-device vs remote execution: The explainer lists "execution location transparency" as a possible future goal, but not a guarantee:

    “It may be desirable to offer more insight into whether a model is executing locally or remotely (e.g. to inform UX or data governance decisions).”
    README.md § Goals: Execution location transparency

  • Structured output is unreliable: While structured outputs are discussed as desirable, they are not required (making this a risky feature that could negatively impact users and their ability even use a web page):

    “Language models are not guaranteed to produce structurally valid results; efforts to constrain output structure using techniques like prompt templating may be employed...”
    README.md § Prompt lifecycle

  • “Tools” abstraction: No such abstraction is described in the explainer. We have concerns about this proposal being overly tied to vendor-specific use cases.

4. Privacy & Security

  • Data governance can be challenging: The explainer notes that training on user data must be prevented, but, even if not used for training, the data is could be stored, transmitted, or retained. This might not be possible because the prompt may be handed off to a service that that does retain data or train the model by default (or by ULA) - or the model has been pre-tuned on purpose by the users.

    “The platform MUST prevent user input from being used to fine-tune models or otherwise persist and train on user prompts.”
    Security & Privacy Questionnaire

    However, there is no mention of how this is enforced, nor of retention policies, which remains a concern.

  • Potential for computation abuse: There are no guardrails discussed around background or opportunistic use of models, which could lead to battery or CPU exhaustion similar to past abuses (e.g. crypto-mining).

  • Fingerprinting risk: Even through the explainer demands local model (again, no guaranteed), prompt content can encode sensitive user data or preferences, introducing new unintentional vectors for surveillance or tracking.

5. Developer Experience & UX

  • Inconsistent outputs: Developers cannot expect uniform results across UAs due to model divergence.
  • No quality signals: There is no standard mechanism to detect hallucinations or confidence levels.
  • No schema support: Inputs and outputs remain freeform text, which makes robust integration difficult.

6. Ecosystem & Governance

  • Lack of accountability: With no shared test suite or conformance requirements, there is no path for developers to hold implementations to a common standard.
  • Unclear relationship to adjacent specs: The explainer does not clarify how this proposal relates to WebNN, WebGPU, or potential future APIs for structured model execution.

We encourage continued exploration of this space, but recommend the following before the API progresses further:

  1. Define a minimal, testable capability baseline that all conforming implementations must meet.
  2. Provide reliable capability detection and developer-facing signals for output quality or confidence.
  3. Establish a clear threat model and data governance framework, including retention and training boundaries.
  4. Clarify the API’s relationship with adjacent emerging capabilities and ensure it fits within the layered architecture of the web platform.
  5. Propose some means to prevent distributed abuse of this computational resource.

The TAG would welcome re-review as these aspects evolve.

Comment by @domenic Aug 26, 2025 (See Github)

Thank you for your feedback. However, your review has several inaccuracies, and most of the parts that are accurate are repeating the same point about nondeterministic outputs.

Here are some specific points:

  • Threat model missing: Security concerns such as prompt injection, history leakage, and cross-origin contamination are not clearly addressed.

I think this is not correct. https://webmachinelearning.github.io/writing-assistance-apis/#privacy is quite detailed, including https://webmachinelearning.github.io/writing-assistance-apis/#privacy-user-input which covers your "history leakage" and "cross-origin contamination". "Prompt injection" is not a concern here since the prompts are supplied directly by the developer; there is no API provided for allowing users or uncontrolled web content to inject prompts into otherwise developer-controlled sessions.

You state

  • However, there is no mention of how this is enforced, nor of retention policies, which remains a concern.

But I don't think this makes sense. (Perhaps it was generated by an LLM?) It's quite simple how to enforce not feeding data into a model that the developer didn't request; simply don't do it. I don't know what type of specification you are expecting here, beyond what already exists.

Similarly, the idea of "retention policy" is nonsensical, when there is in fact no data retained (as explained in the S&P questionnaire and mandated in the specification). For the model itself, there is extensive discussion of retention, e.g. in https://webmachinelearning.github.io/writing-assistance-apis/#privacy-availability-eviction and elsewhere.

  • Structured output is unreliable: While structured outputs are discussed as desirable, they are not required (making this a risky feature that could negatively impact users and their ability even use a web page): > “Language models are not guaranteed to produce structurally valid results; efforts to constrain output structure using techniques like prompt templating may be employed...” > — README.md § Prompt lifecycle

This appears to be a hallucination in whatever AI you are using to write this TAG review, as that text does not appear anywhere and neither does the anchor it links to. Structured outputs are in fact required; see their dedicated section: https://github.com/webmachinelearning/prompt-api/blob/main/README.md#structured-output-with-json-schema-or-regexp-constraints .

  • On-device vs remote execution: The explainer lists "execution location transparency" as a possible future goal, but not a guarantee: > “It may be desirable to offer more insight into whether a model is executing locally or remotely (e.g. to inform UX or data governance decisions).” > — README.md § Goals: Execution location transparency

Similarly a hallucination. The API is in fact designed to allow "execution location transparency", as you put it. The closest text in the goals section is actually debating whether or not we should pierce transparency, by exposing to developers where the model is running. (With the tentative conclusion that we should instead preserve transparency and not expose this information.)

  • Potential for computation abuse: There are no guardrails discussed around background or opportunistic use of models, which could lead to battery or CPU exhaustion similar to past abuses (e.g. crypto-mining).

This is false. See https://webmachinelearning.github.io/writing-assistance-apis/#security-runtime .

  • Fingerprinting risk: Even through the explainer demands local model (again, no guaranteed), prompt content can encode sensitive user data or preferences, introducing new unintentional vectors for surveillance or tracking.

This is false. Prompt content is not retained.

  • No schema support: Inputs and outputs remain freeform text, which makes robust integration difficult.

This is false, as explained in https://github.com/webmachinelearning/prompt-api/blob/main/README.md#structured-output-with-json-schema-or-regexp-constraints .

  • Lack of accountability: With no shared test suite or conformance requirements, there is no path for developers to hold implementations to a common standard.

We are developing a shared test suite, some of which is already available on wpt.fyi.

  • Unclear relationship to adjacent specs: The explainer does not clarify how this proposal relates to WebNN, WebGPU, or potential future APIs for structured model execution.

Although in theory we could expand this more, it's worth noting that it is discussed, e.g. in https://github.com/webmachinelearning/prompt-api/blob/main/README.md#execution-location-transparency:~:text=Currently%2C%20web%20developers%20wishing%20to%20use%20language%20models%20must%20either%20call%20out%20to%20cloud%20APIs%2C%20or%20bring%20their%20own%20and%20run%20them%20using%20technologies%20like%20WebAssembly%20and%20WebGPU .

Comment by @lolaodelola Aug 26, 2025 (See Github)

Hi @domenic, We've seen your response and have reopened the issue. We'll discuss how to address this at our plenary in 12 hours.

Comment by @lolaodelola Aug 27, 2025 (See Github)

Hi @domenic,

We reviewed the situation in our plenary this morning and we accept full responsibility for this oversight. We're sorry for not giving a thorough, accurate and considered review of the Prompt API. Moving forward we will be re-reviewing the proposed design, the issue will remain open while we do so and we rescind our previous resolution. Please bear with us in the meantime.

Discussed Sep 1, 2025 (See Github)

Martin: Were some of those quotes in earlier versions of the explainer?

Marcos: Maybe. The schema thing might have been added later. Max and I went through it together and put notes. I got ChatGPT to rewrite some of this and add links. But it didn't invent anything. If anything, the humans hallucinated. The retention policy was predicated on the LLM going to a remote server.

Martin: Was this always always local?

Marcos: It was always intended to be local, but I was asking if a local model could meet quality expectations. Lots depend on that question. The presumption of this being on-device only holds until local models decide to hand off to a server, which is what some might/will(?) do. ...The thing about structured output. Might need to chase history; it wasn't there when we went through originally. Constraints with regex/schema was not there. ...The whole thing about user transparency was not hallucination. ...Computation abuse was not covered. ...Fingerprinting - can't see how that was false. This might exploit hardware that differs between people. That is a fingerprinting risk potentially.

Martin: That would depend on getting reliably different results for different people based on choice of model, choice of hardware, or other things like that.

Marcos: Not all hallucinations; we missed stuff, but there are a number of concerns that remain. Resolution can be validated and we can move on.

Martin: The packaging of the message obscured the actual message. Need to make sure that the key points you highlight here are re-made, maybe in a more succint form.

Marcos: Looking for the history. Seems strange that it would have taken text that wasn't there.

Martin: The text reads plausibly, which is a trait of a lot of LLMs.

Marcos: Didn't change the messaging of the original thing, just trying to link up the text. 2 parts: re-respond, and clarifying various things. Happy to remove the linked up text. On the TAG part, not being reactionary. We're going to annoy people, and being reactionary on this wasn't super helpful.

...

Marcos: Will look into the JSON Schema part. Re-review the rest of the review. There's an overall concern about doing queries with processing unbeknownst to the user.

Jeffrey: Using local power to answer queries?

Marcos: Yes.

Martin: Somewhat a security vulnerability. The other party is exposed to things the user's LLM tells their systems to do.

Jeffrey: Bitcoin miners.

Marcos: Don't want to get into a back-and-forth.

Jeffrey: I'll review it and find ways to say things that won't/can't be argued with.

Marcos: I'll draft a response in the brainstorming.

Jeffrey: I plan to write a thing on how to generate review responses using LLMs. Christian had thoughts and experience to share. Idea was to not give the output of the LLM to people to read. Needs more discussion once it has been proposed.

Discussed Sep 29, 2025 (See Github)

Issue Triage

Discussed Oct 6, 2025 (See Github)

Jeffrey: Think the explainer should discuss the choice of local or remote in more detail.

Marcos: Think it shouldn't discuss it at all. So many attacks. Anne was unhappy that it exposed the download progress. It affects the whole browser. And language packs are a fingerprinting vector. Major design flaw with the API is that it goes into memory management. Presumptions about sessions that might be large. Could be true, but not for the developerr to decide or manage the memory. No indication in the session about the available size. Want it to move away from talking about that at all.

Jeffrey: All models have a token limit?

Marcos: Model quota is fine, but local vs remote shouldn't matter. Results should be transparent. Developer shouldn't need to manage things they don't know. Even if it's downloaded, they shouldn't be able to know that. If I'm roaming, the browser might ask, but the website shouldn't be able to monitor that.

Jeffrey: We had a discussion around Web Speech, where developers wanted to ensure the data stayed local.

Marcos: That's a different problem. EULA governing the browser would cover that. The website would need to avoid using the API at all, since it can't govern the data once it goes into the platform. Another example: geolocation: Firefox uses Skyhook, and that's part of the bargain when you download Firefox. Your data goes to a company, and you can't do anything about it.

Jeffrey: Your point is that the API should be transparent between cloud and local process. Don't know if the browser is willing to pay for cloud processing. AI models are more expensive than location services.

Marcos: Might not be true forever. Quantum comuting might make this very cheap.

Serena: Would be good to plan for what we know is real now.

Marcos: Some browser companies have a relationship with good model companies. What about smaller browsers?

Jeffrey: The proponents argued that there are good-enough free models. Don't think that's in the explainer, and they should substantiate it.

Marcos: Model quality matters.

Jeffrey: Maybe we ask the proponents to substantiate that free models are good enough that sites don't need to UA-sniff.

Marcos: Users are the ones who need to decide if it's good enough.

Jeffrey: Users could plug in their own models.

Marcos: Developer tests with Chrome. Opera has a low-quality model. Developer doesn't test and doesn't want to run on the server. Real chance to exclude people based on their browser choice. Web developers should care about users, but often they don't.

Serena: Summarizing: saying that because this proposal means that each browser chooses their own model. Because the cost of creating a good model differs so much between mid-range and good. Cost might be prohibitive, and user experience could be drastically different between browsers. And that's bad for users and developers because they can't guarantee quality.

Jeffrey: I question the quality of even the best models, but users can pick their browsers.

Marcos: Some governments will only design for Chrome and Firefox. User doesn't have choice.

Jeffrey: If it's the site picking the browser, presumably the site has tested that browser's model. So users at least won't be hurt by having to use a worse model than the site has tested.

Marcos: We should require that all browsers have a certain quality, and no other API has such varying performance.

Serena: I wonder how we even define the quality bar. Output is so non-deterministic.

Marcos: "Reasonable person".

Serena: What if 10% of the time it's not good enough, but 90% of the time it is. Don't know how someone proposing a model, would they just cherry-pick? How does one prove that the open-source model will be good enough?

Marcos: That's the debate.

Dan: Changing quickly. People's expectation of a "good" model is evolving rapidly.

Jeffrey: I'm wondering how we give actionable feedback. If we just say "don't do this", we'll get ignored. If we say "do this in this other way".

Marcos: We can say "consider this".

Jeffrey: I think if our feedback says "please discuss this, this, this in the explainer", that'll have an effect. Especially if that's the opening of the comment.

We'll iterate more on the draft comment.