Mozilla comments: Not yet, (unofficially favorable)
WebKit comments: Not yet
Major unresolved issues with or opposition to this design: N/A
You should also know that...
Summary: Extends the SpeechRecognition interface by adding a 'quality' property to SpeechRecognitionOptions. This allows developers to specify the semantic capability required for on-device recognition (via processLocally: true). The proposed quality enum supports three levels: 'command', 'dictation', and 'conversation'.
Christian: Marcos mentioned this has the same issue as PromptAPI, you can query if there is a model present on the device suitable for a certain level/language, this could be a fingerprint vector. I want to reccomend they look at Prompt API review.
Ehsan: Isn't this the same problem with all language based models? Maybe we can have a consistent answer.
Lola: Do you suggest we should have a document on language-based models?
Ehsan: That would be my suggestion. Come up with a document that describes all of that. Should be done at the WebML groups. Think this is coming up more often.
Marcos: It’s a more general problem of downloading system components which then you can query, because then they become global.
Lola: To make this even more general, should we have a position on downloading system components? Or is this restricted to this use case?
Marcos: No, could be related to everything. Codecs, etc. Should be a design principle.
Lola: Who would be willing to write that? We also have another plenary before the F2F.
Christian: Could offer to do that, would be my first design principle, and a topic where I’m interested in.
Ehsan: Same here. Would be good to have a more experienced TAG member on that as well.
Lola: Design principles is owned by Jeffrey, so we can talk to him about that.
Christian: We asked a question here. There is the fingerprint-ability concern. We also talked about when we review the prompt API, thinks that you can basicaly download AI models to the system as a global compenent, and in choosing languages and other options, it becomes fingerprintable.
That was our first reaction. Web speech API with local processing is already there, it exists. It can already download AI modules and the previous tagest with concerns to that. so now they are just adding the quality level, which is a minor change but adds to the fingerprintability.
We are now waiting for a response.
Hadley: maybe if we don't hear back next week, let's nudge?
Christian: sure
Comment by @christianliebel Feb 19, 2026 (See Github)
Hi @evanbliu, thank you for your proposal.
We have one question regarding privacy: Could an attacker fingerprint the user’s browsing history by installing certain or rare languages along with model qualities on site A, and checking for the availability of those permutations on site B? And if so, how is that fingerprinting concern mitigated?
It would be great if you could add a security & privacy questionnaire and answer that question.
Thanks for raising this! It's worth noting that this proposal doesn't actually introduce on-device speech recognition, as it's already part of the existing spec. The same fingerprinting concerns you brought up are already present for current on-device speech recognition and are are mitigated by the countermeasures detailed in this PR: https://github.com/WebAudio/web-speech-api/pull/165. These mitigations are based on those developed for the Writing Assistance APIs (https://webmachinelearning.github.io/writing-assistance-apis/).
Let me know if you have any concerns or questions!
Christian: this is related to the Global Browser Component topic we discussed in our f2f. Previous TAG said saistisfied or satisified with converns, but not for the baicl local web speech API. Now they want to extend that local API by giving you a quality property, the level of speech recognition quality. Marcos looked at the pull request. Hard to say anything other than satisfied by concerns because we already said that.
Marcos: perfect summary.
Christian: shall we close it with satisfied with concerns, where the concerns being fingerprinting?
Christian: We feel this will be satisfied with concerns. We asked them to add fingerprintabiltiy information, but this was ignored. They said it's been covered already, but adding the quality dimension adds a lot of fingerprinting vector.
Marcos: We can see the problem again where we're closing issues but then have no mechanism to follow up - e.g. Christian said it would be nice to see certain things. Maybe we should file bugs relating to those concerns? Then they can't avoid the feedback we give them.
Lola: We've spoken about this in relation to other issues.
Matt: We have process for what you just described Marcos. We can see in one place when groups have closed issues, if they've addressed resolutions, etc. We have it and we could use it.
Christian: Should we try out that process here?
Matt: I'm working on guidance for how to do this, I'm happt to take that and make it applicable for TAG but docs are needed. Happy to take it on and work with chairs.
Lola: Let's write the closing comment for this, and then figure out with chairs, Matthew, and anyone else, how to track things.
Christian: Last time we weren't sure what to do about the fingerprinting risks. It tends to make fingreprinting worse. The proponents ignored our request to add a S&P section. The alterntive was that we create an issue in their repo to track it there.
Marcos: S&P is a requirement for any W3C spec.
Matthew: Could show you how to track issues, if somebody is interested. Reach out to me if you want to learn how tracking works.
Ehsan: Curious to know if they have communicated the reason for not adding S&P section? The choice of on-device processign reflects an interest in privacy.
Christian: Their argument is that having a local model means their S&P considerations are already there.
Marcos: They mention mitigations; they've not landed.
There's a larger concern around Web Speech that contributions are Chrome-only, and lack of implementation commitments. Though I see some from Mozilla.
Yves: It would fail the TR '2+ implementations' test if that was a problem at transition time.
Marcos: This is confusing as it seems to be incubation work, but being done within a WG. Checks charter. Will file a bug.
... They're saying they think they solved it through un-merged PRs on the spec. We could push back and say it'd be helpful to give people a summary in the explainer, as it's confusing.
Christian: We could conclude the review here, and ask them to add a link in the explainer at that time.
Matthew: Question is, we would be happy if the PRs are merged. "Assuming the PRs are merged, and the link is added…"
Does anyone have any other comments on this issue? I've included responses to the security & privacy questionnaire below:
2.1. What information does this feature expose, and for what purposes?
Exposure: The feature exposes the availability of specific on-device speech recognition capabilities (categorized as 'command', 'dictation', or 'conversation') for a given language.
Purpose: This exposure allows web developers to specify the semantic capability required for local, on-device speech recognition (when processLocally: true is utilized). This helps optimize the underlying engine's performance, accuracy, and power consumption based on the specific task the user is performing.
2.2. Do features in your specification expose the minimum amount of information necessary to implement the intended functionality?
Yes. The proposal restricts the exposure to a simple, predefined enum of three distinct quality levels. It does not expose granular details about the user's specific hardware, the exact machine learning models installed, or the underlying operating system's native speech APIs.
2.6. Do the features in your specification expose information about the underlying platform to origins?
This API does not introduce on-device speech recognition itself (which is already part of the existing spec). The fingerprinting concerns associated with model availability are addressed via mitigations detailed in WebAudio/web-speech-api#165. These countermeasures are modeled after the Writing Assistance APIs, which typically mitigate this by downloading models on demand (rather than revealing pre-installed state) or by standardizing the availability of core models to reduce entropy.
2.7. Does this specification allow an origin to send data to the underlying platform?
Yes. The proposal allows an origin to pass a specific quality constraint through the browser to the underlying platform's local speech recognition engine to configure how the audio stream is processed.
2.8. Do features in this specification enable access to device sensors?
Yes (Inherited). While this specific proposal only adds an options property, the underlying Web Speech API intrinsically requires access to the device's microphone. This proposal relies entirely on the existing permissions model, user prompts, and security indicators currently established for microphone access in the browser. It does not introduce new sensor access mechanisms.
2.13. How does this specification distinguish between behavior in first-party and third-party contexts?
The proposal does not explicitly introduce new behaviors for third-party contexts. However, like the broader Web Speech API, microphone access (and therefore the ability to use this feature) should be governed by Permissions Policy. Third-party iframes would require explicit delegation (e.g., allow="microphone") from the first-party context to utilize speech recognition at any quality level.
2.14. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?
The API should function similarly to standard browsing, provided the user grants microphone permissions. However, to prevent cross-session tracking, browsers may need to apply stricter model-download heuristics in Private Browsing. For example, if a specific 'dictation' model for a rare language is downloaded during an Incognito session, the browser must ensure that the availability of this newly cached model is not exposed to subsequent standard browsing sessions, and vice versa, to prevent linking the two profiles.
OpenedFeb 2, 2026
Explainer
https://github.com/WebAudio/web-speech-api/blob/main/explainers/quality-levels.md
The explainer
Where and by whom is the work is being done?
Feedback so far
You should also know that...
Summary: Extends the SpeechRecognition interface by adding a 'quality' property to SpeechRecognitionOptions. This allows developers to specify the semantic capability required for on-device recognition (via processLocally: true). The proposed quality enum supports three levels: 'command', 'dictation', and 'conversation'.
Specification URL: https://webaudio.github.io/web-speech-api
<!-- Content below this is maintained by @w3c-tag-bot -->Track conversations at https://tag-github-bot.w3.org/gh/w3ctag/design-reviews/1189