I'm requesting a TAG review of on-device support for the Web Speech API.
This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed.
Security and Privacy self-review²:
Relevant survey questions:
2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?
This feature would expose if on-device speech recognition is available in a specific language. This is required in order for websites to know if on-device speech recognition is available.
2.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses?
Yes. Some websites may have strict privacy requirements that require on-device speech recognition so websites must know if it's possible to ensure that neither audio nor captions are sent to a third-party service for processing.
2.6. Do the features in your specification expose information about the underlying platform to origins?
While this feature does not directly expose information about the underlying platform, websites may potentially use performance metrics for on-device speech recognition to gauge general hardware capability.
2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?
Yes, the spec contains a section on how to reduce risk of fingerprinting. Websites needs explicit user permission to install on-device speech recognition language packs that do not match the user's preferred language or if the user is not on ethernet or Wi-Fi.
The group where the work on this specification is currently being done: Audio Community Group
The group where standardization of this work is intended to be done (if different from the current group): Audio Working Group
This work is being funded by: Google
You should also know that...
The primary risk of this new functionality is the potential for fingerprinting. To mitigate this risk, the Chrome Trust & Safety team proposes requiring explicit user consent to install language packs that do not match one of the user's preferred languages or if the user is not on a Ethernet/Wi-Fi network.
The existing Web Speech API has an outdated callback design which must be maintained due to backwards compatibility/interoperability issues. While Firefox doesn't officially support the speech recognition section of the Web Speech API, it has a unprefixed implementation behind a flag and most of the guides on how to use the Web Speech API do something like window.SpeechRecognition || window.webkitSpeechRecognition; (Examples from developer.mozilla.org, codeburst.io, dev.to) and there are 17.8K instances of this kind of usage on Github alone. The Audio Working Group is looking into potentially replacing this API with a new, modernized version under a different name. A separate TAG design review will be sent for that if the group decides to proceed with the new API.
OpenedJan 9, 2025
こんにちは TAG-さん!
I'm requesting a TAG review of on-device support for the Web Speech API.
This feature adds on-device speech recognition support to the Web Speech API, allowing websites to ensure that neither audio nor transcribed speech are sent to a third-party service for processing. Websites can query the availability of on-device speech recognition for specific languages, prompt users to install the necessary resources for on-device speech recognition, and choose between on-device or cloud-based speech recognition as needed.
2.2. Do features in your specification expose the minimum amount of information necessary to enable their intended uses? Yes. Some websites may have strict privacy requirements that require on-device speech recognition so websites must know if it's possible to ensure that neither audio nor captions are sent to a third-party service for processing.
2.6. Do the features in your specification expose information about the underlying platform to origins? While this feature does not directly expose information about the underlying platform, websites may potentially use performance metrics for on-device speech recognition to gauge general hardware capability.
2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections? Yes, the spec contains a section on how to reduce risk of fingerprinting. Websites needs explicit user permission to install on-device speech recognition language packs that do not match the user's preferred language or if the user is not on ethernet or Wi-Fi.
Further details:
You should also know that... The primary risk of this new functionality is the potential for fingerprinting. To mitigate this risk, the Chrome Trust & Safety team proposes requiring explicit user consent to install language packs that do not match one of the user's preferred languages or if the user is not on a Ethernet/Wi-Fi network.
The existing Web Speech API has an outdated callback design which must be maintained due to backwards compatibility/interoperability issues. While Firefox doesn't officially support the speech recognition section of the Web Speech API, it has a unprefixed implementation behind a flag and most of the guides on how to use the Web Speech API do something like window.SpeechRecognition || window.webkitSpeechRecognition; (Examples from developer.mozilla.org, codeburst.io, dev.to) and there are 17.8K instances of this kind of usage on Github alone. The Audio Working Group is looking into potentially replacing this API with a new, modernized version under a different name. A separate TAG design review will be sent for that if the group decides to proceed with the new API.