#214: Dusting off Web Speech API?
Discussions
2018-03-27
Travis: I finally watched one of Brian's presentations linked from the issue. I'm both happy we can do speech synthesis on the web (it's cool) and saddened by the shape of the API we have. It seems simple and straightforward and first, then you get into it, there's a lot of magic happening, it's very hig-level ... wild-west in terms of having consistency between browsers. Basic controls for pitch, volume, speed, but no way to connect to the rest of the platform.
Alex: Can you take the output and pipe it into web audio.
Travis: Zero connection between this API and the rest of the platform; pipes right to the speakers. It's a rich area for refinement that way. Honestly if a revised version were implemented in experiment, or a new shape we could expose, it would be fantastic and I'd love to see somethhing like that move forwad. In terms of what to do next... we have some folks in a group at MS looking at the speech APIS. Might be they have an interest in having more expressive power or pushing a few more requireents into the existing APIS, but if that's the case it might be enough to gather some interest among rest of browser vendors and start design work in WICG, which is I think where I'd like to see anything get started if we were to pick this up again. I don't know that there's a future for a review of the existing APIs... worthwhile to document what we'd want to see in a future API, but assuming nobody's jumping the gun to implement a new thing, that would be where this ends.
Peter: Speech is one of those things where every so often a proposal pops out, but then dies. Is lack of implementation due to lack of a good design yet, or lack of interest in implementation?
Travis: I'd love to chat with folks I've heard are interested in this ... API seems to be meeting their needs, which aren't complicated.
Peter: shipped?
Travis: text-to-speech part is nearly universally implemented, interoperable-ish. speech-to-text (SpeechRecognition object) not as well deployed.
Peter: data on usage?
Alex: I can try to find.
Travis: I'd love to have folks from web audio group involved in design discussions. High correlation between their expertise and what a community group would need to think about. Thinking about a voice node you could plug in to the audio graph.
Alex: Or a source like for media streams or media tracks like with the various audio and video elements.
Peter: No current working group looking at this?
Sangwhan: right.
Alex: tens of millions of users per day of text-to-speech feature.
...
Alex: Seems low urgency, but seems like we should do the review so somebody later has something to build off of.
Travis: I'm happy to summarize, recap the high and low level issues I've seen so far and write it down in the issue. Reaching out to some folks and finding if this is meeting their needs, or if there's interest in design work on a v2.
Peter: Milestone farther out?
OpenedNov 7, 2017
Hello TAG!
In a side-conversation with Brian Kardell, he suggested we (the TAG) take another look at the [old] Speech API. It turns out to be partially implemented in all browsers, but this spec seems to have never graduated to a REC.
I'm raising this issue for the TAG to think about whether we want to get involved in doing anything about this feature or not. This feature predates the extensible web manifesto, and is thus pretty high-level, but has a variety of interop problems.
Further details (optional):
You should also know that...
Brian has done a series on this API on his site. Here's the relevant links:
[ Edited by @dbaron to change http: links to https: per @bkardell's request ]