#939: Early TAG review request for Playout Statistics API for WebAudio

Visit on Github.

Opened Mar 7, 2024

こんにちは TAG-さん!

I'm requesting a TAG review of Playout Statistics API for WebAudio.

There is currently no way to detect whether WebAudio playout has glitches (gaps in the played audio, which typically happens due to underperformance in the audio pipeline). There is an existing way to measure the instantaneous playout latency using AudioContext.outputLatency, but no simple way to measure average/minimum/maximum latency over time. With this API, we want to propose a way to be able to measure the delay of that audio and the glitchiness of the audio.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done ("unknown" if not known): W3C Audio Working Group
  • Existing major pieces of multi-stakeholder review or discussion of this design:
  • Major unresolved issues with or opposition to this design:
  • This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify Hernqvist, palak8669

Discussions

2024-03-25

Minutes

Matthew: Seems similar to the above - though more focused on playback, and using an interface established in RTC.

Lea: Let's ask on both reviews for a comparison between the two proposals?

Matthew: +1

Also general agreement

Peter: [ scribe missed this; sorry again! This was I think about contrasting polling vs event approaches ]

Peter: Whom should post what?

NOTE: Editing comments in above issue's entry.

Tess: Martin for privacy part. Should be clearer about saying that 10 is too many, and we can increase it later if need be.

Martin: I think 5 would be fine. Though it won't be enough for some people, it's a good place to start.

Peter: If non-linear would it be easy to deal with fewer buckets?

Martin: I think once you're down as low as 5 you're probably fine. You want to know if you can adapt up, or down. Could argue for higher res at both ends.

Peter: Could get a value that tells you how much you need to move to get to the next bucket.

Martin: drops non-linear comment

Martin: Not keen on changing number of buckets if mic is open.

Tess: +1

Lea: I posted on #843 and asked about the differences between that and this one. Foreshadowed more comments to come on #843.

Peter: Anyone want to post similar on #939?

Martin: I would go further and ask for stronger justification for #939.

Martin: about to post on #939

2024-04-29

Minutes

General discussion involving wondering about why this needs to be a distinct API from RenderCapacity (maybe to allow site owners to make changes to content, rather than on-the-fly adjustments to graphs, but still seems similar).

Regarding whether this provides more/less info: discrepancy seems to be whether one's talking from perspective of CPU or Audio load specifically.

Side channel: ACK - what can it be used for? (Could it be used for general communication, like the pre-mitigation Compute Pressure API one?) Could 'blank' audio be used as part of an attack?

Yves: No signals from other implementers.

Group notes that this work originates in a CG; the RenderCapacity work is being done in WG

Peter: Seems clear this is about stats vs the other work is about adaptivity. We asked in the other review about why it's sending events. Seems worth seeing if the two can be combined.

<blockquote> We're trying to understand the difference between this and #843 - this seems to be geared towards statistics, whereas RenderCapacity seems more about adaptivity, but they both seem to be giving similar information - have you explored whether they could be combined?

We'd like to see some further documentation around the abuse cases relating to the side channel that you describe.

Do you have any information on the position of other implementers on this proposal?

</blockquote>
2024-06-24

Minutes

Matthew: one of the two where we had 2 similar APIs to solve similar problems... We did hear back from them...

Dan: what does cooperating mean?

Matthew: means "bad guys"...

Dan: right...

Matthew: There was a similar kind of attack for the compute pressure API - and they had a review from PiNG - they did some mitigations in the implementations... Anssi or Kenneth could point them to the details. https://github.com/w3cping/privacy-request/issues/113 I'd like this team to look at that. In particular https://github.com/w3c/compute-pressure/issues/197

Matthew: I think they have made a good argument that this is a different problem space & therefore different API. I think Lea, Tess & Peter have commented before... Let's chat more about it at the plenary...

Matthew: it's in WICG currently..

Dan: they think it will go to the Audio WG...

Matthew: they are clear on what the differences are - we need to establish TAG consensus that we agree...

bump to plenary

Matthew posted a comment about the potentially similar side channel mitigations in Compute Pressure API in the meantime

2024-07-01

Minutes

@matatk probably needs to close this one, in case we missed something.

2024-07-01

Minutes

Matthew: we got info from how it's different to RenderCapacity... I left a note about the side channel thing. With respect to how related the APIs are we didn't comment. Web Audio working group said.

Yves: [from chat] I was told it was not controversial in the Audio WG, it is linked to several (closed) issues in the WG, and they are just incubing it in WICG, see https://github.com/WebAudio/web-audio-api/issues/2563 , https://github.com/WebAudio/web-audio-api/issues/2444 and https://github.com/WebAudio/web-audio-api/issues/373 They are probably OK with it. The draft is being worked on in WICG prior to bringing to the working group.

Matthew: So it comes back to does TAG think this is different?

Dan: worth noting that this is a good use of WICG considering that the working group is aware of it and it's being incubated in WICG in advance of being added to the charter...

we review discussion in last week's plenary

Matthew: we could ask them about the idea that one could be built on the other... for anyone reading the explainer it would be helpful to contrast ... I can put something along those lines...

2024-07-seattle

Minutes

Issues raised:

  • Lea: Impossible to tell if this is a good design without use cases. Use cases listed are very low level "we need to get statistic X". But what will you do with it?? E.g. sending the stats to a server for future analysis vs adapting the playback somehow are markedly different use cases.
  • Lea: Web Audio is not the only API that playback glitchiness is relevant to. Would e.g. HTMLMediaElement benefit from this too? How would catering to video glitchiness change the shape of the API?
    • Peter: apparently an AudioContext can be created from Video and Audio elements, so if the stats are desired they can be gotten with some degree of plumbing.
    • Lea: Yes but glitchiness is not just about audio. Not ideal if we have a whole different API about video glitchiness.
<blockquote>

NOTE FROM MATTHEW BEFORE SENDING: did we agree to also ask the wider quesition? If not, the 2nd para can go. Will check up on this after lunch :-).

Hi @Hernqvist, we noticed that the RenderCapacity API isn't going ahead at the moment - do you know about the implementation barriers encountered there, and do you anticipate similar implementation challenges here too?

Further, we note that other types of media can be affected by glitchy playback, too (e.g. video, animation) - have you considered whether there are any parallels between this work and those other domains?

</blockquote>
2024-08-12

Minutes

Matthew: they respinded... with information. We asked for the explainer to be update to contrast this spec with other one - render capacity. They did update the explainer. They added info on the side channel stuff. Looked at rate limitations mitigations... We've raised it with them, they are looking into it. With respect to the difference between the 2 APIs, they've added something that addresses that. So they are different APIs - although could have the same impact (reduce complexity or ask user to close other apps to free up resources). So they've done what we've asked and documented it fairly clearly. I think it'd be good to get others' views. But render capacity is on pause. So it's not like we have these 2 things active at the same time.

Dan: What's the multi-implementer story?

Matthew: according to chrome status no info from anyone... but we established they are talking with the audio working group.

Dan: if we return a positive review we should couch it in terms of "yes, please take this to the audio working group."

Matthew: I think if Tess also thinks they are different enough then we should give it a positive review.

Dan: sets to proposed close let's talk about it at the plenary.

2024-08-26

Minutes

Tess: I will look at the revised explainer

Dan: and we can wait until Matthew is back...

2024-09-09

Minutes

Tess: both martin and matthew had thoughts and I think they were answered satisfactorily... I think we'd like a better explainer... I'd be unformfortale with closing without matthew and martin...

Dan: let's try to close at plenary.

2024-09-16

Minutes

Matthew: they replied to our questions.. added 2 questions to the explainer... I fully read them. The main thing we were concerned was relationship to RenderCapacity API... It's slightly moot because that one has been shelved... They did say how it compares. Both of these proposals will tell you about audio glitches but only this one will tell you about load issues with audio pipeline as a whole... THis makes it more secure (they say). They are aware of the side-channel attack and have proposed some mitigations... Proposed to use some kind of rate limiting thing. It seems reasonable to me - but Tess might have a more expert opinion.

Peter: ... side channel being audible to the user ... not necessarily true... I don't think it changes much. Mitigation is probably still fine.

Dan: Multi-implementer story?

Jeffrey: they haven't requested reviews from other engines... still early...

Dan: think this should be part of our feedback...

Matthew: do we think it's solid enough to take the audio wg? Seems like our answer is yes. I just am not sure on whether the mitigations are sufficient... They've descried in the explainer... they want to use something similar to compute pressure...

Jeffrey: I think we should send them to the PING.

Dan: +1

Matthew: agree. they did fill in the privacy & security questionnaire.. They did it as a google doc... https://docs.google.com/document/d/1wGv_mr7Lgg2w-6PuKDrcScoa8IvYAOW3PMTFW85O3Gw/edit#heading=h.d6giaiodx3q3

Peter: I think it's fine to close... as satisfied and take to WA wg and loop in PING.

Tess: +1

<blockquote> Thanks for the extra info you added to the explainer. We don't see any architectural problems with this. This should go to the web audio working group for further work. Please also talk to the PING group regarding the risk mitigation. </blockquote>

Matthew leaves the closing comment