#939: Early TAG review request for Playout Statistics API for WebAudio

Visit on Github.

Opened Mar 7, 2024

こんにちは TAG-さん!

I'm requesting a TAG review of Playout Statistics API for WebAudio.

There is currently no way to detect whether WebAudio playout has glitches (gaps in the played audio, which typically happens due to underperformance in the audio pipeline). There is an existing way to measure the instantaneous playout latency using AudioContext.outputLatency, but no simple way to measure average/minimum/maximum latency over time. With this API, we want to propose a way to be able to measure the delay of that audio and the glitchiness of the audio.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): WICG
  • The group where standardization of this work is intended to be done ("unknown" if not known): W3C Audio Working Group
  • Existing major pieces of multi-stakeholder review or discussion of this design:
  • Major unresolved issues with or opposition to this design:
  • This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify Hernqvist, palak8669

Discussions

Discussed Mar 25, 2024 (See Github)

Matthew: Seems similar to the above - though more focused on playback, and using an interface established in RTC.

Lea: Let's ask on both reviews for a comparison between the two proposals?

Matthew: +1

Also general agreement

Peter: [ scribe missed this; sorry again! This was I think about contrasting polling vs event approaches ]

Peter: Whom should post what?

NOTE: Editing comments in above issue's entry.

Tess: Martin for privacy part. Should be clearer about saying that 10 is too many, and we can increase it later if need be.

Martin: I think 5 would be fine. Though it won't be enough for some people, it's a good place to start.

Peter: If non-linear would it be easy to deal with fewer buckets?

Martin: I think once you're down as low as 5 you're probably fine. You want to know if you can adapt up, or down. Could argue for higher res at both ends.

Peter: Could get a value that tells you how much you need to move to get to the next bucket.

Martin: drops non-linear comment

Martin: Not keen on changing number of buckets if mic is open.

Tess: +1

Lea: I posted on #843 and asked about the differences between that and this one. Foreshadowed more comments to come on #843.

Peter: Anyone want to post similar on #939?

Martin: I would go further and ask for stronger justification for #939.

Martin: about to post on #939

Comment by @martinthomson Mar 25, 2024 (See Github)

Hi @Hernqvist, we're looking at #843 (Audio Render Capacity) and are trying to understand how this approach fits with that. In particular, I'm interested in how this approach deals with the compute side channel leaks. On face value, the information that this API proposes is higher fidelity, if post-hoc, which carries some amount of risk.

Comment by @Hernqvist Mar 27, 2024 (See Github)

Hi @martinthomson! We believe that this proposed API gives less detailed information about CPU load than #843 (Audio Render Capacity). Since audio should always run on realtime threads, general high CPU load should not impact audio. When CPU load is high enough to impact realtime threads to the point of causing glitches, the Render Capacity API should already show underruns and very high load.

As for the risk of side channeling, we have investigated this risk more closely since filing the issue. We have found that the API we are proposing is not necessary to set up an audio glitch-based side channel. One way to set up this side channel using only currently implemented APIs is the following:

  • Website 1 creates multiple computationally expensive AudioContexts. I tested this with 3 AudioContexts that each contained one OscillatorNode and 4000 BiquadFilterNodes.
  • Website 2 creates a single AudioContext with a ScriptProcessorNode that measures the time between consecutive calls to process.
  • Website 2 will observe longer maximum intervals between AudioProcessingEvents while Website 1 is active. Website 1 can now send messages to Website 2 by creating and stopping AudioContexts.

I have tested this scheme (on Mac) on Chrome, Safari and Firefox and it works on all of them (though you might have to change the number of AudioContexts or Nodes a bit depending on browser). So this side channel already exists, and we do not believe it is made worse by the proposed API.

Discussed Apr 29, 2024 (See Github)

General discussion involving wondering about why this needs to be a distinct API from RenderCapacity (maybe to allow site owners to make changes to content, rather than on-the-fly adjustments to graphs, but still seems similar).

Regarding whether this provides more/less info: discrepancy seems to be whether one's talking from perspective of CPU or Audio load specifically.

Side channel: ACK - what can it be used for? (Could it be used for general communication, like the pre-mitigation Compute Pressure API one?) Could 'blank' audio be used as part of an attack?

Yves: No signals from other implementers.

Group notes that this work originates in a CG; the RenderCapacity work is being done in WG

Peter: Seems clear this is about stats vs the other work is about adaptivity. We asked in the other review about why it's sending events. Seems worth seeing if the two can be combined.

<blockquote> We're trying to understand the difference between this and #843 - this seems to be geared towards statistics, whereas RenderCapacity seems more about adaptivity, but they both seem to be giving similar information - have you explored whether they could be combined?

We'd like to see some further documentation around the abuse cases relating to the side channel that you describe.

Do you have any information on the position of other implementers on this proposal?

</blockquote>
Comment by @matatk Apr 29, 2024 (See Github)

We're trying to understand the difference between this and #843 - this seems to be geared towards statistics, whereas RenderCapacity seems more about adaptivity, but they both seem to be giving similar information - have you explored whether they could be combined?

We'd like to see some further documentation around the abuse cases relating to the side channel that you describe.

Do you have any information on the position of other implementers on this proposal?

Comment by @Hernqvist May 15, 2024 (See Github)

Hi @matatk! The RenderCapacity API (#843) and the Playout Statistics API serve different information and for different purposes. As you say, the RenderCapacity API is about seeing the trend of CPU usage and adapting the playout graph as needed. It also exposes one kind of audio glitch through underrunRatio, which happens when the WebAudio graph is too complex for the CPU to deliver audio on time.

The Playout Statistics API serves information about the end-user audio experience, which includes glitches from the whole audio pipeline and end-to-end latency. These general audio glitches can happen for many different reasons including:

  • High cpu usage to the point where it affects realtime threads
  • Platform audio layer/driver glitches
  • Web applications failing to deliver audio on time (due to slow WebAudio graphs, for example)

This means that the Playout Statistics API gives much less information about CPU usage than the RenderCapacity API, because RenderCapacity lets us know about gradual levels of CPU usage, while the Playout Statistics API only lets us know if there are actual audio glitches, one possible reason for which is critical CPU usage. This critical CPU usage is already detectable in other ways, such as delayed audio callbacks.

If we combined the APIs, then the Playout Statistics API would also have to give RenderCapacity information. If it turns out that the RenderCapacity API is too invasive, or that we want to guard the RenderCapacity API behind some additional permission, then it's better that the APIs are separate.

To clarify about the side channel: The preexisting side channel we mentioned only allows cooperating sites to communicate with each other with low precision and high latency, and only if they are open at the same time on the same device. The idea is that if one site creates critical amounts of cpu pressure on realtime threads, this can be observed by other sites in a variety of ways, including delayed audio callbacks and audio glitches. This method does not, however, allow anyone to gain information about other processes that are not actively cooperating. CPU overload can be detected using glitches, but the glitches could also mean other things like platform layer glitches or that the WebAudio graph is slow. We therefore believe that this doesn't create any real user risk, and we believe that we would not add any new risk with the introduction of this API.

We have not talked to other implementers about this proposal, we would like to pass early TAG review first.

Discussed Jun 24, 2024 (See Github)

Matthew: one of the two where we had 2 similar APIs to solve similar problems... We did hear back from them...

Dan: what does cooperating mean?

Matthew: means "bad guys"...

Dan: right...

Matthew: There was a similar kind of attack for the compute pressure API - and they had a review from PiNG - they did some mitigations in the implementations... Anssi or Kenneth could point them to the details. https://github.com/w3cping/privacy-request/issues/113 I'd like this team to look at that. In particular https://github.com/w3c/compute-pressure/issues/197

Matthew: I think they have made a good argument that this is a different problem space & therefore different API. I think Lea, Tess & Peter have commented before... Let's chat more about it at the plenary...

Matthew: it's in WICG currently..

Dan: they think it will go to the Audio WG...

Matthew: they are clear on what the differences are - we need to establish TAG consensus that we agree...

bump to plenary

Matthew posted a comment about the potentially similar side channel mitigations in Compute Pressure API in the meantime

Comment by @matatk Jun 24, 2024 (See Github)

Thank you for all the info in your last comment, @Hernqvist, and sorry for my slow reply. We are discussing the spec as a whole still, but for now I have a pointer regarding the side channel that may be relevant.

The Compute Pressure API had a similar potential side channel, and some mitigations were advised, following a PING review; here are the details: w3c/compute-pressure#197

Discussed Jul 1, 2024 (See Github)

@matatk probably needs to close this one, in case we missed something.

Discussed Jul 1, 2024 (See Github)

Matthew: we got info from how it's different to RenderCapacity... I left a note about the side channel thing. With respect to how related the APIs are we didn't comment. Web Audio working group said.

Yves: [from chat] I was told it was not controversial in the Audio WG, it is linked to several (closed) issues in the WG, and they are just incubing it in WICG, see https://github.com/WebAudio/web-audio-api/issues/2563 , https://github.com/WebAudio/web-audio-api/issues/2444 and https://github.com/WebAudio/web-audio-api/issues/373 They are probably OK with it. The draft is being worked on in WICG prior to bringing to the working group.

Matthew: So it comes back to does TAG think this is different?

Dan: worth noting that this is a good use of WICG considering that the working group is aware of it and it's being incubated in WICG in advance of being added to the charter...

we review discussion in last week's plenary

Matthew: we could ask them about the idea that one could be built on the other... for anyone reading the explainer it would be helpful to contrast ... I can put something along those lines...

Discussed Jul 1, 2024 (See Github)

Issues raised:

  • Lea: Impossible to tell if this is a good design without use cases. Use cases listed are very low level "we need to get statistic X". But what will you do with it?? E.g. sending the stats to a server for future analysis vs adapting the playback somehow are markedly different use cases.
  • Lea: Web Audio is not the only API that playback glitchiness is relevant to. Would e.g. HTMLMediaElement benefit from this too? How would catering to video glitchiness change the shape of the API?
    • Peter: apparently an AudioContext can be created from Video and Audio elements, so if the stats are desired they can be gotten with some degree of plumbing.
    • Lea: Yes but glitchiness is not just about audio. Not ideal if we have a whole different API about video glitchiness.
<blockquote>

NOTE FROM MATTHEW BEFORE SENDING: did we agree to also ask the wider quesition? If not, the 2nd para can go. Will check up on this after lunch :-).

Hi @Hernqvist, we noticed that the RenderCapacity API isn't going ahead at the moment - do you know about the implementation barriers encountered there, and do you anticipate similar implementation challenges here too?

Further, we note that other types of media can be affected by glitchy playback, too (e.g. video, animation) - have you considered whether there are any parallels between this work and those other domains?

</blockquote>
Comment by @martinthomson Jul 1, 2024 (See Github)

@Hernqvist, that piece you have here about side channels probably belongs in the explainer. Having it here is helpful, but this issue will be closed and therefore hard to find. (The same is possibly true for some of the other notes here.)

Comment by @matatk Jul 5, 2024 (See Github)

Another request regarding the explainer: we were discussing your work again this week, and were wondering if you could add a section to the explainer to contrast this spec (e.g. in terms of the scope, when would one use it) with the RenderCapacity API? Thanks!

Comment by @Hernqvist Aug 2, 2024 (See Github)

Hi @martinthomson and @matatk ! Sorry for the slow reply. We have updated the explainer with the main points about the side channel and a comparison to the RenderCapacity API, as requested.

@matatk thank you for the pointer to the Compute Pressure API discussion. We looked at the Compute Pressure mitigations, and we think that the Rate Limitation mitigation could be applied here, which would mitigate the risk that the (preexisting) glitch based side channel is made worse.

Discussed Aug 12, 2024 (See Github)

Matthew: they respinded... with information. We asked for the explainer to be update to contrast this spec with other one - render capacity. They did update the explainer. They added info on the side channel stuff. Looked at rate limitations mitigations... We've raised it with them, they are looking into it. With respect to the difference between the 2 APIs, they've added something that addresses that. So they are different APIs - although could have the same impact (reduce complexity or ask user to close other apps to free up resources). So they've done what we've asked and documented it fairly clearly. I think it'd be good to get others' views. But render capacity is on pause. So it's not like we have these 2 things active at the same time.

Dan: What's the multi-implementer story?

Matthew: according to chrome status no info from anyone... but we established they are talking with the audio working group.

Dan: if we return a positive review we should couch it in terms of "yes, please take this to the audio working group."

Matthew: I think if Tess also thinks they are different enough then we should give it a positive review.

Dan: sets to proposed close let's talk about it at the plenary.

Discussed Aug 26, 2024 (See Github)

Tess: I will look at the revised explainer

Dan: and we can wait until Matthew is back...

Discussed Sep 9, 2024 (See Github)

Tess: both martin and matthew had thoughts and I think they were answered satisfactorily... I think we'd like a better explainer... I'd be unformfortale with closing without matthew and martin...

Dan: let's try to close at plenary.

Discussed Sep 16, 2024 (See Github)

Matthew: they replied to our questions.. added 2 questions to the explainer... I fully read them. The main thing we were concerned was relationship to RenderCapacity API... It's slightly moot because that one has been shelved... They did say how it compares. Both of these proposals will tell you about audio glitches but only this one will tell you about load issues with audio pipeline as a whole... THis makes it more secure (they say). They are aware of the side-channel attack and have proposed some mitigations... Proposed to use some kind of rate limiting thing. It seems reasonable to me - but Tess might have a more expert opinion.

Peter: ... side channel being audible to the user ... not necessarily true... I don't think it changes much. Mitigation is probably still fine.

Dan: Multi-implementer story?

Jeffrey: they haven't requested reviews from other engines... still early...

Dan: think this should be part of our feedback...

Matthew: do we think it's solid enough to take the audio wg? Seems like our answer is yes. I just am not sure on whether the mitigations are sufficient... They've descried in the explainer... they want to use something similar to compute pressure...

Jeffrey: I think we should send them to the PING.

Dan: +1

Matthew: agree. they did fill in the privacy & security questionnaire.. They did it as a google doc... https://docs.google.com/document/d/1wGv_mr7Lgg2w-6PuKDrcScoa8IvYAOW3PMTFW85O3Gw/edit#heading=h.d6giaiodx3q3

Peter: I think it's fine to close... as satisfied and take to WA wg and loop in PING.

Tess: +1

<blockquote> Thanks for the extra info you added to the explainer. We don't see any architectural problems with this. This should go to the web audio working group for further work. Please also talk to the PING group regarding the risk mitigation. </blockquote>

Matthew leaves the closing comment

Comment by @matatk Sep 16, 2024 (See Github)

Thanks for the extra info you added to the explainer. We don't see any architectural problems with this. This should go to the web audio working group for further work. Please also talk to the PING group regarding the risk mitigation.