#843: Web Audio API: RenderCapacity API

Visit on Github.

Opened May 10, 2023

I'm requesting a TAG review of RenderCapacity API.

Generally, the Web Audio renderer’s performance is affected by the machine speed and the computational load of an audio graph. However, Web Audio API does not expose a way to monitor the computational load, and it leaves developers no options to detect glitches that are the essential part of UX in audio applications. Providing developers with a “glitch indicator” is getting more important because the scale of audio applications grows larger and more complex. (Developers have been asking for this feature since 2018.)

Further details:

  • [v] I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: 2023 Q2~Q3
  • The group where the work on this specification is currently being done: W3C Audio WG
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): N/A
  • Major unresolved issues with or opposition to this specification: N/A
  • This work is being funded by: N/A

We'd prefer the TAG provide feedback as (please delete all but the desired option): 💬 leave review feedback as a comment in this issue and @hoch @padenot

Discussions

2023-07-mos-eisley

Minutes

Overall looks good. Unsure what "load" means in this case; do they mean Unix loadavg? If so, that doesn't work on Windows. Tess commented to this effect.

2023-08-21

Minutes

https://github.com/w3ctag/design-reviews/issues/846

2023-09-04

Minutes

Sangwhan: multiple subsystems in the web platform... webrtc and webaudio... webaudio is carved out as its own island. Mainly because of the latency requirements... This proposal is trying to address this - that's great - but only for webaudio. WebGPU and WebRTC... Maybe we could discuss at TPAC because we have a necessity to solve this problem.

Dan: Noting multiple stakeholders - Google and Mozilla people involved.

Sangwhan: Converging worker and web audio...

2024-01-london

Minutes

Sangwhan: similar to compute pressure, but bound to audio worker. Exposes less information. All you know is given the peak compute capacity of the audio worker, where are you sitting when it comes to the amount of pressure, you get that through an event. A level of anonymity where... you could do a side channel from a different tab, but less rewarding than doing it through compute pressure. Use case proposed is to reduce the amount of compute time allocated in the event you are experiencing pressure on the audio worker. If you have high pressure on the audio worker and you're not adequately responding to that, audio glitches. I think it's unfortunate we're trying to introduce two different patterns for this, I've comment. A related proposal on webworkers.. so 3 conflictingish things related to compute capacity and quality of service guarantees. If we let all 3 ship we're going to have inconsistencies in the platform. COmmented but don't think there has been any progress made. I don't think audio wg will revisit this with a new design.

Martin: seems like it provides a lot of granular informationa bout the load on the machine that does escape the sandbox - dependant on what others are doing as much as on what this particular app is doing.

Sangwhan: correct, you can use it as a side channel, from a different origin.. most people have only one audio device so only one audio worker. Compute pressure has a much bigger problem in that sense. Doesn't mean this proposal has zero issues.

Martin: one of the nice things about an audio worker is it runs in a very high priority context, so it has access to very good timing information typically. You are less likely to be preempted by workers operating in other threads or processes, so it provides you a very clean source of timing signal. Not necessarily as much information about what the load is ordinarily.

Yves: the higher priority means you don't really know the real load of all the other things needed for the browser. That's why the compute pressure api can give completely different results from the web audio render capacity one because of that.

Sangwhan: definite characteristics specific to audio workers. Strong preference to have it not migrated across cpus. Cost to that. Quite likely pinned to the cpu

Yves: depends on architecture of the machine

Sangwhan: Valid use case. Don't like the fact we're reinventing things

Yves: it's measuring different things in different contexts. THe thing is, having the same kind of api would be good. Or a general api that sets to the context. Is the context high priority web audo things, or a web rendering thing, or something else?

Martin: alternative api, off the top of my head, if you have the ability to scale you can provide multiple implementions of your audio context and the browser can pick which one it execustes based on load. If one starts lagging it executes the one that's supposed to be faster

Yves: can this be abused to get some information?

Martin: absolutely

Yves: using the webaudio api no signal just to check what is the current load of the signal

Sangwhan: interesting appraoch.. would have the same problems.. when you switch from one version to another and it happens as a side effect from a different origin you're going to expose the same information..

Martin: not necessarily at the ame granularity

Sangwhan: wouldn't alleviate the problem of timer granularity guarantees, unless you start adding nosie which I think is a bad idea for an audio worklet. Taking a scalar and quantizing it to different buckets, so in that sense your'e reducing the amount of entropy exposed. I have bigger concerns about the inconsistencies across apis for mechanisms for providing such functionality, than privacy concerns.

Martin: explain further

Sangwhan: capacity event and compute pressure should be more consistent so there's interop, and the QoS api which takes a different approach to solve a different problem. It's a wob worker proposal so doesn't fit into this context - audio workers are less webby, this is more attached to the DOM. Maybe we can let that one slide

Amy: they responded to this with some reasons why.. Are we at a place we can close with concerns, or is there more work to be done?

Sangwhan: 20 buckets would be fine, maybe..

Martin: substantial number who say 'might'.. out of 31 people 12 said maybe.. not resounding support

Sangwhan: 100 buckets is as good as having no buckets

Yves: inconsistent responses.. 20 buckets nobody says inconsistent for my needs, but not for 100 buckets

Martin: this is one angle on the problem. Not sure this analysis is the right one to be applying in this context.

Sangwhan: maybe a way forward is to gate more granular information behind a permission and by default do it through buckets. Then we could potentially align the apis.

Amy: should we push on this? Or just close with concerns/unsatisfied?

Sangwhan: I'd want us to push - it's still malleable and they might be receiptive

drafts comment

Martin: I'd like to see some sort of analysis of what the leakage risk is. Arbritary criteria for number of buckets doesn't capture the privacy situation very well at all. What Yves was talking about before - high pri process with limited processing capacity, so less subject to compute pressure by things at a lower priority

Yves: if you see the number of buckets of .... of the signal there are studies that show that 2 buckets is enough to reconstruct the signal, you just need more time and more samples. Not really a big issue

Sangwhan: is it already not possible as of today?

Martin: that's the wrong line of argumentation to take. If it's already possible that's a vulnerability in the platform, not an excuse to not bother

Sangwhan: there has been work done by compute pressure, they've experimented with different buckets, tried to do a cross origin side channel communication. They have a proof of concept. How much time proportionate to the granularity of the bucket

Martin: that would be useful

Sangwhan: I can share that. How that works in an rt priority single threaded setup like this is unclear. I'd imagine similarly.

Martin: keep in mind that the individuals producing such analysis are motivated for it to produce a certain result. I'd be more confident if it were independant. If someone contracted an academic to try to break their stuff.

2024-01-london

Minutes

Dan: we need to re-assign ...

reassigned to Tess & Matthew

2024-03-18

Minutes

Matthew: i've done some work here.. but there's a new review request that came in 2 weeks ago - play-out statistics - that looks similar ...

Yves: would say

2024-03-25

Minutes

Matthew: I think they've answered most of the questions we asked. They seem to've moved closer to the Compute Pressure API appraoch (with buckets) since our review started. Would it be useful to share the info from DAS WG about side-channel attacks with the Web Audio group?

Martin: Seems reasonable. Doesn't say how many buckets.

Matthew: They're still deciding that.

Martin: Risk is that due to things like CPU throttling, attacker can learn about the nature of your machine. Though these things tend to be high latency. This is a low bandwidth channel between origins.

Martin: Important to remember that attacks only get better over time.

Martin: Mitigations seem OK. I think they only need a handful of buckets.

Tess: They did a developer survey that came up with the conclusion that 4 buckets isn't enough. They seem to want 10.

Martin: I think 10 may be OK based on those stats - though web sites will always want to know as much as possible.

Martin: Dropping some buckets (reduced resolution) below 50% capacity seems reasonable, as you aren't as interested in that zone.

Tess: Do you think 10 would be fine without a permission prompt?

Lea: I would worry this may contribute to permission prompt fatigue.

Martin: Could be OK without permission. If allowing more buckets with an open mic (for example), the number shouldn't be a huge amount more.

Matthew: that case (open mic permissions) is mentioned - but for quite a high extra number of buckets.

Lea: They do state they chose linear bucketed approach.

Martin: They wanted events for when it changes, which I think suggests pure buckets.

Lea: That's a little strange, though, as it depends on the number of buckets. I thought the resoultion of the buckets could change in future if they're not serving needs. Does events make this a problem?

Martin: No. If we ship with 5 buckets, and then go with 10, they'll get events with whatever the value is more often.

Lea: I'm not following.

Martin: If the value that's exposed is rounded, then if the code is "if load is below x, change to a lower/higher resolution thing" then it's fine.

Lea: Developers may make decisions about code they can run on an event depending on how frequently it fires.

Martin: Sure but that'd only change dramatically if you added a lot. I think this is probably acceptable in the sense that if you expected a low rate of events, but got a higher one, some adaptation may be needed, but I don't expect it will change that much (in which case some signal processing would be needed).

Lea: Does "update" make sense as an event name? I guess it's on render capacity.

Martin: It's OK.

Lea: I think "change" might be more compatible with other event names.

Matthew: How similar do we think this should be to Compute Pressure API? Same interface, different buckets OK?

Martin: Compute Pressure is about general compute resources, this one is more about the ability of an Audio thread to maintain performance.

Matthew: This seems consistent with the past discussion on this topic - sounds like we're OK with what they have.

Martin: Yes, this seems fine - though I would push for a lower number of buckets than what they have, as it can always be changed.

Lea: In that case we could suggest non-linear buckets.

Matthew: That's different to Compute Pressure (but OK if justified, as it seems, and documented).

Martin: Compute Pressure is labelled (nominal, fair, etc...), so different already.

Peter: Don't see a problem with non-linear, which might reduce entropy for some people. That might be better for privacy.

Lea: Beyond bucket sizes/boundaries, what's the DX? Starting/stopping listening. If you have code that draws the graph... where does this fit in? Continuous monitoring, or only some of the time?

Martin: I imagine with an event-based model, you listen and receive the event. Switch level based on threshold.

Lea: When do you decide to listen?

Martin: Always.

Lea: If you listen to it always, why do you have to opt in?

Martin: So the browser knows where the event to be delivered.

Lea: In that case, could the event be on the AudioContext? It seems that starting and stopping is adding friction. If it was performance intensive to listen to it, that's one thing, but if you listen consistently, this is boilerplate.

Peter: [ scribe missed this; sorry! ]

Lea: I thought this is basically throttling.

Martin: I understand this is a window over which it's averaged ("over a period of half a second, tell me what the load was") - in which case having start and stop makes sense.

Lea: There could be a property that sets the granularity.

Lea: Meta point: this is why we have labels in the new process for these things - we need to ensure that API design issues (for example) are covered, as well as security and privacy (etc.)

Matthew: How to handle >100% with buckets?

Martin: There was a counter proposed (which presents a possible side channel) but if you just report "over 100%" then you're in a pretty good position.

Matthew: There was discussion that the capacity can be computed now (with timers) - should we be concerned about that?

Peter: It's not clear why they went with the event-based approach over polling, from the explainer doc.

Peter: event name "change" works for events fired when it changes but if it's on an interval, then "update" is more apt.

Martin: +1 (as above)

Lea: We should have a principle on this (interval vs changes)

Peter: +1

Peter: Also on this one I think it shoudl be a change event (only fired when it changes). When you start observing, you could say how much of a change you want to trigger a firing of the event.

<blockquote> Hello there! We looked at this today during a breakout.

Other TAG members will comment on other parts of the proposal, but we had some questions wrt API design. We need to better understand how this API fits in to the general use cases where it will be used. Currently, the explainer includes a snippet of code showing this in isolation, where it is modifying parameters in the abstract. What is the scope of starting and stopping this kind of monitoring for the use cases listed? Are authors expected to listen continiously or sample short periods of time (because monitoring is expensive)?

If they are expected to listen continuously, then are start() and stop() methods required? If the only purpose of these is to set the update interval, that could be a property directly on AudioContext (in which case the event would be on AudioContext as well and would be named in a more specific way, e.g. rendercapacitychange).

We were also unsure what the update interval does exactly. Does it essentially throttle the event so you can never get more than one event per that period? Does it set the period over which the load is aggregated? Both? Can you get multiple update events without the load actually changing?

Lastly, as a very minor point, change is a far more widespread naming convention for events compared to update, see https://cdpn.io/pen/debug/dyvGYoV update makes more sense if the event fires every updateInterval regardless of whether there was a change, but it produces better DX to only fire the event when the value has actually changed so that every invocation is meaningful.

</blockquote> <blockquote> We think that the general approach to managing side-channel risk is acceptable.

Overall, fewer buckets would be preferable; at most 10, though preferably 5. Though surveys indicate that some number of sites would be unhappy with fewer than 10 buckets, there is an opportunity to revise the number of buckets over time based on feedback on use. Increasing the number of buckets should be feasible without affecting site compatibility. Starting with a more private default is the conservative option. Increasing resolution carries a small risk in that change events are more likely to occur more often (see API design feedback).

More detail is ultimately necessary to understand the design:

  1. Is hysteresis involved?
  2. Is the reported value a maximum/average/percentile?
  3. What happens when the load exceeds 100%?
</blockquote>
2024-04-22

Minutes

Still pending feedback.

2024-07-01

Minutes

Pinged proponents.