design-reviews#911: Long Animation Frame API (LoAF)

#911: Long Animation Frame API (LoAF)

Opened Oct 24, 2023

こんにちは TAG-さん!

I'm requesting a TAG review of Long Animation Frame API.

LoAF an API that lets a webpage measure long periods of time where the main thread was busy/congested, resulting in sluggishness. It also adds information that helps understand what caused that busy period and act on it.

Explainer: https://github.com/w3c/longtasks/blob/main/loaf-explainer.md
Specification: https://w3c.github.io/longtasks/
User research: [url to public summary/results of research]
Security and Privacy self-review²: https://docs.google.com/document/d/1rsaghOOGlTKyzFHR9d_cIRLpZoik3O_S2yYh_oq_jAM/edit?usp=sharing
GitHub repo (if you prefer feedback filed there): https://github.com/w3c/longtaskד
Primary contacts (and their relationship to the specification):
- Noam Rosenthal (@noamr), Google
- Yoav Weiss (@yoavweiss), Google, Group chair
- Michal Mocny (@mmocny), Google
Organization/project driving the design: Google
External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/6118675067699200

Further details:

I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): WebPerfWG
The group where standardization of this work is intended to be done ("unknown" if not known): WebPerfWG
Existing major pieces of multi-stakeholder review or discussion of this design: TBD
Major unresolved issues with or opposition to this design: Not yet...
This work is being funded by: Google

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify @noamr

Discussions

Discussed Nov 20, 2023 (See Github)

Rossen: implementability... my top level understanding is - the authors want to introduce a way for developers to get better details of the user experience... through a set of events they can get back .. in order to help with long running animations. the way they are proposing - having a set of events that are annotating each part of the render pipeline... when you start layout; end the layout; start rendering; end rendering; style, etc... at a high level you can say every browser has these stages - but if you expose this level ... how do you correlate the actual triggers for that long running animation? Maybe there was a user action that invalidated a layout rule? If I go back to the motivation ... to better explain "why" the "sluggishness" is happening. Not able to connect the dots... Maybe need to [read it deeper]. It also overlaps with event timing .. and pre-existing tech.

Dan: is there a cross-browser issue here? Would this only apply to chromium and not gekko, for example?

Rossen: possibly - i know that there are differences... i don't know if by exposing this people will only optimise for chromium ... this isn't clear to me.

Peter: i don't see anything in LoAF that would be problematic for gekko....

Rossen: might see more cross-origin information caused by user inetraction... hover an ad, ad calls "bounding client rect" - now the user can be targetted more..

Dan: that triggers a privacy alarm

Rossen: they do delve into this in the privacy & security consideration section... available to all of the same agent windows -- global state exposure, which is peculiar.

rossen to formulate further response

Comment by @atanassov Nov 21, 2023 (See Github)

Hi @noamr, thank you for proposing this idea and running it through TAG early on. Helping authors remove as much "sluggishness" as possible is always great to see.

Given the deep level of browser integration I am curious to hear your thoughts on having this interoperable across UAs. If I read the proposal correctly, you're proposing lots of lower-level details to be exposed - stages of the rendering pipeline, threading model of the browser, and additional data that might help in correlating all of this into a user action, i.e. they triggered style recalc due to getBoundingClientRect call etc.
The above differs a lot between implementations and I'm concerned that micro-optimizations for one browser will tip other browsers into the opposite direction. Have you discussed this with other browser vendors?

In order to make such feature robust I am assuming you envision the ability to construct call graphs. If so, how do you envision event trigger correlation - e.g. getBoundingClientRect >> style load >> style parse >> style recalc etc.

Reading the S&P statements in the document and those in the template there seem to be some inconsistencies. For example, in the explainer you say

LoAF might expose rendering information for a particular document tree that may be cross-origin (same-agent).

while the S&P doc says

A cross-origin iframe only receives timing information for itself.

To me this reads as a bit of contradiction, i.e. a top-level document (an iframe) can receive a cross-origin information from another document in its tree.

Further, this information is to be exposed to all windows of the UA making it more of a global exposure. I'm probably misreading your points and if not, can you help me alleviate this concern?

Comment by @noamr Nov 21, 2023 (See Github)

Hi @noamr, thank you for proposing this idea and running it through TAG early on. Helping authors remove as much "sluggishness" as possible is always great to see.

Thanks for taking a look!

Given the deep level of browser integration I am curious to hear your thoughts on having this interoperable across UAs. If I read the proposal correctly, you're proposing lots of lower-level details to be exposed - stages of the rendering pipeline, threading model of the browser, and additional data that might help in correlating all of this into a user action, i.e. they triggered style recalc due to getBoundingClientRect call etc. The above differs a lot between implementations and I'm concerned that micro-optimizations for one browser will tip other browsers into the opposite direction. Have you discussed this with other browser vendors?

We have discussed some of this with other browser vendors, and the conversation goes on. The major details in the spec expose things that are well specified and already observable like the stages in the rendering pipeline. We're making sure that whatever is in the spec matches those existing concepts. Other things that might not be interoperable can be optional and in any case I'll be happy to discuss them with the other vendors.

In order to make such feature robust I am assuming you envision the ability to construct call graphs. If so, how do you envision event trigger correlation - e.g. getBoundingClientRect >> style load >> style parse >> style recalc etc.

We don't expose things to that level - only the total forced style and layout time of a whole script runtime. Still, we'll have to work out interoperability and I see this particular attribute as optional in the spec.

Reading the S&P statements in the document and those in the template there seem to be some inconsistencies. For example, in the explainer you say

LoAF might expose rendering information for a particular document tree that may be cross-origin (same-agent).

while the S&P doc says

A cross-origin iframe only receives timing information for itself.

To me this reads as a bit of contradiction, i.e. a top-level document (an iframe) can receive a cross-origin information from another document in its tree.

Further, this information is to be exposed to all windows of the UA making it more of a global exposure. I'm probably misreading your points and if not, can you help me alleviate this concern?

What those lines mean, and I will clarify in the spec, is that the timing itself exposes cross-origin same-agent by design even without this API. You can add timeouts and rAFs to a page, and see if you get delays which are likely due to other same-agent pages doing janky stuff. The API doesn't expose anything other than those delays, and you don't get visibility as to whether they're from other same-agent documents or from general browser slowness or what not.

Where we send the info to is to frames that participate in the LoAF in one of the following ways:

generated a long task that didn't end up rendering, will get its own non-rendering LoAF
its rendering was updated in this LoAF, meaning that you could sample the delays yourself with rAF/ResizeObserver etc.

We never send script or blocking-duration information to cross-origin frames, only delays that they could otherwise sample themselves. Does this answer your question?

Discussed Nov 27, 2023 (See Github)

Rossen: we had a response. There are standards positions.. Good that it's a really early review

Yves: they did request a position from Moz. [and webkit]

Rossen: Still concerned - we are exposing delays and overall sluggishness caused through long running scripts or animation. One of the arguments is that you can observe today cross-origin cross-frame sluggishness by composing fairly well organised rafs on the page and observing them and concluding that the delays may or may not be caused by long running animation or script.. what they're doing here is exposing them and making them even easier to observe and gather and aggregate. So the fact we have this observability cross origin for the same agent doesn't mean we should make it easier. Making it easier cross origin is taking an unwanted pattern and making it easier. Imaginging worse case scenarios, eg. trackers trying to gather information in any possible way, especially cross origin.

Amy: can we ask for threat modeling for worst-case scenarios?

Rossen: yes.

Lea: for early reviews we should focus om more on use cases - how common are they, what's the design space, have they explored other design options, things like that... is this a problem worth solving and is this the right direction?

Amy: if they can articulate the use cases we can encourage them to look for a solution that doesn't increase the attack vectors.

Peter: are we ready to close with additional feedbadck? or do we need some more back and forth?

Rossen: i woulnd't say this is satisfied ... not ready to say unsatisfied but closer to unsatisfied.. Need concrete evidence of why the approach needs more work.. The main points are: (a) interop - how well would this work across engines (good to see they've asked moz and webkit) and (b) the overall extensibility model - for me the major benefit from this is the ability to create causality - "what is causing a chain of events that is resulting in a long running animation?" ... Final bit is : seems we're taking an unwanted pattern and making it much easier to observe...

Peter: so what's the path forward? We could say "we're to see this move forward but want to see work on these 3 areas" -

Amy: based on interacitons from privacy task force with web perf feels like we should push back a little more strongly...

Amy: seems like the argument is based on "if you can get this [data] in other complicated ways then it's better to provide an easier way" which ...

Rossen: I can provide more feedback...

Dan: the text in the ancillary data section of privacy principles is pretty stable and can be referenced in the feecback.

Discussed Dec 18, 2023 (See Github)

they have replied to rossen's comment - currently waiting for our followup.

triage breakout 2

Hadley, Peter

Comment by @atanassov Dec 20, 2023 (See Github)

Hi @noamr, thanks for the additional context.

We have discussed some of this with other browser vendors, and the conversation goes on.

I can't tell how supported the feature by other browser vendors is by this statement. Can you please elaborate or point me to observable discussions?

The major details in the spec expose things that are well specified and already observable like the stages in the rendering pipeline. We're making sure that whatever is in the spec matches those existing concepts.

I am concerned that LoAF is attempting to expose something observable today, and make it easier to obtain, cross-origin and cross-frame without justifying if it is good for users or not. In particular, this type of exposure appears as ancillary user data - is it?

Couple of additional points:

Interop (similar to my first question above), I still can't tell how well implementable the feature across engines is (great to see you're working with Moz and Webkit).
The overall extensibility model - for me, the major benefit of the feature compared to what is possible today (despite not being too easy) is the ability to create causality - "what is causing a chain of events that is resulting in a long running animation?", and I can't tell what that would look like from the current work. Perhaps you can point me to additional examples, or work in progress I can learn from?

Comment by @noamr Dec 20, 2023 (See Github)

Hi @noamr, thanks for the additional context.

We have discussed some of this with other browser vendors, and the conversation goes on.

I can't tell how supported the feature by other browser vendors is by this statement. Can you please elaborate or point me to observable discussions?

Of course! Minutes of the discussion at TPAC: https://w3c.github.io/web-performance/meetings/2023/2023-09-TPAC/index.html

Search LoAF. According to WebKit folks this is more implementable than long tasks, and according to Firefox this is more in line with benchmarks like Speedometer 3. We filed standards position requests with both (https://github.com/mozilla/standards-positions/issues/929, https://github.com/WebKit/standards-positions/issues/283).

The major details in the spec expose things that are well specified and already observable like the stages in the rendering pipeline. We're making sure that whatever is in the spec matches those existing concepts.

I am concerned that LoAF is attempting to expose something observable today, and make it easier to obtain, cross-origin and cross-frame without justifying if it is good for users or not. In particular, this type of exposure appears as ancillary user data - is it?

The data observed in LoAF is how long rendering in your own origin was delayed. This is information you should probably know, and have access to anyway. The way in which this API exposes this information is marginally easier than measuring it yourself - the main thing that this API makes easier is understanding the root cause of delays caused by your own origin. An equivalent would be resource timing, where if you have other origins clogging your network bandwidth it would be observable as slow downs for your own resources.

I think it follows the principles in https://w3ctag.github.io/privacy-principles/#information: "New APIs which add new ways of getting information must be guarded at least as strongly as the existing ways". This holds here. Information about how much rendering was delayed is not currently guarded in any way (and cannot be guarded, except by means of process isolation).

Couple of additional points:

Interop (similar to my first question above), I still can't tell how well implementable the feature across engines is (great to see you're working with Moz and Webkit).

Waiting for them to respond to the standards position open issues.

The overall extensibility model - for me, the major benefit of the feature compared to what is possible today (despite not being too easy) is the ability to create causality - "what is causing a chain of events that is resulting in a long running animation?", and I can't tell what that would look like from the current work. Perhaps you can point me to additional examples, or work in progress I can learn from?

Not only "long running animation", but also "slow responsiveness". See the earlier attached TPAC minutes for use case from Microsoft, they've been successfully using it to reduce scroll jankiness, and also success stories from RUMVision where they actively use this (as part of the origin trial) to pinpoint scripts responsible for sluggish websites for their customers.

Comment by @noamr Jan 4, 2024 (See Github)

To demonstrate how this doesn't expose new ancillary data, consider the following. You want to know if other frames in the process are blocking. To do that today, you can run:

const before = performance.now();
setTimeout(() => {
   const delay = performance.now() - before;
}, 0);

Same with a requestAnimationFrame if you want to measure rendering delay in particular.

To do that with LoAF, you'd have to register an observer, and make sure that the delay surmounts to more than 50ms. This makes LoAF a very blunt instrument to measure things that can be measured in a sharp way today...

Discussed Apr 22, 2024 (See Github)

Tess to ping Dan about this async.

Discussed Jun 17, 2024 (See Github)

<blockquote> @plinss and I looked at this today and it seems broadly acceptable. We have a few concerns here, but none of these really change our overall positive disposition.

We observe that the spec claims that thresholding durations is an effective mitigation strategy for timing attacks. This is not correct. Thresholding only limits the rate at which information can be extracted. The specification rightly points out that these measurements are already possible, but claims this does not make things worse. This is also incorrect. Being able to measure multiple timing sources at the same time makes the rate of information extraction much higher. This is still probably a worthwhile trade-off overall, but please do not pretend like the risk has been eliminated.

We also noted the monekypatch of WebIDL, hopefully you're talking to the WebIDL folks to get those changes folded in and will be removing the monkeypatch. See our guidance in this area.

</blockquote>

Comment by @martinthomson Jun 17, 2024 (See Github)

@plinss and I looked at this today and it seems broadly acceptable. We have a few concerns here, but none of these really change our overall positive disposition.

We also noted the monekypatch of WebIDL, hopefully you're talking to the WebIDL folks to get those changes folded in and will be removing the monkeypatch. See our guidance in this area.

Comment by @noamr Jun 25, 2024 (See Github)

@plinss and I looked at this today and it seems broadly acceptable. We have a few concerns here, but none of these really change our overall positive disposition.

We observe that the spec claims that thresholding durations is an effective mitigation strategy for timing attacks. This is not correct. Thresholding only limits the rate at which information can be extracted. The specification rightly points out that these measurements are already possible, but claims this does not make things worse. This is also incorrect. Being able to measure multiple timing sources at the same time makes the rate of information extraction much higher. This is still probably a worthwhile trade-off overall, but please do not pretend like the risk has been eliminated.

We also noted the monekypatch of WebIDL, hopefully you're talking to the WebIDL folks to get those changes folded in and will be removing the monkeypatch. See our guidance in this area.

Thanks for the review! Indeed the remaining WebIDL monkey patches are in process of being upstreamed (see https://github.com/whatwg/webidl/pull/1400). I will take your comments into account and make the S&P section of the spec more accurate to those points.