design-reviews#324: Event Timing API

#324: Event Timing API

Opened Nov 13, 2018

こんにちはTAG！

I'm requesting a TAG review of:

Name: Event Timing API
Spec: https://wicg.github.io/event-timing/
Explainer: https://github.com/WICG/event-timing
Tests: not yet in WPT but will be moved from here
Primary contacts: @npm1, @tdresser

Further details (optional):

Relevant time constraints or deadlines: Chrome has implemented this API and is running an Origin Trial for it. We'd like to ship by the end of this year if there are no concerns.
We have a Chrome launch bug at https://crbug.com/841224. Sorry, launch bugs have restricted viewing permissions. The bug shows approvals for Privacy and Security from Chrome.

The Self-Review Questionnaire is here: https://docs.google.com/document/d/1fxwq_Fl3wx4YI-djkoDRDBN6-rYzOGUFnNFGChGF58Y/edit?usp=sharing

We'd prefer the TAG provide feedback as (please select one):

open issues in our Github repo for each point of feedback

Discussions

Comment by @foolip Dec 5, 2018 (See Github)

Spec is now up at https://wicg.github.io/event-timing/.

Comment by @torgo Feb 6, 2019 (See Github)

Discussed at Tokyo F2F. Specific feedback:

We would like to see a clearer problem statement - who is this for, when is it used, how does it produce a user benefit?
It would be nice to see an end-to-end example of how the specific data that comes back from this API could be used to improve the user experience
WRT the security & privacy considerations in the explainer: we would like to see some more detail here - e.g. the ways in which this could be misused, if any
Yes, we would like to see the self-review questionnaire filled out, please.

Comment by @travisleithead Feb 6, 2019 (See Github)

Some additional thoughts of mine:

If performance metrics will be gathered from event types for which no event listeners are registered, as the proposal implies, then how is that data useful? E.g., a web site may not have registered any touch events, but the user is using their finger to manipulate the content… does the browser need to record the timing information for what might have happened if the touch events were dispatched?
Is the observer type 'events' too generic? UAs might prefer a more fine-grained event type (e.g., 'mouse-events') to cut-down on the perf impact that observing will have…
One assumption in the spec is that this timing data will be useful to help sites determine causal issues for slow events (that may be causing "smoothness" issues in the user experience). However, it might not be good to assume that there is always correlation between the hw timestamp of the UI event and the time that it is dispatched by the UA. For example, servicing multiple queues internally, animations running on the UI thread, decoding/encoding processes, web audio graph processing, etc., to say nothing of browser-external factors like limited memory conditions causing disk-swapping or page-faulting, or large numbers of open apps competing for CPU time by the OS scheduler. All of these things are happening and can lead to high variability in causality for slow events (especially in that time between received timestamp and dispatch). Setting a high bar for emitting the record (50ms) is one way to cut down on the noise, but it may not be sufficient.
In many of the scenarios, the standards performance metrics APIs (mark/measure) are seemingly insufficient because they don't capture the time before the resulting display update (if this assumption is wrong, please correct). I wonder how correlated the cause-and-effect of events to UI change/updates often is. For example, one scenario describes hovering a menu item that triggers a flyout. (We'll assume that this hover/flyout behavior is triggered by an onmouseover event handler, because CSS hover menus do not need script--would those be counted?) There may be an onmouseover event that run, but triggers asynchronous code that eventually causes the menu to appear--in this case, you've lost the causality relationship. I fear there are far too many of these kinds of scenarios in author code, that the metrics collected from these event timings would be useless.
Perhaps one way to improve the utility of this proposal would be to get more specific--just tracking all mousemoves (for example) may not be terribly useful or provide insightful data. However, if there is a particular element subtree that you are interested in observing the mouseover characterisitics, that it could be more interesting (because it is more specific). In this case, you want to be able to register for performance metrics for events in combination with an element, or perhaps layout box?

Comment by @npm1 Feb 8, 2019 (See Github)

Replying to all the comments, sorry in advance for the length.

We would like to see a clearer problem statement - who is this for, when is it used, how does it produce a user benefit?

Did you get a chance to read over https://wicg.github.io/event-timing/#sec-intro? I think that the introduction section of the spec answers these questions.

It would be nice to see an end-to-end example of how the specific data that comes back from this API could be used to improve the user experience

There are some examples at https://wicg.github.io/event-timing/#sec-example.

WRT the security & privacy considerations in the explainer: we would like to see some more detail here - e.g. the ways in which this could be misused, if any

https://wicg.github.io/event-timing/#priv-sec has a bit more detail. We haven’t come up with ways in which the API could be misused because the information is only exposed to the appropriate frame target. Even with clickjacking, the timing information exposed does not seem to be a problem because it will measure work done by the attacker.

Yes, we would like to see the self-review questionnaire filled out, please.

Sure! Here it is https://docs.google.com/document/d/1fxwq_Fl3wx4YI-djkoDRDBN6-rYzOGUFnNFGChGF58Y/edit?usp=sharing

If performance metrics will be gathered from event types for which no event listeners are registered, as the proposal implies, then how is that data useful? E.g., a web site may not have registered any touch events, but the user is using their finger to manipulate the content… does the browser need to record the timing information for what might have happened if the touch events were dispatched?

I think this question mainly concerns first input, although technically this could happen for first input or for any event so let me answer those separately. For first input, we do not want this metric to change with unobservable changes from the user’s perspective. So for example if a website adds an event listener for the whole page, that should not improve its FirstInputTiming. If we disregard events without listeners, then it would. The reason for this is that before adding the page-wide event listener, events not hitting event listeners would be ignored but would take a short time to process. But after adding it, those events which take a short time to process would be considered first inputs. Thus, including events which do not have registered event listeners is more in line with the user experience and therefore we should be including them. Regarding other events, note that in order to be reported they’d need to pass the threshold, which means a lot of work would need to be happening in the rendering pipeline, since event handlers will take no time to run. This is useful to surface (worth investigating if events cause a lot of rendering). Events without listeners can still cause a lot of work (think scrolling, hover effects), so we should inform developers when they are handled slowly.

Is the observer type 'events' too generic? UAs might prefer a more fine-grained event type (e.g., 'mouse-events') to cut-down on the perf impact that observing will have…

I disagree with it being too generic if we consider other entries. Generally entryTypes are not that specific (we have one ‘resource’ for ResourceTiming, for example). It’s a good point that we do not want to forward an unreasonable amount of entries, hence why we set a high threshold (56 ms) for the |duration| that is required in order for entries to be emitted. In the near future, PerformanceObserver will support more parameters for its observe() method, so we’d like to add a parameter to allow you to further filter the events seen, by event type.

One assumption in the spec is that this timing data will be useful to help sites determine causal issues for slow events (that may be causing "smoothness" issues in the user experience). However, it might not be good to assume that there is always correlation between the hw timestamp of the UI event and the time that it is dispatched by the UA. For example, servicing multiple queues internally, animations running on the UI thread, decoding/encoding processes, web audio graph processing, etc., to say nothing of browser-external factors like limited memory conditions causing disk-swapping or page-faulting, or large numbers of open apps competing for CPU time by the OS scheduler. All of these things are happening and can lead to high variability in causality for slow events (especially in that time between received timestamp and dispatch). Setting a high bar for emitting the record (50ms) is one way to cut down on the noise, but it may not be sufficient.

That’s true and it may be possible to obtain entries corresponding to events where the work is not caused by the event, but in aggregate you should be able to notice events that tend to be surfaced more often. Chrome’s input team investigations on input latency show that work is almost always dominated by JS execution. It's true that false positives will happen sometimes, but it should be rare.

In many of the scenarios, the standards performance metrics APIs (mark/measure) are seemingly insufficient because they don't capture the time before the resulting display update (if this assumption is wrong, please correct). I wonder how correlated the cause-and-effect of events to UI change/updates often is. For example, one scenario describes hovering a menu item that triggers a flyout. (We'll assume that this hover/flyout behavior is triggered by an onmouseover event handler, because CSS hover menus do not need script--would those be counted?) There may be an onmouseover event that run, but triggers asynchronous code that eventually causes the menu to appear--in this case, you've lost the causality relationship. I fear there are far too many of these kinds of scenarios in author code, that the metrics collected from these event timings would be useless.

It’s true that this API will miss capturing events which trigger async work which is NOT executed before the next time the user agent displays pixels on the screen. But tracking this seems hard - we’d probably require some way for the developer to notify us when the event work has been completed. We have an idea to do this: event.measureUntil (it won’t go in our initial API though). I do think that while some use cases will not be possible due to this limitation, we still provide a lot of value from surfacing events which are problematic because they block the user agent from updating the display.

Perhaps one way to improve the utility of this proposal would be to get more specific--just tracking all mousemoves (for example) may not be terribly useful or provide insightful data. However, if there is a particular element subtree that you are interested in observing the mouseover characterisitics, that it could be more interesting (because it is more specific). In this case, you want to be able to register for performance metrics for events in combination with an element, or perhaps layout box?

This assumes a motivated developer. In most cases, we expect this API to be used by RUM analytics vendors, without developer intervention. RUM analytics won't know which elements are important. It also assumes that motivated developers have insight on what's slow, which I don't think is true. Surfacing which element is targeted is something we've discussed, but we decided it could wait for a future version.

Discussed Mar 26, 2019 (See Github)

David: this review was mostly travis - it would be good if someone else replaced travis.

Tess: i can try to be travis.

Peter: Punt 2 weeks

Discussed May 15, 2019 (See Github)

David: I haven't had a chance to look at it.

(Discussion about planned f2f discussion on working mode and how to be more efficient in getting through reviews.)

Comment by @dbaron May 20, 2019 (See Github)

I think this seems like a reasonable proposal that's trying to address a worthwhile goal.

I know I and others have expressed some concerns in the past about how well the web performance specifications fit together in a way that allows multiple implementations to interoperate. Analyzing that without implementing (or at least without more background in implementing the set of specs) is hard, so I haven't attempted to analyze whether this specification makes that problem any worse, but I'm hoping it doesn't.

A few specific comments on the spec:

A few sections say:

This section will be removed once the DOM specification has been modified.

but I'd note it might be desirable to refer to the modified section somehow once the modifications are made, rather than deleting the section.

The Queueing other entries section has some wording (about skipping steps of the event dispatch algorithm when there are no event handlers) that makes me wonder about the choice of "first event". In particular, the first click or the first key might not be the user actually trying to interact with the page functionality; many users do things like select text while they're reading or click in blank areas to control what has focus. If those things count as the first event, the results might be misleading -- especially if those things don't have event handlers. I think Travis raised this before, and I think your reply is a reasonable justification for the current design, although I also think it may be worth calling out in the text of the specification as an issue so that readers are aware of the meaning of firstInput events and how to understand them.

I'd also note that it's not clear from this section how the create an event timing entry definition is used.

The Security and Privacy Considerations section refers to some things that aren't in the specification. In particular, it refers to a choice of a 104ms cutoff that is not documented anywhere in the spec, but seems like it should be.

I'd note that I haven't really dug into the "Processing Model" sections in detail. It seems like it would be useful to have some explanatory material about what those sections do. For example, it seems like this section creates two kinds of timing entries, "firstInput" and "event"; perhaps something should explain in a few sentences which parts of these algorithms create those entries (or change one type to the other) and give them the correct times. Another example of something that might be worth pointing out in prose is that it's possible to get two entries representing the same event (one of each entryType) if an event is both the first input and longer than 104ms. There are probably a bunch of other similar things worth pointing out to readers (including, say, those who are likely to document and teach the specification) that you're aware of but that might not be obvious without very careful consideration of the algorithms in the Processing Model section that might only be done by an implementor.

Comment by @dbaron May 22, 2019 (See Github)

This seems like it's tightly tied to the work that's done in the Web Performance Working Group -- to what extent has it been discussed with the other participants of that working group, and is there a plan to move it there?

Comment by @dbaron May 22, 2019 (See Github)

@kenchris and I just looked at this during a breakout at the Reykjavík face-to-face meeting. We didn't see any major concerns but are still interested in hearing responses to the issues I raised a few days ago and the one other question above -- and may wish to look at the spec a little bit more in light of those responses.

Comment by @npm1 May 22, 2019 (See Github)

A few sections say:

This section will be removed once the DOM specification has been modified.

but I'd note it might be desirable to refer to the modified section somehow once the modifications are made, rather than deleting the section.

Thanks for the feedback. We will keep this in mind when we get there, though I'm not sure it is possible to have much text other than maybe a non-normative note pointing to where the concepts are defined.

The Queueing other entries section has some wording (about skipping steps of the event dispatch algorithm when there are no event handlers) that makes me wonder about the choice of "first event". In particular, the first click or the first key might not be the user actually trying to interact with the page functionality; many users do things like select text while they're reading or click in blank areas to control what has focus. If those things count as the first event, the results might be misleading -- especially if those things don't have event handlers. I think Travis raised this before, and I think your reply is a reasonable justification for the current design, although I also think it may be worth calling out in the text of the specification as an issue so that readers are aware of the meaning of firstInput events and how to understand them.

Yep, first input may not be meaningful. I have filed an issue to add a note about this.

I'd also note that it's not clear from this section how the create an event timing entry definition is used.

Yep, that section is a little bit confusing. The point is to try to specify what can be done if a user agent wants an event surfaced from outside of the event dispatching logic. There is already an issue for this, I'm leaning towards simply removing the section if it is deemed too confusing.

The Security and Privacy Considerations section refers to some things that aren't in the specification. In particular, it refers to a choice of a 104ms cutoff that is not documented anywhere in the spec, but seems like it should be.

The Events exposed section is vague about it, but I can hard-code the threshold (right now it says "a certain threshold") there if that would make it clearer? The algorithm does include the number.

I'd note that I haven't really dug into the "Processing Model" sections in detail. It seems like it would be useful to have some explanatory material about what those sections do. For example, it seems like this section creates two kinds of timing entries, "firstInput" and "event"; perhaps something should explain in a few sentences which parts of these algorithms create those entries (or change one type to the other) and give them the correct times.

Ok, maybe splitting the algorithm to encapsulate the part that creates firstInput would help?

Another example of something that might be worth pointing out in prose is that it's possible to get two entries representing the same event (one of each entryType) if an event is both the first input and longer than 104ms. There are probably a bunch of other similar things worth pointing out to readers (including, say, those who are likely to document and teach the specification) that you're aware of but that might not be obvious without very careful consideration of the algorithms in the Processing Model section that might only be done by an implementor.

Ok, this sounds like it could go in the introductory section. Anyways, filed an issue

This seems like it's tightly tied to the work that's done in the Web Performance Working Group -- to what extent has it been discussed with the other participants of that working group, and is there a plan to move it there?

It has been presented to the group in various occasions and there is general consensus that this is a useful API. The plan is indeed to eventually move it from incubation to that group.

By the way, thank you for providing the feedback!

Discussed Jun 19, 2019 (See Github)

David: Kenneth and I looked at both of these in Reykjavik. They responded, we should read and evaluate their responses. Good thing to do in our regular breakout next week, or maybe asynchronously.

Dan: Should I set the milestone to the 3rd of July? I'm going to do that. Both set to breakout. You two can caucus on this one.

Comment by @dbaron Jul 10, 2019 (See Github)

OK, thanks. I think I'm ok with closing this issue at this point.

Discussed Jul 17, 2019 (See Github)

David: i'm fine with closing this

Kenneth: I can close it then

Comment by @kenchris Jul 17, 2019 (See Github)

Happy that our feedback was useful! Thanks for flying TAG