design-reviews#878: confidence reporting for PerformanceNavigationTiming

#878: confidence reporting for PerformanceNavigationTiming

Opened Jul 25, 2023

I'm requesting a TAG review of systemEntropy addition to PerformanceNavigationTiming.

This proposal adds a new ‘systemEntropy’ field to the PerformanceNavigationTiming struct enabling developers to discern if the page load occurs during a non-optimal performance state.

[Update: 2025-05: The name of this feature is now "confidence", rather than "systemEntropy"]

Explainer¹ (minimally containing user needs and example code): url
Specification URL: N/A
Tests: not yet
User research: N/A
Security and Privacy self-review²: url
GitHub repo (if you prefer feedback filed there): N/A
Primary contacts (and their relationship to the specification):
- Mike Jackson (mwjacksonmsft), Microsoft
Organization(s)/project(s) driving the specification: Microsoft
Key pieces of existing multi-stakeholder review or discussion of this specification: N/A
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5186950448283648

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: [please provide]
The group where the work on this specification is currently being done: W3C WebPerf WG
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): W3C WebPerf WG
Major unresolved issues with or opposition to this specification: N/A
This work is being funded by: Microsoft

You should also know that...

[please tell us anything you think is relevant to this review]

We'd prefer the TAG provide feedback as (please delete all but the desired option): 💬 leave review feedback as a comment in this issue and @-notify [mwjacksonmsft]

Discussions

Discussed Sep 4, 2023 (See Github)

Peter: entropy probably the wrong name... I think this is OK but would like to get Rossen's opinion. I have a concern about how this might expose additional fingerprinting surface or allow side-channel attacks. I accept it's "past" state but it's also coming along side of some high resolution timers... which can be used for side-channel attacks.

peter to leave comment

Comment by @plinss Sep 7, 2023 (See Github)

We haven't had a chance to dive into this thoroughly yet, but first impressions:

The term "entropy" has a number of other connotations and may not be the best term here. Possibly something like "systemLoad" might be more obvious to users.
I don't have specific examples, but this leads to concern about possibly introducing additional information for side-channel attacks or user fingerprinting (see battery status API). I accept that this is past data, but it's strongly correlated with high resolution timers. Have crypto and privacy experts evaluated this aspect?

Comment by @mwjacksonmsft Sep 11, 2023 (See Github)

Thanks for the feedback.

I'm open to feedback on the name. I have a concern that 'load' implies some system resource usage is high, which may or may not be the case, especially during cold start where there could be lock contention, or delays while loading binaries off disk.
Is the concern that a third-party script is included on siteA and siteB, that the script might be able to create a short-term identifier to track the user across the two sites?

Discussed Oct 23, 2023 (See Github)

Rossen: initial lightweight feedback, well received... it's complex. If the system is already under heavy load you'll experience different performance characteristics compared to if it wasn't. This is one way of allowing you to better profile your payloads. Straightforward from addition pov.

Peter: I asked about side channels and they answered with a question. Also about whether something comes out of a cache or not.. curious if someone can use this flag if the system is lying about other metrics.. depends on how it's implemented. There are other cases where the system is going to load it from the cache but pretend it's nto loading it from teh cache so you don't get a timing attack. Is this going to report that the system is under high load or not? If the system isn't hitting the network it's hitting the cache maybe this would reveal that the other api is lying and expose information

Rossen: the cache api or the performance metrics api?

Peter: the perf metrics of loading. Faking a network request but pulling something out of the cache

Rossen: the system is under normal load, and your nativation takes a long time vs a short time?

Peter: maybe under normal circumstances if the resource wasn't in the cache your system would be under heavy load, but now it's in the cache but your system is pretending to load it, but the system isn't under load, by exposing the system load you can infer that this really was cached after all and we negated the mitigation in the other place where we were lying about the fact it was in the cache. Not perfect, but it's information. Other things like battery status where people can use very scant information to pull pretty accurate information

Rossen: I can remember the battery discussions.. the strong feedback from mozilla at the time. Not sure how this compares to it though

Peter: not convinced this is a problem. Just concerned that it might be and I'd like people who know better to look at it. I can leave the feedback.

Rossen: trying to wrap my head around how detecting of the cache usage happens with this new entropy state. Let's see what they come back with

Comment by @plinss Oct 23, 2023 (See Github)

Is the concern that a third-party script is included on siteA and siteB, that the script might be able to create a short-term identifier to track the user across the two sites?

Possibly, but also that a site may be able to fingerprint a user and tell it's the same user on multiple visits.

We also have concerns about other information leakage, for example there's work being done to hide usage of local cache. E.g. loading a resource from a cache but pretending that it's coming from the network (and adding an artificial delay), being able to measure the entropy may reveal this is happening.

Basically, we like to be sure people who understand these kinds of issues better have reviewed this.

Discussed Nov 13, 2023 (See Github)

Peter: should we defer?

Rossen: yes i think we should wait for feedback.

Discussed Nov 27, 2023 (See Github)

Peter: still waiting for feedback.

Discussed Dec 11, 2023 (See Github)

Peter: we haven't heard back from them on the last reply to their response...

Rossen: let me ping internally and see what's happening with this. They're still interested in moving forward and are going to update soon.

punted for now

Comment by @mwjacksonmsft Dec 12, 2023 (See Github)

My apologies for the delayed reply. I'm working with the WebPerfWG to ensure that this information is exposed in a way that conforms with the privacy principles outlined here: https://docs.google.com/presentation/d/19TOz4mXRsYt8tkqzH8io_BrYiZBXhqGyD646gJy-x6I/edit#slide=id.p

Discussed Dec 18, 2023 (See Github)

Pending external feedback on how it fits with the privacy principles.

Comment by @hadleybeeman Dec 18, 2023 (See Github)

Thanks, @mwjacksonmsft — we'll wait to hear back from you then.

Discussed Jan 1, 2024 (See Github)

Dan: nothing to do here - still pending - assigned Hadley to the issue

Discussed Jul 1, 2024 (See Github)

Still pending external feedback. Have been waiting for 7 months.

Comment by @mwjacksonmsft Oct 17, 2024 (See Github)

Hi Tag -

We've iterated on the design of this API in the WebPerfWG. The explainer, and Chrome status entry pages have all been updated. Updated details are captured in the explainer, but the high-level summary is that the field has been renamed to 'confidence', and noise via a randomized response algorithm has been introduced to reduce fingerprinting risks.

The explainer also gives an example of how a RUM provider might debias the aggregate data.

Discussed Mar 31, 2025 (See Github)

Yoav: this is an effort by MS folks to expose a bimodal dimension on navigation timing... but make it privacy-preserving. They have presented at the webperf working group. They are exposing a fuzzy bit on whether the system entropy is high or low.. and as a result of that they can disect the relevant metrics... without knowing for any specific navigation, the entropy level.

Jeffrey: I suggested the fuzzy thing and focusing on the privacy implication... and they fixed the name. It seems like they have gone in the right direction.

yoav and jeffrey assigned and they will propose a closing comment with 'satisfied'

Discussed Apr 14, 2025 (See Github)

Hadley stepping down from this issue. Jeffrey and Yoav to continue with it next week.

Discussed May 12, 2025 (See Github)

Martin: Just don't give measurements when the system is loaded?

Jeffrey: They fixed the name, to "confidence".

Max: They don't explain when they have confidence.

Martin: And they're doing randomized response. I don't think that's a good solution, because they haven't capped the sensitivity. How many times will it be submitted?

Jeffrey: Do we have any documentation on how to do randomized responses?

Martin: No. Everyone doing it has to learn it for themself. If it's just for the one measurement, it doesn't help much.

Jeffrey: I wonder if the Privacy WG could help us with this. Everyone wants to do it well, but the math is hard to learn.

Martin: Debate should be about whether differential privacy is the right tool in the first place. I don't see it here. Think the browser should just not return a value. Instead of providing timing values, say "it didn't work". That can be randomized, and can engage more when the system is loaded. Then the people who are getting the information can rely on the fact that it's good, without guessing about the loading of the system.

Jeffrey: Do you want to send a comment?

Martin: I'll do that and change the title.

Jeffrey: I've lost track of why this provides fingerprinting in the first place.

Martin: It's the "ephemeral" sort. And collecting several samples can let you derandomize.

Jeffrey: Maybe randomize once per active page, and remember it until the system stops being under load?

Discussed May 19, 2025 (See Github)

Jeffrey: Martin already sent feedback; we'll wait for their response.

Discussed Jun 9, 2025 (See Github)

Jeffrey: I think Martin's right that the performance APIs are a fingerprinting vector, but I don't know how much of a problem it is. We should have a discussion about what the TAG's position should be.

Max: If that's a concern, the design of the API should have some mechanism to mitigate it.

Jeffrey: They did try to mitigate it, but Martin's unhappy with the differential privacy they used.

Max: Need more discussion.

Jeffrey: If we schedule it for Breakout C next week, Yoav might be able to dial in.

Discussed Jun 16, 2025 (See Github)

Yoav: Saw comments from Martin and a bit surprised by them. It would be good to hash out those concerns. I was involved in guiding the folks that championed this through the privacy principles. We were explicitly going with a fuzzy/DP approach in order to avoid exposing information about the user and their browser/environment.

... At a high level, this allows browsers to tell developers about why the site was slow. Either because the browser was under stress or it was busy doing a lot of other things, such as might happen at startup. This enables aggregated results to be adjusted at scale to ignore those cases, from those browsers. The goal is to avoid revealing whether a particular user was under stress or not. This allows for individual users. I thought that this was in line with privacy principles; we didn't want to reveal new auxiliary information. Was surprised by the comment; if we take that to the extreme, we are saying that performance measurements are a fingerprinting vector, everything related to timers is. Anything could be.

Martin: I was looking at this from an information ?? perspective. The addition is fine. In the greater view, there is a problem here. I don't know that that means we don't do the feature, just that we acknolwedge there is exposure here. I don't know if we can even block performance monitoring without extreme isolation. There are designs for systems that operate without clocks, but if you have a web connection, that's not good. The realisation that this isn't as big a problem as at seems to be -- that's what caused me to make these comments. I don't think it affects this API. The question is: what do we do on the larger scale to make measurements more useful and more private?

Yoav: I think that's fair, but -- If we are again going by the Privacy Principles, there is some auxiliary data we are exposing related to performance, but to a large extent, it's already exposed through functional APIs, that no one is intending to remove and the performance APIs make the collection of this data either easier or cheaper for users. We know that people who care about these things or malicious folks who care about the fingerprinting aspects -- I'm not sure I'm fully buying the fingerprinting aspects -- but they can all gather this information outside of these APIs. So we made sure through differential privacy, it's not exposing anything about a specific user. If you were trying to guess, 50% of your assumptions would be incorrect in either direction.

On the broader topic of fingerprinting, if we assume that people who are slow on site A will also be slow on site B, we can break that down into network connection, the download time...

Martin: there are multiple metrics there. End to end latency from a network perspective, from a CPU perspective, and download speeds. And those are useful info for people using websites. Recognising that someone who is slow on one website is slow on another -- that's useful info.

This one has an unbound sensistivity, so adding info is not much good.

Yoav: for some of these things, we could go the route of aggregated reporting across users. For very specific things, for things where we know we can avoid the exposure through functional APIs. Here, they are all exposed.

Martin: the best I think we have there is to add a little noise, but that noise doesn't give us any value.

Yves: noise can only save some time

Martin: Having thought about this, I think the only thing we can do here is acknowledge that this is a fingerprinting risk. But there is too much compatibility risk to disable those. This one escaped, and anyone looking to build a browser that's more private may be interested in the ways. But it's not going to be perfect.

Yoav: not convinced that any of this can be mitigated in functional browser. Maybe with full application integrity, you could do some preloading and processing, but that's not the web. It might use web tech, but it's not the web.

Lola: My question has been answered already, I think. From what I'm hearing, it sounds like this is something that we want to weigh the risk of. The data that is exposed as a fingerprinting vector vs. the use to developers. It sounds like there is a risk, but it's OK if we add noise.

Yoav: the risk is inherent to the web platform: timers, network requests, and events. We have to accept that risk. If all other fingerprinting bits are solved, this is not what will expose identity. I don't know how many bits this is in practice. For the 10% slowest users, this might have correlation. We should definitely mention this in the various APIs' privacy and security considerations, but I don't think this is something we can solve without fundamental changes to how websites work.

Hadley: Sounds like we are converging on something. Can someone jump in and start drafting some text that we can agree on?

Martin: I think we're better aligned than it seemed. I think Yoav and I are aligned. We cannot fix this problem, not worth trying. But acknolwedging the fingerprinting risk is a worthwhile exercise.

Separately, I'm comfortable with this API. The value of epsilon is flexible ,no?

Yoav: I think so, they can increase or decrease the value of this... true info vs false info... by default, the value in Chromium is 50/50

Martin: which is a pretty low epsilon. I'm comfortable with that. If we can agree with an epsilon across browsers, that's helpful.

Yoav: if I remember correctly, the API reports the epsilon used. So you can de-bias the data.

Martin: It's just annoying because you need to de-bias on a per-browser/per-epsilon basis.

Hadley: [some discussion on process and issues and stuff]

Yves: resolution: satisfied with concerns?

Yoav and Martin: No, this API is good!

[Yoav to draft text for a comment, to agree with Martin and Lola. Then they will close the issue, resolution: satisfied]

Discussed Jun 23, 2025 (See Github)

Matthew: where are we with this.

Yoav: Martin and I in agreement that performance measurements leak privacy things, but this prposoal doesn't make things worst.

Matthew: there seems to be support in the brainstorming issue, so let's go ahead and post that.

Yoav: Done.

Comment by @yoavweiss Jun 26, 2025 (See Github)

We discussed this on the TAG call and we don't have concerns with this particular API. Its use of differential privacy explicitly avoids exposing information about any particular user, which is great.

One point which may benefit web developers would be if the spec mandated or recommended a certain epsilon value across different implementations, and that would allow developers to debias their metrics across different implementations. With that said, the fact that the value is reflected using randomizedTriggerRate would still enable developers to handle differences between implementations if those arise.

At the same time, there's some concern related to fingerprinting and web performance APIs in general. Slow users (through various dimensions of slowness: network latency, bandwidth or CPU) are likely to be slow across different sites, which can contribute fingerprinting bits to would allow for their persistent re-identification across sites. This information is available through ancillary APIs and non-ancillary APIs alike, and hence it seems unlikely that the situation can be improved. At the same time, it could be good to mention that concern in the relevant APIs' Privacy and Security sections.

Discussed Jun 30, 2025 (See Github)

Yoav: posted what we talked about. What's left is to say put a label on it - satisfied?

Martin: Think that's right. I'll do that and close it off.