design-reviews#1198: Incubation: CPU Performance API

#1198: Incubation: CPU Performance API

Visit on Github

Opened

Feb 19, 2026

Explainer

https://github.com/WICG/cpu-performance/

The explainer

Includes the information requested by the Explainer Explainer.
Follows the Web Platform Design Principles.
Includes or links to answers to the Security/Privacy Questionnaire.
Describes user research you did to validate the problem and/or design.

Where and by whom is the work is being done?

GitHub repo: https://github.com/WICG/cpu-performance/
Primary contacts:
- Nikolaos Papaspyrou (@nickie), Google, author
- Dominic Farolino (@domfarolino), Google, spec mentor
- Yoav Weiss (@yoavweiss), Shopify, W3C Web Performance WG co-chair
Organization/project driving the design: Google Chrome
This work is being funded by: Google
Incubation and standards groups that have discussed the design:
- Web Performance WG, TPAC 2025, minutes
Standards group(s) that you expect to discuss and/or adopt this work when it's ready: Web Performance

Feedback so far

Multi-stakeholder feedback:
- Chromium comments: status, intent to prototype
- Mozilla comments: https://github.com/mozilla/standards-positions/issues/1364
- WebKit comments: https://github.com/WebKit/standards-positions/issues/622
- Adobe comments: favorable
- Figma comments: favorable
Major unresolved issues with or opposition to this design: n/a

You should also know that...

Specification URL: https://wicg.github.io/cpu-performance/

Track conversations at https://tag-github-bot.w3.org/gh/w3ctag/design-reviews/1198

Discussions

Comment by @nickie Feb 24, 2026 (See Github)

Just noting that the specification for this API is ready for review.

Discussed Mar 16, 2026 (See Github)

Jeffrey: I've left some thoughts, but we should discuss.

Marcos: This triggers the same concerns as DAS; it feels a bit self-damaging. Will review again to see if it's different from what's been proposed in the past.

Jeffrey: The approach with buckets helps, because you can always run a benchmark and get detail on the CPU's performance. They propose that the buckets be implementation-defined, and that might be a problem.

Marcos: I will do a first-pass reaction, not a position, in the brainstorming repo, and we can go from there.

Discussed Mar 30, 2026 (See Github)

Marcos: Similiar concerns that I raised with the DAS specs. This feels very foot-gunny. There isn't enough context to make these adjustments. The challenge for this spec would be to prove you can do this that is not foot-gunny when all other previous attempts have not ended well or proven useful.

Jeffrey: Can you be more specific about how the developer will shoot themselves in the foot?

Marcos: It makes an assumption about my CPU but in 5 years, those assumptions won't be valid. The developer will degrade the user experience based on information that doesn't apply.

Jeffrey: They have addressed that part. The buckets have a stable meaning; bucket 4 will always mean the same thing, but in five years there may be 8 or more buckets.

Marcos: Do I make a determination about performance over time? If it's point in time, I could be runing a video or AI job that will skew the numbers.

Jeffrey: Sounds like we should wait for a week and you'll post your comments.

Marcos: OK.

Brian: The scaling factor they have in here is going to be tricky because it's allocation specific. So, the tiers don't have specific meaning and they can't over time. Seems very impossible to have this develop well over time. Is there any prior art that does this well?

Jeffrey: In simpler cases, yes: memory buckets. But that's simpler than GPU/CPU performance. The definition cannot be implementation defined; websites can't guess what the implementation has decided. We might be able to say we have consensus that they need to make the buckets not implementation defined and come back to us with a definition of the buckets. That might make the review easier.

Brian: I don't know if that's my take. I don't see how these can have arbitrary meaning that is left entirely to implementations, that will change over time, and developers are just supposed to make good decisions that will evolve over time.

Jeffrey: It is not supposed to change over time. This machine is in bucket N forever.

Brian: This machine is, but the scale is not. When you get to "unknown", will that be because it's too old or too new? I can't imagine how you do this. The message I want to send is more than "define this".

Jeffrey: How about we post: "We haven't finished discussing this, but one thing we're skeptical about is the idea of leaving the buckets implementation-defined. Can you give us a rough algorithm that might be interoperable?"

Comment by @jyasskin Apr 1, 2026 (See Github)

We haven't finished discussing this, but one thing we're skeptical about is the idea of leaving the buckets implementation-defined. Can you give us a rough algorithm that might be interoperable?

Comment by @marcoscaceres Apr 1, 2026 (See Github)

@nickie, just wondering if you've fed this to Gemini for a review?... it confirmed a lot of concerns for me personally that I had with the proposal.

Here's a prompt. Hope that helps in the meantime while we do human reviews.

<details> # Gemini prompt: `navigator.cpuPerformance` self-review

You are a critical reviewer of web platform proposals. Your job is to find real problems, not to validate the proposal. Do not assume good design because the proposal exists. Apply the following tests rigorously.

Proposal: https://github.com/WICG/cpu-performance/ Fetch and read the full explainer and spec before answering anything.

Test 1: Use-case coverage by existing APIs

For each use case listed in the proposal, determine whether an existing W3C or WHATWG API already provides a more direct signal. Specifically:

For video/media use cases: evaluate MediaCapabilities.decodingInfo() and VideoDecoder.isConfigSupported() (WebCodecs). Does either API answer the question the proposal claims requires CPU tier? Quote the relevant spec text for each.
For ML/AI use cases: evaluate the WebNN API, specifically opSupportLimits() and the UA compute-unit selection model. Does WebNN already address on-device vs. server-side inference decisions without requiring CPU tier? Quote the spec.
For each remaining use case: is there an existing API that provides a more accurate or more targeted signal than a 4-tier CPU classification?

Conclude: which use cases, if any, are genuinely unmet by existing APIs?

Test 2: The reproducibility claim vs. OS reality

The spec requires the same device to always return the same tier regardless of current system state ("reproducibility: independent of current load").

Evaluate this claim against:

Thermal throttling behaviour on mobile SoCs (ARM big.LITTLE, Apple M-series, Qualcomm Snapdragon)
OS power profiles (Windows Balanced/Performance/Power Saver, Android Doze, iOS Low Power Mode)
Battery state effects on CPU boost duration
Background process load (OS indexing, antivirus, system updates)

Is "reproducibility independent of current load" achievable in practice? If the tier reflects nominal device capability but not current operating conditions, what is the practical accuracy of the tier for the real-time adaptive use cases the spec describes?

Test 3: Privacy and fingerprinting surface

The spec says it was designed with privacy in mind. Evaluate:

What is the information gain from navigator.cpuPerformance when combined with navigator.hardwareConcurrency, navigator.deviceMemory, and WebGL renderer strings? Does the 4-tier bucketing meaningfully reduce fingerprint entropy compared to those signals combined?
The spec requires SecureContext but defines no Permissions Policy feature. What does this mean for cross-origin iframes and third-party scripts?
The tier is stable by design ("no reclassification"). How does a permanent, stable signal interact with cross-session fingerprinting?
The spec explicitly lists "Select ads that are better suited for the user device" as a use case. Tier 1 devices correlate with lower-income users. Is there a consent mechanism for this use? Should there be?

Test 4: Abstraction level

Chrome internally classifies devices into performance tiers for rendering heuristics (compositor thread budgets, animation scheduling, etc.). Safari does the same.

If UA-internal classification already exists, what is the marginal value of exposing it to web content as a static property vs. as a dynamic media query (e.g. @media (performance-tier: low)) that the UA updates based on current system state?
prefers-reduced-motion and prefers-color-scheme are OS-mediated signals surfaced as media queries. What are the tradeoffs between that model and the navigator.cpuPerformance model for this use case?

Output format

For each test:

State your verdict: passes / fails / partially passes
Give the specific evidence (quoted spec text, API behavior, OS behavior) that supports the verdict
If it fails, state what would need to change for it to pass

End with an overall assessment: does the proposal justify shipping as specified, or does it need rework? If rework, what is the minimum viable change set?

Do not hedge. If the evidence points to a problem, state the problem clearly.

</details>

Comment by @domfarolino Apr 1, 2026 (See Github)

Can you share the output of your Gemini prompt? I'm surprised the prompt reduces things a "pass/fail" verdict, when all of the questions in each test are more discussion oriented, and not binary.

Comment by @marcoscaceres Apr 1, 2026 (See Github)

Sure (Gemini 3)... ran it a few times, gets the same result:

<details> # CPU Performance API - Critical Review

Test 1: Use-case coverage by existing APIs

Verdict: Fails
Evidence:
- Video/media use cases: The proposal claims a static CPU tier is needed to determine "the resolution and format for images and video" and to "decide if advanced video effects can be rendered." However, the W3C Media Capabilities API already explicitly solves this. Calling navigator.mediaCapabilities.decodingInfo(configuration) resolves a promise with a MediaCapabilitiesDecodingInfo object containing supported, smooth, and powerEfficient boolean flags. Similarly, WebCodecs provides VideoDecoder.isConfigSupported(config) to evaluate specific decoding constraints. Both APIs actively answer whether a device can handle 1080p, 720p, etc., at a given framerate—making a generalized 1-4 CPU tier proxy unnecessary and strictly less accurate.
- ML/AI use cases: The proposal claims applications need the tier to "decide whether to attempt running AI tasks locally or delegate to the server." The WebNN API directly addresses this by exposing device-specific hardware acceleration. Developers can request specific hardware via navigator.ml.createContext({ deviceType: 'gpu' | 'npu' | 'cpu' }) and query exact mathematical and tensor limitations via opSupportLimits(). WebNN allows applications to probe precise capabilities rather than guessing based on a coarse CPU bucket.
- Other use cases: The explainer lists "Control non-essential tasks... use or avoid heavy libraries" and "Improve real user monitoring." navigator.hardwareConcurrency and navigator.deviceMemory already provide static heuristics for device class, while the Long Tasks API and Interaction to Next Paint (INP) provide actual measurements of main-thread contention.
What needs to change: The proposal must articulate distinct use cases that definitively cannot be solved by MediaCapabilities for media, WebNN/WebGPU for compute, or PerformanceObserver for monitoring. Currently, it solves no unique problems.

Test 2: The reproducibility claim vs. OS reality

Verdict: Fails
Evidence:
- Spec Requirement: Section "Computing Performance Tier Value" mandates reproducibility: "An implementation should always report the same performance tier for the same user device... The reported performance tier should not depend on the user device's current load or utilization".
- OS/Hardware Reality: Modern mobile SoCs (Apple A/M-series, Snapdragon, ARM big.LITTLE) aggressively thermally throttle under sustained load. A device capable of "Tier 4" burst performance can drop to "Tier 2" sustained performance within minutes of running a heavy video conferencing app.
- Furthermore, OS power profiles (Windows Battery Saver, Android Doze, iOS Low Power Mode) actively cap CPU frequencies and disable high-performance cores to save battery. If an OS caps a "Tier 4" processor to save battery, but navigator.cpuPerformance rigidly reports 4, the web app will blindly schedule heavy background blur and 1080p decoding. This actively undermines the OS's power management, destroys battery life, and causes severe frame drops.
What needs to change: "Reproducibility independent of current load" is a fundamentally flawed concept for real-time adaptation. To be useful, any performance signal must reflect current OS constraints (thermal throttling, low-power modes) rather than nominal, factory-state theoretical maximums.

Test 3: Privacy and fingerprinting surface

Verdict: Fails
Evidence:
- Information gain: The explainer asserts the API mitigates privacy risks by mapping devices to a small number of tiers. However, when combined with navigator.hardwareConcurrency (core count) and navigator.deviceMemory (RAM), a 4-tier CPU bucket provides a highly specific cross-section that increases fingerprint entropy. A device with 4 cores and 8GB RAM could be isolated into distinct SoC families just by adding the cpuPerformance tier.
- Permissions Policy & 3rd Parties: Section 14 of the security questionnaire states: "The API behaves identically in first-party and third-party contexts... allowing third-party content (such as embedded video players) to adapt." Because the spec defines no Permissions Policy feature, tracking pixels and cross-origin iframes get passive access to a stable device hardware fingerprint.
- Cross-session stability: The spec explicitly requires tiers to be stable and immutable over years ("implementations should not redefine tiers"). A permanent, unchangeable device attribute is the exact definition of a high-value fingerprinting vector.
- Ad Targeting: The proposal explicitly lists "Select ads that are better suited for the user device" as a usecase. Exposing hardware class for ad targeting correlates heavily with socioeconomic status (flagship phone vs. budget device) without any user consent mechanism.
What needs to change: The API must be gated behind a Permissions Policy (disallowed in cross-origin iframes by default). The ad-targeting usecase should be removed, as the web platform should not provide socioeconomic proxies for ad networks without explicit consent.

Test 4: Abstraction level

Verdict: Fails
Evidence:
- The proposal exposes a static JavaScript property navigator.cpuPerformance that returns an integer. This forces developers to write rigid, imperative heuristics (as seen in the explainer's switch (navigator.cpuPerformance) example).
- This compares poorly to OS-mediated signals like prefers-reduced-motion or prefers-color-scheme, which are exposed as CSS Media Queries and can change dynamically based on OS state.
- If the goal is to disable heavy effects or animations on low-end devices, a dynamic media query (e.g., @media (prefers-reduced-performance) or @media (performance-tier: low)) would be vastly superior. It would automatically adapt to OS Battery Saver modes, allow declarative CSS degradation without requiring JavaScript, and avoid exposing a static hardware identifier to tracking scripts.
What needs to change: Shift the abstraction from a static, imperative JS hardware classification to a dynamic, OS-mediated state signal (like a media query) that respects current system conditions and user preferences.

Overall Assessment

The proposal does not justify shipping as specified and requires a complete conceptual rework.

By attempting to provide a static, load-independent hardware taxonomy, the API paradoxically becomes useless for its own stated use cases (adapting to device capabilities), which are already better served by MediaCapabilities and WebNN. Furthermore, its insistence on immutable tiers creates a permanent fingerprinting vector that can be exploited by third-party frames for profiling and ad-targeting.

Minimum viable change set for rework:

Abandon the static, load-independent navigator.cpuPerformance property.
Pivot to a dynamic, OS-backed signal (e.g., a prefers-reduced-performance media query) that reflects the device's current operational reality (accounting for low power modes, user preferences, and thermal throttling).
Ensure this signal is designed strictly for adaptive degradation, stripping out the static taxonomy that enables cross-session fingerprinting and ad targeting.

</details>

Comment by @domfarolino Apr 2, 2026 (See Github)

Thanks. I fed it to Gemini so it could scrutinize the AI-generated critique you provided above. Let me know what you think!

<details> As an expert reviewer, I have scrutinized the provided critique of the **CPU Performance API** proposal. While the critique raises common "standard" objections to new web APIs (privacy, fingerprinting, and redundancy), it contains several logical flaws, misrepresentations of existing APIs, and a failure to account for the specific architectural goals of the proposal.

The following is a critical assessment of the critique's "verdicts."

Test 1: Use-case coverage by existing APIs

Verdict in Critique: Fails My Assessment: Faulty. The critique conflates capability with performance class.

Error regarding MediaCapabilities: The critique argues that MediaCapabilities answers the same question. It does not. MediaCapabilities tells a developer if a specific codec/resolution can be decoded smoothly. It does not tell the developer how much headroom remains for concurrent non-media tasks (e.g., real-time background blur, physics engines, or complex React re-renders). A "Tier 4" device and a "Tier 2" device might both report smooth: true for 1080p, but the Tier 2 device might hit 90% CPU to do it, while the Tier 4 device hits 10%.
Error regarding WebNN: The critique claims WebNN's opSupportLimits() solves the "local vs. server" decision. This is incorrect. WebNN identifies if a hardware operation is supported by an accelerator; it does not provide a holistic "system class" signal. A developer may want to avoid local inference entirely on a low-tier device to preserve battery or UI responsiveness, regardless of whether the hardware can technically run the math.
Logical Gap: The critique ignores the "Real User Monitoring" (RUM) use case. Aggregating performance metrics by CPU tier allows developers to identify if a regression is "global" or limited to "budget devices." navigator.hardwareConcurrency is an unreliable proxy for this because a 4-core Intel Atom performs vastly differently than a 4-core Apple M4.

Test 2: The reproducibility claim vs. OS reality

Verdict in Critique: Fails My Assessment: Partially Correct, but logically misaligned.

The "Nominal vs. Current" Fallacy: The critique argues the API fails because it doesn't reflect thermal throttling. This is a misunderstanding of the proposal's Goal 5: Respect obsolete hardware and applications. The proposal explicitly seeks a static baseline.
Conflict of Intent: The critique suggests the API should be dynamic. However, the proposal specifically positions itself as a complement to the Compute Pressure API. The CPU Performance API provides the "Engine Displacement" (static capacity), while Compute Pressure provides the "Tachometer" (current load). Critiquing the static API for not being dynamic is like critiquing a ruler for not being a stopwatch.

Test 3: Privacy and fingerprinting surface

Verdict in Critique: Fails My Assessment: Exaggerated. * Information Gain: The critique claims a 4-tier bucket increases entropy significantly. In reality, the proposal's goal is to ensure each bucket contains "a fairly large number of devices... both as an absolute number and as distinct device models".

The "Benchmarking" Reality: The critique ignores that any site can currently calculate a CPU tier using a 100ms JavaScript loop (as noted in the "Alternatives Considered" section of the explainer). By providing a coarse, 4-tier browser-mediated signal, the API actually reduces the need for invasive, high-entropy active benchmarking by scripts.
Ad-Targeting Critique: This is a subjective policy argument rather than a technical failure. Socioeconomic correlation exists in almost all hardware signals (deviceMemory, screen resolution, etc.).

Test 4: Abstraction level

Verdict in Critique: Fails My Assessment: Logical Error.

Imperative vs. Declarative: The critique argues for a media query (e.g., @media (performance-tier: low)). While elegant for CSS, this does not solve the use cases for JavaScript-heavy applications (e.g., Figma, Google Meet, or WebGL games) that need to make branching logic decisions before downloading large WASM binaries or assets. A media query cannot help a developer decide whether to fetch a 50MB "Ultra" texture pack or a 5MB "Low" pack.

Overall Assessment

The critique needs rework. It adopts a "zero-trust" posture that ignores the proposal's explicit design goals and its relationship with the Compute Pressure API. It incorrectly frames the API as a "failed" version of a dynamic sensor, when it is intended to be a static hardware classifier.

Minimum Viable Changes for the Critique:

Acknowledge the Compute Pressure API: The critique must evaluate the proposal in tandem with Compute Pressure, rather than demanding the CPU Performance API handle dynamic load signals itself.
Address the "Static vs. Dynamic" Tradeoff: Instead of declaring "failure," the critique should discuss why a static signal is or is not better than a dynamic media query for the specific "asset-loading" use cases.
Correct API Misrepresentations: Remove the claim that MediaCapabilities provides the same "system headroom" signal; it does not.

</details> <details> <summary>Edit: Input prompt</summary>

You are an expert reviewer API of web platform API proposals. The attached file is an initial prompt fed to an AI, telling it to review critically the CPU performance API proposal by Google, followed by the critical review itself.

Your job is to find real problems with the critique of the proposal, not to validate it. Do not assume the critique is accurate or substantive, or expresses legitimate concerns about the API just because it exists.

Scrutinize each "verdict" in the critique to check its work. Cite any ways in which the test that produced the verdict is faulty, if you find any. Give specific evidence (quoted spec/critique text, API behavior, OS behavior) that supports your assessment of the critique. If you are critical of the critique, then describe in detail what needs to change in the tests it poses, or errors in the logic that the critique used when evaluating those tests.

End with an overall assessment: does the critique. Does it need rework? If rework, what is the minimum viable change in the critique that would be acceptable?

Do not hedge. If the evidence points to a problem, state the problem clearly.

Proposal: https://github.com/WICG/cpu-performance/

Fetch and read the full explainer and spec, and then the attached markdown file for the critique that you yourself are scrutinizing.

</details>

Comment by @marcoscaceres Apr 2, 2026 (See Github)

Oh! I love this “assessment of the assessment”! 😍 because then it begs the question what is the correct assessment? If we have the right framing and facets, then we can get a better assessment.

My money is still that it will reach the same conclusions and suggestions. I’m on vacation for the next four days, but would love to see what comes out.

An unbiased critique going in blind would be awesome. What do you think? We could run the same prompt against Claude, co-pilot, and Gemini and coalesce the three. Then we throw humans into the mix.

Would be a neat experiment. 🔬

Comment by @marcoscaceres Apr 2, 2026 (See Github)

@domfarolino do you have your prompt still for assessing the assessment? That’s key.

Comment by @domfarolino Apr 2, 2026 (See Github)

What do you think?

Personally, I think this is getting tedious. I'd like to see a broader, meta discussion with the TAG on how we use AI to evaluate web platform proposals. Some of the questions in your input prompt are reasonable, and others aren't, but we shouldn't be litigating the evaluation criteria each time a TAG member creates an elaborate one-off prompt. If we want to do this for real, maybe the TAG can publish a few SKILL.md files and curated prompts with review criteria that the community votes on or generally agrees on, so that we have a sense of the deterministic criteria we're telling LLMs to judge with. This is as opposed to using a likely-AI-generated input prompt that we've never seen before, with a questionable pass/fail structure.

Your prompt doesn't tell Gemini to evaluate the proposal objectively, you tell it to find problems and be critical. Today, LLMs are suggestible enough to where they'll satisfy what they think their user wants more than responsibly apply the objective criteria you might have intended. It's like asking a ghost hunter to come to your house and look around. Do you think they're not going to find ghosts?

Please spend some human time evaluating this, even if it results in short, specific comments like https://github.com/w3ctag/design-reviews/issues/1198#issuecomment-4166581568, which I agree points out a very reasonable concern.

@domfarolino do you have your prompt still for assessing the assessment? That’s key.

Edited the comment to append the input prompt.

Comment by @marcoscaceres Apr 2, 2026 (See Github)

We are doing the human review. At first glance, I personally reached the same conclusions as the AI (which is why I checked my assumptions with it - which validated my early hunches), but other TAG members might reach different conclusions. Human reviews take time (4-6 weeks usually).

I still think this is a worthwhile exercise as a TAG proxy, an early AGENTS.md would be a helpful first pass. Precisely what I was eluding to. If we got it right, it could save everyone a lot of time, frustration, and yield higher quality early proposals.

Anyway, thanks for the chat. Sorry you found this tedious. You’ll hear from us in a couple of weeks.

Discussed Apr 6, 2026 (See Github)

Jeffrey: there is a lot of discussion on the design reviews that could use clarification. What points from Marcos and Jeffrey make sense? We have problems with their use cases (video sites and AI stuff where cpu measurements wont' be measuring the relevant stuff). They think that its still useful to have this, even though load may slow a computer down; it lets you serve the correct content then use compute pressure to adapt to the current load. Maybe we should ask about whether that actually works and what evidence they have that this does the right thing for sites.

Marcos: The architecture is not correct. The static model they propose, that they adjust performance dynamically based on a static snapshot is completely flawed. The use cases should be taken to the appropriate groups. For example, the media one should be taken to the media group because it may duplicate things they are already working on. So, refining all the use cases would be good. At this point, very skeptical that this is useful in its current form.

Jeffrey: Should they refine the use cases and come back?

Marcos: And reach out to the working groups, too.

Heather: What's the point of clarifying use cases if the architecture is broken?

Marcos: If they change to a dynamic measurement, and there are legit use cases, it could be useful.

Heather: Sounds like it would need to be both. Have to fix static vs dynamic architecture.

Marcos: Change wouldn't be dramatic. Field is similar, but gets events or a media query, so it can change every so often.

Jeffrey: Is the a pointer to the chrome/webkit feature?

Marcos: Will try to find where that is.

Jeffrey: I'd be happy to draft a comment and run it by Marcos.

Marcos: should we start with the questions to get them moving so they don't have to wait for the formal position?

Jeffrey: Yes. Please make sure the use cases need a CPU measurement and talk to the working groups that work on those use cases. We don't need to close or set a resolution; we'll wait for an answer to that first.

Discussed Apr 27, 2026 (See Github)

jyasskin: I captured a proposed feedback that I thought distilled them - does it work?

marcoscaceres: I think the intention is there, but let me have another read and I think it will probably be ok

jyasskin: If you approve, go ahead and post it.

Comment by @marcoscaceres Apr 29, 2026 (See Github)

Thanks for sending us this review. After looking at it in a breakout, we have several concerns that we'd like you to address before we set a final position:

First, it's not clear that the motivating use cases are served by a static measure of CPU performance. We had to guess at how "video conferencing applications" would use this API. For anything related to media decoding and encoding, measurements should reflect the available processing hardware, which often isn't inside the CPU. MediaCapabilities and possibly the isConfigSupported() methods in the WebCodecs API are close, and much better than making assumptions based on the CPU. Similarly, the choice of what neural-network models to use needs to take NPUs and GPUs into account, rather than focusing on the CPU. If these applications need to get a signal of headroom, consider extending those APIs rather than adding an API that indicates the headroom of the wrong processing unit. Without a clear user-serving use case, we shouldn't add new APIs to the platform. Even with one, please talk to the WGs that focus on those use cases, to see if there's a more-targeted API they can add, which might make a better low-level vs high-level tradeoff.

We see that the WebKit position is concerned about the choice of a static vs a dynamic value. For example, a web application may need to re-scale its processor use when it gets hot, when the user sets the device into a low-power mode, or when the user is running other processing-intensive workloads in parallel. We see some indication that the choice of a static measurement is intentional, especially in the statements that this API is meant to complement the Compute Pressure API. However, we don't see an explanation of exactly what you expect applications to do with a static measurement. We guess that it's about downloading code that's flexible enough to cover the range of performance levels that a given machine could expose, but we'd like to see some evidence that this is actually how applications will use the data, and that this up-front downloading is better for users than downloading the right code as conditions change.

We're very concerned about the spec's mention of "Select ads that are better suited for the user device." CPU performance correlates with disposable income, but users can be harmed by exposing that to advertisers. Exposing this may be inescapable (e.g. via benchmarks), but API designers should try to obscure it, not treat it as a feature.

We also want to make sure that the top-level page has control over which iframes can get this information.

We look forward to your responses. Thanks!

Comment by @marcoscaceres May 6, 2026 (See Github)

Reopening for clarification on the following:

Why a CPU measurement, not existing targeted APIs? The motivating use cases (video conferencing, ML model selection) depend on GPUs/NPUs, not CPU. Why not extend MediaCapabilities or WebCodecs isConfigSupported() instead? Have you talked to those WGs?
What do apps actually do with a static number? What's the concrete application behavior? Is it "download flexible code up front"? Where's the evidence that's how apps will use it, and that it's better for users than adapting dynamically?

Comment by @marcoscaceres Jun 8, 2026 (See Github)

@nickie or @domfarolino, just following up on the above questions? ☝️

Comment by @nickie Jun 18, 2026 (See Github)

Reopening for clarification on the following:

Why a CPU measurement, not existing targeted APIs? The motivating use cases (video conferencing, ML model selection) depend on GPUs/NPUs, not CPU. Why not extend MediaCapabilities or WebCodecs isConfigSupported() instead? Have you talked to those WGs?

This API is about CPU performance, not GPU/NPU or anything else. It is not about codecs nor is it related to other special-purpose APIs. I think that all this is clearly stated in the explainer and the spec. Whether this API is useful for video conferencing or ML applications depends a lot on the particular application and how much CPU performance affects it. Communications with interested partners and the feedback that we have received (some linked above) suggest that it is useful and that is why we propose its adoption.

What do apps actually do with a static number? What's the concrete application behavior? Is it "download flexible code up front"? Where's the evidence that's how apps will use it, and that it's better for users than adapting dynamically?

While both static and dynamic information are useful, the CPU Performance API is designed to be a proxy for the user’s device hardware class. That is, a powerful device shouldn't be exposed as a low-tier one just because other heavy applications are running, or the battery is low. This stability is important so that when an app has to make up-front decisions that depend on the user's hardware, it can be right most of the time. This includes "download flexible code up front" as you mentioned, and other concrete use cases in the explainer below.

In particular, web applications may want to use performance information to:

Control non-essential tasks and requests; e.g., allow or block 3rd party scripts, use or avoid heavy libraries.

Adjust the complexity of web content; e.g., the resolution and format for images and video, the compression level for uploading data, enable or disable computationally heavy operations such as animations, improve resource management (lazy loading, prefetching, prerendering).

Improve real user monitoring; e.g., better understand if users have faster or slower devices, focus development effort more appropriately.

Run computations on the client side vs. on the server side; e.g., use server-side rendering, run AI applications and LLMs on the client side.

Select ads that are better suited for the user device.

If necessary, this application can also obtain dynamic information through a complementary API like Compute Pressure. But obtaining dynamic information is something that a web application is generally able to do on its own, while there’s no reliable way for a web application to obtain static information about the user device hardware class, so it can make better up-front decisions. In this way, the API solves the immediate use case of several interested partners.

Coming to the last point about ads, which as I can see in Marcos's comment may have been misinterpreted as "better user tracking for ads", this was never the intention of this API and, as is obvious from the spec, we tried as hard as possible to limit the use of this API for fingerprinting.

Insofar as ads may contain videos or other computationally expensive material, then without knowledge of the device's performance tier, ads may try to provide the highest-performance experience, occasionally overwhelming the device's capabilities and providing a worse user experience. This API gives all vendors of content a chance at tailoring their experience to the device's capabilities.

Comment by @RByers Jun 20, 2026 (See Github)

We're discussing the blink I2S for this. For what it's worth I am convinced by the need for a static API (in addition to compute pressure) as it solves a different problem. For example, how should a video player decide whether to enable optional extra UI features (like the dynamic glow around youtube videos) while minimizing the risk of overloading the CPU and causing jank? Perhaps the most obvious case are games. AFAIK it's entirely standard practice for a game to configure default aspects of it's rendering settings based on the hardware capabilities (eg. draw distance or AI complexity) rather than try to predict and adapt dynamically.

I don't understand why we wouldn't want to default to trusting the developers of advanced web apps like from Adobe and Figma who have said they need this. What such rich app developers really WANT (and get on other platforms) is full hardware details for every aspect of the device so they can be empowered to create the best possible user experience. Instead all we're talking about here is giving them a couple bits of information about the power for the CPU (which can be largely inferred if you're willing to waste a lot of CPU cycles). This seems like an incredibly pragmatic and webby tradeoff to me. Is the TAG taking the position that it knows better than these developers about the value of trying to solve such a problem?

one thing we're skeptical about is the idea of leaving the buckets implementation-defined. Can you give us a rough algorithm that might be interoperable?

This is a GREAT question. I am also very unsure about this. Does the TAG have any guidance on where on the spectrum it's better to fall between leaving such heuristics implementation defined, including a full reference implementation (demo) normatively or non-normatively, or putting some text in the spec to cover most of the algorithm while leaving heuristics tweaks to impl-defined?

Comment by @marcoscaceres Jun 21, 2026 (See Github)

Have you reached out to the relevant working groups with the questions above? The TAG can only give general architectural guidance but doesn’t have the expertise in any particular area.

Comment by @yoavweiss Jun 21, 2026 (See Github)

The API was discussed at the Web Performance WG where we talked about the compat/interop aspects of the buckets at length. We haven't talked about the prospects of an independent reference implementation though. We'd be happy to have such a discussion if that's helpful.

Comment by @marcoscaceres Jun 21, 2026 (See Github)

Great! Thanks @yoavweiss. I’ve not seen any discussion with the Media WG though.

There’s still a general “our partners want this” tone to the discussion that is bothersome. I get that in good hands this API could make a difference. The case that proponents seem to be glossing over is what happens in the mis/abuse cases.

Comment by @RByers Jun 22, 2026 (See Github)

The case that proponents seem to be glossing over is what happens in the mis/abuse cases.

Can you clarify what sort of misuse / abuse you anticipate? I don't see abuse discussed above at all really. Or do you just mean fingerprinting?

One potential misuse I was concerned with was where a site uses this feature poorly to degrade functionality unnecessarily for some class of users (i.e. accidentally due to insufficient testing), or to make a tradeoff that may or may not be aligned with the user's interests without giving them input (eg. decreasing resolution to reduce the risk of dropped frames). Of course that's a general concern with app design which browsers can't solve completely, but I think we all feel it's important that the web can be consumed in any way that people choose. To me that's a reason to prefer having such an API vs. leaving sites to make such tradeoffs themselves without assistance from browser APIs. I think the spec is silent on this point, but In Chrome we've implemented this with the following settings UI:

Perhaps the spec should comment on this? If this UI gets used much at all in practice, then I could imagine investing more to make it a site-specific setting if users say they'd prefer that.

Comment by @tunetheweb Jun 22, 2026 (See Github)

I think the spec is silent on this point, but In Chrome we've implemented this with the following settings UI: ... Perhaps the spec should comment on this? If this UI gets used much at all in practice, then I could imagine investing more to make it a site-specific setting if users say they'd prefer that.

To me the better option here is where choices are made but overridable by the site's app UI, rather than on a per-site case in the browser's UI. Think YouTube auto-selecting a certain bandwidth based on your network speed, but the user having the ability to change that up to 4k if they really want to, despite the app thinking it might struggle with that.

Maybe such non-normative suggestion could be added to the spec as you say?

Comment by @RByers Jun 22, 2026 (See Github)

To me the better option here is where choices are made but overridable by the site's app UI, rather than on a per-site case in the browser's UI. Think YouTube auto-selecting a certain bandwidth based on your network speed, but the user having the ability to change that up to 4k if they really want to, despite the app thinking it might struggle with that.

Oh 100% I agree. I was just speculating on where Marcos's concern about "site misuse" might come from, and sites failing to provide such an override UI is one possible misuse I could imagine. I've filed this issue on the spec, let's try to align there?

Discussed Jul 6, 2026 (See Github)

Dan: They are asking to ship this in Chromium, thus the flurry of feedback.

Marcos: I will re-engage with this.