#1198: Incubation: CPU Performance API

Visit on Github

Opened Feb 19, 2026

Explainer

https://github.com/WICG/cpu-performance/

The explainer

Where and by whom is the work is being done?

  • GitHub repo: https://github.com/WICG/cpu-performance/
  • Primary contacts:
    • Nikolaos Papaspyrou (@nickie), Google, author
    • Dominic Farolino (@domfarolino), Google, spec mentor
    • Yoav Weiss (@yoavweiss), Shopify, W3C Web Performance WG co-chair
  • Organization/project driving the design: Google Chrome
  • This work is being funded by: Google
  • Incubation and standards groups that have discussed the design:
    • Web Performance WG, TPAC 2025, minutes
  • Standards group(s) that you expect to discuss and/or adopt this work when it's ready: Web Performance

Feedback so far

You should also know that...

Specification URL: https://wicg.github.io/cpu-performance/

<!-- Content below this is maintained by @w3c-tag-bot -->

Track conversations at https://tag-github-bot.w3.org/gh/w3ctag/design-reviews/1198

Discussions

Log in to see TAG-private discussions.

Comment by @nickie Feb 24, 2026 (See Github)

Just noting that the specification for this API is ready for review.

Discussed Mar 16, 2026 (See Github)

Jeffrey: I've left some thoughts, but we should discuss.

Marcos: This triggers the same concerns as DAS; it feels a bit self-damaging. Will review again to see if it's different from what's been proposed in the past.

Jeffrey: The approach with buckets helps, because you can always run a benchmark and get detail on the CPU's performance. They propose that the buckets be implementation-defined, and that might be a problem.

Marcos: I will do a first-pass reaction, not a position, in the brainstorming repo, and we can go from there.

Discussed Mar 30, 2026 (See Github)

Marcos: Similiar concerns that I raised with the DAS specs. This feels very foot-gunny. There isn't enough context to make these adjustments. The challenge for this spec would be to prove you can do this that is not foot-gunny when all other previous attempts have not ended well or proven useful.

Jeffrey: Can you be more specific about how the developer will shoot themselves in the foot?

Marcos: It makes an assumption about my CPU but in 5 years, those assumptions won't be valid. The developer will degrade the user experience based on information that doesn't apply.

Jeffrey: They have addressed that part. The buckets have a stable meaning; bucket 4 will always mean the same thing, but in five years there may be 8 or more buckets.

Marcos: Do I make a determination about performance over time? If it's point in time, I could be runing a video or AI job that will skew the numbers.

Jeffrey: Sounds like we should wait for a week and you'll post your comments.

Marcos: OK.

Brian: The scaling factor they have in here is going to be tricky because it's allocation specific. So, the tiers don't have specific meaning and they can't over time. Seems very impossible to have this develop well over time. Is there any prior art that does this well?

Jeffrey: In simpler cases, yes: memory buckets. But that's simpler than GPU/CPU performance. The definition cannot be implementation defined; websites can't guess what the implementation has decided. We might be able to say we have consensus that they need to make the buckets not implementation defined and come back to us with a definition of the buckets. That might make the review easier.

Brian: I don't know if that's my take. I don't see how these can have arbitrary meaning that is left entirely to implementations, that will change over time, and developers are just supposed to make good decisions that will evolve over time.

Jeffrey: It is not supposed to change over time. This machine is in bucket N forever.

Brian: This machine is, but the scale is not. When you get to "unknown", will that be because it's too old or too new? I can't imagine how you do this. The message I want to send is more than "define this".

Jeffrey: How about we post: "We haven't finished discussing this, but one thing we're skeptical about is the idea of leaving the buckets implementation-defined. Can you give us a rough algorithm that might be interoperable?"

Comment by @jyasskin Apr 1, 2026 (See Github)

We haven't finished discussing this, but one thing we're skeptical about is the idea of leaving the buckets implementation-defined. Can you give us a rough algorithm that might be interoperable?

Comment by @marcoscaceres Apr 1, 2026 (See Github)

@nickie, just wondering if you've fed this to Gemini for a review?... it confirmed a lot of concerns for me personally that I had with the proposal.

Here's a prompt. Hope that helps in the meantime while we do human reviews.

<details> # Gemini prompt: `navigator.cpuPerformance` self-review

You are a critical reviewer of web platform proposals. Your job is to find real problems, not to validate the proposal. Do not assume good design because the proposal exists. Apply the following tests rigorously.

Proposal: https://github.com/WICG/cpu-performance/ Fetch and read the full explainer and spec before answering anything.


Test 1: Use-case coverage by existing APIs

For each use case listed in the proposal, determine whether an existing W3C or WHATWG API already provides a more direct signal. Specifically:

  • For video/media use cases: evaluate MediaCapabilities.decodingInfo() and VideoDecoder.isConfigSupported() (WebCodecs). Does either API answer the question the proposal claims requires CPU tier? Quote the relevant spec text for each.
  • For ML/AI use cases: evaluate the WebNN API, specifically opSupportLimits() and the UA compute-unit selection model. Does WebNN already address on-device vs. server-side inference decisions without requiring CPU tier? Quote the spec.
  • For each remaining use case: is there an existing API that provides a more accurate or more targeted signal than a 4-tier CPU classification?

Conclude: which use cases, if any, are genuinely unmet by existing APIs?


Test 2: The reproducibility claim vs. OS reality

The spec requires the same device to always return the same tier regardless of current system state ("reproducibility: independent of current load").

Evaluate this claim against:

  • Thermal throttling behaviour on mobile SoCs (ARM big.LITTLE, Apple M-series, Qualcomm Snapdragon)
  • OS power profiles (Windows Balanced/Performance/Power Saver, Android Doze, iOS Low Power Mode)
  • Battery state effects on CPU boost duration
  • Background process load (OS indexing, antivirus, system updates)

Is "reproducibility independent of current load" achievable in practice? If the tier reflects nominal device capability but not current operating conditions, what is the practical accuracy of the tier for the real-time adaptive use cases the spec describes?


Test 3: Privacy and fingerprinting surface

The spec says it was designed with privacy in mind. Evaluate:

  • What is the information gain from navigator.cpuPerformance when combined with navigator.hardwareConcurrency, navigator.deviceMemory, and WebGL renderer strings? Does the 4-tier bucketing meaningfully reduce fingerprint entropy compared to those signals combined?
  • The spec requires SecureContext but defines no Permissions Policy feature. What does this mean for cross-origin iframes and third-party scripts?
  • The tier is stable by design ("no reclassification"). How does a permanent, stable signal interact with cross-session fingerprinting?
  • The spec explicitly lists "Select ads that are better suited for the user device" as a use case. Tier 1 devices correlate with lower-income users. Is there a consent mechanism for this use? Should there be?

Test 4: Abstraction level

Chrome internally classifies devices into performance tiers for rendering heuristics (compositor thread budgets, animation scheduling, etc.). Safari does the same.

  • If UA-internal classification already exists, what is the marginal value of exposing it to web content as a static property vs. as a dynamic media query (e.g. @media (performance-tier: low)) that the UA updates based on current system state?
  • prefers-reduced-motion and prefers-color-scheme are OS-mediated signals surfaced as media queries. What are the tradeoffs between that model and the navigator.cpuPerformance model for this use case?

Output format

For each test:

  1. State your verdict: passes / fails / partially passes
  2. Give the specific evidence (quoted spec text, API behavior, OS behavior) that supports the verdict
  3. If it fails, state what would need to change for it to pass

End with an overall assessment: does the proposal justify shipping as specified, or does it need rework? If rework, what is the minimum viable change set?

Do not hedge. If the evidence points to a problem, state the problem clearly.

</details>
Comment by @domfarolino Apr 1, 2026 (See Github)

Can you share the output of your Gemini prompt? I'm surprised the prompt reduces things a "pass/fail" verdict, when all of the questions in each test are more discussion oriented, and not binary.

Comment by @marcoscaceres Apr 1, 2026 (See Github)

Sure (Gemini 3)... ran it a few times, gets the same result:

<details> # CPU Performance API - Critical Review

Test 1: Use-case coverage by existing APIs

  1. Verdict: Fails
  2. Evidence:
    • Video/media use cases: The proposal claims a static CPU tier is needed to determine "the resolution and format for images and video" and to "decide if advanced video effects can be rendered." However, the W3C Media Capabilities API already explicitly solves this. Calling navigator.mediaCapabilities.decodingInfo(configuration) resolves a promise with a MediaCapabilitiesDecodingInfo object containing supported, smooth, and powerEfficient boolean flags. Similarly, WebCodecs provides VideoDecoder.isConfigSupported(config) to evaluate specific decoding constraints. Both APIs actively answer whether a device can handle 1080p, 720p, etc., at a given framerate—making a generalized 1-4 CPU tier proxy unnecessary and strictly less accurate.
    • ML/AI use cases: The proposal claims applications need the tier to "decide whether to attempt running AI tasks locally or delegate to the server." The WebNN API directly addresses this by exposing device-specific hardware acceleration. Developers can request specific hardware via navigator.ml.createContext({ deviceType: 'gpu' | 'npu' | 'cpu' }) and query exact mathematical and tensor limitations via opSupportLimits(). WebNN allows applications to probe precise capabilities rather than guessing based on a coarse CPU bucket.
    • Other use cases: The explainer lists "Control non-essential tasks... use or avoid heavy libraries" and "Improve real user monitoring." navigator.hardwareConcurrency and navigator.deviceMemory already provide static heuristics for device class, while the Long Tasks API and Interaction to Next Paint (INP) provide actual measurements of main-thread contention.
  3. What needs to change: The proposal must articulate distinct use cases that definitively cannot be solved by MediaCapabilities for media, WebNN/WebGPU for compute, or PerformanceObserver for monitoring. Currently, it solves no unique problems.

Test 2: The reproducibility claim vs. OS reality

  1. Verdict: Fails
  2. Evidence:
    • Spec Requirement: Section "Computing Performance Tier Value" mandates reproducibility: "An implementation should always report the same performance tier for the same user device... The reported performance tier should not depend on the user device's current load or utilization".
    • OS/Hardware Reality: Modern mobile SoCs (Apple A/M-series, Snapdragon, ARM big.LITTLE) aggressively thermally throttle under sustained load. A device capable of "Tier 4" burst performance can drop to "Tier 2" sustained performance within minutes of running a heavy video conferencing app.
    • Furthermore, OS power profiles (Windows Battery Saver, Android Doze, iOS Low Power Mode) actively cap CPU frequencies and disable high-performance cores to save battery. If an OS caps a "Tier 4" processor to save battery, but navigator.cpuPerformance rigidly reports 4, the web app will blindly schedule heavy background blur and 1080p decoding. This actively undermines the OS's power management, destroys battery life, and causes severe frame drops.
  3. What needs to change: "Reproducibility independent of current load" is a fundamentally flawed concept for real-time adaptation. To be useful, any performance signal must reflect current OS constraints (thermal throttling, low-power modes) rather than nominal, factory-state theoretical maximums.

Test 3: Privacy and fingerprinting surface

  1. Verdict: Fails
  2. Evidence:
    • Information gain: The explainer asserts the API mitigates privacy risks by mapping devices to a small number of tiers. However, when combined with navigator.hardwareConcurrency (core count) and navigator.deviceMemory (RAM), a 4-tier CPU bucket provides a highly specific cross-section that increases fingerprint entropy. A device with 4 cores and 8GB RAM could be isolated into distinct SoC families just by adding the cpuPerformance tier.
    • Permissions Policy & 3rd Parties: Section 14 of the security questionnaire states: "The API behaves identically in first-party and third-party contexts... allowing third-party content (such as embedded video players) to adapt." Because the spec defines no Permissions Policy feature, tracking pixels and cross-origin iframes get passive access to a stable device hardware fingerprint.
    • Cross-session stability: The spec explicitly requires tiers to be stable and immutable over years ("implementations should not redefine tiers"). A permanent, unchangeable device attribute is the exact definition of a high-value fingerprinting vector.
    • Ad Targeting: The proposal explicitly lists "Select ads that are better suited for the user device" as a usecase. Exposing hardware class for ad targeting correlates heavily with socioeconomic status (flagship phone vs. budget device) without any user consent mechanism.
  3. What needs to change: The API must be gated behind a Permissions Policy (disallowed in cross-origin iframes by default). The ad-targeting usecase should be removed, as the web platform should not provide socioeconomic proxies for ad networks without explicit consent.

Test 4: Abstraction level

  1. Verdict: Fails
  2. Evidence:
    • The proposal exposes a static JavaScript property navigator.cpuPerformance that returns an integer. This forces developers to write rigid, imperative heuristics (as seen in the explainer's switch (navigator.cpuPerformance) example).
    • This compares poorly to OS-mediated signals like prefers-reduced-motion or prefers-color-scheme, which are exposed as CSS Media Queries and can change dynamically based on OS state.
    • If the goal is to disable heavy effects or animations on low-end devices, a dynamic media query (e.g., @media (prefers-reduced-performance) or @media (performance-tier: low)) would be vastly superior. It would automatically adapt to OS Battery Saver modes, allow declarative CSS degradation without requiring JavaScript, and avoid exposing a static hardware identifier to tracking scripts.
  3. What needs to change: Shift the abstraction from a static, imperative JS hardware classification to a dynamic, OS-mediated state signal (like a media query) that respects current system conditions and user preferences.

Overall Assessment

The proposal does not justify shipping as specified and requires a complete conceptual rework.

By attempting to provide a static, load-independent hardware taxonomy, the API paradoxically becomes useless for its own stated use cases (adapting to device capabilities), which are already better served by MediaCapabilities and WebNN. Furthermore, its insistence on immutable tiers creates a permanent fingerprinting vector that can be exploited by third-party frames for profiling and ad-targeting.

Minimum viable change set for rework:

  1. Abandon the static, load-independent navigator.cpuPerformance property.
  2. Pivot to a dynamic, OS-backed signal (e.g., a prefers-reduced-performance media query) that reflects the device's current operational reality (accounting for low power modes, user preferences, and thermal throttling).
  3. Ensure this signal is designed strictly for adaptive degradation, stripping out the static taxonomy that enables cross-session fingerprinting and ad targeting.
</details>
Comment by @domfarolino Apr 2, 2026 (See Github)

Thanks. I fed it to Gemini so it could scrutinize the AI-generated critique you provided above. Let me know what you think!

<details> As an expert reviewer, I have scrutinized the provided critique of the **CPU Performance API** proposal. While the critique raises common "standard" objections to new web APIs (privacy, fingerprinting, and redundancy), it contains several logical flaws, misrepresentations of existing APIs, and a failure to account for the specific architectural goals of the proposal.

The following is a critical assessment of the critique's "verdicts."

Test 1: Use-case coverage by existing APIs

Verdict in Critique: Fails My Assessment: Faulty. The critique conflates capability with performance class.

  • Error regarding MediaCapabilities: The critique argues that MediaCapabilities answers the same question. It does not. MediaCapabilities tells a developer if a specific codec/resolution can be decoded smoothly. It does not tell the developer how much headroom remains for concurrent non-media tasks (e.g., real-time background blur, physics engines, or complex React re-renders). A "Tier 4" device and a "Tier 2" device might both report smooth: true for 1080p, but the Tier 2 device might hit 90% CPU to do it, while the Tier 4 device hits 10%.
  • Error regarding WebNN: The critique claims WebNN's opSupportLimits() solves the "local vs. server" decision. This is incorrect. WebNN identifies if a hardware operation is supported by an accelerator; it does not provide a holistic "system class" signal. A developer may want to avoid local inference entirely on a low-tier device to preserve battery or UI responsiveness, regardless of whether the hardware can technically run the math.
  • Logical Gap: The critique ignores the "Real User Monitoring" (RUM) use case. Aggregating performance metrics by CPU tier allows developers to identify if a regression is "global" or limited to "budget devices." navigator.hardwareConcurrency is an unreliable proxy for this because a 4-core Intel Atom performs vastly differently than a 4-core Apple M4.

Test 2: The reproducibility claim vs. OS reality

Verdict in Critique: Fails My Assessment: Partially Correct, but logically misaligned.

  • The "Nominal vs. Current" Fallacy: The critique argues the API fails because it doesn't reflect thermal throttling. This is a misunderstanding of the proposal's Goal 5: Respect obsolete hardware and applications. The proposal explicitly seeks a static baseline.
  • Conflict of Intent: The critique suggests the API should be dynamic. However, the proposal specifically positions itself as a complement to the Compute Pressure API. The CPU Performance API provides the "Engine Displacement" (static capacity), while Compute Pressure provides the "Tachometer" (current load). Critiquing the static API for not being dynamic is like critiquing a ruler for not being a stopwatch.

Test 3: Privacy and fingerprinting surface

Verdict in Critique: Fails My Assessment: Exaggerated. * Information Gain: The critique claims a 4-tier bucket increases entropy significantly. In reality, the proposal's goal is to ensure each bucket contains "a fairly large number of devices... both as an absolute number and as distinct device models".

  • The "Benchmarking" Reality: The critique ignores that any site can currently calculate a CPU tier using a 100ms JavaScript loop (as noted in the "Alternatives Considered" section of the explainer). By providing a coarse, 4-tier browser-mediated signal, the API actually reduces the need for invasive, high-entropy active benchmarking by scripts.
  • Ad-Targeting Critique: This is a subjective policy argument rather than a technical failure. Socioeconomic correlation exists in almost all hardware signals (deviceMemory, screen resolution, etc.).

Test 4: Abstraction level

Verdict in Critique: Fails My Assessment: Logical Error.

  • Imperative vs. Declarative: The critique argues for a media query (e.g., @media (performance-tier: low)). While elegant for CSS, this does not solve the use cases for JavaScript-heavy applications (e.g., Figma, Google Meet, or WebGL games) that need to make branching logic decisions before downloading large WASM binaries or assets. A media query cannot help a developer decide whether to fetch a 50MB "Ultra" texture pack or a 5MB "Low" pack.

Overall Assessment

The critique needs rework. It adopts a "zero-trust" posture that ignores the proposal's explicit design goals and its relationship with the Compute Pressure API. It incorrectly frames the API as a "failed" version of a dynamic sensor, when it is intended to be a static hardware classifier.

Minimum Viable Changes for the Critique:

  1. Acknowledge the Compute Pressure API: The critique must evaluate the proposal in tandem with Compute Pressure, rather than demanding the CPU Performance API handle dynamic load signals itself.
  2. Address the "Static vs. Dynamic" Tradeoff: Instead of declaring "failure," the critique should discuss why a static signal is or is not better than a dynamic media query for the specific "asset-loading" use cases.
  3. Correct API Misrepresentations: Remove the claim that MediaCapabilities provides the same "system headroom" signal; it does not.
</details> <details> <summary>Edit: Input prompt</summary>

You are an expert reviewer API of web platform API proposals. The attached file is an initial prompt fed to an AI, telling it to review critically the CPU performance API proposal by Google, followed by the critical review itself.

Your job is to find real problems with the critique of the proposal, not to validate it. Do not assume the critique is accurate or substantive, or expresses legitimate concerns about the API just because it exists.

Scrutinize each "verdict" in the critique to check its work. Cite any ways in which the test that produced the verdict is faulty, if you find any. Give specific evidence (quoted spec/critique text, API behavior, OS behavior) that supports your assessment of the critique. If you are critical of the critique, then describe in detail what needs to change in the tests it poses, or errors in the logic that the critique used when evaluating those tests.

End with an overall assessment: does the critique. Does it need rework? If rework, what is the minimum viable change in the critique that would be acceptable?

Do not hedge. If the evidence points to a problem, state the problem clearly.

Proposal: https://github.com/WICG/cpu-performance/

Fetch and read the full explainer and spec, and then the attached markdown file for the critique that you yourself are scrutinizing.

</details>
Comment by @marcoscaceres Apr 2, 2026 (See Github)

Oh! I love this “assessment of the assessment”! 😍 because then it begs the question what is the correct assessment? If we have the right framing and facets, then we can get a better assessment.

My money is still that it will reach the same conclusions and suggestions. I’m on vacation for the next four days, but would love to see what comes out.

An unbiased critique going in blind would be awesome. What do you think? We could run the same prompt against Claude, co-pilot, and Gemini and coalesce the three. Then we throw humans into the mix.

Would be a neat experiment. 🔬

Comment by @marcoscaceres Apr 2, 2026 (See Github)

@domfarolino do you have your prompt still for assessing the assessment? That’s key.

Comment by @domfarolino Apr 2, 2026 (See Github)

What do you think?

Personally, I think this is getting tedious. I'd like to see a broader, meta discussion with the TAG on how we use AI to evaluate web platform proposals. Some of the questions in your input prompt are reasonable, and others aren't, but we shouldn't be litigating the evaluation criteria each time a TAG member creates an elaborate one-off prompt. If we want to do this for real, maybe the TAG can publish a few SKILL.md files and curated prompts with review criteria that the community votes on or generally agrees on, so that we have a sense of the deterministic criteria we're telling LLMs to judge with. This is as opposed to using a likely-AI-generated input prompt that we've never seen before, with a questionable pass/fail structure.

Your prompt doesn't tell Gemini to evaluate the proposal objectively, you tell it to find problems and be critical. Today, LLMs are suggestible enough to where they'll satisfy what they think their user wants more than responsibly apply the objective criteria you might have intended. It's like asking a ghost hunter to come to your house and look around. Do you think they're not going to find ghosts?

Please spend some human time evaluating this, even if it results in short, specific comments like https://github.com/w3ctag/design-reviews/issues/1198#issuecomment-4166581568, which I agree points out a very reasonable concern.


@domfarolino do you have your prompt still for assessing the assessment? That’s key.

Edited the comment to append the input prompt.

Comment by @marcoscaceres Apr 2, 2026 (See Github)

We are doing the human review. At first glance, I personally reached the same conclusions as the AI (which is why I checked my assumptions with it - which validated my early hunches), but other TAG members might reach different conclusions. Human reviews take time (4-6 weeks usually).

I still think this is a worthwhile exercise as a TAG proxy, an early AGENTS.md would be a helpful first pass. Precisely what I was eluding to. If we got it right, it could save everyone a lot of time, frustration, and yield higher quality early proposals.

Anyway, thanks for the chat. Sorry you found this tedious. You’ll hear from us in a couple of weeks.

Discussed Apr 6, 2026 (See Github)

Jeffrey: there is a lot of discussion on the design reviews that could use clarification. What points from Marcos and Jeffrey make sense? We have problems with their use cases (video sites and AI stuff where cpu measurements wont' be measuring the relevant stuff). They think that its still useful to have this, even though load may slow a computer down; it lets you serve the correct content then use compute pressure to adapt to the current load. Maybe we should ask about whether that actually works and what evidence they have that this does the right thing for sites.

Marcos: The architecture is not correct. The static model they propose, that they adjust performance dynamically based on a static snapshot is completely flawed. The use cases should be taken to the appropriate groups. For example, the media one should be taken to the media group because it may duplicate things they are already working on. So, refining all the use cases would be good. At this point, very skeptical that this is useful in its current form.

Jeffrey: Should they refine the use cases and come back?

Marcos: And reach out to the working groups, too.

Heather: What's the point of clarifying use cases if the architecture is broken?

Marcos: If they change to a dynamic measurement, and there are legit use cases, it could be useful.

Heather: Sounds like it would need to be both. Have to fix static vs dynamic architecture.

Marcos: Change wouldn't be dramatic. Field is similar, but gets events or a media query, so it can change every so often.

Jeffrey: Is the a pointer to the chrome/webkit feature?

Marcos: Will try to find where that is.

Jeffrey: I'd be happy to draft a comment and run it by Marcos.

Marcos: should we start with the questions to get them moving so they don't have to wait for the formal position?

Jeffrey: Yes. Please make sure the use cases need a CPU measurement and talk to the working groups that work on those use cases. We don't need to close or set a resolution; we'll wait for an answer to that first.