#1071: Private Proof API

Visit on Github.

Opened Mar 19, 2025

こんにちは TAG-さん!

I'm requesting an early TAG design review of Private Proof API.

This API uses Zero-Knowledge Proofs (ZKPs) to allow analysis of potentially identifiable signals while providing only a limited verdict output. For example, it empowers anti-fraud services to verify whether a user possesses an unmodified stored timestamp older than some provided timestamp without disclosing any additional user data. This approach strikes a balance between user privacy and anti-fraud capabilities by enabling websites to request a reputation signal (such as profile age) on which the user agent can enforce meaningful privacy constraints, while making the signal useful enough to remove the need for other burdensome or invasive checks, and allowing the user to clear said signal at will.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): Anti-Fraud CG
  • The group where standardization of this work is intended to be done ("unknown" if not known): Anti-Fraud CG
  • This work is being funded by: Google Chrome

Discussions

Log in to see TAG-private discussions.

Comment by @martinthomson Mar 27, 2025 (See Github)

See also #780 for something that is approximately the same shape.

Discussed Mar 31, 2025 (See Github)

Martin: Assuming familiarity with Private State Tokens: site decides you're trustworthy, and you can take those to another website, which can consume the tokens. This is basically the same, but the decision the website takes about trust isn't a decision. Instead, when it issues the original tokens, it saves the current time. Then when it comes back, the website can ask the browser if the saved time is prior to a specified time. Doesn't know where it created the signal or any other information other than the browser having been seen at least X minutes ago. If you're running an anti-fraud system, you rely on a long history of engagement. If you have a long history, you're more trustworthy. Can use this as a sorting function between likely-safe and unknown visitors.

Martin: This removes some of the problems from the Private State Token solution, because it passes arbitrary information. 1 bit could mean anything, like "lives in particular geography". Apple system said "person has bought an Apple device". Problem with PST is the user doesn't know what the signal is. In Private Proof, you know what the signal is. Has the other problems of PSTs: you have to limit the number of entities that can use the API, or else it becomes a great fingerprint. Needs to be a system of accountability, and that system isn't clear. Proponents suggest entropy limits == cap on number of sites that can use it in a context. Works, but encourages sites to use services with the oldest user accounts, which creates a centralizing incumbency advantage. Can't rely on a diversity of antifraud providers. They don't have a solution yet. Probably on balance it's ok if those problems can be worked through.

Jeffrey: Sounds like a set of questions for the proponents rather than a conclusion yet.

Martin to draft a comment with questions, and we'll run it by the rest of the TAG.

Lots of questions: https://github.com/w3ctag/design-reviews-private-brainstorming/issues/131#issuecomment-2770984939

Discussed Apr 14, 2025 (See Github)

Hadley: this is with me, and I need to look at it. Let's come back to it in a later meeting.

Discussed May 19, 2025 (See Github)

Hadley: agree with Martin’s draft comment. Numbering the points would make it easier to engaged with.

Martin: I will post and re-number my points.

we agree to post martin’s comment .. and posted

Discussed May 19, 2025 (See Github)

Hadley: agree with Martin’s draft comment. Numbering the points would make it easier to engaged with.

Martin: I will post and re-number my points.

we agree to post martin’s comment .. and posted

Comment by @martinthomson May 22, 2025 (See Github)

This proposal is substantially similar to the Private State Token API (https://wicg.github.io/trust-token-api/, https://github.com/w3ctag/design-reviews/issues/780), but it seeks to address one of the key privacy concerns with that approach.

Private State Tokens allowed a site to write a bit1 when it was in a top-level or same-site context and read that bit when it is embedded in a cross-site context. The major problem with that from a privacy perspective is that no one really got to know what the bit meant. So while the intent was that this was supposed to signal that the site considered the user to be trusted2.

The change here is to move to a single bit again, but strictly limit what that bit can mean. In the same-site context, a site is able to sign the current time. That genuinely carries a single bit into the browser that the site cannot control. In the cross-site context, the site can then ask the browser for proof that the time it previously signed was before a time of its choice. The site then either receives that proof or not, providing it with a single bit. The site has some control, because they can choose how old the original signature was, but they don't get anything more than that.

For anti-abuse purposes, this is useful because it allows sites to call out to an anti-abuse provider and separate potential abusive visitors into two groups: those with a history longer than some predefined age, and those without. Those with a long history are unable to spoof that (with caveats), which allows the site to (at their discretion) focus their abuse protection mechanisms more closely on those without a long history.

This strikes a balance between the anti-abuse needs and privacy, for sure. Whether it strikes an appropriate balance depends on several factors, which we'd like to see more detail on:

  1. One of the major drawbacks of Private State Tokens is the natural bias toward centralization. If you don't limit the number of entities that can supply tokens, the API is basically a fingerprint-creation API. This proposal is no different in that regard. In particular, this creates a tendency for sites to partner with anti-abuse vendors who have been around longer and those who have been exposed to more users. The proposal does not appear to (currently) have robust safeguards in place for that bias. We'd like to see some more analysis about what the options might be here. The explainer section on side channels is not really adequate to understand the full privacy implications of this.

  2. The idea of issuer "fungibility" explored in the explainer seems impractical as described, but the general idea that the choice of issuer is not revealed through the proof is worth exploring as a potential mitigation for the natural centralization tendency in the API. We understand that this has a range of challenges -- it isn't clear what would incentivize the necessary cooperation, it requires new and potentially different cryptography, plus it might have other centralization biases -- but we want to encourage further investigation along these lines.

  3. The underlying cryptography includes rate limiting measures, which allows sites to specify a maximum number of times that their signature can be used in a given time period. This exists to prevent a user from sharing the signature they receive with others. There is discussion in the explainer of putting these rate limiting parameters in a .well-known location on a site, but this seems to be subject to change, which might enable abuse. We'd like to see this more clearly specified, with clear guardrails on use so that they cannot be abused. This might require some analysis to support the choices made.

  4. We'd like to see some consideration given to device portability of the underlying keying material that is used. Does your trustworthiness reset if you get a new phone? (This might be a desirable property because it means that there are more reasons that someone might appear too "new", more below.)

  5. It's not clear that it is safe to expose enrollment in this system in a cross-site context without revealing additional fingerprinting bits.

  6. Given the high computation cost involved in producing the bit, what safeguards can be put in place by user agents to ensure that this does not lead to a different form of abuse? Particularly on battery-powered devices that tend to have less computation resources.

  7. Some work clearly needs to be done to safeguard against revealing side channels. Is it not possible to unconditionally perform the computation and return an invalid value if no token exists? This also applies to the hasToken() API, which leaks a fingerprinting bit.

  8. The API involves a fetch for all invocations, which seems unnecessary. Though both generation of signatures and validation of proofs might need to occur on servers, imperative JavaScript APIs to manage the necessary interactions are sufficient.

  9. Other similar APIs have included a non-trivial base rate of forced errors, even for users who have valid tokens. This is not so much to provide a measure of privacy, though it could provide a small measure of differential privacy, but more to ensure that sites do not become too dependent on the API being able to produce valid proofs. This forces sites to provide a reasonable user experience for people who cannot satisfy their challenge. After all, this API encodes a bias against new or younger users. What, if anything, do you plan to do to encourage sites to be responsible in that regard.

Footnotes

  1. In practice, allowances in the API for key rotation and other things meant that this was sometimes two bits. And sometimes there could be an additional "hidden" bit. Either way, the intent was that this be just a single bit.

  2. Again, what "trust" means here is vague and whether it is the user or their browser that was trusted is unclear.

Comment by @hadleybeeman Jul 17, 2025 (See Github)

Hi, we are just looking at this in our W3C TAG breakout today. @arichiv @samuelschlesinger @philippp — do you have any thoughts on our feedback above?

Comment by @arichiv Jul 17, 2025 (See Github)

Sorry for the delay, we appreciate the feedback given and are considering it as well as the positioning of the Private Proof API itself right now. We hope to revisit when ready, but this may take some time.

Discussed Jul 21, 2025 (See Github)

Hadley: Nudged them for follow-up to our comments, came back saying they would revisit it, but it would take some time. Suggested to close it as timed out, and let them reopen it later.

Ehsan: Just sent a comment to the private brainstorming. We could post it to the public repository before closing it.

Hadley: We've given them a lot of comments, do you think it is important to put those comments in or can we hold on to them and post them later?

Ehsan: I consider this an important issue, my understanding is that it can protect users to proof they are not a fraud. Think this would need a clarification.

Hadley: If the TAG agrees, I think this would be important.

Matthew: We can chat with Martin if he agrees with Ehsan's comment and then post it over.

Lola: Ok, we'll wait for Martin before we proceed.

Hadley to ping Martin in the private brainstorming.

Comment by @toreini Jul 25, 2025 (See Github)

Hi @arichiv , many thanks for this spec. Can you please have a look at these?

  1. I have a question regarding the design of the protocol: How does this spec resists against a fraudulent who want to impersonate an honest user? Consider this scenario:
  • You are fraud or part of a fraud campaign that is small or medium sized, you use a malicious extension to record the message exchange (which is not an unreasonable assumption for fraud).
  • As message is not bound to time or recorded nonce (to prevent tracking), you distribute it among dozen of other user agents. Each one will try to impersonate a user, relaying the same messages with the hope that one of them will pass the browser challenge (if the T* is predictable, which I think it is).
  • the above scenario may work with large scale or targeted fraud attacks too.
  1. Also, The diagram in the explainer is confusing (and misleading) in many ways: for instance (but not limited too), client private key k and then, response k? are they different? Does it mean the private key is sent to the server?

Many thanks again, Ehsan

Comment by @SamuelSchlesinger Jul 27, 2025 (See Github)

Hi @toreini. The client's private key k is never sent to the server except as a blinded portion of the request for an issuance (by blinded, I mean the server cannot invert this to discover the value of k), and it is in fact the key thing which prevents the distribution attacks you mention. In particular, we send a proof that there exist authenticated k and i within 0..RATE_LIMIT such that F_k(i || epoch) = RateLimitingToken, and RateLimitingToken is what's actually sent back to the server. Because F_k is cryptographically pseudorandom and our proof is zero-knowledge, we are never revealing the client's identity to the server. Further, because RateLimitingToken is determistically constructed from k andi, any given credential issuance can only be used RATE_LIMIT times per token. This means that, while we aren't binding the token to the user's device (this was a key design choice to allow all device types to be supported), our token presentations are scarce.

Comment by @toreini Jul 29, 2025 (See Github)

Hi @SamuelSchlesinger,

many thanks for your response. yes, I can see that the proof of ownership is sent instead of the private k in the spec text, but please make your protocol diagram more accurate. However, here is my current concern.

If a malicious actor can get access to the RateLimitingToken and the proof, then I think it can relay/replay the token later if it is not bound to something. You are essentially relying on the epoch_length and epoch_limit in the verification algorithm to mitigate the distribution attacks but these two parameters are deterministic. I assume you are holding on economy of scale and say each token cannot be relayed/replayed more than x times, accepting the risk of some fraud actors here and there, but not a large-scale campaign.

Essentially, my question is this: Is the relaying/replaying attack possible if a malicious actor get access to a legit token and proof? If the verification relies on epoch_length and epoch_limit, and they are predictable (which I think they are), then my concern is that if a small/medium fraud campaign or a clever big campaign can successfully bypass the current protocol if they have a collection of valid "RateLimitingToken and the proof". Even in case of using ring signature for issuer fungibility, as described in honest token sharing, collusion can be possible.

Comment by @SamuelSchlesinger Jul 29, 2025 (See Github)

Yes, indeed that is possible with the current design in the explainer. One mitigation we could apply is binding the proof to a nonce provided by the issuer at challenge time, the underlying cryptography supports this. Still, I should clarify, if you have a sink of valid tokens, then you can generate proofs and rate limiting tokens anyways, as we cannot bind the tokens to a device without seemingly violating some of our aims to allow cross-platform compatibility.

Currently, this API is on hold, but if we pursue it I will be sure to address the concerns you raised about replay attacks and the inaccurate diagram.