design-reviews#823: Early design review request: IPA

#823: Early design review request: IPA

Opened Mar 3, 2023

こんにちは TAG-さん!

I'm requesting a TAG review of Interoperable Private Attribution (IPA).

IPA proposes a system that enables cross-site attribution. The idea is to provide businesses that use advertising with a way to measure how their advertising is performing without having to rely on tracking. To do this, IPA assigns users with an identifier - a match key - that cannot be used outside of a multi-party compute (MPC) system. The MPC system only executes a specific protocol that has been vetted to ensure that it only provides aggregated information.

Explainer¹ (minimally containing user needs and example code): https://github.com/patcg-individual-drafts/ipa/blob/main/IPA-End-to-End.md
User research: none yet
Security and Privacy self-review²: https://github.com/patcg-individual-drafts/ipa/blob/main/sec-priv-q.md
GitHub repo: https://github.com/patcg-individual-drafts/ipa
Primary contacts:
- Ben Savage (@benjaminsavage), Meta
- Erik Taubeneck (@eriktaubeneck), Meta
- Martin Thomson (@martinthomson), Mozilla
Organization/project driving the design: Meta
External status/issue trackers for this feature:
- https://github.com/WebKit/standards-positions/issues/142
- https://github.com/mozilla/standards-positions/issues/753

Further details:

I have reviewed the TAG's Web Platform Design Principles
The group where the incubation/design work on this is being done (or is intended to be done in the future): PATCG
The group where standardization of this work is intended to be done ("unknown" if not known): PATWG (not approved, draft charter)
Existing major pieces of multi-stakeholder review or discussion of this design: Records of some discussion can be found in the project repository and PATCG minutes.
Major unresolved issues with or opposition to this design: The explainer includes sections that describe a number of open issues. We are planning trials that should help answer some of these.
This work is being funded by: Meta and Mozilla.

You should also know that...

The security and privacy questionnaire covers two key challenges, that I will highlight again here:

This proposal uses information - match keys - that might be used to perform cross-site tracking if the protections in the proposal were to fail. The API allows any web site to request and receive this information from user agents. The proposal includes a number of measures that are designed to protect this information.
The aggregated information that is provided to sites is based on the use of match keys. The use of differential privacy ensures that there is some protection for the contribution of individual users. The design limits the rate at which sites gain this information, so while the amount of information each week has strict limits, over time this limit always increases without bound.

Any conclusions about the privacy properties of the API will depend on an assessment of the adequacy of these protections.

We'd prefer the TAG provide feedback as 🐛 open issues in our GitHub repo for each point of feedback. We're happy to engage with general feedback, commentary, and questions in this thread; we expect some feedback to be very broad in nature.

Discussions

Discussed Mar 13, 2023 (See Github)

Hadley: not written from the perspective of the user...

Dan: really good answers to the security & privacy questionnaire...

Dan: https://github.com/w3ctag/design-reviews/issues/724 we said "multiple approaches..." do we want a different thing on each browser?

Hadley: IPA is set up with a multi-party computation model - makes tracking someone against their will a lot harder. This differentiates this proposal... will be better for user?

Dan: also points back to the user PoV..

Peter: do any of these benefit the user or just better ways of doing what advertisers want to do?

Hadley: conditional level of validation that the person has been validated somewhere else...

Peter: ...trust tokens...

Hadley: yes but trust tokens are completely anonymous... IPA protocol is an aggregate report... measure performance of use of adverising... (Reading from the security and privacy questionnaire) a lot can be said about the relative virtues of advertising .. not all will say advertising is necessary ... however you can't do advertising without being able to measure it. Alternatives involve tracking or surveys & panels ... expensive & slow and too inefficient for general use.

Dan: this is what Apple also say that they do... They have not requested a review yet even ...

Comment by @hadleybeeman Mar 15, 2023 (See Github)

Hi @benjaminsavage, @martinthomson and @@eriktaubeneck. We are looking at this in our W3CTAG meeting today.

Two questions for you.

User cases. Could you please list out the use cases you are designing for, from an end-user's perspective? You mention several throughout the explainer (authentication, measuring the impact of advertising), but it's hard to tell which you are designing for.
How does this relate to other approaches? You mention them in the acknowledgements section, but they appear to be competing proposals. How does this differ and how do you anticipate this all to make sense from a developer's point of view?

Comment by @eriktaubeneck Mar 15, 2023 (See Github)

Hi @hadleybeeman, glad to hear that you will be looking at it today! Let me do my best to answer your questions, but please follow up if you need any clarity.

We are designing for the use case of measuring the impact of advertising based on cross site behavior. Previously this was supported via shared cross site context (3rd party cookies.) We believe this is an important use case to support in a manner that doesn't enable tracking end-users individually, so that end-users's have access to sites which are ad supported that may otherwise require payment or require end-users to provide PII that can be used for tracking.
The Private Advertising Technology Community Group (PATCG), where IPA has been proposed, has published a fairly comprehensive overview of alternative proposals. I'm not sure I understand the question about "mak[ing] sense from a developer's point of view", but the general goal in the PATCG is to recommend a common standard (IPA or otherwise) which can be standardized and implemented across browsers so that developers have a only one common API that supports this use case.

Comment by @ShivanKaul Mar 15, 2023 (See Github)

Some general feedback and thoughts:

I’m not seeing the user benefit here: the use case pointed out (“measuring the impact of advertising based on cross site behavior”) is very explicitly a use-case website developers have, not end users (with reference to W3C priority of constituencies). Users already have choices regarding preventing Web tracking; most user agents already block 3rd party cookies by default.
I also think this has the potential to be actively harmful for users: for e.g. this from the Explainer is concerning: “If a match key provider is able (and willing), they could extend this even further by performing user-level linkage to other contexts (e.g., email based matching with offline merchants), then distribute encrypted match keys, enabling businesses to bring offline user activity from these other contexts into the MPC. The impact this may have on the overall ecosystem is not obvious. On one hand, it may drive an increase in sharing of PII between parties in an effort to gain access to this new measurement capability.” Plus, the overall complexity of this is problematic, which is largely hidden from users; ISTM that when the system fails, user privacy is harmed in a way that is invisible to users. Similarly…
What all can the user verify in this system regarding their privacy? Can they verify that the privacy budget is being respected, for e.g.?
This proposal will prioritize and further lead to the consolidation of the Web in favour of large browser vendors. MPC-based systems tend to be expensive to operate, and if IPA is “standardized and implemented across browsers so that developers have a only one common API that supports this use case” then my concern is that would be effectively ruling out smaller user agents who can’t afford to pay for such a service.

Comment by @martinthomson Mar 15, 2023 (See Github)

I just want to pick up on this one point:

effectively ruling out smaller user agents who can’t afford to pay for such a service

In IPA, websites (advertisers, publishers, etc...) pay, not user agents.

Comment by @ShivanKaul Mar 16, 2023 (See Github)

It is not clear what the relationship is between the User Agent Vendor and the three Helper Parties, except for the fact that the User Agent must trust the Helper Parties to not collude. Given that (by design) 2 parties colluding would be disastrous for user privacy, there's a strong incentive for the User Agent Vendor to operate one of the Helper Parties.

Comment by @martinthomson Mar 16, 2023 (See Github)

Operating a helper node is something Mozilla has considered, but we're not inclined to do so for a few reasons. Foremost of those is that we're looking to use something like the CA/Browser forum as a reference model for governance. That is, we want to have a common set of helper party networks that are trusted and overseen by a group formed by multiple browsers. In other words, having browsers oversee the operators of those networks. Having the browser involved both in operation and oversight would introduce some fairly gnarly conflicts of interest that seemed best to avoid.

As for the other points you make @ShivanKaul:

Regarding users vs. sites (point 1). Yes, this is directly acknowledged in the explainer. It's come up a number of times in PATCG meetings specifically in the context of the priority of constituencies. This is something that I recognize that each of us weight differently, but we'll note that the priority is necessarily loose, so there are a few ways that I think you can justify doing something like attribution.

The magnitude of the benefit here needs to be considered. The IPA design deliberately imposes a very low cost on users. Leaving aside trivial amounts of bandwidth and compute, the primary cost is the privacy loss (in the formal DP sense) that accrues through providing sites with the ability to perform aggregated attribution. Mozilla's position here is that - provided that we can find an acceptable set of parameters, especially for the $\epsilon$ and $\delta$ in the $(\epsilon, \delta)$-differential privacy - this cost is acceptable for general web browsing cases. That is, acceptable if the corresponding advantage provided to web sites is significant. And for that I think that the case has been made by those in the advertising industry: measurement capabilities provide enormous benefit in terms of being able to profitably run an advertising business.

Again, we acknowledge that benefits that users see are likely to be indirect, at best. Access to ad-supported content is not automatic here. The advertising industry has some pretty bad incentive structures and it might be that the current trend away from ad-supported content will continue, with the benefits to users not be realized. But we do believe that advertising has demonstrated an ability to provide support to sites that can be more equitable than other business models as it largely shifts the burden on to those who are more willing or able to support advertisers. A progressive taxation system, if you will.

Ultimately, there are a lot of things to consider here. It's understandable that you might distrust the advertising industry. There are a lot of shady practices that probably won't stop as a result of us building this stuff. Many actors are unhappy with the share of revenue taken by intermediaries (what's new). Hell, some of that will probably get worse despite our efforts, but that is a risk for a lot of the stuff we build. We could, as I think you are implying, refuse to do anything here, but there are those of us that think that leads to undesirable outcomes, like a far less equitable web. What we are trying to do here is to avoid the worst pitfalls and build safeguards for the rest, technical if possible, procedural and policy-based for the gaps.

You identify a few areas that are particularly challenging for IPA. Some of its flexibility comes with inherent trade-offs around things like user transparency. You also identified one area that continues to be challenging for us with the point you make about match key providers. All I can really say is that these represent some of the harder trade-offs we've made in the design. Having some more discussion about these choices relative to some of the alternatives might be the best way to proceed, because some of those choices can be hard to rationalize without putting them into the broader context. I also want to acknowledge explicitly that the context I'm talking about includes not only how sites receive support, but how browsers support themselves. (You might also add bad behaviour from information brokers and regulatory interventions into what is turning out to be pretty complicated.)

Comment by @ShivanKaul Mar 31, 2023 (See Github)

Sorry for the late response, you know how IETF weeks are...

but we'll note that the priority is necessarily loose, so there are a few ways that I think you can justify doing something like attribution.

I don’t think the Priority of Constituencies is “loose”; it’s plain, and the exceptions that are listed in the “Web Platform Design Principles” document are unrelated to what’s being proposed here.

The IPA design deliberately imposes a very low cost on users. Leaving aside trivial amounts of bandwidth and compute, the primary cost is the privacy loss (in the formal DP sense) that accrues through providing sites with the ability to perform aggregated attribution.

While the (formal, DP) privacy loss for users in IPA is definitely something we should reason about, I suspect that it is also the more attractive one to solve for us as engineers; the more important user concerns here are i. around transparency and trust, and ii. piercing the privacy boundary of the browser by intentionally linking events that happen outside the browser with events that happen within the browser.

The proposed governance model is especially concerning to me: it looks like we’re building complicated and expensive new Web infrastructure/governance structures here, similar to the CA/Browser forum like you mentioned, except that with IPA, there is not even a security or any other similar benefit to users. I really don’t think CAB is the model to be emulating. This is the first W3C proposal (we’re aware of) that requires the use of trusted, non-user auditable centralized servers for privacy protections. Beyond the clear privacy risk for catastrophic harm here (e.g., misconfigured server), this approach seems incompatible with several TAG findings / W3C principles, including “enhancing individuals control and power”, “the web is transparent” and “the web must make it possible for people to verify the information they see”.

This proposal has the goal of intentionally linking behaviors in the browser with behaviors outside the browser. This is a new category of privacy harm that the proposal would enable, and the first time we’ve seen it as an explicit goal in a proposal. This has already resulted in attacks like https://github.com/patcg-individual-drafts/ipa/issues/57.

As best we can tell, this technology is being proposed to benefit sites and browser vendors, and at the risk to users and the openness and transparency of the platform as a whole.

Comment by @martinthomson Apr 1, 2023 (See Github)

Regarding priorities and "loose", I was loosely referring to this important qualification:

Like all principles, this isn’t absolute. Ease of authoring affects how content reaches users. User agents have to prioritize finite engineering resources, which affects how features reach authors. Specification writers also have finite resources, and theoretical concerns reflect underlying needs of all of these groups.

That said, even a strict ordering justifies our conclusion, though it requires acknowledging that some benefits are indirect. That is, the indirect benefit to users as a result of serving the needs of authors (again, via an ability to more effectively support their work with advertising) outweighs or is neutral with the loss associated with those users participating in an aggregated measurement system. And the benefit to authors is potentially significant.

Comment by @benjaminsavage Apr 21, 2023 (See Github)

Thank you everyone for the feedback thus far.

I wanted to update the group about a change that we have recently made to the IPA proposal.

In light of both:

The risk of events in the browser being linked to events outside of the browser (a risk called out by @csharrison, which we added to the IPA end-to-end doc)
The attack @bmcase discovered, and posted about (https://github.com/patcg-individual-drafts/ipa/issues/57), which could be waged by a malicious match key provider

We've opted to remove the setMatchKey API from this proposal. Perhaps, in future, we will find solutions to these problems, but until that time, we would like to explore a simpler proposal which only includes a getEncryptedMatchKey() API.

The underlying identifier being secret shared in this case would just be a random number, generated by the user-agent, which would never be revealed to any party, just stored on the device.

We hope this simplification will address a number of the concerns listed above.

Comment by @ShivanKaul Apr 30, 2023 (See Github)

To clarify, Match Key Providers (and their associated API call, set_match_key()) are being removed from IPA, thus removing the cross-device measurement use-case. Is that correct? That would help with point ii. of:

... the more important user concerns here are i. around transparency and trust, and ii. piercing the privacy boundary of the browser by intentionally linking events that happen outside the browser with events that happen within the browser.

It would also be good to update the Explainer then.

Discussed Jun 12, 2023 (See Github)

Hadley: long discussion... api changed

Discussed Jul 1, 2023 (See Github)

Tess: I think something along these lines is probably the right thing for aggregated reporting...

Discussed Jul 3, 2023 (See Github)

Amy: They've responded to our qs. Also some conversation and changes to the spec since it was reviewed in Tokyo. We should re-review. They have some open questions, so if we have the right expertise in the TAG we can offer advice, but otherwise we should probably wait for it to settle down

Amy: How does this fit with Attribution Reporting API? (It's the same problem?)

Dan: Any neutral opinion / summary on the three proposals?

Amy: Maybe Nick Doty / Sam Weiler?

Dan: i'll take the action to reach out to Sam Weiler & Nick Doty

Discussed Feb 12, 2024 (See Github)

Tess: last time there had been a unified approach... do you think it's worth spending time reviewing IPA as it is right now? Should we ask the PATCG to circle back when that happens?

Martin: Right, we've had a number of conversations about iterations that remove a persistant identifier, which is generally an all round improvement. We might appreciate guidance on the abstract notion - is it a good idea to support this use case - I don't think anything further than that would be useful

Tess: I propose we close this nicely and ask that they file a new request when the new proposal is ready

Peter: so is this a good idea?

Tess: we know that advertisers want to measure effectiveness, and they'll do so in a bunch of ways that are indistinguishable from cross site tracking. Since we're cracking down on cross site tracking, we're also cracking down on anything that resemebles cross site tracking because we can't distinguish. It's a good thing that people are trying to move off technology that is indistinguishable from cross site tracking. I'm happy that the industry is collaborating on something that is reasonbly privacy preserving, and I want to encourage that. I'd rather see a convergent solution.

Martin: there are a bunch of nonobvious properties of a system like this. We talked about targeted advertising previously. All advertising is targeted to some extent. Also, being able to measure effectively creates the ability to better target your advertising, whether or not that comes with the ability to gain more information about people is orthogonal. Assuming you have some amount of information about people, the ability to measure the effect of advertising on people gives you the ability to realise which people will be more receptive to your message.

Peter: that's my concern. In general I'm against targeted advertising because it does what it does in a harmful way. I get the business case for wanting to measure the effectiveness. I don't want to add technology to the web to just enable a business case. But as long as we can come up with a way that doesn't cause harm - if it can't be abused, I can't oppose it. There is a fine line between effective commercial advertising and effective manipulation of peoples opinions. There's a lot of evidence that social networks in general have caused all sorts of harm this way.

Martin: we thought a lot about this. We concluded the harm minimisation aspects were significant enough to justify doing it, but only on the basis that the privacy tuning of the proposal was sufficiently good. Effectively making sure that the information made available to advertisers isn't good enough for them to do some of the more sophisticated stuff. We may not ever achieve that - really bad information is still pretty good for some people. Some really interesting equity questions that raises. With our primary tool being differential privacy, what are the tuning parameters we put on this? At oen end of the scale it's useless, and at the other end you've not moved the privacy needle at all.

Peter: when it comes to harm reduction, there are lots of existing mechanism used to track and identify people. If we give them a way to get the same effect without that level of invasiveness, but the amount of information they get they don't believe is as good they'll still do the harmful thing. We have to remove the ability to do the harmful thing, and if we can do that, why do we have to give them a less harmful thing?

Tess: comes back to tuning the parameters.. technological intervention to prevent it from happening is... an arms race. If we push things to the point where that happens... If we can push them to a beter place and have it be good enough that they're not going to engage in escalation then that seems like a good thing. One of the nice things about the escalation is that it's more obvious that you're doing it, then it's easier to get to a policy intervention. On the technical side, the status quo in browsers with 3p cookies is widespread cross site tracking that is technologically indistinguishable from a number of otherwise legitimate use cases. Having a mechanism like this means that those things are technologically distinguishable to some extent. It's not as good as what you were doing before - but what you were doing before is going away. So it's adopt this, or do something that's beyond the pale. If we can tune this so it's good enough that people can get 80% of what they want and not run afoul of non-technical interventions, we're hitting the sweet spot. If we push them too hard and they all just do the worst thing, then we're worse off than we were at the beginning. I feel like this is worth trying. I don't know what the end state is.

Peter: so we're not saying no

We talked about this today during our call, and it's our understanding that there is a promising path forward to merge IPA, PAM and the relevant portions of ARA. Given that, we don't think it's prudent to review the details of IPA since this is subject to change.

We're happy to see these attempts to converge on a way of measuring advertising effectiveness that is more privacy preserving. We encourage you to keep fine-tuning the privacy properties of your proposals, and then to open a new design review request when it's ready and we'll take a look then. Thanks!

</blockquote>

Comment by @rhiaro Feb 13, 2024 (See Github)

Comment by @bmcase Feb 13, 2024 (See Github)

That makes sense. More details on this hybrid proposal are forthcoming. Once there has been time for more discussion of it, we'll open a new design review.