#726: Early design review for the Topics API

Visit on Github.

Opened Mar 25, 2022

Braw mornin' TAG!1

I'm requesting a TAG review of the Topics API.

The intent of the Topics API is to provide callers (including third-party ad-tech or advertising providers on the page that run script) with coarse-grained advertising topics that the page visitor might currently be interested in. These topics will supplement the contextual signals from the current page and can be combined to help find an appropriate advertisement for the visitor.

  • Explainer¹ (minimally containing user needs and example code): https://github.com/jkarlin/topics
  • User research: [url to public summary/results of research]
  • Security and Privacy self-review²: See below
  • GitHub repo (if you prefer feedback filed there): https://github.com/jkarlin/topics
  • Primary contacts (and their relationship to the specification):
    • Josh Karlin, jkarlin@, Google
    • Yao Xiao, xyaoinum@, Google
  • Organization/project driving the design: Chrome Privacy Sandbox
  • External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5680923054964736

Further details:

  • [ x ] I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): Either WICG or PATCG
  • The group where standardization of this work is intended to be done ("unknown" if not known): unknown
  • Existing major pieces of multi-stakeholder review or discussion of this design: Lots of discussion on https://github.com/jkarlin/topics/issues/, and a white paper on fingerprintability analysis: https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf
  • Major unresolved issues with or opposition to this design: We believe that the proposed API leans heavily towards user privacy in the privacy/utility tradeoff, as it should. But, the API’s utility isn’t yet clear. Until we try the API in an experiment, we can’t know for sure how the API will perform. Some changes are likely going to be needed. Knobs we may tweak include, but are not limited to, topics in the taxonomy, weights of the topics in the taxonomy, how a site might suggest topics for itself, and how we might get topic data from more places than just the domain (e.g., from the url if there is some signal that the url is privacy safe to parse).
  • This work is being funded by: Chrome

You should also know that...

This API was developed in response to feedback that we (Chrome) received from feedback on our first interest-based advertising proposal, FLoC. That feedback came from TAG, other browsers, Advertisers, and our users. We appreciate this feedback, and look forward to your thoughts on this API.

At the bottom of this issue is both the security survey responses, as well as responses to questions from TAG about FLoC, but answered in terms of Topics.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

Self Review Questionnaire: Security & Privacy

2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?

  • It exposes one of the user’s top-5 topics from the previous week to the caller if the calling context’s site also called the Topics API for the user on a page about that topic in the past three weeks. This is information that could have instead been obtained using third-party cookies. The part that might not have been obtained using third-party cookies is that this is a top topic for the user. This is more global knowledge that a single third-party may not have been able to ascertain.
  • 5% of the time the topic is uniformly random.
  • The topic comes from a taxonomy. The initial proposed taxonomy is here: https://github.com/jkarlin/topics/blob/main/taxonomy_v1.md
  • The topic returned (if one of the top 5 and not the random topic) is random among the top 5, and is set per calling top-frame site. So if any frame on a.com calls the API, it might get the topic with index 3, while b.com callers might get topic at index 1 for the week. This reduces cross-site correlation/fingerprintability.
  • Topics are derived only from sites the user visited that called the API.
  • Topics are derived only from the domain of the site, not the url or content of the site. Though this may change depending on utility results.

2.2 Do features in your specification expose the minimum amount of information necessary to enable their intended uses?

Yes. The entire design of the API is to minimize the amount of information about the user that is exposed in order to provide for the use case. We have also provided a theoretical (and applied) analysis of the cross-site fingerprinting information that is revealed: https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf

2.3. How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?

The API intentionally provides some information about the user to the calling context. We’ve reduced the ability to use this information as a global identifier (cross site fingerprinting surface) as much as possible.

2.4. How do the features in your specification deal with sensitive information?

Sensitive information is reduced by only allowing topics in the Taxonomy that Chrome and the IAB have deemed are not sensitive (the topics in the proposed initial taxonomy are derived from the two respective organization’s advertising taxonomies).

This does not mean that topics in the taxonomy, or groups of topics in the taxonomy learned about the user over time cannot be correlated sensitive topics. This may be possible.

2.5. Do the features in your specification introduce new state for an origin that persists across browsing sessions?

The API provides some information about the user’s browsing history, and this is stored in the browser. The filtering mechanism used to provide a topic to a calling context if and only if that context has observed the user on a page about that topic in the past also stores data. This could be used to learn if the user has visited a specific site in the past (which third-party cookies can do quite easily today) and we’d like to make that hard. There may be interventions that the browser can take to detect and prevent such abuses.

2.6. Do the features in your specification expose information about the underlying platform to origins?

No.

2.7. Does this specification allow an origin to send data to the underlying platform?

The top-frame site’s domain is read to determine a topic for the site.

2.8. Do features in this specification enable access to device sensors?

No.

2.9. Do features in this specification enable new script execution/loading mechanisms?

No.

2.10. Do features in this specification allow an origin to access other devices?

No.

2.11. Do features in this specification allow an origin some measure of control over a user agent’s native UI?

No.

2.12. What temporary identifiers do the features in this specification create or expose to the web?

The topics that are returned by the API. They are per-epoch (week), per-user, and per site. It is cleared when the user clears state.

2.13. How does this specification distinguish between behavior in first-party and third-party contexts?

The topic is only returned to the caller if the calling context’s site has also called the API on a domain about that topic with that same user in the past three weeks. So whether the API returns anything or not depends on the calling context’s domain.

2.14. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?

The API returns an empty list in incognito mode. We feel that this is safe because there are many reasons that an empty list might be returned. e.g., because the user is new, because the user is in incognito, because the site has not seen this user on relevant sites with the associated topics in the past three weeks, because the user has disabled the API via UX controls.

This is effectively the same behavior as the user being new, so this is basically the API working the same within incognito mode as in regular mode. We could have instead returned random topics in incognito (and for new users) but this has the deleterious effect of significantly polluting the API with noise. Plus, we don’t want to confuse users/developers by having the API return values when they expect it not to (e.g., after disabling the API).

2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?

There is no formal specification yet, but the explainer goes into detail on the privacy considerations. The primary security consideration is that the API reveals information beyond third-party cookies in that learning a topic means that the topic is one of the users top topics for the week.

2.16. Do features in your specification enable origins to downgrade default security protections?

No.

2.17. How does your feature handle non-"fully active" documents?

No special considerations.

Responses to questions from the FLoC TAG review, as they apply to Topics

Sensitive categories

The documentation of "sensitive categories" visible so far are on google ad policy pages. Categories that are considered "sensitive" are, as stated, not likely to be universal, and are also likely to change over time. I'd like to see:

  • an in-depth treatment of how sensitive categories will be determined (by a diverse set of stakeholders, so that the definition of "sensitive" is not biased by the backgrounds of implementors alone);
  • discussion of if it is possible - and desirable (it might not be) - for sensitive categories to differ based on external factors (eg. geographic region);
  • a persistent and authoritative means of documenting what they are that is not tied to a single implementor or company;
  • how such documentation can be updated and maintained in the long run;
  • and what the spec can do to ensure implementers actually abide by restrictions around sensitive categories. Language about erring on the side of user privacy and safety when the "sensitivity" of a category is unknown might be appropriate.

A key difference between Topics and Cohorts is that the Topics taxonomy is human curated, whereas cohorts were the result of a clustering algorithm and had no obvious meaning. The advantage of a topics based approach is that we can help to clarify which topics are exposed. For instance, the initial topology we intend to use includes topics that are in both the IAB’s content taxonomy and Google’s advertising taxonomy. This ensures that at least two separate entities had reviewed the topics for sensitive categories. Assuming that the API is successful, we would be happy to consider a third-party maintainer of the taxonomy that incorporates both relevant advertising interests as well as up-to-date sensitivities.

Browser support

I imagine not all browsers will actually want to implement this API. Is the result of this, from an advertisers point of view, that serving personalised ads is not possible in certain browsers? Does this create a risk of platform segmentation in that some websites could detect non-implementation of the API and refuse to serve content altogether (which would severely limit user choice and increase concentration of a smaller set of browsers)? A mitigation for this could be to specify explicitly 'not-implemented' return values for the API calls that are indistinguishable from a full implementation.

The description of the experimentation phase mentions refreshing cohort data every 7 days; is timing something that will be specified, or is that left to implementations? Is there anything about cohort data "expiry" if a browser is not used (or only used to browse opted-out sites) for a certain period?

As always, it is up to each browser to determine which use cases and APIs it wishes to support. Returning empty lists is completely reasonable. Though a caller could still use the UA to determine if the API is really supported or not. I’m not sure that there is a good solution here.

In regards to the duration of a topic, I think that is likely to be per-UA.

In the Topics API, we ensure that each topic has a minimum number of users, by returning responses uniformly at random 5% of the time.

Opting out

I note that "Whether the browser sends a real FLoC or a random one is user controllable" which is good. I would hope to see some further work on guaranteeing that the "random" FLoCs sent in this situation does not become a de-facto "user who has disabled FLoC" cohort. It's worth further thought about how sending a random "real" FLoC affects personalised advertising the user sees - when it is essentially personalised to someone who isn't them. It might be better for disabling FLoC to behave the same as incognito mode, where a "null" value is sent, indicating to the advertiser that personalised advertising is not possible in this case. I note that sites can opt out of being included in the input set. Good! I would be more comfortable if sites had to explicitly opt in though. Have you also thought about more granular controls for the end user which would allow them to see the list of sites included from their browsing history (and which features of the sites are used) and selectively exclude/include them? If I am reading this correctly, sites that opt out of being included in the cohort input data cannot access the cohort information from the API themselves. Sites may have very legitimate reasons for opting out (eg. they serve sensitive content and wish to protect their visitors from any kind of tracking) yet be supported by ad revenue themselves. It is important to better explore the implications of this.

The current plan is for the Topics API to return an empty list in incognito mode.

Sites opt in via using the API. If the API is not used, the site will not be included. Sites can also prevent third parties from calling the API on their site via permission policy.

In regards to granular controls, we feel that this is possible with Topics (less so with FLoC) and expect to expose via UX the topics that are being returned, and allowing users to opt out of the API completely or disable individual topics.

The API is designed to facilitate ecosystem participation - as calling the API is both the way to contribute and receive value from the API. We do not want sites to be able to get topics without also supporting the ecosystem.

Centralisation of ad targeting

Centralisation is a big concern here. This proposal makes it the responsibility of browser vendors (a small group) to determine what categories of user are of interest to advertisers for targeting. This may make it difficult for smaller organisations to compete or innovate in this space. What mitigations can we expect to see for this? How transparent / auditable are the algorithms used to generates the cohorts going to be? When some browser vendors are also advertising companies, how to separate concerns and ensure the privacy needs of users are always put first?

The Topics API helps to address broad, granular topics based advertising. For more niche topics, we suggest the usage of alternative sandbox APIs like FLEDGE. In terms of transparency, the API is written plainly in open source code, the design is occurring on github with an active community, and the ML model used to classify topics will be available for anyone to evaluate.

Accessing cohort information

I can't see any information about how cohorts are described to advertisers, other than their "short cohort name". How does an advertiser know what ads to serve to a cohort given the value "43A7"? Are the cohort descriptions/metadata served out of band to advertisers? I would like an idea of what this looks like.

With Topics, the Taxonomy name is its semantic meaning.

Security & privacy concerns

I would like to challenge the assertion that there are no security impacts.

  • A large set of potentially very sensitive personal data is being collected by the browser to enable cohort generation. The impact of a security vulnerability causing this data to be leaked could be great.

In Chrome, the renderer is only aware of the topic for the given site. The browser stores information about which callers were on each top-level site, and whether the API was called. This is significantly better than the data stored for third-party cookies.

  • The explainer acknowledges that sites that already know PII about the user can record their cohort - potentially gathering more data about the user than they could ever possibly have access to without explicit input from the user - but dismisses this risk by comparing it to the status quo, and does not mention this risk in the Security & Privacy self-check.

The Topics API, unlike FLoC, only allows a site to learn topics if the caller has observed the user on a site about that topic. So it is no longer easy to learn more about the user than they could have without explicit input from the user.

  • Sites which log cohort data for their visitors (with or without supplementary PII) will be able to log changes in this data over time, which may turn into a fingerprinting vector or allow them to infer other information about the user.

Topics is more difficult to use as a cross-site fingerprinting vector due to the fact that different sites receive different topics during the same week. We have a white paper studying the impact of this: https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf Logging data over time does still increase knowledge about the user however. We’ve limited this as much as we think is possible.

  • We have seen over past years the tendency for sites to gather and hoard data that they don't actually need for anything specific, just because they can. The temptation to track cohort data alongside any other user data they have with such a straightforward API may be great. This in turn increases the risk to users when data breaches inevitably occur, and correlations can be made between known PII and cohorts.

The filtering mentioned above (not returning the topic if it was observed by the calling context for that user on a site about that topic) significantly cuts down on this hoarding. It’s no longer possible for any arbitrary caller on a page to learn the user’s browsing topics.

  • How many cohorts can one user be in? When a user is in multiple cohorts, what are the correlation risks related to the intersection of multiple cohorts? "Thousands" of users per cohort is not really that many. Membership to a hundred cohorts could quickly become identifying.

There are only 349 topics in the proposed Topics API, and 5% of the time a uniformly random topic is returned. We expect there to be significantly more users per topic that there were in FLoC.

Discussions

Discussed Apr 1, 2022 (See Github)

"the API’s utility isn’t yet clear"

Amy: they have pre-empted our FLoC questions for Topics

.... loads of issues probably worth reading

... Ad topics eventually sourced from a third party (todo: who?)

... Browser classifies sites.. sites providing their own topics is an "open question".. interesting to think about variation in ads across browsers then?

... 5% random noise, for plausible deniability. Attempts to avoid cross-correlating user between sites.

... Story about sites accumulating topics? Even if only one per week. Aggregating topic data with existing user data (from fingerprinting, cookies, etc)?

... Can sites opt out of being included in the mapping to topics? (Other than by havin an 'opaque' host name) - yes (header, and also have to call the API). How many topics can map to a single site?

... Definite improvement over floc.. more transparent

... Risks with colluding sites (documented). Risks with unintended correlations of topics to sensitive topics.

... "better for user privacy than cookies"

... Lots of open questions under discussion

Amy: definitely better than FLOC. It's in the private advertising group. Tons of discussion on the privacy stuff. It's comletely different to FLOC. No cohorts. Ways it can be abused but it's documented and discussed. It seems better. Ton of open issues. I'd like to look through a bit more. The direction this is going is saying there's no privacy-preserving way to do targeted advertising ... so maybe they should stop trying?

Amy: up front there are a lot of mitigations - like 5% random noise (less of a fingerprinting vector) - list of topics is human-curated and transparent and would ship with chrome.

Peter: a lot of evidence that targeted advertising can be used for political purposes... to negative effect. Also questions on effectiveness in general for commercial purposes.

Amy: question about multi-stakeholder.

Peter: i'm wondering if we shouldn't have a finding "stop targeted advertising."

[discussion about legal future of surveillence for advertising, collection & sale of user data]

Lea: what about websites that can't afford another business model?

Peter: not saying ban advertising altogether

Dan: whether or not it's correct that targeting affects the real value of an ad, whether it influences someone's behaviour, it is the case that targeted ads make more money. Still an economic issue. If targeted ads went away tmorrow there would be a certain number of websites that have a big income drop. It's a concern. Publishers would not aagree with the statement that we should ban targeted advertising. Robin Berjon's take would be interesting.

Lea: unpopular opinion - if targeted ads could be done in a privacy preserving way, it could be better for users - as long as it's opt in, users can see ads that are relevant

Discussed May 1, 2022 (See Github)

Dan: this does require TAG time.. there are some more recent comments from the requester.

Comment by @lknik May 23, 2022 (See Github)

Is it possible to conduct a more formal leak-analysis?

We’ve reduced the ability to use this information as a global identifier (cross site fingerprinting surface) as much as possible.

Comment by @jkarlin May 23, 2022 (See Github)

Please see https://github.com/patcg-individual-drafts/topics/blob/main/topics_analysis.pdf for a more formal analysis.

Comment by @jkarlin May 25, 2022 (See Github)

Also, I'd appreciate your thoughts on if this API belongs in document, navigator, or somewhere else. We chose document.browsingTopics() because the topics are filtered by calling context. But perhaps it should be in navigator since it's more about the state of the user's browsing history?

Discussed Jul 1, 2022 (See Github)

Hadley: [leaves comment about f2f london]

Comment by @hadleybeeman Jul 12, 2022 (See Github)

Hello! We discussed this at our W3C TAG breakout.

We are adding this to our agenda for our upcoming face-to-face in London, and we'll come back to this in more detail then.

Comment by @jkarlin Jul 12, 2022 (See Github)

Great, thanks for the update. Would it be useful for me to be present/available during that time?

Discussed Aug 1, 2022 (See Github)

Amy: I was looking at these this morning trying to figure out what to do individually (for privacy sandbox). We could write something collectively about them. All seems to be at enabling targetted advertising...

Dan: could be scope for a general document. Maybe about framing the issue as the tension between targeted advertising and privacy

Amy: some other thoughts on topics: as you read through the mitigations to make it more privacy preserving - it makes it less and less useful. Lots of people from the Ad tech industustry in their repo asking how useful it would be. So the question is - does it make it completely ineffective for the use case..? Might be worth asking what's changed in Topics in particular.

Dan: maybe if privacy advocates are complaining and ad tech people are complaining maybe it is actually hitting that middle ground.. maybe it's a compromise. But no good if both groups reject it. What we did not get into in the London session is the wider ad industry feedback, besides google.

Sangwhan: within the scope of Topics related stuff, the new way of doing retargeting, I've seen other companies chime in and have written their version of the proposals - they are actively engaging, so the approach may not be that contentious. The technical details and the amount of data held where is a point of contention. Advertisers, especially the ones that are very data driven don't want to give up.. they want to give up as little control as possible. There is a middle gorund. But definite engagement. About finding a path where the ad companies with a dependency on this look for a plan B where 3p cookies are not a thing. There are people who are not willing to accept that compromise.

Dan: maybe try to get TAG consensus on some of the issues across the board, and documenting the tension between targeted advertising and user privacy?

Sangwhan: we could make a statement about 3p cookie tracking being bad...

Dan: pretty sure we can get consensus on that. We could also talk about the fact that some new proposals are trying to mitigate or to preserve mechanisms for 3p tracking and there is a danger that in doing so we .. that the move towards deprecating 3p cookies is positive and we have to be careful as we develop new advertising enabling tech that is a replacement that we're not just replacing one bad thing with another bad thing. Something measured.

Amy: "what's the point of replacing 3rd party cookies with another set of tech that just mimics the status quo?" but more polite.

Sangwhan: definitely limits the amount of tracking. A step in the right direction. I don't have the expertise to say whether or not this will be a level playing field. It might end up being beneficial to advertisers that do not require significant targeting, who have a lot of volume, compared to those who do small amounts heavily targeted. Can't say if that's good or bad. Definitely the landscape will change. Another thing we want to mention is that 3p cookies do have a legitimate use case - single sign on. That is 99.9% is creepy tracking, but 0.1% is single sign on, we want to preserve that from not completely breaking - we want that in the statement.

Dan: we can point to things like fedcm.

[tangent discussion about single sign on in absence of 3p cookies]

Amy: I'm happy to kick it off by creating a doc and get feedback from others.

Comment by @cynthia Aug 9, 2022 (See Github)

Retroactively: Yes.

Discussed Dec 1, 2022 (See Github)

Amy: we talked about this a lot but haven't updated the issue. I'll go back and find past minutes.

Peter: how does EU ruling about targeted advertising on Meta play with Topics?

Discussed Dec 1, 2022 (See Github)

Amy: they have a paper - analysis of cross-site reidentification risk... not sure if it's up to date with their latest changes. It's great that they did it, but would be preferable to see this sort of an analysis from someoen independant.

Amy: appreciate the work that's gone into this... but also ...

Dan: noting it's Google Research and Chrome people - no outside contributors

Dan: in a world where the topics api is the only way to do any kind of targeted advertising that is not only contextual, this would be a net positive over where we are now.

Amy: noting user control and how much control the user has over it

Dan: considering this would only be used for ads, and further targeting ads, the person would have the option of using a browser that does not ship the topics api and therefore self select...

Amy: unless they detect the api is not implemented and refuse to serve a page in that browser, then we have fragmentation - they have mitigations against some ways to detect if the user has disabled the api or is in incognito mode (empty array returned) - but don't know what happens in the case where the browser hasn't implemented it at all...

Dan: if the behaviour is the same in incog mode as a user who has disabled it, that should not be detectable

Amy: but if a browser hasn't implemented it at all, would you still get an empty array or would you get an error?

Dan: writes comment..

<blockquote> Hi @jkarlin thanks for this. We discussed in [breakout today](https://github.com/w3ctag/meetings/blob/gh-pages/2022/telcons/12-19-agenda.md). One concern we have is the risk that publishers might try to detect whether the user is using a browser with the topics API loaded / enabled and attempt to deny service if the API is not implemented. This is a similar issue what was discussed in your response to the security & privacy questionnaire for the API's behaviour in incognito mode. In both cases it feels like the result should be that the publisher should not be able to tell whether the topics API is disabled/not implemented. Is this the case? </blockquote>

Amy: then there's the whole Dmarti issue about centralizing power in larger publishers...

Comment by @jkarlin Dec 2, 2022 (See Github)

It's been awhile since our last presentation. I just wanted to bring to your attention two changes since then. Namely:

  1. Option to retrieve topics without modying state of which topics have been observed
  2. Option to send headers as part of fetch request headers. That pull request also mentions document requests, but that's still being debated.
Comment by @torgo Dec 20, 2022 (See Github)

Hi @jkarlin thanks for this. Apologies for the delay in giving further feedback. We discussed in breakout today. One concern we have is the risk that publishers might try to detect whether the user is using a browser with the topics API loaded / enabled and attempt to deny service if the API is not implemented. This is a similar issue what was discussed in your response to the security & privacy questionnaire for the API's behaviour in incognito mode. In both cases it feels like the result should be that the publisher should not be able to tell whether the topics API is disabled/not implemented. Is this the case?

Comment by @jkarlin Dec 20, 2022 (See Github)

Thanks for the question. The API will return an empty response if: the user opts out, the user cleared relevant history, the user is incognito mode, the user is new, the user is signed into Chrome in a child account, etc. As you can see, there are many reasons for which the user may have an empty response, meaning that it is not a clear signal of the user's state. Requiring that a user have topics would negatively impact a significant fraction of the site's traffic.

There are two reasons for returning an empty list instead of a random value. One is that it's much more understandable to the user that when the API is disabled/incognito/etc that the API is not sending anything about them (random or not). Second, the number of cases in which no topics are sent is not small. If we were to send random responses in those cases the signal to noise ratio would be significantly impacted.

Comment by @torgo Dec 21, 2022 (See Github)

Thanks @jkarlin for that clarification and thanks @annevk for the further info from the Webkit position. We're going to put this on the agenda again for early in the new year to discuss further

Discussed Jan 1, 2023 (See Github)
The intention of the Topics API is to enable high level interests of web users to be shared with third parties in a privacy-preserving way in order to enable targeted advertising, while also protecting users from unwanted tracking and profiling. The TAG's initial view is that this API does not achieve these goals as specified.

The Topics API as proposed puts the browser in a position of sharing information about the user, derived from their browsing history, with any site that can call the API. This is done in such a way that the user has no fine-grained control over what is revealed, and in what context, or to which parties. It also seems likely that a user would struggle to understand what is even happening; data is gathered and sent behind the scenes, quite opaquely. This goes against the principle of [enhancing the user's control](https://w3ctag.github.io/ethical-web-principles/#control), and we believe is not appropriate behaviour for any software purporting to be an agent of a web user.

The responses to the proposal from [Webkit](https://github.com/WebKit/standards-positions/issues/111#issuecomment-1359609317) and [Mozilla](https://github.com/mozilla/standards-positions/issues/622#issuecomment-1372979100) highlight the tradeoffs between serving a diverse global population, and adequately protecting the identities of individuals in a given population. Shortcomings on neither side of these tradeoffs are acceptable for web platform technologies. 

It's also clear from the positions shared by Mozilla and Webkit that there is a lack of multi-stakeholder support. We remain concerned about fragmentation of the user experience if the Topics API is implemented in a limited number of browsers, and sites that wish to use it prevent access to users of browsers without it (a different scenario from the user having disabled it in settings).

We are particularly concerned by the opportunities for sites to use additional data gathered over time by the Topics API in conjunction with other data gathered about a site visitor, either via other APIs, via out of band means, and/or via existing tracking technologies in place at the same time, such as fingerprinting.

We appreciate the in-depth privacy analyses of the API that have been done so far [by Google](https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf) and [by Mozilla](https://mozilla.github.io/ppa-docs/topics.pdf). If work on this API is to proceed, it would benefit from further analysis by one or more independant (non-browser engine or adtech) parties.

Further, if the API were both effective and privacy-preserving, it could nonetheless be used to customise content in a discriminatory manner, using stereotypes, inferences or assumptions based on the topics revealed (eg. a topic could be used - accurately or not - to infer a [protected characteristic](https://w3ctag.github.io/privacy-principles/#hl-sensitive-information), which is thereby used in selecting an advert to show). Relatedly, there is no binary assessment that can be made over whether a topic is "sensitive" or not. This can vary depending on context, the circumstances of the person it relates to, as well as change over time for the same person.

Giving the web user access to browser settings to configure which topics can be observed and sent, and from/to which parties, would be a necessary addition to an API such as this, and go some way towards restoring agency of the user, but is by no means sufficient. People can become vulnerable in ways they do not expect, and without notice. People cannot be expected to have a full understanding of every possible topic in the taxonomy as it relates to their personal circumstances, nor of the immediate or knock-on effects of sharing this data with sites and advertisers, and nor can they be expected to continually revise their browser settings as their personal or global circumstances change.

A portion of topics returned by the API are proposed to be randomised, in part to enable plausible deniability of the results. The usefulness of this mitigation may be limited in practice; an individual who wants to explain away an inappropriate ad served on a shared computer cannot be expected to understand the low level workings of a specific browser API in a contentious, dangerous or embarrassing situation (assuming a general cultural awareness of the idea of targeted ads being served based on your online activities or even being "listened to" by your devices, which does not exist everywhere, but is certainly pervasive in some places/communities).

While we appreciate the efforts that have gone into this proposal aiming to iteratively improve the privacy-preserving possibilities of targeted advertising, ultimately it falls short. In summary, the proposed API appears to maintain the status quo of inappropriate surveillence on the web, and we do not want to see it proceed further.

Sangwhan: the state with 3p cookies is worse.. this is a counter proposal

Yves: change Amy proposes about filtering on the set of topics you're willing to share is a good way to make it better.

Amy: I agree it would be an improvement.. But still possible to identify individuals from a set.

Dan: it's presented as a way to do privacy preserving targeted advertising but it doesn't actually acheive that goal. Jumping through a lot of hoops and doesn't do it anyway. Something we should call out. If a lot of people say 'we use the Topics API that means we preserve your privacy' and that's not the case, that's a bad thing for the web

Sangwhan: the information theory perspective.. if what was presented is wrong that's a concern ....

Amy: Not necessarily that it's wrong. Moz analysis is from a different perspective. Google analysis doesn't cover the worst case scenario. The google paper is correct but just not the whole picture.

Sangwhan: please add the user opt-out story into [this comment]. Users should be able to opt out from certain topics. It could be something culturatly sensitive.

Amy: lots of reasons... Also shared computers.

Dan: only one user of the browser.

Amy: plausibile deniabolity - random topics sometimes selected - that doesn't help. If we have a collective idea that the web is spying on you then ... given that "it was a randomy selected ad" won't cut it.

Sangwhan: Also because it's so low ...

Amy: Martin's paper goes into it - you get the same set of topics in the period of a week - in the worst csase it would be 3 weeks before you'd find a topic is of legit interest.

Sangwhan: the more origins involved ...

Amy: you can spin up subdomains - game it in this way.

Sangwhan: using sampling to get the topics...

Amy: on the point of saying "i want to opt out" - it's a good setting - but you can't possibly know what things are going to get you into trouble in a given moment. You don't make time in your week to curate your ad topics for sensitive topics.

Dan: Don Marti sometimes makes the point thath people who don't think t hey're in a protected class or that they've vulnerable people could suddenly become a vulnerable person, if they're in an abusive situation, or the law changes .. eg. someone who suffers a miscarriage isn't going to immediately sit down and curate their ad preferences

Amy: and if they're in a particular jurisisdiction, suddenly it's dangerous if 'baby stuff' is no longer an interest overnight

Sangwhan: the more efforts like this get delayed the more the current state continues.

Amy: I don't think that needs to be true.

discussion on dynamics of advertising and browsers

Amy: I don't think it's about 3rd party cookies - it's about surveillance on the web and tracking. if we take away 3rd party cookies and replace it with something that has the same problems then that's not OK.

Sangwhan: I'd like [us] to give actionable feedback. I think what webkit and mozilla raised are all valid concerns.

Amy: also getting feedback from an independent third party. I think we're getting iteratively closer. We're moving slowly in the right direction.

Dan: +1

amy to do some edits and check with group before posting

Discussed Jan 1, 2023 (See Github)

Amy: negative signals from Mozilla and Webkit

Dan: move to breakout C

Comment by @rhiaro Jan 12, 2023 (See Github)

The intention of the Topics API is to enable high level interests of web users to be shared with third parties in a privacy-preserving way in order to enable targeted advertising, while also protecting users from unwanted tracking and profiling. The TAG's initial view is that this API does not achieve these goals as specified.

The Topics API as proposed puts the browser in a position of sharing information about the user, derived from their browsing history, with any site that can call the API. This is done in such a way that the user has no fine-grained control over what is revealed, and in what context, or to which parties. It also seems likely that a user would struggle to understand what is even happening; data is gathered and sent behind the scenes, quite opaquely. This goes against the principle of enhancing the user's control, and we believe is not appropriate behaviour for any software purporting to be an agent of a web user.

The responses to the proposal from Webkit and Mozilla highlight the tradeoffs between serving a diverse global population, and adequately protecting the identities of individuals in a given population. Shortcomings on neither side of these tradeoffs are acceptable for web platform technologies.

It's also clear from the positions shared by Mozilla and Webkit that there is a lack of multi-stakeholder support. We remain concerned about fragmentation of the user experience if the Topics API is implemented in a limited number of browsers, and sites that wish to use it prevent access to users of browsers without it (a different scenario from the user having disabled it in settings).

We are particularly concerned by the opportunities for sites to use additional data gathered over time by the Topics API in conjunction with other data gathered about a site visitor, either via other APIs, via out of band means, and/or via existing tracking technologies in place at the same time, such as fingerprinting.

We appreciate the in-depth privacy analyses of the API that have been done so far by Google and by Mozilla. If work on this API is to proceed, it would benefit from further analysis by one or more independant (non-browser engine or adtech) parties.

Further, if the API were both effective and privacy-preserving, it could nonetheless be used to customise content in a discriminatory manner, using stereotypes, inferences or assumptions based on the topics revealed (eg. a topic could be used - accurately or not - to infer a protected characteristic, which is thereby used in selecting an advert to show). Relatedly, there is no binary assessment that can be made over whether a topic is "sensitive" or not. This can vary depending on context, the circumstances of the person it relates to, as well as change over time for the same person.

Giving the web user access to browser settings to configure which topics can be observed and sent, and from/to which parties, would be a necessary addition to an API such as this, and go some way towards restoring agency of the user, but is by no means sufficient. People can become vulnerable in ways they do not expect, and without notice. People cannot be expected to have a full understanding of every possible topic in the taxonomy as it relates to their personal circumstances, nor of the immediate or knock-on effects of sharing this data with sites and advertisers, and nor can they be expected to continually revise their browser settings as their personal or global circumstances change.

A portion of topics returned by the API are proposed to be randomised, in part to enable plausible deniability of the results. The usefulness of this mitigation may be limited in practice; an individual who wants to explain away an inappropriate ad served on a shared computer cannot be expected to understand the low level workings of a specific browser API in a contentious, dangerous or embarrassing situation (assuming a general cultural awareness of the idea of targeted ads being served based on your online activities or even being "listened to" by your devices, which does not exist everywhere, but is certainly pervasive in some places/communities).

While we appreciate the efforts that have gone into this proposal aiming to iteratively improve the privacy-preserving possibilities of targeted advertising, ultimately it falls short. In summary, the proposed API appears to maintain the status quo of inappropriate surveillence on the web, and we do not want to see it proceed further.

Comment by @darobin Jan 14, 2023 (See Github)

Quick question: according to this statement and to the following thread, Google believe that they have further arguments relating to Topics that have not been taken into consideration by the TAG. Is the expectation that these arguments will be brought here so that the TAG may review them? What's more if the TAG still finds Topics problematic after these arguments are brought forth, is the plan to withdraw the proposal, or to ship no matter what?

Comment by @torgo Jan 27, 2023 (See Github)

For avoidance of doubt – the TAG has not closed this review. We fully expect to continue the discussion and review further changes with the hope that the issues we've raised above will be addressed.

Comment by @michaelkleber Feb 9, 2023 (See Github)

Hello TAG, thank you for continuing to review and we appreciate the dialogue. Your response to this proposal, and the related responses from WebKit and Mozilla, make it clear that without changes or new information, Topics in its current form is not likely to gain multi-browser support or progress along the W3C standards track. In the long term, the Privacy Sandbox effort aims to converge with other browsers on APIs that we all agree are appropriate for the Web and useful for online advertising without cross-site tracking.

In the near term, however, Chrome is unable to remove third-party cookies (expected in 2024) without making some privacy-improving replacement technologies available. The Topics API will remain part of the collection of APIs that we expect the ads ecosystem to test during 2023 — and we hope the testing feedback we hear and the implementer experience we gain will be valuable contributions in future work towards cross-browser standards work in this space, however long that takes.

Regarding your comments on the current state of the API, we have some responses and some disagreements, and we would be happy to continue the conversation. Your ideas for iteratively improving the current API will be welcome, if you wish to provide them, even if our need to balance multiple interests prevents us from adopting your recommendations wholesale. We do appreciate the feedback and will always look to it for elements we could incorporate in the meantime. We would be particularly interested in any thoughts on how to modify our API now to ease a transition to a standards approach for interest-based advertising in the future, although it may be too early for this to be a relevant design question.

We understand the TAG is busy, and perhaps you are not interested in further review or discussion on this proposal until its status materially changes (e.g. we gain multi-implementer support, we get new data on utility from ad ecosystem testing, adoption of TEE processing changes the privacy infrastructure landscape, etc). If that is the case, we look forward to picking up a version of this discussion again in the future, likely sometime after Chrome has removed 3rd-party cookies.

Alternatively, if you want to continue discussion on the details of your review, we are happy to make the case for why we continue to feel that replacing 3rd-party cookies with Topics is a tremendous step forward in web privacy despite its trade-offs.

Comment by @torgo Feb 13, 2023 (See Github)

Thanks @michaelkleber just briefly: we're definitely interested in further discussion on the details which we hope can lead to improvements in the areas we've outlined.

Comment by @michaelkleber Feb 16, 2023 (See Github)

Great, thank you @torgo, we will get back to you — may take us a few weeks.

Comment by @lknik Mar 24, 2023 (See Github)

@michaelkleber

In the near term, however, Chrome is unable to remove third-party cookies (expected in 2024) without making some privacy-improving replacement technologies available. The Topics API will remain part of the collection of APIs that we expect the ads ecosystem to test during 2023

Interesting. Could you shed some light on why is that? Is that due to the competition/etc proceedings? That said, these arguments would be non-technical, if so?

Comment by @jkarlin Apr 10, 2023 (See Github)

Hey folks. Thanks for the discussion so far. Specific responses inline:

The Topics API as proposed puts the browser in a position of sharing information about the user, derived from their browsing history, with any site that can call the API. This is done in such a way that the user has no fine-grained control over what is revealed, and in what context, or to which parties. It also seems likely that a user would struggle to understand what is even happening; data is gathered and sent behind the scenes, quite opaquely.

Note that the number of sites that can both call the API, and receive an unfiltered response, is quite small. This is because the caller would have to have observed the user on a site about that topic in the past to get through the filter. The vast majority of sites that can call the API will actually receive an empty list. For more details about this observer-based filtering, see this part of the explainer.

Both users and websites can opt out of the Topics API. Clearing any browsing history prevents those sites from affecting the user’s generated Topics. Generally speaking, UX is not part of specification discussion. That said, there is UX provided within Chrome settings to opt out of individual Topics that have been selected, and we’re looking into UX to opt out of any given topic preemptively. Your criticisms all apply to third-party cookies, but in each case Topics offers a very large step forward in understanding and control.

The responses to the proposal from Webkit and Mozilla highlight the tradeoffs between serving a diverse global population, and adequately protecting the identities of individuals in a given population. Shortcomings on neither side of these tradeoffs are acceptable for web platform technologies.

It is important to point out the underlying physics that we all must adhere to. Any proposal in this space (by any company) has some notion of a data leakage rate built in. This is true regardless of the choice of privacy mechanism. As time passes, the leakage is additive, and eventually a cross-site identifier can be derived. It’s a matter of how long it takes to get there. This point of view applies to WebKit's PCM and Mozilla + Meta's IPA proposals as well: every API here is about tradeoffs.

For the Topics API, our study suggests that it would take tens of weeks of revisiting the same two pages to re-identify the vast majority of users across those pages using only the data from the API. We consider that a substantial win in privacy compared to third-party cookies, where cross-site re-identification takes a single visit. We could make it worst case instead of average case analysis instead (and crank up the random noise), but at a trade-off with utility. These types of analysis and trade-offs are what we expect to continue tuning going forward.

It's also clear from the positions shared by Mozilla and Webkit that there is a lack of multi-stakeholder support. We remain concerned about fragmentation of the user experience if the Topics API is implemented in a limited number of browsers, and sites that wish to use it prevent access to users of browsers without it (a different scenario from the user having disabled it in settings).

We’re interested in finding solutions to the use case, especially those that garner multi stake-holder support. That said, the concerns you mention about browser fragmentation do not seem to have prevented similar privacy-related launches in Mozilla or WebKit that increased fragmentation. And a Chrome migration from third-party cookies to an API like Topics will bring browser behavior much closer together, not drive it further apart.

We are particularly concerned by the opportunities for sites to use additional data gathered over time by the Topics API in conjunction with other data gathered about a site visitor, either via other APIs, via out of band means, and/or via existing tracking technologies in place at the same time, such as fingerprinting.

If these sorts of covert tracking practices are in use, then the Topics API will not provide any new information at all — recall that any party that can recognize a person across the various sites in which the party is embedded already has a large superset of the information available to the Topics algorithm.

While extra correlations might be inferred beyond what the taxonomy provides, Topics has significantly better protections against inferring sensitive correlations, compared to third-party cookies or alternative tracking technologies like fingerprinting possible across all browsers.

Further, if the API were both effective and privacy-preserving, it could nonetheless be used to customise content in a discriminatory manner, using stereotypes, inferences or assumptions based on the topics revealed (eg. a topic could be used - accurately or not - to infer a protected characteristic, which is thereby used in selecting an advert to show). Relatedly, there is no binary assessment that can be made over whether a topic is "sensitive" or not. This can vary depending on context, the circumstances of the person it relates to, as well as change over time for the same person.

These concerns are also discussed in our explainer. In the end, what can be learned from these human-curated topics derived from pages that the user visits is probabilistic, and far less detailed than what cookies can provide with precise cross-site identifiers. While imperfect, this is clearly better for user privacy than cookies. We understand each user cares about different things, and this is why we give controls including to turn off certain topics or to turn off Topics entirely.

Giving the web user access to browser settings to configure which topics can be observed and sent, and from/to which parties, would be a necessary addition to an API such as this, and go some way towards restoring agency of the user, but is by no means sufficient. People can become vulnerable in ways they do not expect, and without notice. People cannot be expected to have a full understanding of every possible topic in the taxonomy as it relates to their personal circumstances, nor of the immediate or knock-on effects of sharing this data with sites and advertisers, and nor can they be expected to continually revise their browser settings as their personal or global circumstances change.

The UX is still evolving here but we already have the ability for users to opt out of the API, and to opt out of individual topics. I generally expect users that have sensitivities to Topics to disable the API as a whole, rather than ferret out individual concerns. You seem to be taking this discussion from the perspective that third-party cookies simply do not exist on the web and that Topics is introducing these behaviors, whereas we’re considering the substantial gain in privacy from where we are with third-party cookies.

A portion of topics returned by the API are proposed to be randomised, in part to enable plausible deniability of the results. The usefulness of this mitigation may be limited in practice; an individual who wants to explain away an inappropriate ad served on a shared computer cannot be expected to understand the low level workings of a specific browser API in a contentious, dangerous or embarrassing situation (assuming a general cultural awareness of the idea of targeted ads being served based on your online activities or even being "listened to" by your devices, which does not exist everywhere, but is certainly pervasive in some places/communities).

I wouldn’t expect users to understand that probabilistic deniability is built into the privacy technology that we use today. That said, you seem to be suggesting that personalized advertising in general is bad because someone might look over the user’s shoulder or use their computer and the user might be embarrassed. I’d note that 1) sharing a computer has far greater embarrassment potential, 2) personalized advertising comes about in many ways (1p data, contextual data, inferences, geo ip, etc) and 3) personalized advertising is often wrong today even with the much more powerful third-party cookies.

I appreciate your feedback and remain open to suggestions you might have on how the API might improve.

Edit: I meant to attribute IPA to Meta + Mozilla but accidentally omitted Meta. Fixed.

Comment by @martinthomson Apr 11, 2023 (See Github)

These types of analysis and trade-offs are what we expect to continue tuning going forward.

It seems to me like this is where there is a large disconnect. Implicit in the Topics design is an assumption that this sort of trade-off has been agreed. PATCG has talked at some length about this, but that is a very narrow slice of the larger community, and very far from representative.

Reaching this conclusion is not natural. Assuming that we all agree that a trade-off is necessary is presumptuous. In part, addressing this assumption is why PATCG is chartered to produce a document that lays out the principles by which it guided its work. The question of whether or not to entertain trade-offs is the primary reason for that work item - at least from my perspective - because none of this work to support advertising works without that work as a foundation.

It might be that specific proposals (like this one) fail on grounds that are less about the principle, but more about execution. For instance, Topics proposes a weekly release of a small amount of data, with some amount of probabilistic protection. You can model that protection as $(\epsilon, \delta)$-differential privacy with $\epsilon\approx10.4$. Maybe the disagreement is about the amount of data release or the value of $\epsilon$. But that doesn't mean that we have consensus about whether there is a trade-off in the first place.

This sounds like a subject where the TAG might add value. Not in terms of determining values for $\epsilon$ - no one can solve that problem - but in terms of convening a discussion in the community about this problem. Then perhaps we might have some better agreement about what those trade-offs might need to look like - if they exist at all.


p.s., Please don't forget Meta when it comes to IPA.

Comment by @dmarti Apr 11, 2023 (See Github)

It might be an oversimplification to consider Topics API purely in the advertising context. It could turn out to be applied more often in personalized pricing, learning management systems, law enforcement, or other areas where site operators want to categorize users by web history and can accept some noise.

Comment by @michaelkleber Apr 11, 2023 (See Github)

Hi Martin: I think we're largely in agreement here. The W3C clearly has not come to anything like agreement regarding the kind of trade-offs made in the Topics design. And indeed Privacy Sandbox includes an entirely separate proposal, FLEDGE, which makes a different set of design decisions here, for just that reason.

As I said above, "we hope the testing feedback we hear and the implementer experience we gain will be valuable contributions in future work" — and that work absolutely includes the sorts of principles discussions you're asking for.

If the TAG wants to pause this review until they / the PATCG / etc. come to some consensus on a higher-level position that would inform it, that's entirely reasonable. I feel like I already made a similar offer, and @torgo indicated a preference to keep the review going.

As @jkarlin's response indicates, we think the Jan 12 "initial view" reflects a variety of misunderstandings on specific implementation choices, and we strongly disagree with the conclusion that it "appears to maintain the status quo". That seems like worthwhile clarification, irrespective of what overarching principles we all agree to use to measure these sorts of proposals in the future.

Comment by @michaelkleber Apr 14, 2023 (See Github)

Hi @lknik, you asked about our need to offer some privacy-improving replacement technologies.

For a long discussion of this, please take a look at the blog post here: https://privacysandbox.com/news/working-together-to-build-a-more-private-internet.

From a POV more focused on web standards: as with any other non-backwards-compatible change to the web platform, we can only proceed with a deprecation and removal after considering the potential breakage. See https://www.chromium.org/blink/launching-features/#feature-deprecations for full details of the Blink process, but note for example that the guidelines ask "What is the cost of removing this feature?" and "What is the suggested alternative? There should be another way for developers to achieve the same functionality." A change that would cause most web sites to lose half of their revenue, without any privacy-improving alternative, is not compatible with our removal process.

You mention "the competition/etc proceedings", and certainly the Commitments that Google made to the UK's Competition and Markets Authority are part of our overall considerations. But our stance here always mirrored theirs: disrupting the advertising ecosystem without a reasonable privacy-improving replacement would harm too many parties — publishers, advertisers, technology providers, and people.

Comment by @jkarlin May 2, 2023 (See Github)

FYI a draft spec for the Topics API is available. Please let me know if you'd like me to create a separate spec review thread for it.

Discussed Jun 1, 2023 (See Github)

reviewing Martin's comment

Amy: from one of Martin's linked papers: "any replacement of third-party cookies may have to seek other avenues to achieve privacy for the web". This paper is specifically about Topics.

Dan: lack of multi-stakeholder

Dan: Topics impacts every single user of the web, everyone who uses a browser other than chrome. That's a big group. Micahel says: "Any UA that wants to pass feature detection but not give out information could implement the API to return an empty set of topics every time." - this means other browsers still need to implement. Forcing webkit and mozilla to implement the api in order to avert the situation where websites say "best viewed in chrome".

Yves: good summary. The fact you need to force others who don't want that api to implement it so you avoid having two different web, a chrome web and other browsers.

Yves: is site classification local in the browser, or sent to google?

Sangwhan: it's entirely local. [Demonstrates current build]

Amy: I understood that sites also need to opt in to being included in the topic categorisation for users. But seeing this, it's determining topics from sites which have not called the API. Did I misunderstand this?

Sangwhan: maybe that is just about sites having access to topics by using the topics?

Amy: this is problematic. Sites who know they serve a vulnerable population may not want to be used for this to protect their users.

Sangwhan: this version in the I2S does not provide the opt out for the topic as far as I can see it. Strange, because it's normative in the spec. That's a definite problem. Shipping something non-conformant is bizarre.

Sangwhan: imagine a website is misclassified by the web browser, so there is unexpected classification of cohorts that there is no way to opt out from. The fact that my personal website and 4chan are in the same cohort ("online communities" in this case) is not very comforting, and there is no way for me to. or if I want to change topics. Seems like a design issue. Seems user hostile.

Sangwhan: thoughts on document vs window? If you want to persist topics .. if you put it on window there's a possibility of you being able to use it in service worker process, if it's on document you have to use it in document. For extensibility, window makes more sense, or worker global scope.

<blockquote> The following comment has come out of TAG discussions this week:

First of all, thanks to @martinthomson for those pointers to two relevant papers.

We've continued to discuss this API across several calls this week. @cynthia also demonstrated the current implementation.

We remain concerned about the points recently raised about interop. Especially given the lack of multi-stakeholder buy-in for this API, how can we really protect against a future where advertising based sites tell users they must switch to a browser that implements Topics? @michaelkleber you've said "Any UA that wants to pass feature detection but not give out information could implement the API to return an empty set of topics every time" however that still implies other UAs would be required to implement the API (at least minimally) when they might not otherwise do so, in order to mitigate privacy harms for their users - so there is a risk here.

We remain concerned about the ability of users to give meaningful consent for their interests to be calculated and tracked from their browsing activity. The spec says:

suggestion that user agents provide UX to give users choice in which Topics are returned

and refers to a "user preference setting" in several places.

We have inferred from this that users are able to disable particular topics in the settings, or the API as a whole, but we don't think that either of these potential configuration options are good enough to protect against potential privacy harms, particularly for marginalised groups. A person's status as vulnerable, at-risk, or marginalised can change over time, and this isn't something most people are necessarily aware of or paying attention to in their day-to-day web use, and nor is it reasonable to expect people to regularly review their browser settings with this in mind. Thus, "opt out of individual topics" is not sufficient to offer meaningful consent to being tracked in this way. Further, from what we have seen of the API as implemented so far, there are no user preference settings relating to specific individual topics. We raised this in our initial review, and don't feel it has yet been considered with the depth warranted.

This issue intersects with others, for example, as pointed out in the Webkit review that the topics list represents a western cultural context, and that the mechanism for sites being classified according to these categories is unclear. We understand from the spec that site classification is automated, based on the domain, but the mechanism for doing this remains opaque, and it is not clear there is any recourse for sites which are misclassified.

We saw in the current implementation that sites in a user's browsing history which do not call the Topics API were being classified under particular topics. We had been led to believe that sites opt-in to being classified by calling the API ("Sites opt in via using the API. If the API is not used, the site will not be included." in the initial review request), but perhaps we misunderstood, or this has changed. The spec refers to "site opt outs", although we weren't able to find how they do this in the spec (please could you point us to the right place if we missed it?).

Questions:

  • Do you have a response to the points raised in Webkit's review?
  • Do you have any analysis or response to the papers that Martin pointed to?
  • Please could you elaborate if it is in fact the case that all sites browsed by a user are included by default as input data for generating a user's topics list?
    • If this is the case, what recourse is there for sites which are misclassified?
  • Can you clarify the situation with regard to definition of user preference / opt out?
  • Have you considered dropping the part where topics are calculated from browsing history, and instead entirely configured by the user in their browser settings? This would be much closer to people being able to meaningfully opt in to targeted advertising, and would make several of the other concerns raised moot.
</blockquote>
Discussed Jun 1, 2023 (See Github)

discussion on what happened last week in the privacy sandbox related call

Rossen: they have their third LGTM in BlinkDev so from an API owners pov they are good to go. Suggest going to review.

Yves: from my recollection - we're not against it but there are issues needing to be fixed - like being able to exclude topics for all sites - e.g. "I don't want health topics". not well enough defined. They have to say that topics will be a better replacement than shared storage, etc.. which replicate mechanism of third party cookies.

Dan: trials?

Rossen: they have done the trials.

Peter: There's a header - there's a lot of dicussions on setting topics to empty. If you send the header that opts out of FLOC that apparently opts you out of TOPICS as well.

Dan: is it a FLOC-specific header?

Peter: It's a permission policy setting- a lot of apps e.g. gitlab, wordpress, mastodon, set that to NO by default. Apparently TOPICS respects that header.

Rossen: there's a feature detection for topics API - as a developer at a site - I'm providing a new feature - as a developer I can segregate our customers - and thereby tell people to come back with a browser that supports it..

Dan: see here for discussion with m. kleber on topcis... Michael said "topics in incog mode or turned off works the same as if the topics api does for a regular user who has only recently started browsing". So according to him it's not detectable.

Peter: it's true for a browser that supports topics incog mode is not detectable. But what if I'm using a browser that doesn't support it at all that is detectable.

Rossen: right.

Dan: the browser could return "yes" but return false data.

Rossen: that assumes browsers will do extra work.

Yves: you can use introspection to see if it's been implemented or not...

Peter: still forces other browsers to do work. They can do this but it still puts an onus on them to do something to prevent losing market share.

Dan: agree this a key issue - we need to make sure we do not end up in a "works best in chrome" scenario for other browsers.

discussion on Chromium-based vs. non-chromium based browsers

Peter: I'm presuming it'll be there by default for browsers using chromium...

Dan: this is an interop issue, as Rick Byers asked us to comment on.

Comment by @nightpool Jun 19, 2023 (See Github)

Hi all, Google has publicly committed to binging the Topics API live in July 2023.[0][1]. Where does that leave this review? There has been (to my knowledge) no multi-stakeholder browser community agreement or consensus related to the Topics API. How does Chrome see this launch as fitting into their commitment to an open and standards-based web platform? Why hasn't there even been at least a public I2S thread about this feature, if Chrome has already publicly committed to shipping it in July?

Comment by @cynthia Jun 19, 2023 (See Github)

There will be an update on this soon.

As for how Chrome is treating this concerning shipping in the broader web, this is the wrong venue to ask. I would suggest asking either the chromium-dev or blink-dev mailing list about that.

Comment by @RByers Jun 23, 2023 (See Github)

There is now an I2S on blink-dev. Input on the interop risk is welcome there.

Comment by @atanassov Jun 26, 2023 (See Github)

Reading through the i2s, discussion here and minutes of our last call with Michael Kleber, a question that didn't come clearly answered to me is - how can we prevent any high-volume libs/platforms on the web result in broken interop? Feature detection will enable singling out UA missing the feature and lead to broken or at worst disabled experience. I'm sure this is covered somewhere but couldn't find it.

Comment by @michaelkleber Jun 26, 2023 (See Github)

Any UA that wants to pass feature detection but not give out information could implement the API to return an empty set of topics every time. That's the intended behavior in Chrome when the user hasn't browsed much in recent epochs, or is in incognito mode, or has disabled the API, or when the current API caller happens to be excluded from seeing this page's actual topics because of per-caller topic filtering, etc. So it seems like a safe way to avoid the interop breakage risk.

Comment by @martinthomson Jun 27, 2023 (See Github)

More information regarding the privacy aspects: Interest-disclosing Mechanisms for Advertising are Privacy-Exposing (not Preserving) and On the Robustness of Topics API to a Re-Identification Attack. My initial read of both suggests that these could be construed either as vindicating Google's position or as repudiating it, depending on your perspective.

I'll reiterate points I've made before: the effect of platform changes on aggregate metrics only makes sense to the extent that the privacy impact is uniform or approximately uniform. Topics is very specifically individualized and so appears to be far from uniform (a position supported by both papers). Consequently, though the privacy impact of the API for most people might be modest, there are some for whom the effect is significant.

Comment by @jkarlin Jun 27, 2023 (See Github)

We are actively involved with the research community, presenting our research on the privacy properties of the Topics API in papers, reports, and workshop presentations. We are happy to see more external members of the research community engaging with this area.

We're protecting users against general tracking on the web by making it difficult or expensive to track users at scale. These papers show that we're successfully doing so with the Topics API. While the information gained per topic is not uniform, it is so much better than where we are today with third party cookies that we feel that it's a great step forward in protecting users while also funding the sites that they enjoy visiting.

Comment by @plinss Jun 29, 2023 (See Github)

The following comment has come out of TAG discussions this week:

First of all, thanks to @martinthomson for those pointers to two relevant papers.

We've continued to discuss this API across several calls this week. @cynthia also demonstrated the current implementation.

We remain concerned about the points recently raised about interop. Especially given the lack of multi-stakeholder buy-in for this API, how can we really protect against a future where advertising based sites tell users they must switch to a browser that implements Topics? @michaelkleber you've said "Any UA that wants to pass feature detection but not give out information could implement the API to return an empty set of topics every time" however that still implies other UAs would be required to implement the API (at least minimally) when they might not otherwise do so, in order to mitigate privacy harms for their users - so there is a risk here.

We remain concerned about the ability of users to give meaningful consent for their interests to be calculated and tracked from their browsing activity. The spec says:

suggestion that user agents provide UX to give users choice in which Topics are returned

and refers to a "user preference setting" in several places.

We have inferred from this that users are able to disable particular topics in the settings, or the API as a whole, but we don't think that either of these potential configuration options are good enough to protect against potential privacy harms, particularly for marginalised groups. A person's status as vulnerable, at-risk, or marginalised can change over time, and this isn't something most people are necessarily aware of or paying attention to in their day-to-day web use, and nor is it reasonable to expect people to regularly review their browser settings with this in mind. Thus, "opt out of individual topics" is not sufficient to offer meaningful consent to being tracked in this way. Further, from what we have seen of the API as implemented so far, there are no user preference settings relating to specific individual topics. We raised this in our initial review, and don't feel it has yet been considered with the depth warranted.

This issue intersects with others, for example, as pointed out in the Webkit review that the topics list represents a western cultural context, and that the mechanism for sites being classified according to these categories is unclear. We understand from the spec that site classification is automated, based on the domain, but the mechanism for doing this remains opaque, and it is not clear there is any recourse for sites which are misclassified.

We saw in the current implementation that sites in a user's browsing history which do not call the Topics API were being classified under particular topics. We had been led to believe that sites opt-in to being classified by calling the API ("Sites opt in via using the API. If the API is not used, the site will not be included." in the initial review request), but perhaps we misunderstood, or this has changed. The spec refers to "site opt outs", although we weren't able to find how they do this in the spec (please could you point us to the right place if we missed it?).

Questions:

  • Do you have a response to the points raised in Webkit's review?
  • Do you have any analysis or response to the papers that Martin pointed to?
  • Please could you elaborate if it is in fact the case that all sites browsed by a user are included by default as input data for generating a user's topics list?
    • If this is the case, what recourse is there for sites which are misclassified?
  • Can you clarify the situation with regard to definition of user preference / opt out?
  • Have you considered dropping the part where topics are calculated from browsing history, and instead entirely configured by the user in their browser settings? This would be much closer to people being able to meaningfully opt in to targeted advertising, and would make several of the other concerns raised moot.
Comment by @cynthia Jun 30, 2023 (See Github)

As API-surface feedback was also promised on "document, navigator, or somewhere else", adding that to the review comment above. We briefly discussed this, and the current thoughts on where the API belongs are somewhat inconclusive.

While navigator might sound logical given that it will be exposing a lossy representation of the browsing history, this also implies it is global to the user agent - I'm not sure how that would hold in the long term. If there is a necessity to change the behavior so that the API is contextual (e.g. different topics based on the caller's origin), it would definitely be out of place. Also, there are a lot of things somewhat unnecessarily hanging off of navigator, so bloat would be another reason.

This leaves document as the natural location for access via the browsing context. One question on the API surface would be whether there would be a reason to access topics from a worker (e.g. for background/off-thread/SW-based bidding), in which case you would probably want to expose it to WorkerGlobalScope as well. We don't know if it would be a critical use case, but if the ad tax in the main thread can go down as a side effect of this, it would be worth considering.

Discussed Jul 1, 2023 (See Github)

Dan: We left many 2 major comments last week and have received no responses yet.

Comment by @hadleybeeman Aug 2, 2023 (See Github)

Hi all. We've looked at this during our W3CTAG f2f. We are still hoping for replies to our previous two comments from @plinss and @cynthia. Any thoughts?

Comment by @siliconvoodoo Aug 2, 2023 (See Github)

Let's sum this up in very lay man terms: Topics = google money. It's not in users interest, nor should it be at the agenda of a moral society. We, the people, want an integrally anonymized internet. If your business model can't survive because you can't monetize on the back of the data of your visitors, go do something more useful for society. Stochastic plausible deniability is whitewashing of an otherwise dystopian behavior. Pretension that "studies" demonstrated a desire from users to have targeted ads, is just done on the back of uneducated respondents about the risks of identifiability, and freedom of the web in general. And an "improvement from cookies" is just a sophistry as explained by brave's devs on their blog, I quote

Google claims that these systems, [...], improve privacy because they’re designed to replace third-party cookies. The plain truth is that privacy-respecting browsers [...] have been protecting users against third-party tracking (via cookies or otherwise) for years now.

Google’s proposals are privacy-improving only from the cynical, self-serving baseline of “better than Google today.” Chrome is still the most privacy-harming popular browser on the market, and Google is trying to solve a problem they introduced by taking minor steps meant to consolidate their dominance on the ad tech landscape. Topics does not solve the core problem of Google broadcasting user data to sites, including potentially sensitive information.

Comment by @jkarlin Aug 2, 2023 (See Github)

Thanks for the feedback. I’ve added responses to both plinss and cynthia below:

Do you have a response to the points raised in Webkit's review?

They are similar in nature to what has already been brought up by TAG and discussed in this thread. If there are particular questions I’d be happy to respond.

Do you have any analysis or response to the papers that Martin pointed to?

Yes, please see my previous comment. To add to that, I think it’s important to understand that all of the papers are using different data sets with different modeling assumptions on evolution of user interests, number of users present etc. Our own research utilized real user data, while the others understandably had to generate synthetic web traces and interests, which Jha et al. notes may not be representative of the general population. Nonetheless, they all found that it took a large number of epochs to reidentify the majority of users across sites.

Please could you elaborate if it is in fact the case that all sites browsed by a user are included by default as input data for generating a user's topics list? If this is the case, what recourse is there for sites which are misclassified?

This is not the case. Only sites that call the API are included as input to generating the user’s topics list.

Can you clarify the situation with regard to definition of user preference / opt out?

Users can opt out of the API wholesale within Chrome's privacy preferences. They can also disable topics that have been selected. In the future, they will be able to preemptively remove topics.

Sites can choose not to use the API, in which case user visits to their site will not be included in topics calculation. Sites can further ensure that nobody on their site calls the API via permission policy.

Have you considered dropping the part where topics are calculated from browsing history, and instead entirely configured by the user in their browser settings? This would be much closer to people being able to meaningfully opt in to targeted advertising, and would make several of the other concerns raised moot.

It’s been raised in our public meetings. Folks have raised multiple issues with such an approach. One is that user interests are dynamic, whereas settings are generally quite static. A second is that it seems like many users might not bother to configure this, even if doing so would improve their ads and the revenue of the sites they visit.

This leaves document as the natural location for access via the browsing context. One question on the API surface would be whether there would be a reason to access topics from a worker (e.g. for background/off-thread/SW-based bidding), in which case you would probably want to expose it to WorkerGlobalScope as well. We don't know if it would be a critical use case, but if the ad tax in the main thread can go down as a side effect of this, it would be worth considering.

Excellent, thanks for that guidance. It seems reasonable to expose the API to WorkerGlobalScopebut I don’t think it would alleviate any main thread costs, as the browsingTopics call itself is asynchronous and efficient. If developers start to ask for it, then we can consider adding it more seriously.

Comment by @siliconvoodoo Aug 3, 2023 (See Github)

What happens when one visits the China embassy website, they decide they don't like your topics too much and make your visa obtention difficult or impossible? Or USA for that matter, it regularly happens https://techcrunch.com/2019/09/02/denied-entry-united-states-whatsapp/.

Comment by @jkarlin Aug 3, 2023 (See Github)

@siliconvoodoo your hypothetical doesn't make sense. If the authorities were looking at your browser, surely they would be far more interested in your actual browsing history (readily available in the browser) than your topics? And if you cleared your history, then your topics would be cleared too.

Edit: Ah, I was looking at the article you linked to about phones being scanned and missed the first part about the website. In the website case, said website would a) have to have a third-party on it that observed you on such a site and is willing to share that information, b) that topic could very well be noise, c) the taxonomy is coarse grained with highly sensitive topics removed, and finally, compared to third-party cookies (what Chrome is trying to deprecate), topics conveys tiny amounts of information.

Comment by @dmarti Aug 3, 2023 (See Github)

@jkarlin Governments have a limited number of secret police hours to work with. Not all citizens and visitors can be fully observed at all times. Governments will be able to use a lightweight remote screening system like Topics API to identify people for further, more resource-consuming attention like a full device search. Clearing Topics API data or using a browser without Topics API turned on could also be a factor in selection. And the set of possible callers is big enough that we don't know in advance which callers will be owned by, or have a data sharing agreement with, which governments.

The Topics API taxonomy is free of obvious sensitive topics, but can still encode sensitive information (such as people who like music A and food B in country C)

Comment by @siliconvoodoo Aug 4, 2023 (See Github)

@jkarlin Your argument is trying to justify gas burning because coal is worse. When I'm telling you to go nuclear. It's a sort of tu quoque fallacy. The problem is systemic, don't compartment it in pieces to find ad-hoc ways to give incompatible whataboutisms in each case. Surely you must understand that authority directly looking at your device is one situation, which must be fought, a la Apple versus FBI case. But not the one I'm worried about with Topics, that would be remote mass profiling. The surface of attack against individuals just keep being magnified, third party cookies is not a standard of reference, as the brave blog explained. There are enough NGO to alert about our predicaments, Big Brother Watch, Quadrature du net, Snowden; fictions: black mirror, brave new world... I can't understand how you can willingly be participating in implementing pathways that enable dystopias, instead of pushing for a society with more safety nets against what's coming. Why are you not aiming at tor-like anonymity for all? No cookies, no Topics, fingerprinting jamming, IP spoofing... Surely you've noticed alt-right horrors becoming mainstream, you must be able to picture what fascists power houses a la 1984 are becoming enabled to do with all the technology that we provide them? Immigration officers are not motivated to take jobs because they have nothing else to do, it's because they love the power to be nationalist right wingers and deny brown skin people entry on fake excuses. In Russia it will be because you have gay topics. In China, because you visited uyghurs activists sites. In Iran because you have feminists interests... They don't have to access any device, they will have your profile on the database, gathered and refined any time you visit an end point controlled by the agencies. The more instruments you provide the more fascist the society veers, the more risk you expose us to citizen scores, unjust incarceration, visa denials, lynching, executions or worse.

Comment by @jyasskin Aug 4, 2023 (See Github)

There are two sides of the Topics API: the interface it exposes to pages to tell them what topics a user is probably interested in, and the interface it exposes to users to figure out or guess what topics they're actually interested in. The interface with pages is the traditional realm of web standards and involves a bunch of tradeoffs around the rate that pages can identify users, which @martinthomson has focused on above.

On the other hand, the interface with users is not generally something that we standardize or specify, instead giving user agents wide freedom to do what's best for their users, even if that's very different from what other UAs do. There are some limits here—if pages need to adapt to particular UI, it may be worth constraining the variation—but I don't think Topics falls into that category, and I suspect that the Topics spec actually has too much normative text specifying the user-facing part of its behavior.

Unfortunately, a large fraction of the TAG's review that @plinss recounted focuses on the particular UI that Chrome plans to ship, rather than the question of whether UAs have the freedom to do the right thing for their users. The TAG suggests that many users would appreciate if their interests were "entirely configured by the user in their browser settings", and I agree. As far as I can see, this UI is completely supported by the Topics API and would require no changes to the page-facing API or page behavior. Whether or not Chrome initially ships that UI, other browsers can do so, and Chrome could switch to it in the future. If I'm wrong, and that UI would require changes to the page-facing API, that would be a really good thing to point out soon, so that Chrome can ship a more-compatible API instead.

Comment by @chrisvls Aug 4, 2023 (See Github)

There are a few places where some of the assurances described in the beginning of this TAG discussion (quite a while ago now!), and even some more recently, don't quite track what is in the spec.

  • The discussion here states the taxonomy is coarse-grained, but the spec does not limit the depth of the taxonomy. From this discussion, it may be intentional that the spec would allow a taxonomy of a billion items. A bit more on this in a separate section below.

  • The discussion here states the “taxonomy name is its semantic meaning”, but the spec does not require that a topic have more than an integer ID. There is no requirement for a human-readable taxonomy name, nor for a utility for localizing that name.

  • The discussion here states that the taxonomy will exclude sensitive topics and hews to certain existing taxonomies, but the spec does not provide for any assurance or process for this.

  • The discussion here states that the spec has done as much as it can to allow for user consent, as this is generally left to UX implementation, but it is not clear that the permissions framework wouldn’t offer other options, such as treating each topic as a powerful feature or requiring powerful feature treatment of Topics.

  • The discussion here implies that security concerns are minimized because topics calculation will be done on the domain or url and occur locally in the browser, but the spec would allow the implementer to analyze the entire document in the context of the implementer’s choosing, including a server. While today’s specs allow server-based browser implementation, it is rare, and the marketing for the Privacy Sandbox features on-device processing pretty prominently.

To return to a moment to the assertion that a billion topics would mean no privacy loss because only five may be eligible for reporting out to sites.

  • A billion topics would invalidate all of the applicable analyses of the cross-identification probability. For example, the theoretical limit for leakage (log2(N,k) where N is taxonomy size and k is topics tracked) would go from ~6 bits to ~29 bits for the spec's current limitation that the TopicID be an integer.

  • Users can’t proactively review and opt-out of a billion topics.

  • The five-percent random results would not ensure that all topics had users if there are a billion topics.

  • It is not clear that the security and privacy concerns could be addressed by relying on the fact that cookies have effectively less data than topics.

  • The rewards for site collusion to game the system would be much higher. These may not have been explored in sufficient detail for a coarse-grained taxonomy, or even for trying to game multiple taxonomies; they certainly haven’t for a super-fined-grained one.

Finally one question one might ask: why comment on a spec when there seems so small a chance of cross-browser implementation?

As an enterprise customer of Google Workspace and Chrome, I am already subjected to small, creeping changes to the interpretation of the terms of service – and updates to those terms that are difficult to opt out of. So, even if Chrome is the only full implementer, I would rather see the critical privacy promises in a draft spec so that they stick for longer.

Also, it is really important that implementations match their marketing. There are big implications for the web as a whole if the most popular browser can market a feature as "only local calculation of coarse-grained topics" when we decide to opt in, but then, since they don't think it is a big deal, change that over time.

Comment by @shivanigithub Aug 23, 2023 (See Github)

FYI, Chrome plans to start gating topics API invocation behind the enrollment and attestation mechanism. (explainer, spec PR)

Discussed Jan 1, 2024 (See Github)

Martin: [insert short explanation of difference between topics and protected audience]

Matthew: Topics does the work of collecting data about the user and classifying them, and protected audience does that and also the auctions, attribution, and displaying the ad (with fenced frame)

Martin: correct. Topics gets away with that by collecting information and giving it away in the clear. Protected Audience takes the information into the box and it doesn't leave that box.

Amy: can we close it? We've left them feedback.. not sure what more we can do

Tess: is there harm reduction we can do? it takes a lot of effort

Matthew: if we say no, we have little control. If we try to harm reduce, we still have little control. Is there another appropriate venue?

Hadley: PATCG

Martin: we have lot of topics, but we're mostly discussing attribution because we have wide agreement that there's something salvageable. We're building trust on that.

Martin: when does the line end for standardisation on the web? Where does w3c or TAG influence end in an API? Someone builds an API .execute() that takes a string - "kitten" or "puppy" - is it up to the browser to decide whether to do it? Do we have an opinion on that?

Tess: stop energy is sometimes the right thing

Amy: ethical web principles, privacy principles are our way of having opinions on this

Hadley: referring back to these as authoritive documents is helpful

Tess: what differentiates w3c from other standards bodies? There are many SDOs which are value neutral places where companies can show up and hand them money and get specifications out. There's no vetting of what kind of work happens or holding it to any standards. W3C is not that place. It has never been value neutral, we're getting better at expressing those values. It's not a place without scope. In this case,it's in scope, but not within our values align

Martin: disagreements - do we have to resolve, or can we accept it? In this case we can say we don't have to worry about this - the fight can continue, but outside of w3c

Hadley: the progression of these reviews has been frustrating. Practically what can we do?

Tess: risk of losing wider interaction with google...

Martin: propose we just close it linking back to Amy's first reivew

Amy: +1

Martin: let's discuss in plenary

Matthew: what are the effects on the platform for any of these outcomes?

Amy: impossible to know

Discussed Feb 1, 2024 (See Github)

Martin: is there much more that needs to be said

Dan: the last substantive update was about putting topics behind the attestation mechanism. Jeffrey points out that our previous review focuses on UI

Martin: a lot of attention has been paid to how google is planning to harvest topics, and that is not part of the api surface. We're being asked to ignore the fact the browser will harvest the information about what the user is interested in and only focus on the api surface

Dan: that is an implication.. I don't agree we should be solely reviewing the api in isolation

Martin: the nature of the information it provides and who provides it is a core part of the capability

Dan: is there more to say since June?

Martin: I don't think so

Amy: don't think so

Tess: could we say something like while we still think this isn't something we should add to the platform we can see that you've improved it since they first brought it to us? They're trying to mitigate?

Amy: they haven't improved it..

Martin: it's got worse, the number of topics has increased

Peter: one of the risks we were afraid of in the first place. You can make this mechanism as bad as you want it to be.

Dan: we agree with privacy goals?

Martin: important to give a nod towards... acknowledge the fact that browsers are able to implement things outside of standards, engage with their users as they see fit. But we think this is nonetheless bad for the platform.

Dan: the explainer lists the privacy goals

Tess: are they meeting their own privacy goals? Are their meeting our interpretation of their own privacy goals? My impression is they think they've done it. Maybe we have different goals.. are we interpreting their own bar differently than they are? Can we make that explicit? Something they think is not in the threat model but that we think it is.

Their goals

  • It must be difficult to reidentify significant numbers of users across sites using just the API.
  • The API should provide a subset of the capabilities of third-party cookies.
  • The topics revealed by the API should be less personally sensitive about a user than what could be derived using today’s tracking methods.
  • Users should be able to understand the API, recognize what is being communicated about them, and have clear controls. This is largely a UX responsibility but it does require that the API be designed in a way such that the UX is feasible.

Tess: that last one seems to mean that topics shouldn't be opaque ... like arbitrary machine generated clusters...

Martin: concretely it does achieve the UX goal... Chrome's UX isn't good - but the API meets that goal.

Amy: I'm not convinced. I would interpret - even if they fully exposed a list of topics which change every week - it's still not practical for the user to stay on top of them... Even if the browser surfaced each time the topics changed... I don't think they can meet that goal in a way that I think is workable UX... We raised this issue in the first review...

Tess: 3rd goal - "less sensitive" ... obviously you can make the difference epislon and say you've met this goal...

Martin: it's a non-goal

Tess: it either means nothing or it means substantially different and I haven't seen that it's substantially different. Probably a similar argument to be made for the 2nd one...

Peter: we're trying to get rid of 3rd party cookies...

Tess: ...and replace it with purpose built mechnaisms that addrsss specifc use cases.. so this is them saying that.

Peter: we want to replace the acceptable use cases, not the bad parts of 3p cookies

Tess: we need to surface that in this comment... We could imagine the interpretation of ...

Dan: the key part is using just the API - comes back to the discussion before - if you ignore the existence of all other apis then maybe. But if this data is being leaked or accessed in other ways, then no. You can't just ignore the existence of other parts of the platform. the other thing that's missing is something we raised very early on - the danger to marginalised groups and the interpretation of.. 'sensitive topics' ... this depends completely on your point of view and where you live and what community you're part of. It's so amorphous.

Peter: and shifts over time

Amy: we've raised the marginelised groups thing a couple of times... no meaningful response.

Tess: yes agree. The privacy properties of this proposal aren't sufficient. Inadequate to your own stated goals. Additionally there are other goals we think should be there you haven't met.

<blockquote> To summarize and close this review, we note that there are some disagreements about goals here that underpin the disconnect.

The goals you have set out in the explainer are:

  • It must be difficult to reidentify significant numbers of users across sites using just the API.
  • The API should provide a subset of the capabilities of third-party cookies.
  • The topics revealed by the API should be less personally sensitive about a user than what could be derived using today’s tracking methods.
  • Users should be able to understand the API, recognize what is being communicated about them, and have clear controls. This is largely a UX responsibility but it does require that the API be designed in a way such that the UX is feasible.

The set of goals also implictly compares the privacy characteristics of this API to the web with 3rd party cookies (and tracking). In the spirit of "leaving the web better than you found it," we would like to see the design goals achieved whilst also preserving the privacy characteristics of the web without third party cookies.

We do acknowledge that you have arguably achieved the 4th goal, with an API that does not actively prevent the user from understanding and recognizing what is being communicated about them. However the implicit privacy labour that would be required to manage this set of topics on an ongoing basis remains a key question.

Finally, we challenge the assertion that reidentification in the absence of other information is the right benchmark to apply. As we previously noted, the potential for this to affect privacy unevenly across different web users is a risk that is not adequately mitigated.

</blockquote>

We agree to close with an unsatisfied and with the above comment.

Comment by @plinss Feb 27, 2024 (See Github)

To summarize and close this review, we note that there are some disagreements about goals here that underpin the disconnect.

The goals you have set out in the explainer are:

  • It must be difficult to reidentify significant numbers of users across sites using just the API.
  • The API should provide a subset of the capabilities of third-party cookies.
  • The topics revealed by the API should be less personally sensitive about a user than what could be derived using today’s tracking methods.
  • Users should be able to understand the API, recognize what is being communicated about them, and have clear controls. This is largely a UX responsibility but it does require that the API be designed in a way such that the UX is feasible.

The set of goals also implictly compares the privacy characteristics of this API to the web with 3rd party cookies (and tracking). In the spirit of "leaving the web better than you found it," we would like to see the design goals achieved whilst also preserving the privacy characteristics of the web without third party cookies.

We do acknowledge that you have arguably achieved the 4th goal, with an API that does not actively prevent the user from understanding and recognizing what is being communicated about them. However the implicit privacy labour that would be required to manage this set of topics on an ongoing basis remains a key question.

Finally, we challenge the assertion that reidentification in the absence of other information is the right benchmark to apply. As we previously noted, the potential for this to affect privacy unevenly across different web users is a risk that is not adequately mitigated.