#726: Early design review for the Topics API

Visit on Github.

Opened Mar 25, 2022

Braw mornin' TAG!1

I'm requesting a TAG review of the Topics API.

The intent of the Topics API is to provide callers (including third-party ad-tech or advertising providers on the page that run script) with coarse-grained advertising topics that the page visitor might currently be interested in. These topics will supplement the contextual signals from the current page and can be combined to help find an appropriate advertisement for the visitor.

  • Explainer¹ (minimally containing user needs and example code): https://github.com/jkarlin/topics
  • User research: [url to public summary/results of research]
  • Security and Privacy self-review²: See below
  • GitHub repo (if you prefer feedback filed there): https://github.com/jkarlin/topics
  • Primary contacts (and their relationship to the specification):
    • Josh Karlin, jkarlin@, Google
    • Yao Xiao, xyaoinum@, Google
  • Organization/project driving the design: Chrome Privacy Sandbox
  • External status/issue trackers for this feature (publicly visible, e.g. Chrome Status): https://chromestatus.com/feature/5680923054964736

Further details:

  • [ x ] I have reviewed the TAG's Web Platform Design Principles
  • The group where the incubation/design work on this is being done (or is intended to be done in the future): Either WICG or PATCG
  • The group where standardization of this work is intended to be done ("unknown" if not known): unknown
  • Existing major pieces of multi-stakeholder review or discussion of this design: Lots of discussion on https://github.com/jkarlin/topics/issues/, and a white paper on fingerprintability analysis: https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf
  • Major unresolved issues with or opposition to this design: We believe that the proposed API leans heavily towards user privacy in the privacy/utility tradeoff, as it should. But, the API’s utility isn’t yet clear. Until we try the API in an experiment, we can’t know for sure how the API will perform. Some changes are likely going to be needed. Knobs we may tweak include, but are not limited to, topics in the taxonomy, weights of the topics in the taxonomy, how a site might suggest topics for itself, and how we might get topic data from more places than just the domain (e.g., from the url if there is some signal that the url is privacy safe to parse).
  • This work is being funded by: Chrome

You should also know that...

This API was developed in response to feedback that we (Chrome) received from feedback on our first interest-based advertising proposal, FLoC. That feedback came from TAG, other browsers, Advertisers, and our users. We appreciate this feedback, and look forward to your thoughts on this API.

At the bottom of this issue is both the security survey responses, as well as responses to questions from TAG about FLoC, but answered in terms of Topics.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

☂️ open a single issue in our GitHub repo for the entire review

Self Review Questionnaire: Security & Privacy

2.1. What information might this feature expose to Web sites or other parties, and for what purposes is that exposure necessary?

  • It exposes one of the user’s top-5 topics from the previous week to the caller if the calling context’s site also called the Topics API for the user on a page about that topic in the past three weeks. This is information that could have instead been obtained using third-party cookies. The part that might not have been obtained using third-party cookies is that this is a top topic for the user. This is more global knowledge that a single third-party may not have been able to ascertain.
  • 5% of the time the topic is uniformly random.
  • The topic comes from a taxonomy. The initial proposed taxonomy is here: https://github.com/jkarlin/topics/blob/main/taxonomy_v1.md
  • The topic returned (if one of the top 5 and not the random topic) is random among the top 5, and is set per calling top-frame site. So if any frame on a.com calls the API, it might get the topic with index 3, while b.com callers might get topic at index 1 for the week. This reduces cross-site correlation/fingerprintability.
  • Topics are derived only from sites the user visited that called the API.
  • Topics are derived only from the domain of the site, not the url or content of the site. Though this may change depending on utility results.

2.2 Do features in your specification expose the minimum amount of information necessary to enable their intended uses?

Yes. The entire design of the API is to minimize the amount of information about the user that is exposed in order to provide for the use case. We have also provided a theoretical (and applied) analysis of the cross-site fingerprinting information that is revealed: https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf

2.3. How do the features in your specification deal with personal information, personally-identifiable information (PII), or information derived from them?

The API intentionally provides some information about the user to the calling context. We’ve reduced the ability to use this information as a global identifier (cross site fingerprinting surface) as much as possible.

2.4. How do the features in your specification deal with sensitive information?

Sensitive information is reduced by only allowing topics in the Taxonomy that Chrome and the IAB have deemed are not sensitive (the topics in the proposed initial taxonomy are derived from the two respective organization’s advertising taxonomies).

This does not mean that topics in the taxonomy, or groups of topics in the taxonomy learned about the user over time cannot be correlated sensitive topics. This may be possible.

2.5. Do the features in your specification introduce new state for an origin that persists across browsing sessions?

The API provides some information about the user’s browsing history, and this is stored in the browser. The filtering mechanism used to provide a topic to a calling context if and only if that context has observed the user on a page about that topic in the past also stores data. This could be used to learn if the user has visited a specific site in the past (which third-party cookies can do quite easily today) and we’d like to make that hard. There may be interventions that the browser can take to detect and prevent such abuses.

2.6. Do the features in your specification expose information about the underlying platform to origins?

No.

2.7. Does this specification allow an origin to send data to the underlying platform?

The top-frame site’s domain is read to determine a topic for the site.

2.8. Do features in this specification enable access to device sensors?

No.

2.9. Do features in this specification enable new script execution/loading mechanisms?

No.

2.10. Do features in this specification allow an origin to access other devices?

No.

2.11. Do features in this specification allow an origin some measure of control over a user agent’s native UI?

No.

2.12. What temporary identifiers do the features in this specification create or expose to the web?

The topics that are returned by the API. They are per-epoch (week), per-user, and per site. It is cleared when the user clears state.

2.13. How does this specification distinguish between behavior in first-party and third-party contexts?

The topic is only returned to the caller if the calling context’s site has also called the API on a domain about that topic with that same user in the past three weeks. So whether the API returns anything or not depends on the calling context’s domain.

2.14. How do the features in this specification work in the context of a browser’s Private Browsing or Incognito mode?

The API returns an empty list in incognito mode. We feel that this is safe because there are many reasons that an empty list might be returned. e.g., because the user is new, because the user is in incognito, because the site has not seen this user on relevant sites with the associated topics in the past three weeks, because the user has disabled the API via UX controls.

This is effectively the same behavior as the user being new, so this is basically the API working the same within incognito mode as in regular mode. We could have instead returned random topics in incognito (and for new users) but this has the deleterious effect of significantly polluting the API with noise. Plus, we don’t want to confuse users/developers by having the API return values when they expect it not to (e.g., after disabling the API).

2.15. Does this specification have both "Security Considerations" and "Privacy Considerations" sections?

There is no formal specification yet, but the explainer goes into detail on the privacy considerations. The primary security consideration is that the API reveals information beyond third-party cookies in that learning a topic means that the topic is one of the users top topics for the week.

2.16. Do features in your specification enable origins to downgrade default security protections?

No.

2.17. How does your feature handle non-"fully active" documents?

No special considerations.

Responses to questions from the FLoC TAG review, as they apply to Topics

Sensitive categories

The documentation of "sensitive categories" visible so far are on google ad policy pages. Categories that are considered "sensitive" are, as stated, not likely to be universal, and are also likely to change over time. I'd like to see:

  • an in-depth treatment of how sensitive categories will be determined (by a diverse set of stakeholders, so that the definition of "sensitive" is not biased by the backgrounds of implementors alone);
  • discussion of if it is possible - and desirable (it might not be) - for sensitive categories to differ based on external factors (eg. geographic region);
  • a persistent and authoritative means of documenting what they are that is not tied to a single implementor or company;
  • how such documentation can be updated and maintained in the long run;
  • and what the spec can do to ensure implementers actually abide by restrictions around sensitive categories. Language about erring on the side of user privacy and safety when the "sensitivity" of a category is unknown might be appropriate.

A key difference between Topics and Cohorts is that the Topics taxonomy is human curated, whereas cohorts were the result of a clustering algorithm and had no obvious meaning. The advantage of a topics based approach is that we can help to clarify which topics are exposed. For instance, the initial topology we intend to use includes topics that are in both the IAB’s content taxonomy and Google’s advertising taxonomy. This ensures that at least two separate entities had reviewed the topics for sensitive categories. Assuming that the API is successful, we would be happy to consider a third-party maintainer of the taxonomy that incorporates both relevant advertising interests as well as up-to-date sensitivities.

Browser support

I imagine not all browsers will actually want to implement this API. Is the result of this, from an advertisers point of view, that serving personalised ads is not possible in certain browsers? Does this create a risk of platform segmentation in that some websites could detect non-implementation of the API and refuse to serve content altogether (which would severely limit user choice and increase concentration of a smaller set of browsers)? A mitigation for this could be to specify explicitly 'not-implemented' return values for the API calls that are indistinguishable from a full implementation.

The description of the experimentation phase mentions refreshing cohort data every 7 days; is timing something that will be specified, or is that left to implementations? Is there anything about cohort data "expiry" if a browser is not used (or only used to browse opted-out sites) for a certain period?

As always, it is up to each browser to determine which use cases and APIs it wishes to support. Returning empty lists is completely reasonable. Though a caller could still use the UA to determine if the API is really supported or not. I’m not sure that there is a good solution here.

In regards to the duration of a topic, I think that is likely to be per-UA.

In the Topics API, we ensure that each topic has a minimum number of users, by returning responses uniformly at random 5% of the time.

Opting out

I note that "Whether the browser sends a real FLoC or a random one is user controllable" which is good. I would hope to see some further work on guaranteeing that the "random" FLoCs sent in this situation does not become a de-facto "user who has disabled FLoC" cohort. It's worth further thought about how sending a random "real" FLoC affects personalised advertising the user sees - when it is essentially personalised to someone who isn't them. It might be better for disabling FLoC to behave the same as incognito mode, where a "null" value is sent, indicating to the advertiser that personalised advertising is not possible in this case. I note that sites can opt out of being included in the input set. Good! I would be more comfortable if sites had to explicitly opt in though. Have you also thought about more granular controls for the end user which would allow them to see the list of sites included from their browsing history (and which features of the sites are used) and selectively exclude/include them? If I am reading this correctly, sites that opt out of being included in the cohort input data cannot access the cohort information from the API themselves. Sites may have very legitimate reasons for opting out (eg. they serve sensitive content and wish to protect their visitors from any kind of tracking) yet be supported by ad revenue themselves. It is important to better explore the implications of this.

The current plan is for the Topics API to return an empty list in incognito mode.

Sites opt in via using the API. If the API is not used, the site will not be included. Sites can also prevent third parties from calling the API on their site via permission policy.

In regards to granular controls, we feel that this is possible with Topics (less so with FLoC) and expect to expose via UX the topics that are being returned, and allowing users to opt out of the API completely or disable individual topics.

The API is designed to facilitate ecosystem participation - as calling the API is both the way to contribute and receive value from the API. We do not want sites to be able to get topics without also supporting the ecosystem.

Centralisation of ad targeting

Centralisation is a big concern here. This proposal makes it the responsibility of browser vendors (a small group) to determine what categories of user are of interest to advertisers for targeting. This may make it difficult for smaller organisations to compete or innovate in this space. What mitigations can we expect to see for this? How transparent / auditable are the algorithms used to generates the cohorts going to be? When some browser vendors are also advertising companies, how to separate concerns and ensure the privacy needs of users are always put first?

The Topics API helps to address broad, granular topics based advertising. For more niche topics, we suggest the usage of alternative sandbox APIs like FLEDGE. In terms of transparency, the API is written plainly in open source code, the design is occurring on github with an active community, and the ML model used to classify topics will be available for anyone to evaluate.

Accessing cohort information

I can't see any information about how cohorts are described to advertisers, other than their "short cohort name". How does an advertiser know what ads to serve to a cohort given the value "43A7"? Are the cohort descriptions/metadata served out of band to advertisers? I would like an idea of what this looks like.

With Topics, the Taxonomy name is its semantic meaning.

Security & privacy concerns

I would like to challenge the assertion that there are no security impacts.

  • A large set of potentially very sensitive personal data is being collected by the browser to enable cohort generation. The impact of a security vulnerability causing this data to be leaked could be great.

In Chrome, the renderer is only aware of the topic for the given site. The browser stores information about which callers were on each top-level site, and whether the API was called. This is significantly better than the data stored for third-party cookies.

  • The explainer acknowledges that sites that already know PII about the user can record their cohort - potentially gathering more data about the user than they could ever possibly have access to without explicit input from the user - but dismisses this risk by comparing it to the status quo, and does not mention this risk in the Security & Privacy self-check.

The Topics API, unlike FLoC, only allows a site to learn topics if the caller has observed the user on a site about that topic. So it is no longer easy to learn more about the user than they could have without explicit input from the user.

  • Sites which log cohort data for their visitors (with or without supplementary PII) will be able to log changes in this data over time, which may turn into a fingerprinting vector or allow them to infer other information about the user.

Topics is more difficult to use as a cross-site fingerprinting vector due to the fact that different sites receive different topics during the same week. We have a white paper studying the impact of this: https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf Logging data over time does still increase knowledge about the user however. We’ve limited this as much as we think is possible.

  • We have seen over past years the tendency for sites to gather and hoard data that they don't actually need for anything specific, just because they can. The temptation to track cohort data alongside any other user data they have with such a straightforward API may be great. This in turn increases the risk to users when data breaches inevitably occur, and correlations can be made between known PII and cohorts.

The filtering mentioned above (not returning the topic if it was observed by the calling context for that user on a site about that topic) significantly cuts down on this hoarding. It’s no longer possible for any arbitrary caller on a page to learn the user’s browsing topics.

  • How many cohorts can one user be in? When a user is in multiple cohorts, what are the correlation risks related to the intersection of multiple cohorts? "Thousands" of users per cohort is not really that many. Membership to a hundred cohorts could quickly become identifying.

There are only 349 topics in the proposed Topics API, and 5% of the time a uniformly random topic is returned. We expect there to be significantly more users per topic that there were in FLoC.

Discussions

2022-04-18

Minutes

"the API’s utility isn’t yet clear"

Amy: they have pre-empted our FLoC questions for Topics

.... loads of issues probably worth reading

... Ad topics eventually sourced from a third party (todo: who?)

... Browser classifies sites.. sites providing their own topics is an "open question".. interesting to think about variation in ads across browsers then?

... 5% random noise, for plausible deniability. Attempts to avoid cross-correlating user between sites.

... Story about sites accumulating topics? Even if only one per week. Aggregating topic data with existing user data (from fingerprinting, cookies, etc)?

... Can sites opt out of being included in the mapping to topics? (Other than by havin an 'opaque' host name) - yes (header, and also have to call the API). How many topics can map to a single site?

... Definite improvement over floc.. more transparent

... Risks with colluding sites (documented). Risks with unintended correlations of topics to sensitive topics.

... "better for user privacy than cookies"

... Lots of open questions under discussion

Amy: definitely better than FLOC. It's in the private advertising group. Tons of discussion on the privacy stuff. It's comletely different to FLOC. No cohorts. Ways it can be abused but it's documented and discussed. It seems better. Ton of open issues. I'd like to look through a bit more. The direction this is going is saying there's no privacy-preserving way to do targeted advertising ... so maybe they should stop trying?

Amy: up front there are a lot of mitigations - like 5% random noise (less of a fingerprinting vector) - list of topics is human-curated and transparent and would ship with chrome.

Peter: a lot of evidence that targeted advertising can be used for political purposes... to negative effect. Also questions on effectiveness in general for commercial purposes.

Amy: question about multi-stakeholder.

Peter: i'm wondering if we shouldn't have a finding "stop targeted advertising."

[discussion about legal future of surveillence for advertising, collection & sale of user data]

Lea: what about websites that can't afford another business model?

Peter: not saying ban advertising altogether

Dan: whether or not it's correct that targeting affects the real value of an ad, whether it influences someone's behaviour, it is the case that targeted ads make more money. Still an economic issue. If targeted ads went away tmorrow there would be a certain number of websites that have a big income drop. It's a concern. Publishers would not aagree with the statement that we should ban targeted advertising. Robin Berjon's take would be interesting.

Lea: unpopular opinion - if targeted ads could be done in a privacy preserving way, it could be better for users - as long as it's opt in, users can see ads that are relevant

2022-05-23

Minutes

Dan: this does require TAG time.. there are some more recent comments from the requester.

2022-07-11

Minutes

Hadley: [leaves comment about f2f london]

2022-08-08

Minutes

Amy: I was looking at these this morning trying to figure out what to do individually (for privacy sandbox). We could write something collectively about them. All seems to be at enabling targetted advertising...

Dan: could be scope for a general document. Maybe about framing the issue as the tension between targeted advertising and privacy

Amy: some other thoughts on topics: as you read through the mitigations to make it more privacy preserving - it makes it less and less useful. Lots of people from the Ad tech industustry in their repo asking how useful it would be. So the question is - does it make it completely ineffective for the use case..? Might be worth asking what's changed in Topics in particular.

Dan: maybe if privacy advocates are complaining and ad tech people are complaining maybe it is actually hitting that middle ground.. maybe it's a compromise. But no good if both groups reject it. What we did not get into in the London session is the wider ad industry feedback, besides google.

Sangwhan: within the scope of Topics related stuff, the new way of doing retargeting, I've seen other companies chime in and have written their version of the proposals - they are actively engaging, so the approach may not be that contentious. The technical details and the amount of data held where is a point of contention. Advertisers, especially the ones that are very data driven don't want to give up.. they want to give up as little control as possible. There is a middle gorund. But definite engagement. About finding a path where the ad companies with a dependency on this look for a plan B where 3p cookies are not a thing. There are people who are not willing to accept that compromise.

Dan: maybe try to get TAG consensus on some of the issues across the board, and documenting the tension between targeted advertising and user privacy?

Sangwhan: we could make a statement about 3p cookie tracking being bad...

Dan: pretty sure we can get consensus on that. We could also talk about the fact that some new proposals are trying to mitigate or to preserve mechanisms for 3p tracking and there is a danger that in doing so we .. that the move towards deprecating 3p cookies is positive and we have to be careful as we develop new advertising enabling tech that is a replacement that we're not just replacing one bad thing with another bad thing. Something measured.

Amy: "what's the point of replacing 3rd party cookies with another set of tech that just mimics the status quo?" but more polite.

Sangwhan: definitely limits the amount of tracking. A step in the right direction. I don't have the expertise to say whether or not this will be a level playing field. It might end up being beneficial to advertisers that do not require significant targeting, who have a lot of volume, compared to those who do small amounts heavily targeted. Can't say if that's good or bad. Definitely the landscape will change. Another thing we want to mention is that 3p cookies do have a legitimate use case - single sign on. That is 99.9% is creepy tracking, but 0.1% is single sign on, we want to preserve that from not completely breaking - we want that in the statement.

Dan: we can point to things like fedcm.

[tangent discussion about single sign on in absence of 3p cookies]

Amy: I'm happy to kick it off by creating a doc and get feedback from others.

2022-12-12

Minutes

Amy: we talked about this a lot but haven't updated the issue. I'll go back and find past minutes.

Peter: how does EU ruling about targeted advertising on Meta play with Topics?

2022-12-19

Minutes

Amy: they have a paper - analysis of cross-site reidentification risk... not sure if it's up to date with their latest changes. It's great that they did it, but would be preferable to see this sort of an analysis from someoen independant.

Amy: appreciate the work that's gone into this... but also ...

Dan: noting it's Google Research and Chrome people - no outside contributors

Dan: in a world where the topics api is the only way to do any kind of targeted advertising that is not only contextual, this would be a net positive over where we are now.

Amy: noting user control and how much control the user has over it

Dan: considering this would only be used for ads, and further targeting ads, the person would have the option of using a browser that does not ship the topics api and therefore self select...

Amy: unless they detect the api is not implemented and refuse to serve a page in that browser, then we have fragmentation - they have mitigations against some ways to detect if the user has disabled the api or is in incognito mode (empty array returned) - but don't know what happens in the case where the browser hasn't implemented it at all...

Dan: if the behaviour is the same in incog mode as a user who has disabled it, that should not be detectable

Amy: but if a browser hasn't implemented it at all, would you still get an empty array or would you get an error?

Dan: writes comment..

<blockquote> Hi @jkarlin thanks for this. We discussed in [breakout today](https://github.com/w3ctag/meetings/blob/gh-pages/2022/telcons/12-19-agenda.md). One concern we have is the risk that publishers might try to detect whether the user is using a browser with the topics API loaded / enabled and attempt to deny service if the API is not implemented. This is a similar issue what was discussed in your response to the security & privacy questionnaire for the API's behaviour in incognito mode. In both cases it feels like the result should be that the publisher should not be able to tell whether the topics API is disabled/not implemented. Is this the case? </blockquote>

Amy: then there's the whole Dmarti issue about centralizing power in larger publishers...

2023-01-09

Minutes

The intention of the Topics API is to enable high level interests of web users to be shared with third parties in a privacy-preserving way in order to enable targeted advertising, while also protecting users from unwanted tracking and profiling. The TAG's initial view is that this API does not achieve these goals as specified.

The Topics API as proposed puts the browser in a position of sharing information about the user, derived from their browsing history, with any site that can call the API. This is done in such a way that the user has no fine-grained control over what is revealed, and in what context, or to which parties. It also seems likely that a user would struggle to understand what is even happening; data is gathered and sent behind the scenes, quite opaquely. This goes against the principle of [enhancing the user's control](https://w3ctag.github.io/ethical-web-principles/#control), and we believe is not appropriate behaviour for any software purporting to be an agent of a web user.

The responses to the proposal from [Webkit](https://github.com/WebKit/standards-positions/issues/111#issuecomment-1359609317) and [Mozilla](https://github.com/mozilla/standards-positions/issues/622#issuecomment-1372979100) highlight the tradeoffs between serving a diverse global population, and adequately protecting the identities of individuals in a given population. Shortcomings on neither side of these tradeoffs are acceptable for web platform technologies. 

It's also clear from the positions shared by Mozilla and Webkit that there is a lack of multi-stakeholder support. We remain concerned about fragmentation of the user experience if the Topics API is implemented in a limited number of browsers, and sites that wish to use it prevent access to users of browsers without it (a different scenario from the user having disabled it in settings).

We are particularly concerned by the opportunities for sites to use additional data gathered over time by the Topics API in conjunction with other data gathered about a site visitor, either via other APIs, via out of band means, and/or via existing tracking technologies in place at the same time, such as fingerprinting.

We appreciate the in-depth privacy analyses of the API that have been done so far [by Google](https://github.com/jkarlin/topics/blob/main/topics_analysis.pdf) and [by Mozilla](https://mozilla.github.io/ppa-docs/topics.pdf). If work on this API is to proceed, it would benefit from further analysis by one or more independant (non-browser engine or adtech) parties.

Further, if the API were both effective and privacy-preserving, it could nonetheless be used to customise content in a discriminatory manner, using stereotypes, inferences or assumptions based on the topics revealed (eg. a topic could be used - accurately or not - to infer a [protected characteristic](https://w3ctag.github.io/privacy-principles/#hl-sensitive-information), which is thereby used in selecting an advert to show). Relatedly, there is no binary assessment that can be made over whether a topic is "sensitive" or not. This can vary depending on context, the circumstances of the person it relates to, as well as change over time for the same person.

Giving the web user access to browser settings to configure which topics can be observed and sent, and from/to which parties, would be a necessary addition to an API such as this, and go some way towards restoring agency of the user, but is by no means sufficient. People can become vulnerable in ways they do not expect, and without notice. People cannot be expected to have a full understanding of every possible topic in the taxonomy as it relates to their personal circumstances, nor of the immediate or knock-on effects of sharing this data with sites and advertisers, and nor can they be expected to continually revise their browser settings as their personal or global circumstances change.

A portion of topics returned by the API are proposed to be randomised, in part to enable plausible deniability of the results. The usefulness of this mitigation may be limited in practice; an individual who wants to explain away an inappropriate ad served on a shared computer cannot be expected to understand the low level workings of a specific browser API in a contentious, dangerous or embarrassing situation (assuming a general cultural awareness of the idea of targeted ads being served based on your online activities or even being "listened to" by your devices, which does not exist everywhere, but is certainly pervasive in some places/communities).

While we appreciate the efforts that have gone into this proposal aiming to iteratively improve the privacy-preserving possibilities of targeted advertising, ultimately it falls short. In summary, the proposed API appears to maintain the status quo of inappropriate surveillence on the web, and we do not want to see it proceed further.

Sangwhan: the state with 3p cookies is worse.. this is a counter proposal

Yves: change Amy proposes about filtering on the set of topics you're willing to share is a good way to make it better.

Amy: I agree it would be an improvement.. But still possible to identify individuals from a set.

Dan: it's presented as a way to do privacy preserving targeted advertising but it doesn't actually acheive that goal. Jumping through a lot of hoops and doesn't do it anyway. Something we should call out. If a lot of people say 'we use the Topics API that means we preserve your privacy' and that's not the case, that's a bad thing for the web

Sangwhan: the information theory perspective.. if what was presented is wrong that's a concern ....

Amy: Not necessarily that it's wrong. Moz analysis is from a different perspective. Google analysis doesn't cover the worst case scenario. The google paper is correct but just not the whole picture.

Sangwhan: please add the user opt-out story into [this comment]. Users should be able to opt out from certain topics. It could be something culturatly sensitive.

Amy: lots of reasons... Also shared computers.

Dan: only one user of the browser.

Amy: plausibile deniabolity - random topics sometimes selected - that doesn't help. If we have a collective idea that the web is spying on you then ... given that "it was a randomy selected ad" won't cut it.

Sangwhan: Also because it's so low ...

Amy: Martin's paper goes into it - you get the same set of topics in the period of a week - in the worst csase it would be 3 weeks before you'd find a topic is of legit interest.

Sangwhan: the more origins involved ...

Amy: you can spin up subdomains - game it in this way.

Sangwhan: using sampling to get the topics...

Amy: on the point of saying "i want to opt out" - it's a good setting - but you can't possibly know what things are going to get you into trouble in a given moment. You don't make time in your week to curate your ad topics for sensitive topics.

Dan: Don Marti sometimes makes the point thath people who don't think t hey're in a protected class or that they've vulnerable people could suddenly become a vulnerable person, if they're in an abusive situation, or the law changes .. eg. someone who suffers a miscarriage isn't going to immediately sit down and curate their ad preferences

Amy: and if they're in a particular jurisisdiction, suddenly it's dangerous if 'baby stuff' is no longer an interest overnight

Sangwhan: the more efforts like this get delayed the more the current state continues.

Amy: I don't think that needs to be true.

discussion on dynamics of advertising and browsers

Amy: I don't think it's about 3rd party cookies - it's about surveillance on the web and tracking. if we take away 3rd party cookies and replace it with something that has the same problems then that's not OK.

Sangwhan: I'd like [us] to give actionable feedback. I think what webkit and mozilla raised are all valid concerns.

Amy: also getting feedback from an independent third party. I think we're getting iteratively closer. We're moving slowly in the right direction.

Dan: +1

amy to do some edits and check with group before posting

2023-01-09

Minutes

Amy: negative signals from Mozilla and Webkit

Dan: move to breakout C

2023-06-26

Minutes

reviewing Martin's comment

Amy: from one of Martin's linked papers: "any replacement of third-party cookies may have to seek other avenues to achieve privacy for the web". This paper is specifically about Topics.

Dan: lack of multi-stakeholder

Dan: Topics impacts every single user of the web, everyone who uses a browser other than chrome. That's a big group. Micahel says: "Any UA that wants to pass feature detection but not give out information could implement the API to return an empty set of topics every time." - this means other browsers still need to implement. Forcing webkit and mozilla to implement the api in order to avert the situation where websites say "best viewed in chrome".

Yves: good summary. The fact you need to force others who don't want that api to implement it so you avoid having two different web, a chrome web and other browsers.

Yves: is site classification local in the browser, or sent to google?

Sangwhan: it's entirely local. [Demonstrates current build]

Amy: I understood that sites also need to opt in to being included in the topic categorisation for users. But seeing this, it's determining topics from sites which have not called the API. Did I misunderstand this?

Sangwhan: maybe that is just about sites having access to topics by using the topics?

Amy: this is problematic. Sites who know they serve a vulnerable population may not want to be used for this to protect their users.

Sangwhan: this version in the I2S does not provide the opt out for the topic as far as I can see it. Strange, because it's normative in the spec. That's a definite problem. Shipping something non-conformant is bizarre.

Sangwhan: imagine a website is misclassified by the web browser, so there is unexpected classification of cohorts that there is no way to opt out from. The fact that my personal website and 4chan are in the same cohort ("online communities" in this case) is not very comforting, and there is no way for me to. or if I want to change topics. Seems like a design issue. Seems user hostile.

Sangwhan: thoughts on document vs window? If you want to persist topics .. if you put it on window there's a possibility of you being able to use it in service worker process, if it's on document you have to use it in document. For extensibility, window makes more sense, or worker global scope.

<blockquote> The following comment has come out of TAG discussions this week:

First of all, thanks to @martinthomson for those pointers to two relevant papers.

We've continued to discuss this API across several calls this week. @cynthia also demonstrated the current implementation.

We remain concerned about the points recently raised about interop. Especially given the lack of multi-stakeholder buy-in for this API, how can we really protect against a future where advertising based sites tell users they must switch to a browser that implements Topics? @michaelkleber you've said "Any UA that wants to pass feature detection but not give out information could implement the API to return an empty set of topics every time" however that still implies other UAs would be required to implement the API (at least minimally) when they might not otherwise do so, in order to mitigate privacy harms for their users - so there is a risk here.

We remain concerned about the ability of users to give meaningful consent for their interests to be calculated and tracked from their browsing activity. The spec says:

suggestion that user agents provide UX to give users choice in which Topics are returned

and refers to a "user preference setting" in several places.

We have inferred from this that users are able to disable particular topics in the settings, or the API as a whole, but we don't think that either of these potential configuration options are good enough to protect against potential privacy harms, particularly for marginalised groups. A person's status as vulnerable, at-risk, or marginalised can change over time, and this isn't something most people are necessarily aware of or paying attention to in their day-to-day web use, and nor is it reasonable to expect people to regularly review their browser settings with this in mind. Thus, "opt out of individual topics" is not sufficient to offer meaningful consent to being tracked in this way. Further, from what we have seen of the API as implemented so far, there are no user preference settings relating to specific individual topics. We raised this in our initial review, and don't feel it has yet been considered with the depth warranted.

This issue intersects with others, for example, as pointed out in the Webkit review that the topics list represents a western cultural context, and that the mechanism for sites being classified according to these categories is unclear. We understand from the spec that site classification is automated, based on the domain, but the mechanism for doing this remains opaque, and it is not clear there is any recourse for sites which are misclassified.

We saw in the current implementation that sites in a user's browsing history which do not call the Topics API were being classified under particular topics. We had been led to believe that sites opt-in to being classified by calling the API ("Sites opt in via using the API. If the API is not used, the site will not be included." in the initial review request), but perhaps we misunderstood, or this has changed. The spec refers to "site opt outs", although we weren't able to find how they do this in the spec (please could you point us to the right place if we missed it?).

Questions:

  • Do you have a response to the points raised in Webkit's review?
  • Do you have any analysis or response to the papers that Martin pointed to?
  • Please could you elaborate if it is in fact the case that all sites browsed by a user are included by default as input data for generating a user's topics list?
    • If this is the case, what recourse is there for sites which are misclassified?
  • Can you clarify the situation with regard to definition of user preference / opt out?
  • Have you considered dropping the part where topics are calculated from browsing history, and instead entirely configured by the user in their browser settings? This would be much closer to people being able to meaningfully opt in to targeted advertising, and would make several of the other concerns raised moot.
</blockquote>
2023-06-26

Minutes

discussion on what happened last week in the privacy sandbox related call

Rossen: they have their third LGTM in BlinkDev so from an API owners pov they are good to go. Suggest going to review.

Yves: from my recollection - we're not against it but there are issues needing to be fixed - like being able to exclude topics for all sites - e.g. "I don't want health topics". not well enough defined. They have to say that topics will be a better replacement than shared storage, etc.. which replicate mechanism of third party cookies.

Dan: trials?

Rossen: they have done the trials.

Peter: There's a header - there's a lot of dicussions on setting topics to empty. If you send the header that opts out of FLOC that apparently opts you out of TOPICS as well.

Dan: is it a FLOC-specific header?

Peter: It's a permission policy setting- a lot of apps e.g. gitlab, wordpress, mastodon, set that to NO by default. Apparently TOPICS respects that header.

Rossen: there's a feature detection for topics API - as a developer at a site - I'm providing a new feature - as a developer I can segregate our customers - and thereby tell people to come back with a browser that supports it..

Dan: see here for discussion with m. kleber on topcis... Michael said "topics in incog mode or turned off works the same as if the topics api does for a regular user who has only recently started browsing". So according to him it's not detectable.

Peter: it's true for a browser that supports topics incog mode is not detectable. But what if I'm using a browser that doesn't support it at all that is detectable.

Rossen: right.

Dan: the browser could return "yes" but return false data.

Rossen: that assumes browsers will do extra work.

Yves: you can use introspection to see if it's been implemented or not...

Peter: still forces other browsers to do work. They can do this but it still puts an onus on them to do something to prevent losing market share.

Dan: agree this a key issue - we need to make sure we do not end up in a "works best in chrome" scenario for other browsers.

discussion on Chromium-based vs. non-chromium based browsers

Peter: I'm presuming it'll be there by default for browsers using chromium...

Dan: this is an interop issue, as Rick Byers asked us to comment on.

2023-07-03

Minutes

Dan: We left many 2 major comments last week and have received no responses yet.

2024-01-london

Minutes

Martin: [insert short explanation of difference between topics and protected audience]

Matthew: Topics does the work of collecting data about the user and classifying them, and protected audience does that and also the auctions, attribution, and displaying the ad (with fenced frame)

Martin: correct. Topics gets away with that by collecting information and giving it away in the clear. Protected Audience takes the information into the box and it doesn't leave that box.

Amy: can we close it? We've left them feedback.. not sure what more we can do

Tess: is there harm reduction we can do? it takes a lot of effort

Matthew: if we say no, we have little control. If we try to harm reduce, we still have little control. Is there another appropriate venue?

Hadley: PATCG

Martin: we have lot of topics, but we're mostly discussing attribution because we have wide agreement that there's something salvageable. We're building trust on that.

Martin: when does the line end for standardisation on the web? Where does w3c or TAG influence end in an API? Someone builds an API .execute() that takes a string - "kitten" or "puppy" - is it up to the browser to decide whether to do it? Do we have an opinion on that?

Tess: stop energy is sometimes the right thing

Amy: ethical web principles, privacy principles are our way of having opinions on this

Hadley: referring back to these as authoritive documents is helpful

Tess: what differentiates w3c from other standards bodies? There are many SDOs which are value neutral places where companies can show up and hand them money and get specifications out. There's no vetting of what kind of work happens or holding it to any standards. W3C is not that place. It has never been value neutral, we're getting better at expressing those values. It's not a place without scope. In this case,it's in scope, but not within our values align

Martin: disagreements - do we have to resolve, or can we accept it? In this case we can say we don't have to worry about this - the fight can continue, but outside of w3c

Hadley: the progression of these reviews has been frustrating. Practically what can we do?

Tess: risk of losing wider interaction with google...

Martin: propose we just close it linking back to Amy's first reivew

Amy: +1

Martin: let's discuss in plenary

Matthew: what are the effects on the platform for any of these outcomes?

Amy: impossible to know

2024-02-26

Minutes

Martin: is there much more that needs to be said

Dan: the last substantive update was about putting topics behind the attestation mechanism. Jeffrey points out that our previous review focuses on UI

Martin: a lot of attention has been paid to how google is planning to harvest topics, and that is not part of the api surface. We're being asked to ignore the fact the browser will harvest the information about what the user is interested in and only focus on the api surface

Dan: that is an implication.. I don't agree we should be solely reviewing the api in isolation

Martin: the nature of the information it provides and who provides it is a core part of the capability

Dan: is there more to say since June?

Martin: I don't think so

Amy: don't think so

Tess: could we say something like while we still think this isn't something we should add to the platform we can see that you've improved it since they first brought it to us? They're trying to mitigate?

Amy: they haven't improved it..

Martin: it's got worse, the number of topics has increased

Peter: one of the risks we were afraid of in the first place. You can make this mechanism as bad as you want it to be.

Dan: we agree with privacy goals?

Martin: important to give a nod towards... acknowledge the fact that browsers are able to implement things outside of standards, engage with their users as they see fit. But we think this is nonetheless bad for the platform.

Dan: the explainer lists the privacy goals

Tess: are they meeting their own privacy goals? Are their meeting our interpretation of their own privacy goals? My impression is they think they've done it. Maybe we have different goals.. are we interpreting their own bar differently than they are? Can we make that explicit? Something they think is not in the threat model but that we think it is.

Their goals

  • It must be difficult to reidentify significant numbers of users across sites using just the API.
  • The API should provide a subset of the capabilities of third-party cookies.
  • The topics revealed by the API should be less personally sensitive about a user than what could be derived using today’s tracking methods.
  • Users should be able to understand the API, recognize what is being communicated about them, and have clear controls. This is largely a UX responsibility but it does require that the API be designed in a way such that the UX is feasible.

Tess: that last one seems to mean that topics shouldn't be opaque ... like arbitrary machine generated clusters...

Martin: concretely it does achieve the UX goal... Chrome's UX isn't good - but the API meets that goal.

Amy: I'm not convinced. I would interpret - even if they fully exposed a list of topics which change every week - it's still not practical for the user to stay on top of them... Even if the browser surfaced each time the topics changed... I don't think they can meet that goal in a way that I think is workable UX... We raised this issue in the first review...

Tess: 3rd goal - "less sensitive" ... obviously you can make the difference epislon and say you've met this goal...

Martin: it's a non-goal

Tess: it either means nothing or it means substantially different and I haven't seen that it's substantially different. Probably a similar argument to be made for the 2nd one...

Peter: we're trying to get rid of 3rd party cookies...

Tess: ...and replace it with purpose built mechnaisms that addrsss specifc use cases.. so this is them saying that.

Peter: we want to replace the acceptable use cases, not the bad parts of 3p cookies

Tess: we need to surface that in this comment... We could imagine the interpretation of ...

Dan: the key part is using just the API - comes back to the discussion before - if you ignore the existence of all other apis then maybe. But if this data is being leaked or accessed in other ways, then no. You can't just ignore the existence of other parts of the platform. the other thing that's missing is something we raised very early on - the danger to marginalised groups and the interpretation of.. 'sensitive topics' ... this depends completely on your point of view and where you live and what community you're part of. It's so amorphous.

Peter: and shifts over time

Amy: we've raised the marginelised groups thing a couple of times... no meaningful response.

Tess: yes agree. The privacy properties of this proposal aren't sufficient. Inadequate to your own stated goals. Additionally there are other goals we think should be there you haven't met.

<blockquote> To summarize and close this review, we note that there are some disagreements about goals here that underpin the disconnect.

The goals you have set out in the explainer are:

  • It must be difficult to reidentify significant numbers of users across sites using just the API.
  • The API should provide a subset of the capabilities of third-party cookies.
  • The topics revealed by the API should be less personally sensitive about a user than what could be derived using today’s tracking methods.
  • Users should be able to understand the API, recognize what is being communicated about them, and have clear controls. This is largely a UX responsibility but it does require that the API be designed in a way such that the UX is feasible.

The set of goals also implictly compares the privacy characteristics of this API to the web with 3rd party cookies (and tracking). In the spirit of "leaving the web better than you found it," we would like to see the design goals achieved whilst also preserving the privacy characteristics of the web without third party cookies.

We do acknowledge that you have arguably achieved the 4th goal, with an API that does not actively prevent the user from understanding and recognizing what is being communicated about them. However the implicit privacy labour that would be required to manage this set of topics on an ongoing basis remains a key question.

Finally, we challenge the assertion that reidentification in the absence of other information is the right benchmark to apply. As we previously noted, the potential for this to affect privacy unevenly across different web users is a risk that is not adequately mitigated.

</blockquote>

We agree to close with an unsatisfied and with the above comment.