#650: Distributed Tracing WG: Baggage specification
Discussions
2021-07-26
Peter: a bunch of activity not from us..
Peter's cat: MEOOWW
Dan: they don't have a S&P questionnaire response... [writes comment]
Yves: ask if their header can be [???] ... RFC8941
Dan: ..user identifiable data..
[dan & yves leave some comments]
Peter: lot of privacy concerns that are glossed over...
2021-08-09
Ken: it's about associated data that you propagate. Need to read in detail.
Peter: next week?
2021-09-27
Ken: looks pretty sensible - sending other data along with your distributed tracing... along with tracing you can send your own custom data.. metadata...
Hadley: what kind of tracing?
Ken: open telemetry project... open telemetry uses it...
Dan: isn't metadata sometimes problematic?
Ken: custom data - tracking your own thing - e.g. i'm running on azure ... could be potential privacy issue... There's a lot of telemetry people who use orchestration.. kubernetes... based on w3c distributed tracing spec... microsoft thing dapr.. https://docs.dapr.io/developing-applications/building-blocks/observability/tracing-overview/ https://opentelemetry.io/
Hadley: use cases doc?
Rossen: question around motivation.. mostly driven by this is a thing people are doing it can we standardise it please. I'm not opposed in general. But without the clear use case it's hard to piece together why this needs to be standardised, which pieces of it make sense, and the overall...
Ken: They already standardized dist tracing - this is an extension...
https://docs.dapr.io/developing-applications/building-blocks/observability/tracing-overview/
Rossen: trace contexts spec? Is a recommentation...
https://www.w3.org/TR/trace-context/
https://www.w3.org/groups/wg/distributed-tracing
Dan: but it's adding a side channel for custom metadata - so absolutely agree with hadley show me the use cases...
Hadley: i'd feel better if we can bottom out what the mitigations are...
Rossen: how did this go to rec without going through tag? In Feb 2020?
Ken: I feel i've seen this - in tag review...
[looking for any TAG review we may have done and not finding it]
Hadley: I"ll leave a comment
2021-11-15
Ken: people are already doing this today, about sending metadata with your distributed tracing. As I understand the resposne to Hadley, clearly people are implementing custom headers which makes it harder to control, so they expect switching to this unified approach which is better from compatibility. Yyou can already do this today, so this is about standardising how people are doing it, sending this custom data, so that it will work across different libraries. That sounds like the right thing to do. This is not allowing anything new. Was wondering how this ties together with structured headers.
Dan: looks like they are being quite responsive. If this is just a way to standardise a thing people are doing already and there is some consensus around it it doesn't feel like it opens privacy issues. But is it a good design?
Ken: structured headers allows structured data, not just key value pairs and properties. [writes comment]
Ken: subset of structured headers? Make sure it's an actual subset and doesn't have confusing additional features
Peter: we have guidence in design principles about structured headers
Yves: that would be good
Peter: don't want to see yet another header form that requires another parser
Yves: or even worse, intermediaries, like what happened for cookies in the past
Dan: also missing info on multistakeholder support
Hadley: Why is it called baggage and not tracing id?
Ken: this is an extension tracing api?
Dan: language idiom issue?
Hadley: yes. Makes it harder to understand
Dan: is there anything in the responses that should be in the explainer?
Hadley: didn't get the use case until I went to the charter. Makes sense to standardise what they're already doing in other ways. Standardising a few different headers, of which baggage is the 'everything else' header. Still not convinced or understanding how this can't be hijacked for stronger user tracking. Am I missing something?
Ken: not worse than it is today. Going to standardise how you store the data, not what data you store. If people start using the same data you could use it for additional tracking, but you could always use it today with the telemetry thing. If it becomes standardised you might learn what something means you could use as additional entropy. But it's the same as today, if specific data is common, that would be the same case. Maybe slightly more entropy... if they continue today people are not collaborating well, but something becomes common enough that it's a de-facto standard. With the new thing if people just use their own data it's not going to give you more but if they use the same... it's the same.. not going to give you more entropy. Not standardising the values, just the format to store values, that's not very useful. You oculd argue there's a larger chance of de facto values and key pairs on top that could be used
Hadley: if it's standardised is it more useful to intercept?
Ken: already part of the telemetry thing.. this is just the custom data part. The non custom data part is much more valuable for tracking purposes.
Hadley: right, rather than just the baggage part. Not sure what that means we should do.
Dan: should it behave differently in private browsing?
Hadley: that's tricky. Depends on what you want to do in private browsing. You might want this kind of functionality to work. Whether it gets disrupted or cleared.. doesn't the spec say when you reload the page you get a new id?
Peter: id is provided by the service, the browser doesn't have control of that. Nothing stopping the service generating the same trace id or something they could deanonymise
Hadley: or with your IP address or somethign else identifiable or consistent
Peter: means private browser has no recourse than dropping the trace id entirely and not sending any telemetry
2021-12-Madripoor
Hadley: no response to comments, but a new issue that references ours, responding to Yves question
Dan: that's merged. Can ask Yves if he's happy
Ken: it's meta data.. you can carry around a lot because there's no limits, you're in control. You need to understand telemetry, distributed compute.. not something you're going to get from a random website
Dan: I guess that's the point.. could a random website misuse it
Ken: they need to collaborate with someone otherwise it's their own data. The thing is people want.. distributed tracing you want not just from the frontend but from the backend and carry it around so you know how everything works together, front and back end. Normally you do that for yourself. I don't think you can do the tracing across random websites and get it to you, that would be a major issue. I think this is okay.
Dan: they say under privacy of the baggage header.. can contain user identifiable data. Must ensure it does not leak beyond defined trust boundaries... and channel is secured. What does that mean in real terms?
Ken: is there any obvious way it can leak?
Dan: it doesn't define the trust boundaries. What are the trust boundaries? It doesn't say therefore it must be in secure contexts, just says it is secure. The requirement here is good but it's not as strong as it could be. I would say something like...
In the [Privacy & Security Section](https://w3c.github.io/baggage/#privacy-of-the-baggage-header) it states "As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured." But the term "trust boundaries" is not itself defined in the document. Also maybe you should refernece [secure contexts](https://www.w3.org/TR/secure-contexts/) directly instead of just saying "secured" to be clear about what you mean by secured here?
We're also still waiting for a response on the issues raised by Hadley about incognito mode and naming.
Yves: they updated the token to the same RFC, that looks good
Dan: they need to define their privacy and security considerations better
Yves: the question Hadley raised about incognito mode still stands. Waiting on those to be answered.
Hadley: and 'baggage' being a US idiom, haven't heard back on that either
OpenedJun 22, 2021
Ya ya yawm TAG!
I'm requesting a TAG review of Baggage specification
Baggage is a header format and behavior specification that enables scenarios of opaque values propagation along the distributed trace. Distributed Trace can spawn across multiple components and types of servers: browsers, proxies and load balancers, server applications, and SaaS services. Baggage from incoming request is propagated to all related outgoing calls. This enables many inter-operability scenarios for end users when monitoring and troubleshooting highly distributed applications. There is a lot of prior art in a form of vendor-specific headers, this specification solves the interoperability issue when many app components may not be controlled by the end user.
Further details:
You should also know that...
Previous form of this specification is built-in into ASP.NET framework, current version of specification is implemented by many OpenTelemetry SDKs.
We found that it is great to have predictability in which field is being automatically propagated as it increases transparency for end-user, as oppose to numerous vendor-specific headers.
We'd prefer the TAG provide feedback as (please delete all but the desired option):
🐛 open issues in our GitHub repo for each point of feedback