#650: Distributed Tracing WG: Baggage specification

Visit on Github.

Opened Jun 22, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of Baggage specification

Baggage is a header format and behavior specification that enables scenarios of opaque values propagation along the distributed trace. Distributed Trace can spawn across multiple components and types of servers: browsers, proxies and load balancers, server applications, and SaaS services. Baggage from incoming request is propagated to all related outgoing calls. This enables many inter-operability scenarios for end users when monitoring and troubleshooting highly distributed applications. There is a lot of prior art in a form of vendor-specific headers, this specification solves the interoperability issue when many app components may not be controlled by the end user.

Further details:

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: we want to get to CR this calendar year
  • The group where the work on this specification is currently being done: Distributed Tracing Working Group
  • The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Distributed Tracing Working Group
  • Major unresolved issues with or opposition to this specification: none
  • This work is being funded by: multiple companies

You should also know that...

Previous form of this specification is built-in into ASP.NET framework, current version of specification is implemented by many OpenTelemetry SDKs.

We found that it is great to have predictability in which field is being automatically propagated as it increases transparency for end-user, as oppose to numerous vendor-specific headers.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Discussions

2021-07-26

Minutes

Peter: a bunch of activity not from us..

Peter's cat: MEOOWW

Dan: they don't have a S&P questionnaire response... [writes comment]

Yves: ask if their header can be [???] ... RFC8941

Dan: ..user identifiable data..

[dan & yves leave some comments]

Peter: lot of privacy concerns that are glossed over...

2021-08-09

Minutes

Ken: it's about associated data that you propagate. Need to read in detail.

Peter: next week?

2021-09-27

Minutes

Ken: looks pretty sensible - sending other data along with your distributed tracing... along with tracing you can send your own custom data.. metadata...

Hadley: what kind of tracing?

Ken: open telemetry project... open telemetry uses it...

Dan: isn't metadata sometimes problematic?

Ken: custom data - tracking your own thing - e.g. i'm running on azure ... could be potential privacy issue... There's a lot of telemetry people who use orchestration.. kubernetes... based on w3c distributed tracing spec... microsoft thing dapr.. https://docs.dapr.io/developing-applications/building-blocks/observability/tracing-overview/ https://opentelemetry.io/

Hadley: use cases doc?

Rossen: question around motivation.. mostly driven by this is a thing people are doing it can we standardise it please. I'm not opposed in general. But without the clear use case it's hard to piece together why this needs to be standardised, which pieces of it make sense, and the overall...

Ken: They already standardized dist tracing - this is an extension...

https://docs.dapr.io/developing-applications/building-blocks/observability/tracing-overview/

https://opentelemetry.io/

Rossen: trace contexts spec? Is a recommentation...

https://www.w3.org/TR/trace-context/

https://www.w3.org/groups/wg/distributed-tracing

Dan: but it's adding a side channel for custom metadata - so absolutely agree with hadley show me the use cases...

Hadley: i'd feel better if we can bottom out what the mitigations are...

Rossen: how did this go to rec without going through tag? In Feb 2020?

Ken: I feel i've seen this - in tag review...

[looking for any TAG review we may have done and not finding it]

Hadley: I"ll leave a comment

2021-11-15

Minutes

Ken: people are already doing this today, about sending metadata with your distributed tracing. As I understand the resposne to Hadley, clearly people are implementing custom headers which makes it harder to control, so they expect switching to this unified approach which is better from compatibility. Yyou can already do this today, so this is about standardising how people are doing it, sending this custom data, so that it will work across different libraries. That sounds like the right thing to do. This is not allowing anything new. Was wondering how this ties together with structured headers.

Dan: looks like they are being quite responsive. If this is just a way to standardise a thing people are doing already and there is some consensus around it it doesn't feel like it opens privacy issues. But is it a good design?

Ken: structured headers allows structured data, not just key value pairs and properties. [writes comment]

Ken: subset of structured headers? Make sure it's an actual subset and doesn't have confusing additional features

Peter: we have guidence in design principles about structured headers

Yves: that would be good

Peter: don't want to see yet another header form that requires another parser

Yves: or even worse, intermediaries, like what happened for cookies in the past

Dan: also missing info on multistakeholder support

Hadley: Why is it called baggage and not tracing id?

Ken: this is an extension tracing api?

Dan: language idiom issue?

Hadley: yes. Makes it harder to understand

Dan: is there anything in the responses that should be in the explainer?

Hadley: didn't get the use case until I went to the charter. Makes sense to standardise what they're already doing in other ways. Standardising a few different headers, of which baggage is the 'everything else' header. Still not convinced or understanding how this can't be hijacked for stronger user tracking. Am I missing something?

Ken: not worse than it is today. Going to standardise how you store the data, not what data you store. If people start using the same data you could use it for additional tracking, but you could always use it today with the telemetry thing. If it becomes standardised you might learn what something means you could use as additional entropy. But it's the same as today, if specific data is common, that would be the same case. Maybe slightly more entropy... if they continue today people are not collaborating well, but something becomes common enough that it's a de-facto standard. With the new thing if people just use their own data it's not going to give you more but if they use the same... it's the same.. not going to give you more entropy. Not standardising the values, just the format to store values, that's not very useful. You oculd argue there's a larger chance of de facto values and key pairs on top that could be used

Hadley: if it's standardised is it more useful to intercept?

Ken: already part of the telemetry thing.. this is just the custom data part. The non custom data part is much more valuable for tracking purposes.

Hadley: right, rather than just the baggage part. Not sure what that means we should do.

Dan: should it behave differently in private browsing?

Hadley: that's tricky. Depends on what you want to do in private browsing. You might want this kind of functionality to work. Whether it gets disrupted or cleared.. doesn't the spec say when you reload the page you get a new id?

Peter: id is provided by the service, the browser doesn't have control of that. Nothing stopping the service generating the same trace id or something they could deanonymise

Hadley: or with your IP address or somethign else identifiable or consistent

Peter: means private browser has no recourse than dropping the trace id entirely and not sending any telemetry

2021-12-Madripoor

Minutes

Hadley: no response to comments, but a new issue that references ours, responding to Yves question

Dan: that's merged. Can ask Yves if he's happy

Ken: it's meta data.. you can carry around a lot because there's no limits, you're in control. You need to understand telemetry, distributed compute.. not something you're going to get from a random website

Dan: I guess that's the point.. could a random website misuse it

Ken: they need to collaborate with someone otherwise it's their own data. The thing is people want.. distributed tracing you want not just from the frontend but from the backend and carry it around so you know how everything works together, front and back end. Normally you do that for yourself. I don't think you can do the tracing across random websites and get it to you, that would be a major issue. I think this is okay.

Dan: they say under privacy of the baggage header.. can contain user identifiable data. Must ensure it does not leak beyond defined trust boundaries... and channel is secured. What does that mean in real terms?

Ken: is there any obvious way it can leak?

Dan: it doesn't define the trust boundaries. What are the trust boundaries? It doesn't say therefore it must be in secure contexts, just says it is secure. The requirement here is good but it's not as strong as it could be. I would say something like...

In the [Privacy & Security Section](https://w3c.github.io/baggage/#privacy-of-the-baggage-header) it states "As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured." But the term "trust boundaries" is not itself defined in the document. Also maybe you should refernece [secure contexts](https://www.w3.org/TR/secure-contexts/) directly instead of just saying "secured" to be clear about what you mean by secured here?

We're also still waiting for a response on the issues raised by Hadley about incognito mode and naming.

Yves: they updated the token to the same RFC, that looks good

Dan: they need to define their privacy and security considerations better

Yves: the question Hadley raised about incognito mode still stands. Waiting on those to be answered.

Hadley: and 'baggage' being a US idiom, haven't heard back on that either

2022-02-28

Minutes

Reviewed status, punted to F2F.