design-reviews#650: Distributed Tracing WG: Baggage specification

#650: Distributed Tracing WG: Baggage specification

Opened Jun 22, 2021

Ya ya yawm TAG!

I'm requesting a TAG review of Baggage specification

Baggage is a header format and behavior specification that enables scenarios of opaque values propagation along the distributed trace. Distributed Trace can spawn across multiple components and types of servers: browsers, proxies and load balancers, server applications, and SaaS services. Baggage from incoming request is propagated to all related outgoing calls. This enables many inter-operability scenarios for end users when monitoring and troubleshooting highly distributed applications. There is a lot of prior art in a form of vendor-specific headers, this specification solves the interoperability issue when many app components may not be controlled by the end user.

Explainer (minimally containing user needs and example code): https://github.com/w3c/distributed-tracing-wg/blob/main/EXPLAINER.md
Specification URL: https://w3c.github.io/baggage/
Tests: WIP
Security and Privacy self-review²: WIP
GitHub repo (if you prefer feedback filed there): https://github.com/w3c/baggage/
Primary contacts (and their relationship to the specification):
- Sergey Kanzhelev, [Google], editor
- Daniel Khan, [Dynatrace], editor
Organization(s)/project(s) driving the specification:
Key pieces of existing multi-stakeholder review or discussion of this specification:
External status/issue trackers for this specification (publicly visible, e.g. Chrome Status): https://github.com/w3c/baggage/issues

Further details:

I have reviewed the TAG's Web Platform Design Principles
Relevant time constraints or deadlines: we want to get to CR this calendar year
The group where the work on this specification is currently being done: Distributed Tracing Working Group
The group where standardization of this work is intended to be done (if current group is a community group or other incubation venue): Distributed Tracing Working Group
Major unresolved issues with or opposition to this specification: none
This work is being funded by: multiple companies

You should also know that...

Previous form of this specification is built-in into ASP.NET framework, current version of specification is implemented by many OpenTelemetry SDKs.

We found that it is great to have predictability in which field is being automatically propagated as it increases transparency for end-user, as oppose to numerous vendor-specific headers.

We'd prefer the TAG provide feedback as (please delete all but the desired option):

🐛 open issues in our GitHub repo for each point of feedback

Discussions

Comment by @otherdaniel Jun 29, 2021 (See Github)

Can you clarify what a user-agent is meant to do on receiving this header, if anything?

This issue came up during Chromium Security & Privacy review of this issue, and we were unclear whether the spec is meant to apply to browsers / user agents or not. The spec seems primarily concerned with "downstream services", but the last example in the explainer shows a web browser echoing a "W3C Trace ID" back to a server, which would make this functionality cookie-equivalent. However, the implications of being cookie-like (life time? origin bound? which permissions apply?) do not seem to be spelled out clearly.

Comment by @dyladan Jul 20, 2021 (See Github)

Can you clarify what a user-agent is meant to do on receiving this header, if anything?

For the baggage header, the user-agent will not do anything other than send the header. In version 1, there is no response header and we have no plans to introduce one in future versions at the moment.

This issue came up during Chromium Security & Privacy review of this issue, and we were unclear whether the spec is meant to apply to browsers / user agents or not.

Other than sending the baggage header just like any other header, the user-agent has no responsibilities. It is conceivable that a user-agent would want to add baggage entries, but that is not part of the specification.

The spec seems primarily concerned with "downstream services", but the last example in the explainer shows a web browser echoing a "W3C Trace ID" back to a server, which would make this functionality cookie-equivalent. However, the implications of being cookie-like (life time? origin bound? which permissions apply?) do not seem to be spelled out clearly.

The "W3C Trace ID" is part of the separate but related W3C Trace Context specification which is not a part of the baggage specification which we are asking to be reviewed.

Comment by @arturjanc Jul 22, 2021 (See Github)

For the baggage header, the user-agent will not do anything other than send the header.

This on its own is concerning security-wise and should be covered in the spec and explainer. For example, what happens on cross-origin requests (e.g. those made in no-cors mode): would the baggage header be included? How should cross-origin redirects be handled?

I think a conceptually simple way to address this could be to guarantee that the baggage header can only be attached on requests that are same-origin to the response that included the baggage information. The spec already mentions that baggage data "does not leak beyond defined trust boundaries" but my guess is that the folks working on this need to put a little more thought into how this would work on the web.

Discussed Jul 26, 2021 (See Github)

Peter: a bunch of activity not from us..

Peter's cat: MEOOWW

Dan: they don't have a S&P questionnaire response... [writes comment]

Yves: ask if their header can be [???] ... RFC8941

Dan: ..user identifiable data..

[dan & yves leave some comments]

Peter: lot of privacy concerns that are glossed over...

Comment by @torgo Jul 26, 2021 (See Github)

Hi @SergeyKanzhelev - thanks for sending us this. We're still looking for some more info related to security & privacy - specifically the response to the questionnaire... Also can you be a bit more clear on how you "forbid encoding user identifiable data"?

Comment by @ylafon Jul 26, 2021 (See Github)

Hi, I didn't see any reference to Structured Headers, can your header format be expressed this way?

Comment by @torgo Jul 26, 2021 (See Github)

Also can you please let us know the multistakeholder status on this? What is the implementation status across other engines/browsers?

Discussed Aug 9, 2021 (See Github)

Ken: it's about associated data that you propagate. Need to read in detail.

Peter: next week?

Comment by @samuelweiler Aug 30, 2021 (See Github)

@torgo, thank you for drawing this to PING's attention by adding a privacy-tracker label.

As a reminder for the WG, when you're ready for a PING review, please follow the instructions at https://www.w3.org/Guide/documentreview/#how_to_get_horizontal_review

Comment by @SergeyKanzhelev Aug 31, 2021 (See Github)

Sorry for the long delay in replies.

For the baggage header, the user-agent will not do anything other than send the header.

This on its own is concerning security-wise and should be covered in the spec and explainer. For example, what happens on cross-origin requests (e.g. those made in no-cors mode): would the baggage header be included? How should cross-origin redirects be handled?

At the moment there are no plans to implement baggage as part of the user-agent. It will typically be done in the JavaScript library executing on the page. This JavaScript library will need to be configured to account for cors correctly.

I think a conceptually simple way to address this could be to guarantee that the baggage header can only be attached on requests that are same-origin to the response that included the baggage information. The spec already mentions that baggage data "does not leak beyond defined trust boundaries" but my guess is that the folks working on this need to put a little more thought into how this would work on the web.

We believe that defining the trust boundaries is not the part of specification, thus no specific details on how to implement cors or configure trust boundaries in the web. Trust boundaries, especially involving many components of a distributed application, are typically very specific to the application and cannot be easily generalized as a specification. This, however, may be a whole new topic for the distributed tracing working group for the future exploration.

We're still looking for some more info related to security & privacy - specifically the response to the questionnaire... Also can you be a bit more clear on how you "forbid encoding user identifiable data"?

I think this is referring to the privacy section of a specification. Specification does not solve the problem of what data will be propagated thru the distributed application components. It solves the problem of standardizing the way data is propagated so centralized solutions may be developed to inspect and secure this propagation. Many tools today allow to propagate this information using custom headers which are vendor specific and not clearly documented. So users of these tools have less control over the data being transmitted. When tools will follow the spec, it will be easier for user to control what is transmitted.

Comment by @SergeyKanzhelev Aug 31, 2021 (See Github)

I didn't see any reference to Structured Headers, can your header format be expressed this way?

We had an extensive discussions on whether to use structured headers for this or not. For multiple reasons we decided to proceed with the cookies-like format and not support strongly typed values. I cannot find the rationale clearly documented, I will take this as an action item

Discussed Sep 27, 2021 (See Github)

Ken: looks pretty sensible - sending other data along with your distributed tracing... along with tracing you can send your own custom data.. metadata...

Hadley: what kind of tracing?

Ken: open telemetry project... open telemetry uses it...

Dan: isn't metadata sometimes problematic?

Ken: custom data - tracking your own thing - e.g. i'm running on azure ... could be potential privacy issue... There's a lot of telemetry people who use orchestration.. kubernetes... based on w3c distributed tracing spec... microsoft thing dapr.. https://docs.dapr.io/developing-applications/building-blocks/observability/tracing-overview/ https://opentelemetry.io/

Hadley: use cases doc?

Rossen: question around motivation.. mostly driven by this is a thing people are doing it can we standardise it please. I'm not opposed in general. But without the clear use case it's hard to piece together why this needs to be standardised, which pieces of it make sense, and the overall...

Ken: They already standardized dist tracing - this is an extension...

https://docs.dapr.io/developing-applications/building-blocks/observability/tracing-overview/

https://opentelemetry.io/

Rossen: trace contexts spec? Is a recommentation...

https://www.w3.org/TR/trace-context/

https://www.w3.org/groups/wg/distributed-tracing

Dan: but it's adding a side channel for custom metadata - so absolutely agree with hadley show me the use cases...

Hadley: i'd feel better if we can bottom out what the mitigations are...

Rossen: how did this go to rec without going through tag? In Feb 2020?

Ken: I feel i've seen this - in tag review...

[looking for any TAG review we may have done and not finding it]

Hadley: I"ll leave a comment

Comment by @hadleybeeman Sep 27, 2021 (See Github)

Hi @SergeyKanzhelev. We are looking at this in our W3CTAG breakout, and a couple of questions have come up.

Who is the user for this? What need of theirs is it meeting? It would help us a lot of if you could put those points in your explainer.
We are concerned about the privacy/security implications of opening up a metadata channel that the user can't control. Have you thought through any attack scenarios? And if so, what are your thoughts on how to make them less likely to happen?

The security and privacy sections of your spec talk about the responsibilities of application owners and systems. Does your approach let users protect themselves too? Or might it be easier to track them with your proposal?

We'd be grateful for your thoughts on those. Thanks!

Comment by @arturjanc Oct 1, 2021 (See Github)

At the moment there are no plans to implement baggage as part of the user-agent. It will typically be done in the JavaScript library executing on the page. This JavaScript library will need to be configured to account for cors correctly.

I'm having a bit of difficulty reconciling this with the description in the explainer: "We also propose a response header which can be used to report a trace ID back to the caller to ... correlate the initial page load of a browser and all subsequent requests to a server side trace" and "The browser uses this information to provide a trace ID for all subsequent requests within this one load cycle".

From your explanation above I understand that the user's browser would simply ignore the header and the responsibility of propagating the baggage value would be on the client-side JS executing on a given page. This brings up a few thoughts:

How will scripts executing on a page learn the value of the baggage header? Are you also proposing a JS API to get this data?
This means that the baggage will -- in practice -- be propagated only for a subset of requests made by a given page, and will exclude no-cors resource loads. I think this means that the value will only be propagated for programmatic loads using the fetch() API if the caller makes sure to add the baggage header.
Pages with some third-party resources that don't opt in via Access-Control-Allow-Headers may have a hard time enabling this. My guess is that most pages are in this situation.

Comment by @SergeyKanzhelev Oct 19, 2021 (See Github)

@hadleybeeman thank you for your questions and sorry for delay.

Who is the user for this? What need of theirs is it meeting? It would help us a lot of if you could put those points in your explainer.

The user for Baggage is an application developers or authors or third party tools installed to these applications. Baggage is a mechanism that enables specific scenarios of information propagation across components of a distributed system. By standardizing it we are simplifying implementation of these scenarios as well as giving application developers some mechanisms or verification on what is shared between systems. At present, custom headers are often used which makes is harder to audit universally.

We are concerned about the privacy/security implications of opening up a metadata channel that the user can't control. Have you thought through any attack scenarios? And if so, what are your thoughts on how to make them less likely to happen?

At present those scenarios are often implemented using custom headers, which makes it harder to control. We expect the most implementation will want to switch to unified approach that will offer a better compatibility across vendors and components. And this will make audit and control of what metadata is shared easier.

The security and privacy sections of your spec talk about the responsibilities of application owners and systems. Does your approach let users protect themselves too? Or might it be easier to track them with your proposal?

End users must rely on application authors for protection. Similar how it happens with any custom headers application author may decide to send today, there is no mechanism to control what will be sent via the baggage header. In future if baggage will be widely used on browsers in JS libraries, browsers may consider restricting where baggage headers will be send.

Comment by @SergeyKanzhelev Oct 19, 2021 (See Github)

How will scripts executing on a page learn the value of the baggage header? Are you also proposing a JS API to get this data?

Today we only support ajax calls inside a single page load. We do not propose a JS API for the actual page loads at this point. We were considering using Server-Timing header to return some correlation data from the server back to the page, but this is not something we are actively working on.

This means that the baggage will -- in practice -- be propagated only for a subset of requests made by a given page, and will exclude no-cors resource loads. I think this means that the value will only be propagated for programmatic loads using the fetch() API if the caller makes sure to add the baggage header.

Yes, implementations will be JS Library-specific.

Pages with some third-party resources that don't opt in via Access-Control-Allow-Headers may have a hard time enabling this. My guess is that most pages are in this situation.

Yes, this standard makes situation a little bit better as it defines a unified header when in the past custom headers were typically used. It will be easier to ask Access-Control-Allow-Headers to enable baggage than any vendor-specific headers. And we may provide some helper methods for this. We had conversation about including Distributed Tracing Working Group headers to the default set of allowed headers. But I don't think this will be happening short term

Comment by @dyladan Nov 9, 2021 (See Github)

Hello TAG review team. This is just a friendly ping to see what the current status of this review is. Please let me know if you're waiting on anything from us (Distributed Tracing WG) at this point.

Comment by @plinss Nov 9, 2021 (See Github)

@dyladan thanks for the ping, we’ll get back to this next week

Discussed Nov 15, 2021 (See Github)

Ken: people are already doing this today, about sending metadata with your distributed tracing. As I understand the resposne to Hadley, clearly people are implementing custom headers which makes it harder to control, so they expect switching to this unified approach which is better from compatibility. Yyou can already do this today, so this is about standardising how people are doing it, sending this custom data, so that it will work across different libraries. That sounds like the right thing to do. This is not allowing anything new. Was wondering how this ties together with structured headers.

Dan: looks like they are being quite responsive. If this is just a way to standardise a thing people are doing already and there is some consensus around it it doesn't feel like it opens privacy issues. But is it a good design?

Ken: structured headers allows structured data, not just key value pairs and properties. [writes comment]

Ken: subset of structured headers? Make sure it's an actual subset and doesn't have confusing additional features

Peter: we have guidence in design principles about structured headers

Yves: that would be good

Peter: don't want to see yet another header form that requires another parser

Yves: or even worse, intermediaries, like what happened for cookies in the past

Dan: also missing info on multistakeholder support

Hadley: Why is it called baggage and not tracing id?

Ken: this is an extension tracing api?

Dan: language idiom issue?

Hadley: yes. Makes it harder to understand

Dan: is there anything in the responses that should be in the explainer?

Hadley: didn't get the use case until I went to the charter. Makes sense to standardise what they're already doing in other ways. Standardising a few different headers, of which baggage is the 'everything else' header. Still not convinced or understanding how this can't be hijacked for stronger user tracking. Am I missing something?

Ken: not worse than it is today. Going to standardise how you store the data, not what data you store. If people start using the same data you could use it for additional tracking, but you could always use it today with the telemetry thing. If it becomes standardised you might learn what something means you could use as additional entropy. But it's the same as today, if specific data is common, that would be the same case. Maybe slightly more entropy... if they continue today people are not collaborating well, but something becomes common enough that it's a de-facto standard. With the new thing if people just use their own data it's not going to give you more but if they use the same... it's the same.. not going to give you more entropy. Not standardising the values, just the format to store values, that's not very useful. You oculd argue there's a larger chance of de facto values and key pairs on top that could be used

Hadley: if it's standardised is it more useful to intercept?

Ken: already part of the telemetry thing.. this is just the custom data part. The non custom data part is much more valuable for tracking purposes.

Hadley: right, rather than just the baggage part. Not sure what that means we should do.

Dan: should it behave differently in private browsing?

Hadley: that's tricky. Depends on what you want to do in private browsing. You might want this kind of functionality to work. Whether it gets disrupted or cleared.. doesn't the spec say when you reload the page you get a new id?

Peter: id is provided by the service, the browser doesn't have control of that. Nothing stopping the service generating the same trace id or something they could deanonymise

Hadley: or with your IP address or somethign else identifiable or consistent

Peter: means private browser has no recourse than dropping the trace id entirely and not sending any telemetry

Comment by @kenchris Nov 15, 2021 (See Github)

I see that you currently support key/value pairs and properties. Do you expect this to cover all your use-cases?

As far as I could see this looks like a subset of structured headers - is that the case? I think this should either use structured headers or a strict subset

Comment by @torgo Nov 15, 2021 (See Github)

We're also missing info on multi-stakeholder support. I can't find this on Chrome Status. @dyladan @SergeyKanzhelev can you comment?

Comment by @hadleybeeman Nov 15, 2021 (See Github)

Thanks for the thorough reply to my earlier questions, @SergeyKanzhelev! They were helpful. For what it's worth though, I still didn't really get the use case you're trying to support until I looked at the working group's charter. You might want to take a bit of the scope section from the charter to set the scene in your explainer.

Also, we've been talking about the name of the feature. Baggage is definitely a US/UK idiom, and may not be intuitive to anyone who hasn't been focusing on distributed tracing. Might it be useful to use something like Trace ID additional header? (Maybe best to do some user research on this — the important thing is that it's immediately understandable to as many of your users as possible.)

And finally, in the privacy vein, we talked about whether it would be good for this to behave differently in private/incognito browsing mode. It struck us that there is nothing to stop the application provider from sending headers derived from information tied to the machine or usual user (their IP address, their user ID, etc.). We wondered if it might make sense to ask user agents to ignore the trace ID altogether in when in private/incognito mode, so that the user's experience isn't tied to the experience that same user (or whoever they share a browser with) gets when not in private/incognito mode. What are your thoughts on this?

Comment by @ylafon Nov 15, 2021 (See Github)

Also, is there a reason why, in the ABNF, you refer to rfc2616 and rfc7230? Why not using the same source (ideally the latest) for all the definitions?

Discussed Dec 1, 2021 (See Github)

Hadley: no response to comments, but a new issue that references ours, responding to Yves question

Dan: that's merged. Can ask Yves if he's happy

Ken: it's meta data.. you can carry around a lot because there's no limits, you're in control. You need to understand telemetry, distributed compute.. not something you're going to get from a random website

Dan: I guess that's the point.. could a random website misuse it

Ken: they need to collaborate with someone otherwise it's their own data. The thing is people want.. distributed tracing you want not just from the frontend but from the backend and carry it around so you know how everything works together, front and back end. Normally you do that for yourself. I don't think you can do the tracing across random websites and get it to you, that would be a major issue. I think this is okay.

Dan: they say under privacy of the baggage header.. can contain user identifiable data. Must ensure it does not leak beyond defined trust boundaries... and channel is secured. What does that mean in real terms?

Ken: is there any obvious way it can leak?

Dan: it doesn't define the trust boundaries. What are the trust boundaries? It doesn't say therefore it must be in secure contexts, just says it is secure. The requirement here is good but it's not as strong as it could be. I would say something like...

In the [Privacy & Security Section](https://w3c.github.io/baggage/#privacy-of-the-baggage-header) it states "As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured." But the term "trust boundaries" is not itself defined in the document. Also maybe you should refernece [secure contexts](https://www.w3.org/TR/secure-contexts/) directly instead of just saying "secured" to be clear about what you mean by secured here?

We're also still waiting for a response on the issues raised by Hadley about incognito mode and naming.

Yves: they updated the token to the same RFC, that looks good

Dan: they need to define their privacy and security considerations better

Yves: the question Hadley raised about incognito mode still stands. Waiting on those to be answered.

Hadley: and 'baggage' being a US idiom, haven't heard back on that either

Comment by @torgo Dec 9, 2021 (See Github)

In the Privacy & Security Section it states "As such, the baggage header can contain user-identifiable data. Systems MUST ensure that the baggage header does not leak beyond defined trust boundaries and they MUST ensure that the channel that is used to transport potentially user-identifiable data is secured." But the term "trust boundaries" is not itself defined in the document. Also maybe you should refernece secure contexts directly instead of just saying "secured" to be clear about what you mean by secured here?

We're also still waiting for a response on the issues raised by Hadley about incognito mode and naming.

Comment by @kalyanaj Feb 15, 2022 (See Github)

Regarding this comment: https://github.com/w3ctag/design-reviews/issues/650#issuecomment-969128971: we don't require any browser support for this specification as this is implemented in JavaScript (e.g. OpenTelemetry JS SDKs). OpenTelemetry SDKs across multiple languages implement Baggage specification.

Comment by @kalyanaj Feb 24, 2022 (See Github)

Hello TAG Review Team: Thanks for your review comments above, I want to consolidate the list of questions discussed above and the current status:

Naming: https://github.com/w3c/baggage/issues/83: Response tracked as part of this issue.
Private Browsing mode: https://github.com/w3c/baggage/issues/84: Response tracked as part of this issue.
Multi-stakeholder support: Covered as part of this response at https://github.com/w3ctag/design-reviews/issues/650#issuecomment-1040777563
Structured Headers: Covered as part of this response at https://github.com/w3ctag/design-reviews/issues/650#issuecomment-909543300
Reference to the right RFC (https://github.com/w3ctag/design-reviews/issues/650#issuecomment-969290866): We have addressed this feedback as part of https://github.com/w3c/baggage/pull/79.

Please let us know your thoughts on the above and if you have any additional comments/ questions.

Discussed Feb 28, 2022 (See Github)

Reviewed status, punted to F2F.

Comment by @torgo Mar 23, 2022 (See Github)

Hi @kalyanaj - thank you so much for preparing that summary addressing all the issues that were raised. We're just coming back to this today at our face-to-face meeting. We're happy with the responses you've provided. We'd still like to see some additional justification re: the structured headers topic (documented in your spec). However we're broadly happy with this.