Design Review #206

#206: `Accept-CH` header is weird

Opened Oct 17, 2017

Following on from #190, the Accept-CH header seems strange. We're trying to understand its motivation.

The latest draft spec notes cache-friendliness as a reason for the header. This motivation isn't entirely clear; aren't servers allowed to specify which headers to Vary on?

Other reasons we have heard include header bloat, but this hasn't been quantified for us. I.e., if a large fraction of high-traffic sites opt-in via Accept-CH, it seems probable that Accept-CH + DPR, Width, etc. data would be larger than simply sending all values all the time.

Lastly, it doesn't seem that there's a privacy motivator, given that this is a server opt-in.

This design complicates some scenarios and makes it harder to design new APIs. If header bloat is a major issue for the platform, the TAG would also like to be aware so as to help avoid designs that would encourage the addition of new headers. As yet, this case hasn't been clearly articulated with data we've seen.

Please help, @mnot, @triblondon, @igrigorik!

Discussions

Comment by @triblondon Oct 18, 2017 (See Github)

The implication in your issue description is that Accept-CH is a request header. It's a response header, so just want to check you're aware of that.

So the alternative is that browsers send this data in all requests all the time? Since the data is relatively small and there's no user-opt-in, I see no obvious problem with that.

Comment by @mnot Oct 19, 2017 (See Github)

Yeah, I can see how that would confuse people; perhaps Allow-CH (to align with the response nature of Allow)?

Comment by @igrigorik Oct 27, 2017 (See Github)

So the alternative is that browsers send this data in all requests all the time? Since the data is relatively small and there's no user-opt-in, I see no obvious problem with that.

Loud and clear feedback from implementers to date it that they are not willing to do this. Exposing "unproven" headers on every request is a very high bar and it doesn't scale, and the goal of CH is to establish a framework that can scale to (many) dozens of different hints.

As a result, we need an origin opt-in mechanism, which is what Accept-CH defines: it's a response header that advertises which hints the origin supports, or is interested in. Also, as noted in #190, there is a companion Accept-CH-Lifetime which persists this preference for specified time, allowing these hints to be delivered on subsequent navigations.

perhaps Allow-CH (to align with the response nature of Allow)?

I'm not sure that Allow makes that any clearer to the uninitiated, and we do have Accept-CH being used in the wild already, so I'd be very hesitant about making this change unless it's a clear an unambiguous win.

Comment by @triblondon Oct 28, 2017 (See Github)

My experience is that developers find the whole Accept-* and Vary system to be the most confusing part of HTTP, and adding a new response header with a similar name to those which have until now only been used in the request (to my knowledge!) adds to that confusion, imho.

I’m also generally having an ick reaction to the idea of a response including metadata that is nothing to do with the resource but is intended to negotiate the client’s behaviour in future requests. This seems like something that should happen prior to the request, like the way we upgrade to H2, for example.

This concern would be mitigated by doing this opt in via origin policy, so I wonder if that could offer a better solution.

I don’t know if any of that qualifies as an unambiguous win.

Comment by @igrigorik Oct 28, 2017 (See Github)

I’m also generally having an ick reaction to the idea of a response including metadata that is nothing to do with the resource but is intended to negotiate the client’s behaviour in future requests. This seems like something that should happen prior to the request, like the way we upgrade to H2, for example.

Shoving negotiation into TLS handshake is neither simple, easy, nor makes it any more understandable to developers. Not to mention it would be a layer and functional mismatch: we're not negotiation connection-wide properties, we're defining caching policies against a specific resource or origin, which is precisely what HTTP req-resp negotiation is for.

This concern would be mitigated by doing this opt in via origin policy, so I wonder if that could offer a better solution.

Origin Policy doesn't directly address any of this. It still relies on header-based delivery mechanism — which is what we're defining here — and optionally moves delivery of such headers into an origin wide policy file. As such, you still need to define all the headers we're talking about here.

Comment by @mnot Oct 29, 2017 (See Github)

@igrigorik Origin Policy has moved past that, AIUI; it's no longer header-based. That said, I think a more practical argument against OP is that it's still a ways off (unless it starts moving much more quickly).

WRT ick reaction - we have a fair amount of metadata with a variety of scopes already defined in headers. My only ick reaction is when that scope is ill-defined, because it causes confusion and interop problems. OP would be nice if it were both ready and well-suited to this problem.

/cc @mikewest

Comment by @igrigorik Oct 29, 2017 (See Github)

@mnot how so? Is the current draft out of date? Looking at the example the .well-known policy file specifies origin scoped response headers, which is what we're discussing here. No?

I'm all for OP, but it doesn't directly address the problem Accept-CH solves.

Comment by @mnot Oct 29, 2017 (See Github)

OP is one of our... slower-moving specs. See: https://github.com/WICG/origin-policy/issues/19#issuecomment-321229817

Comment by @igrigorik Oct 31, 2017 (See Github)

@mnot ah, neat-o! I like that direction.

I see some activity on the Chrome prototype @ https://bugs.chromium.org/p/chromium/issues/detail?id=751996, but it's not clear to me how the roadmap looks like from here and what hidden risks are yet to be discovered. For example, implications of blocking response processing while policy is fetched and applied (design sketch).

I'd like to suggest that we should pursue these efforts in parallel. We have lots of folks and use cases waiting to be unblocked via Accept-CH + Accept-CH-Lifetime, and if down the road we can enable alternative opt-in functionality via OP.. great!

Comment by @ylafon Nov 2, 2017 (See Github)

Note that the metadata will not be sent on the first hit, (ie: prior receiving Accept-CH from the server). #190 could benefit from Variants to let the client decide (without even sending metadata to the server in this case).

Comment by @igrigorik Nov 2, 2017 (See Github)

@ylafon correct, a request to an origin that UA has not visited before would not have any hints. A subsequent visit can if the origin provides Accept-CH and Accept-CH-Lifetime.

We've been working with @mnot on Variants / CH interop, and I'm happy to replace Key guidance in the spec at a later point, once Variants is stable.

Comment by @torgo Dec 12, 2017 (See Github)

Discussed on today's teleconference. Some concerns persist about the design.

Comment by @slightlyoff Dec 12, 2017 (See Github)

Hey @igrigorik,

We discussed again at today's TAG call, and I remain confused (perhaps you can join a future call?).

There's a deep principle I'd like some clarity on: are we "too bloated" today? It seems the primary value is in reducing request header size, but only on the sites which don't use the values subsequently. This is very concerning! It seems that if we're at (or beyond) some theoretical limit about the total amount we can add, that the TAG should be providing strong guidance to feature developers to avoid new header information and perhaps be working to trim existing bloat back. If that's a cause we need to be signed up to, we'd love to know.

Specifically, would love to understand if there's data that points to a general issue with header bloat and a calculus that describes how the Accept-* pattern helps, particularly given the extra round-trips that might be needed (e.g., for a redirect) to generate alternative body content for a top-level request.

Similarly, does the data suggest that we should be trying to consolidate Accept-* headers? If we're over our budget, would like to avoid opt-in-flag bloat.

Comment by @triblondon Dec 13, 2017 (See Github)

Also @igrigorik, sorry to harp on about it and you may have considered it a poor point but I don't think I saw a response to the suggestion that using Accept as a response header is very hard for developers to reason about. Fundamentally the request-response model of the web is built on the idea that each transaction is relatively stateless, and I'm uneasy about the recent trend to bend those rules with stuff like Clear-Site-Data or HSTS which apply to the entire origin or affect subsequent requests. Origin policy seems like the answer to this concern, and it seems at the very least a shame to ship Accept-CH in the meantime.

You're going to have a lot of developers wanting to use CH, because they're great and they address a lot of performance concerns that we tell people to worry about, but imho this isn't a low level primitive that we expect a small number of exceptionally clever people to build a user-friendly library on top of. So I think of the average developer and I'd like them to find CH simple to understand and use. I see Accept-CH as a bit of a curveball. But my only suggestion is to wait for origin policy, or to not require the opt-in in the first place, which brings us to Alex's points above.

Comment by @mnot Dec 13, 2017 (See Github)

@slightlyoff when you say Accept-* above, are you referring to request headers that fall into that pattern, or the specific semantics of Accept-CH?

Comment by @ylafon Jan 22, 2018 (See Github)

See also httpwg/http-extensions#372 for discussion about the privacy aspect of Accept-CH

Comment by @triblondon Jan 22, 2018 (See Github)

Reminded by @ylafon's post above that this came up again during the TAG call last week. We continued to struggle to understand the need for Accept-CH. @slightlyoff noted that if we are 'running out' of headers, then isn't that a really big problem that goes far beyond client hints? There are lots of new headers that have come into HTTP in the last few years, why aren't those opt in too.

We noted that there's a danger of sites introducing a mandatory redirect for non-bot users just to capture the CH data before rendering a page. Some sites for example might have a very different experience when Save-Data is enabled (see CNN vs CNN lite).

Personally my primary concern remains that the syntax and nomenclature is confusing to developers who are used to Accept-* being a request header concept, not a response header.

Comment by @igrigorik Jan 29, 2018 (See Github)

There's a deep principle I'd like some clarity on: are we "too bloated" today? It seems the primary value is in reducing request header size, but only on the sites which don't use the values subsequently. This is very concerning! It seems that if we're at (or beyond) some theoretical limit about the total amount we can add, that the TAG should be providing strong guidance to feature developers to avoid new header information and perhaps be working to trim existing bloat back. If that's a cause we need to be signed up to, we'd love to know.

Every new TCP connection starts with a small send+receive window — see flow control & congestion control. As a result, if the client overflows the initial window, it'll incur extra roundtrips before the request can be read and processed by the server. Historically, this value was only 3 packets, and RFC 6928 bumped it to 10 packets. I don't have recent telemetry on adoption of IW10, but let's assume that most clients and servers are using this new limit. Even so, ~14KB does not offer infinite room for growth, especially once you account for large cookies, long URLs, etc.

As a result, the general guidance is to be conservative with adding headers that are emitted by default: omit if possible, use compact form, make them opt-in, etc. That's not a hard and fast rule against adding headers, but a good guiding principle that's true today and will stay true in the future.

As a side note: HTTP/2 helps a bit here with its default dictionary allowing us to compress common header fields, but the client still has to communicate the initial non-default values (e.g. all cookie values, etc) at the start of the connection, so overall calculus stays the same.

Specifically, would love to understand if there's data that points to a general issue with header bloat and a calculus that describes how the Accept-* pattern helps, particularly given the extra round-trips that might be needed (e.g., for a redirect) to generate alternative body content for a top-level request.

As outlined above, we don't have infinite room for growth if we want to avoid bumping against the initial window size and incurring extra roundtrips. On the other hand, the goal of Client Hints is to establish a framework that can allow us to communicate an easily extensible list of hints (well beyond what's in the spec today), some of which may only be useful to a select number of origins.

Hence the opt-in flow provided by Accept-CH. The server indicates to the client which hints it is interested in, alongside an optional preference for how long the client should remember this preference for, and the client can then emit the right hints for each origin without burdening the web at large with every hint we can think of ever.

Does the data suggest that we should be trying to consolidate Accept-* headers? If we're over our budget, would like to avoid opt-in-flag bloat.

No. Modifying default Accept header is something we've historically been very-very careful with, not for reasons of size but due to fear of triggering invalid matches in existing Accept parsers in the wild. A separate Accept-CH header provides clean separation from this problem at a cost of 9 extra bytes — bytes well spent.

Conversely, you also want to deliver hints as separate headers, instead of a single glob value, because the latter is not cache friendly and makes it very hard to craft Vary/Key policies.

Fundamentally the request-response model of the web is built on the idea that each transaction is relatively stateless, and I'm uneasy about the recent trend to bend those rules with stuff like Clear-Site-Data or HSTS which apply to the entire origin or affect subsequent requests. Origin policy seems like the answer to this concern, and it seems at the very least a shame to ship Accept-CH in the meantime.

"Relatively stateless" is a nice myth — see cookies. I hear your concerns but the mechanism we're using here is using well-established HTTP semantics. If and when Origin Policy becomes available, we can certainly look into enabling CH through it for browsers, as an alternative deployment option. Don't forget that CH is not restricted to browsers only, it's also useful to other types of clients to whom OP does not appy, but also benefit from a standard negotiation mechanism provided by Accept-CH. Not to mention, OP also introduces blocking roundtrips, which are highlighted below as a potential concern.

We noted that there's a danger of sites introducing a mandatory redirect for non-bot users just to capture the CH data before rendering a page. Some sites for example might have a very different experience when Save-Data is enabled (see CNN vs CNN lite).

That's bad implementation. CH is progressive enhancement: you can't assume that all browsers will provide all of the hints all the time — e.g. some browsers may not support some hints at all, some may opt-out from sending them when in incognito, and so on. In other words, think of CH in the same way as you think about ServiceWorker registrations. Is it possible that some sites might choose to wait for the SW registration and force a reload? Yes. But, once again, that's bad implementation. CH is subject to same considerations and best practices:

Server should check for presence of hints on the nav request.
1. If hints are present, because of previous opt-in: use them
2. Otherwise, emit Accept-CH + Accept-CH-Lifetime
  - If the browser supports delivery of requested hints, it will deliver them on subsequent requests to the origin (nav requests, subresource requests, etc) — see (i)

I believe this should address all the outstanding questions here. Let me know if I missed anything.

Comment by @triblondon Feb 2, 2018 (See Github)

Discussed at London F2F. TAG's group conclusion is that this header needs to be reconsidered. Our primary concern is the redirection that it incentivises. We believe many sites will choose to bounce non-bot users through a redirect in order to farm the CH data before rendering a first page. As the number of CH headers increases, this motivation will only get stronger. There is a strong precedent for web authors doing things that implementors think are bad ideas (eg. naive user agent header parsing).

This will leave Client-Hints in a position marginally better than serving a script to sample the data in JS and then setting it in a cookie and doing a redirect, but not by much. It doesn't achieve the goal of client hints.

It's worth noting that there are numerous examples of major sites doing redirects on a first request. And also that many site authors will go to considerable lengths to avoid an inconsistent experience between the first and second page views. Additionally, if subresource requests within a page get different CH headers to the page itself, the subresources run the risk of not being compatible with the page into which they are being loaded.

Comment by @tarunban Feb 2, 2018 (See Github)

I agree that by not including hints on the first request itself, Accept-CH header incentives redirects which can regress performance. This is bad for both the user and the developer.

That's where the Accept-CH-Lifetime header helps. By using the Accept-CH-Lifetime, the origin can specify the list of hints that it's interested in, and a time duration. The user agent will then persist that per-origin list on durable storage. Next time any resource is fetched from the same origin, the user agent would include the specified hints in the request headers. This obviates the need for developers to add an extra redirect since once the origin provides the client hints preference, it is almost guaranteed (except in some privacy-preserving cases) that all subsequent resource requests from the same user (even across app restarts) would contain the specified hints.

Comment by @igrigorik Feb 2, 2018 (See Github)

I feel like we're going in circles here, and it's frustrating..

Discussed at London F2F. TAG's group conclusion is that this header needs to be reconsidered. Our primary concern is the redirection that it incentivises. We believe many sites will choose to bounce non-bot users through a redirect in order to farm the CH data before rendering a first page. As the number of CH headers increases, this motivation will only get stronger. There is a strong precedent for web authors doing things that implementors think are bad ideas (eg. naive user agent header parsing).

As stated before, CH is progressive enhancement. Your position is premised on the fact that every hint must be available on every request, from every browser. This will never be true for reasons stated earlier, nor does it account for the very real concerns of overflowing window sizes.

Further, as explained earlier, the combination of Accept-CH + Accept-CH-Lifetime gets us to exactly the same place you're looking for, but addresses the above. Delivery of every hint on every request, without an opt-in, is an non-starter for implementers — I've pushed this rock up this particular hill for 5+ years now, I can say this with certainty.

Last but not least, we've had CH in production (without delivery on initial request) for a few years now and yes, we heard loud and clear feedback that sites want it on initial request. However, nothing stopped implementers from using the redirect pattern you've described.. and yet, I'm not aware of a single instance of this. On the other hand, we have strong feedback that Accept-CH + Accept-CH-Lifetime is a good solution that would be used once available.

It's worth noting that there are numerous examples of major sites doing redirects on a first request.

Yes, redirects are a thing, and? Are you suggesting that web should get rid of redirects? If so, that's a separate TAG discussion.

And also that many site authors will go to considerable lengths to avoid an inconsistent experience between the first and second page views.

Progressive enhancement is a thing — e.g. ServiceWorker. Does the TAG believe that progressive enhancement is a pattern that we should discourage on the web? How did we rationalize SW?

if subresource requests within a page get different CH headers to the page itself, the subresources run the risk of not being compatible with the page into which they are being loaded.

How does this relate to any of the above, and what makes you believe that subresource requests would receive different hints? The opt-in is origin wide.

Just to be clear, I understand and hear the raised concerns here. The challenge is that the solution space here has tradeoffs on all sides, and I strongly believe — based on past 5+ years of work on this — that given the feedback from browser/client+server+site implementers we've arrived at a good solution that satisfies all the parties. Would everything be simpler if we could just send every hint on every request — of course, that's where we started! But for all the reasons stated above, that's not a tenable solution.

If the TAG has an alternative proposal that can satisfy all of the issues raised above, I'm happy to hear it. In absence of one, "reconsider" is not useful feedback.

Comment by @mnot Feb 2, 2018 (See Github)

I'd note that this is a HTTP WG draft, and if individuals on the TAG want to give feedback on it, the appropriate places are outlined in our contribution policy. Folks over there probably aren't seeing this discussion.

Comment by @mnot Feb 21, 2018 (See Github)

To elaborate on one of Ilya's points, if I may:

Progressive enhancement is a thing — e.g. ServiceWorker. Does the TAG believe that progressive enhancement is a pattern that we should discourage on the web? How did we rationalize SW?

To wit, CH is an optimisation; when the headers aren't in the initial request, the server can choose to degrade the image quality or use a bit more bandwidth, depending on its goals. From the end user standpoint, the page still loads -- just not quite as optimally (in some dimension).

So, I don't understand the seeming conclusion above that redirection is inevitable. The core use case IMO for CH is allowing intermediaries -- servers, reverse proxies, server-side software that isn't deeply integrated into the content -- to serve the optimal response without modifying content. People developing those products tend to understand the perf penalty of an extra round trip; they have other, reasonable options (as per above), and don't have to redirect, because again, it's an optimisation.

Popping up a level - what solution is the TAG proposing to its concerns? Note that removing Accept-CH means that CH would be considered as "passive fingerprinting."

Discussed Mar 1, 2018 (See Github)

Peter: Google was going to go talk to some people...

Alex: I did. I now understand that the reason the hints are not being sent on the initial nav is that some folks who represent the net team are resistent to sending this data. (For reasons not privacy related.) These folks haven't seen the compelling argument yet... Concerns about size (how much can afford to send). Pretty extrodinarily that this will be resolved soon. Plan is to show effectivness on sub-resources first, then use that data to convince the team to put it on the initial request. Politics. We shold note in our review that the inital request should send the data and inquire who can take the feedback.

Peter: Andrew had some strong feedback--please sync with him first.

Comment by @mnot Apr 5, 2018 (See Github)

Right now, the draft defines the opt-in mechanism and requires is use with a SHOULD:

Implementers SHOULD support Client Hints opt-in mechanisms and MUST clear persisted opt-in preferences when site data, browsing history, browsing cache, or similar, are cleared.

Otherwise, it's optional (and this is reinforced in several places).

I think that the minimal change that the TAG wants is to remove/reduce that SHOULD, correct? Or is it preferable to remove the opt-in mechanism completely? What about the Feature Policy integration?

For background, this issue covers some of the privacy discussion.

Comment by @igrigorik Apr 5, 2018 (See Github)

/me nudges @slightlyoff to summarize discussions+resolutions on this. :-)

Comment by @slightlyoff Apr 7, 2018 (See Github)

Hey all,

Thanks for all of the thoughtful discussion.

After talking at some length with both @igrigorik and @mnot, the TAG wanted to close this issue. We're happy to see the Accept-CH system go forward with the following caveats:

Developers clearly want Client-Hints to be sent for top-level navigations, and there are clear use cases for it. We should acknowledge that this is a goal of Client Hints.
It is punitive to introduce the need for a redirect to negotiate alternative top-level content for data and latency and data-sensitive users and the sites that serve them. This potentially undermines the utility of URLs for these sites and introduces even more latency. This is only partially addressed by the addition of Accept-CH-Lifetime header at the cost of high complexity for both developers and implementers.
It seems odd to need to send both Accept-CH and Accept-CH-Lifetime. Can these be combined? It also further seems odd that the choice of terminology is inconsistent with other lifetime-defining header directives (e.g., Cache-Control, Expect-CT, etc.) which all use max-age sub-values.
It's also incongruous that the header is a response value but is named "Accept-". No other header we're aware of uses this nomenclature in responses (vs requests).
We remain unconvinced by privacy arguments regarding opt-in-free Client-Hints. Perhaps for environments that aren't the web, this would be a meaningful distinction, but for HTML, the reality of CSS media queries, DOM APIs, and other side-channels for determining this information at runtime ensure that there's no net privacy benefit (aside from requiring that fingerprinting parties do so actively and thus visibly, a situation the can be re-created by removing support for Client-Hints). UAs that don't provide regular HTML processing could avoid sending specific hints. A unique case is reflection of data that's not trivially available otherwise (e.g. Save-Data), but given that this information is available via origin opt-in under Accept-CH. We therefore recommend that UAs revisit the decision to require the opt-in at their earliest convenience (again, acknowledging that non-web environments or restricted contexts like a "private browsing" mode may choose to set defaults very differently, up to and including disabling Client-Hints entirely).
Without the privacy argument, the argument for the Accept-CH protocol seems to hinge on an argument about header bloat. This is a question for data to resolve and we're looking forward to learning more about the current state of request header contribution to packet fragmentation and latency. If the TAG should be discouraging the addition of new data to the default set of headers, we'd like to understand the case in detail so we might advise feature developers appropriately (perhaps via our Design Principles, which we request feature designers consult).

We'd love to hear back from Accept-CH implementers regarding the impact of the feature when deployed and how they weigh on the various questions we've raised here. As a step towards a future where this information is available to developers more cheaply, Accept-CH is a reasonable short-term compromise. We'd like to avoid it being a long-term end-state.

Comment by @reschke Apr 12, 2018 (See Github)

@slightlyoff - https://greenbytes.de/tech/webdav/rfc7694.html#accept-encoding makes "Accept-Encoding" a response header field. Furthermore there are https://greenbytes.de/tech/webdav/rfc5789.html#accept-patch and https://www.w3.org/TR/ldp/#header-accept-post