#201: ads.txt

Visit on Github.

Opened Sep 26, 2017

Hello TAG!

I'm requesting a TAG review of:

Further details (optional):

  • This issue has been opened up by the TAG

Discussions

Comment by @torgo Sep 26, 2017 (See Github)

Some of the issues we have identified in our discussion at our Nice f2f:

  • This spec defines a well known, hard coded URL. There is now a standard for placing these paths within a .well-known prefix, see https://tools.ietf.org/html/rfc5785
  • The spec does not define the format using a formal syntax grammar, eg. ABNF, making it very hard to understand what would be valid examples of this format. For example, there is no specification for which whitespace characters are acceptable as separators. For examples of good grammar specifications, see https://www.w3.org/TR/tabular-data-model/
  • The spec requires that the ads.txt file is published on a 'root domain'. There is no technical definition of 'root domain' in web architecture, and sites with authority and control over an origin may reasonably not have control over the parent origin.
  • It appears possible that this document is allowing for parseable content to follow on from a comment on the same line as the comment text. This would be so unusual that we suspect that this is not actually the intent of the authors.
  • The document specifies that ads.txt should be available on HTTP and HTTPS. This is enormously concerning, especially since some sites are moving away from listening for HTTP traffic at all, and requiring the use of HTTP for any web specification should be considered contrary to the very principles of good web architecture and detrimental to the future development of the web. See the TAG finding on securing the web
  • The document contains a normative reference to w3schools regarding URL encoding, which is a site generally regarded as a poor source of information about the web, and certainly not a primary source on any subject. On this point, https://tools.ietf.org/html/rfc3986 would be the correct normative reference.
  • Google has a system called App links and we are wondering why a mechanism like that is not appropriate for this use case.

We are happy to engage with the authors, and we appreciate the importance of the problem that this is trying to solve. Making this more compatible with web architecture would be appreciated and will help the authors get better buy in from the web community.

(most of the words in this comment by @triblondon)

Comment by @slightlyoff Sep 26, 2017 (See Github)

The author of the document responded to a private ping, noting there's an updated version of the document here.

The 1.0.1 update indicates that crawlers should follow redirects within the same CNAME entry (although the language is wolly regarding "root domain"); e.g. it allows redirects between https://example.com and http://example.com, enabling downgrade of connection security.

There appear to be additions for "SUBDOMAIN" which is a redirect type. It does not appear to be well-specified and it's unclear why redirects with an eTLD+1 policy aren't being used instead.

Comment by @tantek Oct 17, 2017 (See Github)

@torgo re: "On this point, https://tools.ietf.org/html/rfc3986 would be the correct normative reference." why not https://url.spec.whatwg.org/ instead which I believe more and more W3C RECs are citing. E.g. https://www.w3.org/TR/webmention/#normative-references and https://www.w3.org/TR/websub/#normative-references (the latter a PR hopefully soon to be REC)

Comment by @slightlyoff Feb 1, 2018 (See Github)

We've re-visited this at the London F2F meeting. Most of the issues remain. I'm pinging the authors via private mail.

Comment by @triblondon Feb 1, 2018 (See Github)

Up to date list of concerns, referencing the 1.0.1 version of the doc:

  • This spec defines a well known, hard coded URL. There is now a standard for placing these paths within a .well-known prefix, see https://tools.ietf.org/html/rfc5785
  • The spec does not define the format using a formal syntax grammar, eg. ABNF, making it very hard to understand what would be valid examples of this format. For example, there is no specification for which whitespace characters are acceptable as separators. For examples of good grammar specifications, see https://www.w3.org/TR/tabular-data-model/
  • The spec requires that the ads.txt file is published on a 'root domain'. There is no technical definition of 'root domain' in web architecture, and sites with authority and control over an origin may reasonably not have control over the parent origin.
  • The document specifies that ads.txt should be available on "HTTP and/or HTTPS". This is enormously concerning, especially since some sites are moving away from listening for HTTP traffic at all, and suggesting the use of HTTP for any web specification should be considered contrary to the very principles of good web architecture and detrimental to the future development of the web. See the TAG finding on securing the web
  • The document contains a normative reference to w3schools regarding URL encoding. W3Schools is a site which has been widely regarded as a poor source of information about the web, and certainly not a primary source on any subject. On this point, https://tools.ietf.org/html/rfc3986 or https://url.spec.whatwg.org/ would be the correct normative reference.
  • The doc indicates that crawlers should follow redirects within the same CNAME entry (although the language is woolly regarding "root domain"); e.g. it allows redirects between https://example.com and http://example.com, enabling downgrade of connection security.
  • There appear to be additions for "SUBDOMAIN" which is a redirect type. It does not appear to be well-specified and it's unclear why redirects with an eTLD+1 policy aren't being used instead.
  • Google has a system called App links and we are wondering why a mechanism like that is not appropriate for this use case.
Comment by @triblondon Feb 1, 2018 (See Github)

Alex and I have pinged IAB people and we'll follow up on a telcon

Discussed Mar 1, 2018 (See Github)

(Alex is having connectivity issues. To be revisited.)

Discussed Mar 1, 2018 (See Github)

Alex: No further interactions on this since we last talked about it. Punt again.

Peter: Face-to-face or later?

Alex: I'll ping, and update at face-to-face.

Comment by @slightlyoff Oct 30, 2018 (See Github)

Met with George several times in February, debriefed in Tokyo. Just pinged again to understand if they plan to publish a new version which will address our concerns.

Comment by @wseltzer Feb 5, 2019 (See Github)

A venue for further discussion could be the Improving Web Advertising BG which has active participation from IAB TechLab.

Comment by @torgo Mar 1, 2019 (See Github)

@wseltzer just following up on this. Does the Web Advertising BG hold regular calls? can we potentially tee up this discussion point and maybe members of the TAG could join for that session?

Comment by @wseltzer Mar 6, 2019 (See Github)

@torgo yes, the group meets every 2 weeks, with upcoming calls planned for March 14 and March 28.

Comment by @ylafon May 21, 2019 (See Github)

Note that the current version is https://iabtechlab.com/wp-content/uploads/2019/03/IAB-OpenRTB-Ads.txt-Public-Spec-1.0.2.pdf

Comment by @ylafon May 21, 2019 (See Github)

As of version 1.0.2, we notice that most comments were not addressed yet, apart from a clarification in the redirect section. In this section, codes others than 302 are allowed, but 308 is missing from the updated list. The section 5.3 would greatly benefit from a clarification of the parsing model, whitespace definition, etc...

We are still concerned about the possible "downgrade redirect" issue, as the current specification still allows redirect from https to http. In general the specification should mandate the use of https only (and MAY default to http if not available, with the trust issues associated with its use).

Also, as the document defines a document format, it would be better for it to have a proper media type definition rather than using text/plain, at worst, using the generic text/csv would be better. Note that the RFC defining the text/csv media type also define its grammar (see comment on section 5.3) https://tools.ietf.org/html/rfc4180

Comment by @torgo May 21, 2019 (See Github)

@wseltzer we are thinking since we haven't made enough progress on this issue that it should be migrated over to the advertising BG. Would the BG be a good forum for discussing ads.txt and feeding back on its design? Let us know and maybe we can migrate the issue over this week.

Discussed Jan 1, 2020 (See Github)

Alice: This has been open a long time. I am extremely confused by it.

Kenneth: It's not really a W3C spec either.

Alice: I suggest we ask Dan in the plenary what he'd like to happen with this.

Yves: Agreed, I would ideally like to close it.

Kenneth: Is anybody using this today? Could we propose something else? If people are already using this, is there anything we can do?

Yves: The issue notes that Alex and Andrew have previously contacted the IAB, but neither of them are still involved with TAG.

Kenneth: Might also be worth reaching out to Alex to see who he contacted within Google. If nobody is using it it's not relevant, would be nice to know who is using it and whether we have contacts to affect usage before we spend more time on it.

Discussed Jan 1, 2020 (See Github)

Alice: my thoughts: this has been open a long time. Kenneth raised - is this in use? The last 3 comments we haven't heard feedback on.

Yves: it would be good to have a group taking care of this. Considering they didn't take our feedback into their revised spec, we should probably drop it.

Dan: propsosal is to close with "not satisfied" and make some specific recommendations? Those might be : do this as a real standards group... Isn't there an advertising community group?

Tess: there is a business group.

Dan: let's agree this in plenary

Comment by @cmlight Jan 29, 2020 (See Github)

Hi, ads.txt working group member here. Yes, it would be great to get these concerns addressed in the next ads.txt (and related specs) version update. Items I had previously written down that I'm hoping to make more technically precise include:

  • Character encoding: we see files published in various character encodings which may not be properly interpreted by all platforms. We should specify a character encoding such as UTF-8 for the file content so that validators can consistently flag issues
  • Byte-order mark headers: we see files that have non-visible byte order marks (https://en.wikipedia.org/wiki/Byte_order_mark) which can trip up parsing if not interpreted properly. We should include specifics in the spec about whether these are allowed or not
  • Line endings: the spec does not specify which byte sequences are considered line endings. We've encountered files encoded using atypical (or containing a mix of) line ending types which could trip up parsers. We should update the spec to include specifics of what byte sequences (0a; 0d0a; 0d; etc) are considered valid, parseable line endings.
  • Public suffix list specificity: the publicsuffix.org list contains two sections: an ICANN section and a private section. The ads.txt spec doesn't specify whether the private section is valid for use.
  • SUBDOMAIN= directive specificity and limitations: I'd like to make the spec provide more detail and examples about how SUBDOMAIN= directives behave and interact with each other, along with potentially defining a limit to the number of levels.
  • Security: I'd like to see if we can be more precise in the standard about how to treat HTTPS URLs, when it is permissible to fall back to HTTP, what validations the crawler should perform (e.g. SSL certificate validation), and the valid transport security protocols accepted. We should consider security risks that should be mitigated with precise rules.

I will work with @slightlyoff on this.

Stepping back from the specific recommendations in this thread, I was wondering if you have any pointers to documents that explain how to write a good spec, if such a thing exists? Also, I would like to somehow put together a compatibility testing suite that participants can use to confirm that their crawlers and parsers were implemented correctly. If you have any tips on this or examples of well-written solutions that do this, that would be great to learn from.

Comment by @torgo Mar 4, 2020 (See Github)

Hi @cmlight -

First of all, thanks for the visibility on some of the issues you are tackling. It seems like there is active work happening on a new spec. I think what needs to happen is that when a new spec is ready for review, someone files a new design review issue here with us.

Regarding how to write a good spec, we can provide feedback but we are not really equipped to help write the spec itself and some of the answer to that question is venue specific. One approach might be to bring this work to a venue where you might have greater opportunity to bring in expertise in spec development and expertise in related web technologies. For example, a w3c community group could be a good low-friction venue. In general, successful web specifications tend to be developed in an open environment and according to a transparent process.