#619: Early design review: Sanitizer API

Visit on Github.

Opened Mar 23, 2021

I'm requesting an early design review of the Sanitizer API

Provide a browser-maintained "ever-green", safe, and easy-to-use library for user input sanitization as part of the general web platform.

Further details:

We'd prefer the TAG provide feedback as:

Discussions

2021-03-29

Minutes

Lea: I looked yesterday - so first off lots of use cases. It's work that should be happening. Sensible defaults. Even though they have a detailed dictionary - there are some things not expressed through that. For example allow lists and block lists - but what about javascript: links on <a> elements. Strange to take a string as an input and return a document but I can see the reasoning. .. Unclear how someone can remove a blocked element through the default list. Not clear how you can drop one element from the allow list without specifying the entire allow list again. Noticed that the config not exposed on the sanitizer instance... That could help with customizing the default config.

Hadley: why would you want to do this?

Lea: you want users to be able to write arbitrary HTML - but this opens up xss security issues. There is a library called dompurify which is widely used that they are basing their work on. It would be good to have something like this in the browser.

[we can come back to it in the plenary]

2021-05-Arakeen

Minutes

Lea: we've given feedback, they've made some changes and explained why they can't address others. I think we're done here. Tess?

Also, sanitize to string vs sanitize: they have two methods, and they've received criticism for that. We pointed out that a document fragment serialization algorithm doesn't belong in this disucssion. So I opened an issue in WHATWG. I think it's stalled, but there wasn't push back against it.

Hadley: do we have danger if we bake the algorithm into the spec? With crypto, we've said "expect algorithns to evolve faster than the spec"

Tess: Yes and no. the editors of the HTML spec will have to be vigilant and keep this up to date, and browser engineers to be vigilant and implement those changes. But that's better than the status quo. Sanitisation is done by JS libraries, which may/may not be maintained, on sites that may/may not be maintainted. So this probably a win over the status quo.

Lea: Agreed. In most cases, it would be tweaking the default config.

Tess: right. Suppose some broswer impelenents a new element befoer others, or suppose a browser engine has implementations of non-standard elements that aren't exposed to the web but are exposed to other clients. Ex: webkit has <attachment>, which is not exposed to the web. It's used for mail clients, to signal the web view.

In webkit, I would expect the default block list for the sanitiser to include <attachment> when that element is enabled.

Lea: Does their spec allow for default config?

Tess: it's not a browser. The mail client would have this turned on (add attachment to the blacklist), and would be great client of this sanitizer API. If you think about parsing a mail message itself, you're parsing the hierarchy of MIME parts. Each step of THAT is untrusted. People try to do buffer overruns in mail clients by having weird names for attachments.

This seems fine.

I agree that sanitiser strings seems misplaced, if we want to way to serialise a document fragment, we should just have one. And you'd just sanitise the string by calling this.

Lea: Re UAs supporting proprietary elements and needing to add to config... could this apply to browsers? Would it be helpful to allow UAs to extend this default config?

Tess: I think UAs should be allowed to extend the default config mostly because... I think we want the default config to be safe. And authors to have to opt in to less safety. But you could imagine a browser like Brave might want one even stricter than default.

Lea: or the user to customise it

Tess: sure.

Browsers are free to violate specs, and if they do in practice we can update the spec to match that reality.

We shouldn't add it unless we need it.

hadley: what definition of funky?

Tess: I'm proud that they have a definition of handleFunkyElement. hyperlinks with js URLs, form elements whose action element has a js url, and input/button elements that have a form action attribute with a js url.

Hadley: 'funky' is not a very helpful descriptor, may not be useful for people outside the group.

Lea: I don't have a problem with 'funky'. My main issue is that it talks about elements but 75% of it is attributes.

Tess: moving this functionality into the browser is much better, because the browser is more likely to be aware of new elements in HTML. We added the template element not long ago, and there are lots of sanitiser libraries that don't handle it. I'm inclined to say thanks for doing this important work, let us know how it goes.

Lea: I'll comment and propose closing.