#832: TAG review of the proposal to use the RegExp `v` flag instead of `u` for the HTML `pattern` attribute

Visit on Github.

Opened Apr 5, 2023

こんにちは TAG-さん!

I'm requesting a TAG review of the proposal to use the RegExp v flag instead of u for the HTML pattern attribute.

Summary

Using the new RegExp v flag instead of u would enable the use of set notation, string literal syntax, and Unicode properties of strings within pattern attribute values.

Explainer

This proposal makes the pattern attribute more powerful, enabling the use of set notation, string literal syntax, and Unicode properties of strings.

Differences with the previous u flag-based behavior:

  • [FEATURE] Previously invalid patterns now become valid, e.g.

    pattern="[\p{ASCII_Hex_Digit}--[Ff]]"
    pattern="\p{RGI_Emoji}"
    pattern="[_\q{a|bc|def}]"
    

    For more useful examples of the RegExp v flag, see the relevant feature explainer.

  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

    pattern="[(]"
    pattern="[)]"
    pattern="[[]"
    pattern="[{]"
    pattern="[}]"
    pattern="[/]"
    pattern="[-]"
    pattern="[|]"
    pattern="[&&]"
    pattern="[!!]"
    pattern="[##]"
    pattern="[$$]"
    pattern="[%%]"
    pattern="[**]"
    pattern="[++]"
    pattern="[,,]"
    pattern="[..]"
    pattern="[::]"
    pattern="[;;]"
    pattern="[<<]"
    pattern="[==]"
    pattern="[>>]"
    pattern="[??]"
    pattern="[@@]"
    pattern="[``]"
    pattern="[~~]"
    pattern="[_^^]"
    

    Throwing patterns result in inputElement.validity.valid === true for any input value, so the only compatibility risk is that some value/pattern combinations that would previously result in inputElement.validity.valid === false now result in inputElement.validity.valid === true.

  • Other previously valid patterns still behave the same. (Other than the abovementioned features, the v flags only differs in behavior from the u flag w.r.t. case-insensitive matching, but the pattern attribute uses case-sensitive matching.)

Note that the breaking changes mostly apply to somewhat esoteric edge cases that can easily be avoided. In the worst case, this could cause previously invalid input to now be considered valid (since throwing patterns result in inputElement.validity.valid === true for any input value, as if the pattern attribute wasn’t there). In other words, the only Web Compat risk is that a website without server-side validation would suddenly allow submission of values that would previously be prevented by the client-side pattern. All currently allowed inputs would still be accepted, just as they did previously.

IMHO making the change is worth it given the powerful new functionality it brings, and the relatively small compatibility risk. This is reminiscent of the discussion in https://github.com/whatwg/html/issues/439 (but in a different direction).

For context, here’s a few pointers w.r.t. when we decided to implicitly enable the u flag for the pattern attribute in the first place:

Checklist

Further details

  • I have reviewed the TAG's Web Platform Design Principles
  • Relevant time constraints or deadlines: N/A
  • The group where the work on this specification is currently being done: Spec-wise this is a minor change to the existing HTML pattern attribute. The work is in the form of a PR to the WHATWG HTML Standard.
  • Major unresolved issues with or opposition to this specification: So far, all stakeholders seem in favor of the proposal. The one open question is whether this change is Web Compatible. We’ve implemented a Chrome UseCounter giving us the upper bound of potential compat issues. None of the UseCounter hits so far (see analysis starting with comment #11 on the crbug) constitute an issue in practice.
  • This work is being funded by: Google GmbH

We'd prefer the TAG provide feedback as (please delete all but the desired option):

💬 leave review feedback as a comment in this issue and @-notify mathiasbynens

Discussions

Discussed Apr 1, 2023 (See Github)

Some pause wrt backwards compat, but after looking into what the v flag can do, we think the tradeoffs are very worth it. Very little breakage, in very few cases, and very little harm in these cases, since client-side validation is a hint anyway.

Propose closing, and plan to close in Plenary.

<blockquote>

Hi @mathiasbynens,

@ylafon and I looked at this during a breakout today. While we were slightly concerned initially about the backwards compat implications, we do think the benefits v brings far outweigh these concerns, and we are happy to see this go forwards.

Thank you for flying TAG!

</blockquote>
Comment by @LeaVerou Apr 5, 2023 (See Github)

Hey, could you folks please add an explainer that follows the guidelines in https://tag.w3.org/explainers/ ? If anything, I found this more informative than the explainer actually linked, so please make sure to include this information.

Comment by @LeaVerou Apr 5, 2023 (See Github)

From the same link:

  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

Do you have any sense of how common these patterns are in the wild?

Comment by @mathiasbynens Apr 5, 2023 (See Github)

Hey, could you folks please add an explainer that follows the guidelines in https://tag.w3.org/explainers/ ? If anything, I found this more informative than the explainer actually linked, so please make sure to include this information.

Updated. PTAL

From the same link:

  • [BREAKING CHANGE] Some previously valid patterns are now errors, specifically those with a character class including either an unescaped special character ( ) [ ] { } / - \ | or a double punctuator:

Do you have any sense of how common these patterns are in the wild?

So far the only examples we’ve seen are cases where an unescaped -, {, |, } occurs within a character class. - is the most common occurrence (although our sample size is small). Interestingly, all the cases are the “username” field for a login form, where the username is really an email address, yet instead of using type=email the authors chose to use a pattern. See the link under “Major unresolved issues with or opposition to this specification” for some more details on each case.

Comment by @LeaVerou Apr 19, 2023 (See Github)

Hi @mathiasbynens,

@ylafon and I looked at this during a breakout today. While we were slightly concerned initially about the backwards compat implications, we do think the benefits v brings far outweigh these concerns, and we are happy to see this go forwards.

Thank you for flying TAG!