design-reviews#1095: Canvas Text Metrics for Editing, Art and Design

#1095: Canvas Text Metrics for Editing, Art and Design

Opened May 18, 2025

こんにちは TAG-さん!

I'm requesting a TAG review of Additional Canvas TextMetrics features.

Web content developers often request improvements to canvas text support to enable richer text rendering and interaction. For rendering text there is a desire to perform per-glyph styling and rendering, where a glyph is the smallest unit of text. For editing, developers need better support for selection and caret position. Broadly speaking, developers would like editing capabilities similar to DOM content, and rendering capabilities similar to SVG text.

Explainer¹: https://github.com/Igalia/explainers/blob/main/canvas-formatted-text/text-metrics-additions.md
Specification: https://github.com/whatwg/html/pull/11000
WPT Tests: html/canvas/[element|offscreen]/text...
- 2d.text.measure.index-from-offset*
- 2d.text.measure.selection-rects*
- 2d.text.measure.text-clusters*
Security and Privacy self-review²: https://docs.google.com/document/d/1YarKhEHqpvVJmzYkm_TzXmP-QGkmHo_G4mQtmPSb3m8/edit?usp=sharing
GitHub repo: https://github.com/whatwg/html/issues/10677
Primary contacts:
- Stephen Chenney (@schenney-chromium), Igalia
- Florin Malita, Google
Organization/project driving the specification: Google and Igalia
This work is being funded by: Google and Igalia
Primary standards group developing this feature: WHATWG
Incubation and standards groups that have discussed the design:
- WHATWG - 11/21/24 WHATNOT Meeting Notes Limited visibility beyond our control. The issue of internationalization has been resolved already with the addition of a lang attribute to HTML canvas contexts.
Multi-stakeholder support³:
- Chromium comments: Positive, https://chromestatus.com/feature/5075532483657728
- Mozilla comments: https://github.com/mozilla/standards-positions/issues/1144
- WebKit comments: https://github.com/WebKit/standards-positions/issues/436
Major unresolved issues with or opposition to this specification: Open discussion of the Text Clusters API, see https://github.com/whatwg/html/issues/10677 and https://github.com/whatwg/html/issues/11141
Status/issue trackers for implementations⁴: Chromium

Further details:

I have reviewed the TAG's Web Platform Design Principles

You should also know that...

Our primary concerns is getting TAG feedback on the Text Clusters feature for querying and rendering per-glyph text information.

Discussions

Discussed May 19, 2025 (See Github)

[discussion about when we might decline a review]

jeffrey: i can see accessibility concerns here because this make is it more possibile to do things just in canvas, without the normal HTML accessibility features by default.

Discussed May 19, 2025 (See Github)

[discussion about when we might decline a review]

jeffrey: i can see accessibility concerns here because this make is it more possibile to do things just in canvas, without the normal HTML accessibility features by default.

Discussed May 26, 2025 (See Github)

Jeffrey: Some concerns, and I have an accessibility concern to add

... they should have a good answer that this text stays exposed to assistive technology

... don't have a draft comment yet, Matthew and I should draft one for next week

Dan A: In terms of moving faster, why shouldn't you post it this week?

Jeffrey: Think we don't have enough to say yet

... are at the "ask a bunch of questions" stage

Dan A: From the perspective of showing progress, it would be fine to directly ask the questions

Jeffrey: Works for me.

Discussed Jun 2, 2025 (See Github)

Matthew: This one - I think we have the content of a comment... An example that probably isn't buggy... but looks like it's buggy. We think they should clarify. The other concern ... this adds text to the canvas with no automatic fallback. AOM isn't a thing... and also isn't automatic. There's the clunky fallback of canvas fallback content. They didn't mention in the explainer. We could ask them if they considered this. They probably have considered. I think we were left with asking for clartification on example, asking about fallback content, and asking about the actual API... Also, points from Jeffrey.

Jeffrey: I noticed another thing... the fact that they are not splitting the fi ligature is probably a bug. One of the use cases is where you can click to put your editing cursor and you have to be able to click beteween the f and the i...

DanC: we were going to ask about i18n... it talks about "left to right" .. so maybe the should run it past the i18n folks.

Hadley: add you to this?

DanC: sure assigns self

Matthew to draft comment and lightly review among issue assignees before posting. Not closing, will wait for responses

Discussed Jun 9, 2025 (See Github)

Matthew to draft, with Jeffrey's help.

Discussed Jun 16, 2025 (See Github)

Jeffrey: Probably ready to post; any objections?

Martin: Looked fine.

Comment by @alice Jun 18, 2025 (See Github)

3. We're concerned that this proposal doesn't talk about providing equivalents for the text that is placed on the canvas that would be exposed to assistive technologies. One option the platform offers is canvas fallback content, though it has its limitations. Have you considered how this barrier may be overcome, and/or what specific advice should be given to content authors?

(Commenting as a colleague of Stephen's who provided accessibility guidance during the course of this work.)

My understanding is that the current state of the art is (unfortunately) for canvas-based applications to implement self-voicing solutions based on aria-live regions. I believe Stephen is investigating the extent to which fallback content is even used in practice for accessibility-related purposes, but what I hear from various sources is that canvas fallback content isn't really fit for purpose for most production use cases.

I think this represents a serious gap in the platform, and I agree that adding APIs to support more sophisticated use cases for text in canvas does make the need for addressing this gap increasingly pressing. However, I'm not sure what guidance can usefully be given in the context of this API while that gap persists.

Comment by @jyasskin Jun 18, 2025 (See Github)

Thanks for chiming in, @alice!

A wild thought, but what if these text rendering functions took references to elements in the fallback content instead of plain text strings? They'd probably need to be phrased as callbacks to handle when the text of the elements changes, and we could think of them as special cases for rendering what sighted people need, instead of making the text-only people be the special case. Doing that might also motivate non-accessibility developers to fix the problems with fallback content. And it could allow the full range of multilingual text in those elements instead of assuming the whole string is the canvas's language.

Comment by @schenney-chromium Jun 18, 2025 (See Github)

Ah, taking elements and rendering them in canvas. That's exactly what we're working on with HTML-in-Canvas in an effort to improve accessibility for content best representing by DOM. It raises it's own set of a11y concerns, but in general should make it easier for authors to match their canvas content with their fallback content as you suggest.

In this case we're building on existing canvas text features, and don't want to mess with the API too much due to existing web content.

My collaborator in this work, @fmalita, is looking into the functional questions you raised. I take the point about ligatures and writing modes.

For getIndexFromOffset() (Point 2), canvas has the notion of LTR and RTL but nothing beyond that. The assumption is that an author will use transforms to align text vertically. There is also the problem of mimicking sideways-* which IIRC needs to rotate individual characters in some situations, which I think you would need this proposal to do correctly. Regardless, to get the correct position to use for getIndexFromOffset I believe an author would need to transform the point into the coordinate system of the text string to correctly compute the offset along the string's rendered orientation.

Ligature breaking is possible and I agree should happen, so we will define the correct behavior in any spec text and update the explainer if necessary. For italics and the like we would behave the same as the DOM caretPositionFromPoint method which does not take into account italics. The underlying implementation in browsers uses the editing code paths (also for selection) so the behavior would match what those systems do. In Chromium, at least, the selection rect does not cover italic characters in all cases.

The copying of align and baseline (Point 4) is done to simplify the rendering when measurement and creation of the glyph clusters is done in a different code location. That is, if you create in one function then pass around to another or store for later. It captures the alignment and baseline at measurement time so they can be re-applied at rendering time. I believe this came out of the spec PR discussion. Note Florin and myself have taken over this work from @AndresRPerez12.

Comment by @fmalita Jun 18, 2025 (See Github)

Thanks for reviewing @jyasskin!

The example code passes end=text.length, which is the number of code units, but the algorithm throws an exception if it exceeds the number of code points. Since the emoji each take 2 code units to represent a code point, this should throw. However, Intl.Segmenter operates in terms of code units, and it's important to be consistent with that here, so you should probably update your algorithm to use code units too.

Good catch, that looks like an unintended discrepancy. Agreed that consistency with Intl.Segmenter is desirable, plus working with code points would introduce some extra client friction.

We should update the spec PR to use code units.

Interestingly, the Chromium prototype impl is already using code units, thus the example doesn't throw. You can see it running live here if you launch Chromium with --enable-experimental-web-platform-features.

(The example does require a minor tweak to match the current spec PR: TextCluster.begin has been renamed to TextCluster.start.)

We see that the "fi" ligature is getting a single color. We suspect that counts as "their corresponding glyphs cannot be broken down any further" in your spec.

Indeed, the focus is on glyph clusters (as produced by the text shaping engine/text preparation algorithm) that cannot be subdivided any further without breaking the user-perceived "character". They roughly correspond to grapheme clusters, but they also include ligatures because those are single glyphs that cannot be subdivided (I've heard some of our font folks refer to them as "glyphemes").

One thing we should probably emphasize in these docs is that TextClusters are constructed in the glyph (post-shaping) domain, and are indirectly related to the Unicode domain.

The explainer refers to these TextClusters as "grapheme clusters" in a couple places, but "Grapheme clusters are not the same as ligatures."

Point taken, some of the explainers are using loose (or possibly outdated) terminology - we should update them. Note that the spec PR avoids any mention of graphemes and sticks to the accurate description.

One of your use cases for this API is for caret positioning, but the caret has to be able to be positioned between the "f" and "i" in that ligature.

Since ligatures are single glyphs, there is no way to get the partial font metrics required for accurate mid-ligature positioning. AFAIK most editors don't support mid-ligature positioning, and the ones that do rely on approximations/heuristics.

At a high level, this seems to fall onto clients: if caret positioning is important, they should use fonts without ligatures, or disable ligatures during text shaping. (Whether that latter part is achievable with Canvas is a good question... AFAICT, unlike CSS, there is no way to explicitly disable ligatures with Canvas APIs. I found this old proposal that could be extended for font-variant-ligatures, but that's a separate effort. @schenney-chromium)

Discussed Jun 23, 2025 (See Github)

Matthew: Situation we are in is that we can't do any better. Jeffrey's idea was interesting but that would take time to do. The HTML in Canvas group ... one way would be for APA to check fallback and see whether we advise that. Alice is also looking into using libraries angle. Don't know if we hold this up for that.

Jeffrey: HiC what inspired my suggesiton. Maybe we should provide this only for HiC things, e.g., say you went far enough in this direction so move it towards HiC. May be frustratring for some folks. May be more effective to focus on the good thing that isn't quite ready.

Matthew: Not sure how far AT implementers are on this and whether they'll support it. If we want to resolve this quickly, fine by me but could be decoupled from design-review issue. I don't know if we have TAG consensus on that.

Jeffrey: I woudln't say we're unsatisfied with this without hecking that first. Action for me is to draft a comment where we are leaning on satisfied but with HiC direction.

Matthew: I thought you were saying this could be resolved with HiC conditition?

Jeffrey: My understand is the whole structure of this API is to draw this string into canvas with no accessible fallack. The HiC version would be a very different API shape.

Matthew: The options are fallback message works but you have to insist - MUST .. or it odesn't work and we can't suggest that - which would be the unsatisfied.

Jeffrey: When Alice says fallack content isn't fit. I feel we haven't ??? to fix it.

Matthew: There'd be a mismatch with ordering of fallback content. If canvas moves, how do you communicate that someone with assistive tech.

Matthew: I think it is more like it technicically works but it wasn't with the idea to ??? It is not that it is not going to work but it is probably going to miss some movements.

Jeffrey: Related: rearrange the DOM without moving elements. To move two swap elements, remove one and insert the other. Haven't thought of AT but it seems AT would think there is new content. There is work going on to move before function on elements to do that swap without taking that out of the DOM. And AT would have ot make a decision on announcing that to users.

Matthew: Hard ot say for so many ways this could be used. It might sound like the AT user should be told - they read it - it doesn't matter. If it moves lower down they'l discover but if above, they've already read it.

Matthew: AT users typically skip around things. If header moved to behind... this is a rabbithole. UX principles should kick in?

Jeffrey: Website uses the live region to say something moved around. If rearranged the fallback DOM and let the AT decide.

Matthew: The simple case like creating a static picture and the text is in a circle. There is mroe text below with funny colours. You can make fallback content that works for that. But Alice is saying that isn't going ot work. If it is something moving around that complicates things a lot. I think it is because the API could be called any time. The clue here is Canvas based application. Like TVs. Rich stuff.

Jeffrey: Google Docs is a canvas based application.

Jeffrey: One of the drawbacks with fallback content is if it hasn't been drawn yet. Maybe they should be working on that this style of CTM.

Jeffrey: will propose a comment in private brainstorming to use fallback content instead of raw text strings.

Sarven: Any discussion of how quickly the canvas might be moving? How frequently it should be communicated to AT? If there are 100 changes/second, what does that mean?

Matthew: Key thing is equitable user experience. Don't tell about every graphical change, but if it means something, it needs to be communicated. In an accessible game, you'd expect sound or haptics to indicate stuff. Don't need to keep re-describing the scene. Canvas is so general that it's hard to give general advice. Lots-of-updates isn't good for AT users. Hard to keep track of even group chat.

Sarven: Is this something that AT could have an event listener that picks up on it?

Matthew: Canvas is a bag of pixels, and that's it.

Jeffrey: canvas can tell the AT there is something important. I think your idea with event listener is right ???

Matthew: ... but don't want to do that rapidly or frequently.

Jeffrey: Matthew mentioned we could put a "MUST provide fallback content" but I don't think that's effective.

Matthew: I don't disagree. Work is done on this. If we want positive change to happen we can't say stop it or ask too much. Sometimes we have to say it. If we are going ot say it we encourage the best outcome.

Comment by @mbeganyi-a11y Jun 26, 2025 (See Github)

I think this represents a serious gap in the platform, and I agree that adding APIs to support more sophisticated use cases for text in canvas does make the need for addressing this gap increasingly pressing. However, I'm not sure what guidance can usefully be given in the context of this API while that gap persists.

Hi all. Mike here, member of AGWG and APA but my comments are purely my own. New poster here in this thread and just curious about a couple of things:

@schenney-chromium Have you discovered anything about that gap that Alice mentioned and if there's any development? It doesn't seem like there's a workaround at the moment, but thought I'd ask anyway; and
With the ability to customize kerning, skew, orientation, etc., also comes the chance for rendered text in a typically left-to-right language to be presented right-to-left. This is true of the example on the project page that renders the text in a circle, which has upside-down and right-to-left text presentation. I'm unaware of any way to solve for this customized rendering except for having plain text available within a close proximity (reading order) to the rendered text, or relying on canvas fallbacks (which are limited). I'm trying to think about other methods to ensure as accessible an experience as possible. Within the proposal, is there room to have an automatic output that's rendered in real text adjacent to the canvas? Thinking out loud here...

Discussed Jun 30, 2025 (See Github)

Matthew: Discussed this with APA earlier. HTML-in-Canvas would be better. Think we're unsatisfied.

Jeffrey: Checking on the two "TAG thinks" claims: "TAG thinks it's likely to be more future-proof to provide a whole point" and the sense that this capability is ok but they should consider a knob.

Matthew: APA was thinking canvas should be an image, with alternative text. WCAG could do more to encourage semantic alternatives to canvas. APA should keep having this discussion.

Agreement to post the comment.

Comment by @jyasskin Jul 2, 2025 (See Github)

Thanks for the discussion and for pointing out HTML-in-Canvas. I was drawing on ideas from that proposal, but had forgotten what it was called and so failed to credit it.

We have a principle not to "extend a platform feature that has known defects, without considering how to address those defects", and the TAG is worried about how much this addition to measureText() and fillText() violates that, and whether the benefits are worth the violation. Basically, making the text-drawing canvas features more powerful will attract more developers to use them, and if the easiest path leads to non-accessible outcomes, that'll hurt overall accessibility on the web.

It seems likely to be better overall to finish an accessible replacement for these features (hopefully HTML-in-Canvas) and then implement this new feature on top of that accessible replacement.

What kinds of problems will be caused by delaying this feature like that, and how should we think about how to balance them against the accessibility (and maybe internationalization) harm from implementing this feature as proposed?

On the API as proposed:

Thanks for aligning on code units and for the explanation that this is fundamentally about post-ligature glyphs. That seems fine to us, with 2 caveats:
1. If one of the use cases for this is to help editors position their carets, the feature of disabling ligatures in canvas rendering seems like it needs to be part of this feature, and not postponed. (It does make sense to tell editors that they ought to disable ligatures instead of trying to position inside of them.) As discussed above, it might be better to do this by rendering fully-CSS-able HTML onto canvas instead of copying individual CSS features into CanvasTextDrawingStyles.
2. You'll want to get the explainer updated to avoid "grapheme clusters" before asking the internationalization folks for a review.
Even though current implementations only use a single dimension of position to implement things like getIndexFromOffset(), the TAG thinks it's likely to be more future-proof to provide a whole point as input, like caretPositionFromPoint(). The question of how to deal with vertical text probably deserves a review from the Internationalization WG.
[covered above]
That's a good justification for copying align and baseline. Thanks.

Re @mbeganyi-a11y's second point: it's always going to be possible for authors to write inaccessible pages (in this case, for people who can read text visually, but need it to be ~"simple"). We just need to make them work for it. 😃 This is worth an APA review, but the TAG's guess is that it's acceptable to provide this ability to draw hard-to-read text. However, there might be a need for a configuration knob, like prefers-reduced-motion and prefers-reduced-transparency to ask sites to tone it down. prefers-reduced-visual-complexity? (On the privacy/fingerprinting front, we could imagine asking browsers to push users toward giving prefers-reduced-visual-complexity and prefers-reduced-motion the same value, but it doesn't seem quite right to say that prefers-reduced-motion should imply that sites should avoid circular non-moving text.)

Discussed Jul 7, 2025 (See Github)

Lola: This is waiting for response to Jeffrey's comment.

Comment by @schenney-chromium Jul 14, 2025 (See Github)

Thanks for the discussion and for pointing out HTML-in-Canvas. I was drawing on ideas from that proposal, but had forgotten what it was called and so failed to credit it.

We have a principle not to "extend a platform feature that has known defects, without considering how to address those defects", and the TAG is worried about how much this addition to measureText() and fillText() violates that, and whether the benefits are worth the violation. Basically, making the text-drawing canvas features more powerful will attract more developers to use them, and if the easiest path leads to non-accessible outcomes, that'll hurt overall accessibility on the web.

Yes, point well taken.

It seems likely to be better overall to finish an accessible replacement for these features (hopefully HTML-in-Canvas) and then implement this new feature on top of that accessible replacement.

I agree that any accessibility improvements could be driven by HTML-in-Canvas, and those could then be pushed to "regular" canvas text rendering. Guessing about the future, we may end up in a situation where most text in canvas is HTML-in-Canvas and the only text rendered directly in canvas is special in some way.

What kinds of problems will be caused by delaying this feature like that, and how should we think about how to balance them against the accessibility (and maybe internationalization) harm from implementing this feature as proposed?

That's an interesting question. The use cases I'm aware of are related to editing text and geometric effects that require individual glyphs. Folks from Google would have a better idea on specific use cases, beyond "make it easier to edit text in canvas" and "make it easier to implement existing animation engines, specifically Lottie, on the web". We could also decompose this proposal if that would make things simpler.

On the API as proposed:

Thanks for aligning on code units and for the explanation that this is fundamentally about post-ligature glyphs. That seems fine to us, with 2 caveats:

If one of the use cases for this is to help editors position their carets, the feature of disabling ligatures in canvas rendering seems like it needs to be part of this feature, and not postponed. (It does make sense to tell editors that they ought to disable ligatures instead of trying to position inside of them.) As discussed above, it might be better to do this by rendering fully-CSS-able HTML onto canvas instead of copying individual CSS features into CanvasTextDrawingStyles.

I think we can position cursors within ligatures. Chrome at least supports breaking ligatures for selection and other editing operations, so canvas can hook into that. But also yes, HTML-in-Canvas theoretically makes text editing in canvas even easier (although it doesn't "just work" yet).

You'll want to get the explainer updated to avoid "grapheme clusters" before asking the internationalization folks for a review.

Noted.

Even though current implementations only use a single dimension of position to implement things like getIndexFromOffset(), the TAG thinks it's likely to be more future-proof to provide a whole point as input, like caretPositionFromPoint().

Sounds good. Thanks.

The question of how to deal with vertical text probably deserves a review from the Internationalization WG.

I have just filed an HTML issue to address vertical writing mode rendering in canvas. https://github.com/whatwg/html/issues/11449 We'll get that resolved before bring up internationalization review. The question has come up multiple times on Stack Overflow with no good answer ever, so it's clearly a pain point.

It seems likely that we will re-igniting the canvas accessibility discussion in the coming months. I'm hopeful that we can make a real progress now that we have stronger forcing functions and specific new APIs to provoke new ideas.

Comment by @fmalita Jul 14, 2025 (See Github)

It seems likely to be better overall to finish an accessible replacement for these features (hopefully HTML-in-Canvas) and then implement this new feature on top of that accessible replacement.

I agree that any accessibility improvements could be driven by HTML-in-Canvas, and those could then be pushed to "regular" canvas text rendering. Guessing about the future, we may end up in a situation where most text in canvas is HTML-in-Canvas and the only text rendered directly in canvas is special in some way.

Probably also worth pointing out that the text clusters portion of the proposal is intended to support rich text effects and animations ("text as art", as opposed to "text as information"). I suspect in that particular context authors will need to be explicit/intentional in order to make semantic information accessible. E.g. something like this could use some high-level annotation to make sense of glyphs flying around the screen, as opposed to feeding the dom to a screen reader.

Discussed Jul 21, 2025 (See Github)

Jeffrey: I talked to the proponents and there's some interest from Flutter on this, so need to figure out if our proposal works at all, but overall there's some positivity. (Separating the visual presentation from the semantics.) Flutter has some accessibility challenges; looking forward to hearing from them with my Google hat on. On hold for now.

Comment by @jyasskin Jul 22, 2025 (See Github)

FYI, I talked to @fmalita about this, and it sounds like there's some more conversation to be had on his side with some of the potential users. I also suggested that the "Game Over" animation should ideally have a single <p>GAME OVER</p> as its element that gets exposed to AT, and then the website should be able to draw the animated visual representations of the text clusters from that element. I'll mark this as "pending feedback" until he comments again.