#1095: Canvas Text Metrics for Editing, Art and Design

Visit on Github.

Opened May 18, 2025

こんにちは TAG-さん!

I'm requesting a TAG review of Additional Canvas TextMetrics features.

Web content developers often request improvements to canvas text support to enable richer text rendering and interaction. For rendering text there is a desire to perform per-glyph styling and rendering, where a glyph is the smallest unit of text. For editing, developers need better support for selection and caret position. Broadly speaking, developers would like editing capabilities similar to DOM content, and rendering capabilities similar to SVG text.

Further details:

You should also know that...

Our primary concerns is getting TAG feedback on the Text Clusters feature for querying and rendering per-glyph text information.

Discussions

Discussed May 19, 2025 (See Github)

[discussion about when we might decline a review]

jeffrey: i can see accessibility concerns here because this make is it more possibile to do things just in canvas, without the normal HTML accessibility features by default.

Discussed May 19, 2025 (See Github)

[discussion about when we might decline a review]

jeffrey: i can see accessibility concerns here because this make is it more possibile to do things just in canvas, without the normal HTML accessibility features by default.

Discussed May 26, 2025 (See Github)

Jeffrey: Some concerns, and I have an accessibility concern to add

... they should have a good answer that this text stays exposed to assistive technology

... don't have a draft comment yet, Matthew and I should draft one for next week

Dan A: In terms of moving faster, why shouldn't you post it this week?

Jeffrey: Think we don't have enough to say yet

... are at the "ask a bunch of questions" stage

Dan A: From the perspective of showing progress, it would be fine to directly ask the questions

Jeffrey: Works for me.

Discussed Jun 2, 2025 (See Github)

Matthew: This one - I think we have the content of a comment... An example that probably isn't buggy... but looks like it's buggy. We think they should clarify. The other concern ... this adds text to the canvas with no automatic fallback. AOM isn't a thing... and also isn't automatic. There's the clunky fallback of canvas fallback content. They didn't mention in the explainer. We could ask them if they considered this. They probably have considered. I think we were left with asking for clartification on example, asking about fallback content, and asking about the actual API... Also, points from Jeffrey.

Jeffrey: I noticed another thing... the fact that they are not splitting the fi ligature is probably a bug. One of the use cases is where you can click to put your editing cursor and you have to be able to click beteween the f and the i...

DanC: we were going to ask about i18n... it talks about "left to right" .. so maybe the should run it past the i18n folks.

Hadley: add you to this?

DanC: sure assigns self

Matthew to draft comment and lightly review among issue assignees before posting. Not closing, will wait for responses

Discussed Jun 9, 2025 (See Github)

Matthew to draft, with Jeffrey's help.

Discussed Jun 16, 2025 (See Github)

Jeffrey: Probably ready to post; any objections?

Martin: Looked fine.

Comment by @alice Jun 18, 2025 (See Github)

3. We're concerned that this proposal doesn't talk about providing equivalents for the text that is placed on the canvas that would be exposed to assistive technologies. One option the platform offers is canvas fallback content, though it has its limitations. Have you considered how this barrier may be overcome, and/or what specific advice should be given to content authors?

(Commenting as a colleague of Stephen's who provided accessibility guidance during the course of this work.)

My understanding is that the current state of the art is (unfortunately) for canvas-based applications to implement self-voicing solutions based on aria-live regions. I believe Stephen is investigating the extent to which fallback content is even used in practice for accessibility-related purposes, but what I hear from various sources is that canvas fallback content isn't really fit for purpose for most production use cases.

I think this represents a serious gap in the platform, and I agree that adding APIs to support more sophisticated use cases for text in canvas does make the need for addressing this gap increasingly pressing. However, I'm not sure what guidance can usefully be given in the context of this API while that gap persists.

Comment by @jyasskin Jun 18, 2025 (See Github)

Thanks for chiming in, @alice!

A wild thought, but what if these text rendering functions took references to elements in the fallback content instead of plain text strings? They'd probably need to be phrased as callbacks to handle when the text of the elements changes, and we could think of them as special cases for rendering what sighted people need, instead of making the text-only people be the special case. Doing that might also motivate non-accessibility developers to fix the problems with fallback content. And it could allow the full range of multilingual text in those elements instead of assuming the whole string is the canvas's language.

Comment by @schenney-chromium Jun 18, 2025 (See Github)

Ah, taking elements and rendering them in canvas. That's exactly what we're working on with HTML-in-Canvas in an effort to improve accessibility for content best representing by DOM. It raises it's own set of a11y concerns, but in general should make it easier for authors to match their canvas content with their fallback content as you suggest.

In this case we're building on existing canvas text features, and don't want to mess with the API too much due to existing web content.

My collaborator in this work, @fmalita, is looking into the functional questions you raised. I take the point about ligatures and writing modes.

For getIndexFromOffset() (Point 2), canvas has the notion of LTR and RTL but nothing beyond that. The assumption is that an author will use transforms to align text vertically. There is also the problem of mimicking sideways-* which IIRC needs to rotate individual characters in some situations, which I think you would need this proposal to do correctly. Regardless, to get the correct position to use for getIndexFromOffset I believe an author would need to transform the point into the coordinate system of the text string to correctly compute the offset along the string's rendered orientation.

Ligature breaking is possible and I agree should happen, so we will define the correct behavior in any spec text and update the explainer if necessary. For italics and the like we would behave the same as the DOM caretPositionFromPoint method which does not take into account italics. The underlying implementation in browsers uses the editing code paths (also for selection) so the behavior would match what those systems do. In Chromium, at least, the selection rect does not cover italic characters in all cases.

The copying of align and baseline (Point 4) is done to simplify the rendering when measurement and creation of the glyph clusters is done in a different code location. That is, if you create in one function then pass around to another or store for later. It captures the alignment and baseline at measurement time so they can be re-applied at rendering time. I believe this came out of the spec PR discussion. Note Florin and myself have taken over this work from @AndresRPerez12.

Comment by @fmalita Jun 18, 2025 (See Github)

Thanks for reviewing @jyasskin!

  • The example code passes end=text.length, which is the number of code units, but the algorithm throws an exception if it exceeds the number of code points. Since the emoji each take 2 code units to represent a code point, this should throw. However, Intl.Segmenter operates in terms of code units, and it's important to be consistent with that here, so you should probably update your algorithm to use code units too.

Good catch, that looks like an unintended discrepancy. Agreed that consistency with Intl.Segmenter is desirable, plus working with code points would introduce some extra client friction.

We should update the spec PR to use code units.

Interestingly, the Chromium prototype impl is already using code units, thus the example doesn't throw. You can see it running live here if you launch Chromium with --enable-experimental-web-platform-features.

(The example does require a minor tweak to match the current spec PR: TextCluster.begin has been renamed to TextCluster.start.)

  • We see that the "fi" ligature is getting a single color. We suspect that counts as "their corresponding glyphs cannot be broken down any further" in your spec.

Indeed, the focus is on glyph clusters (as produced by the text shaping engine/text preparation algorithm) that cannot be subdivided any further without breaking the user-perceived "character". They roughly correspond to grapheme clusters, but they also include ligatures because those are single glyphs that cannot be subdivided (I've heard some of our font folks refer to them as "glyphemes").

One thing we should probably emphasize in these docs is that TextClusters are constructed in the glyph (post-shaping) domain, and are indirectly related to the Unicode domain.

  1. The explainer refers to these TextClusters as "grapheme clusters" in a couple places, but "Grapheme clusters are not the same as ligatures."

Point taken, some of the explainers are using loose (or possibly outdated) terminology - we should update them. Note that the spec PR avoids any mention of graphemes and sticks to the accurate description.

  1. One of your use cases for this API is for caret positioning, but the caret has to be able to be positioned between the "f" and "i" in that ligature.

Since ligatures are single glyphs, there is no way to get the partial font metrics required for accurate mid-ligature positioning. AFAIK most editors don't support mid-ligature positioning, and the ones that do rely on approximations/heuristics.

At a high level, this seems to fall onto clients: if caret positioning is important, they should use fonts without ligatures, or disable ligatures during text shaping. (Whether that latter part is achievable with Canvas is a good question... AFAICT, unlike CSS, there is no way to explicitly disable ligatures with Canvas APIs. I found this old proposal that could be extended for font-variant-ligatures, but that's a separate effort. @schenney-chromium)

Discussed Jun 23, 2025 (See Github)

Matthew: Situation we are in is that we can't do any better. Jeffrey's idea was interesting but that would take time to do. The HTML in Canvas group ... one way would be for APA to check fallback and see whether we advise that. Alice is also looking into using libraries angle. Don't know if we hold this up for that.

Jeffrey: HiC what inspired my suggesiton. Maybe we should provide this only for HiC things, e.g., say you went far enough in this direction so move it towards HiC. May be frustratring for some folks. May be more effective to focus on the good thing that isn't quite ready.

Matthew: Not sure how far AT implementers are on this and whether they'll support it. If we want to resolve this quickly, fine by me but could be decoupled from design-review issue. I don't know if we have TAG consensus on that.

Jeffrey: I woudln't say we're unsatisfied with this without hecking that first. Action for me is to draft a comment where we are leaning on satisfied but with HiC direction.

Matthew: I thought you were saying this could be resolved with HiC conditition?

Jeffrey: My understand is the whole structure of this API is to draw this string into canvas with no accessible fallack. The HiC version would be a very different API shape.

Matthew: The options are fallback message works but you have to insist - MUST .. or it odesn't work and we can't suggest that - which would be the unsatisfied.

Jeffrey: When Alice says fallack content isn't fit. I feel we haven't ??? to fix it.

Matthew: There'd be a mismatch with ordering of fallback content. If canvas moves, how do you communicate that someone with assistive tech.

Matthew: I think it is more like it technicically works but it wasn't with the idea to ??? It is not that it is not going to work but it is probably going to miss some movements.

Jeffrey: Related: rearrange the DOM without moving elements. To move two swap elements, remove one and insert the other. Haven't thought of AT but it seems AT would think there is new content. There is work going on to move before function on elements to do that swap without taking that out of the DOM. And AT would have ot make a decision on announcing that to users.

Matthew: Hard ot say for so many ways this could be used. It might sound like the AT user should be told - they read it - it doesn't matter. If it moves lower down they'l discover but if above, they've already read it.

Matthew: AT users typically skip around things. If header moved to behind... this is a rabbithole. UX principles should kick in?

Jeffrey: Website uses the live region to say something moved around. If rearranged the fallback DOM and let the AT decide.

Matthew: The simple case like creating a static picture and the text is in a circle. There is mroe text below with funny colours. You can make fallback content that works for that. But Alice is saying that isn't going ot work. If it is something moving around that complicates things a lot. I think it is because the API could be called any time. The clue here is Canvas based application. Like TVs. Rich stuff.

Jeffrey: Google Docs is a canvas based application.

Jeffrey: One of the drawbacks with fallback content is if it hasn't been drawn yet. Maybe they should be working on that this style of CTM.

Jeffrey: will propose a comment in private brainstorming to use fallback content instead of raw text strings.

Sarven: Any discussion of how quickly the canvas might be moving? How frequently it should be communicated to AT? If there are 100 changes/second, what does that mean?

Matthew: Key thing is equitable user experience. Don't tell about every graphical change, but if it means something, it needs to be communicated. In an accessible game, you'd expect sound or haptics to indicate stuff. Don't need to keep re-describing the scene. Canvas is so general that it's hard to give general advice. Lots-of-updates isn't good for AT users. Hard to keep track of even group chat.

Sarven: Is this something that AT could have an event listener that picks up on it?

Matthew: Canvas is a bag of pixels, and that's it.

Jeffrey: canvas can tell the AT there is something important. I think your idea with event listener is right ???

Matthew: ... but don't want to do that rapidly or frequently.

Jeffrey: Matthew mentioned we could put a "MUST provide fallback content" but I don't think that's effective.

Matthew: I don't disagree. Work is done on this. If we want positive change to happen we can't say stop it or ask too much. Sometimes we have to say it. If we are going ot say it we encourage the best outcome.

Comment by @mbeganyi-a11y Jun 26, 2025 (See Github)

I think this represents a serious gap in the platform, and I agree that adding APIs to support more sophisticated use cases for text in canvas does make the need for addressing this gap increasingly pressing. However, I'm not sure what guidance can usefully be given in the context of this API while that gap persists.

Hi all. Mike here, member of AGWG and APA but my comments are purely my own. New poster here in this thread and just curious about a couple of things:

  1. @schenney-chromium Have you discovered anything about that gap that Alice mentioned and if there's any development? It doesn't seem like there's a workaround at the moment, but thought I'd ask anyway; and

  2. With the ability to customize kerning, skew, orientation, etc., also comes the chance for rendered text in a typically left-to-right language to be presented right-to-left. This is true of the example on the project page that renders the text in a circle, which has upside-down and right-to-left text presentation. I'm unaware of any way to solve for this customized rendering except for having plain text available within a close proximity (reading order) to the rendered text, or relying on canvas fallbacks (which are limited). I'm trying to think about other methods to ensure as accessible an experience as possible. Within the proposal, is there room to have an automatic output that's rendered in real text adjacent to the canvas? Thinking out loud here...