textOverflow #1283

Fil · 2023-02-17T14:01:14Z

Adds a textOverflow option to the text mark (and by way of consequence to the axes), which respects the lineWidth option (with the same text metrics as the multiline text, including monospace). But, instead of wrapping the text over several lines, clips it to the given length.

If the option is specified as "clip", it's a "white clip", if it's specified as "ellipsis", an ellipsis is added at the end of the clipped string.

The option tries to be a little smart in that it does not replace the last character by an ellipsis, and it trims whitespace.

Unless there is an explicit title channel, any text that is clipped that way receives a title with the full text. Texts that are not clipped do not receive a title (it's a bit opinionated, but easy to override by specifying the title channel).

Regarding axes, only the tick labels are clipped: the option is not passed to the axis label.

closes #394

Build available at https://observablehq.com/@fil/plot-text-overflow-1283 if you want to play with it.

mbostock · 2023-02-17T18:58:25Z

This is great! Very excited to see this at last.

I wonder what you think about supporting an ellipsis in the middle, like macOS Finder does? Could that be the default? Or how would you specify whether you want the ellipsis at the start, middle, or end? I see some discussion around a CSS textOverflowMiddle property, but it seems like it never made it into browsers.

src/marks/text.js

Fil · 2023-02-17T23:39:56Z

Yes I think we could have a clipAnchor option that defaults to "end"? In terms of implementation, it should be as simple as (in theory):

"start": reverse the string, clip, reverse
"middle": interlace the first part of the string with the second part flipped, clip, deinterlace

…dices

Fil · 2023-02-21T16:11:15Z

I implemented the different overflow strategies and extended them to work with accents and emojis. Tricky! but it seems to work.

README.md

mbostock

To recap our in-person discussion this morning, we agreed you would:

Add clip-end and ellipsis-end aliases that are equivalent to clip and ellipsis, respectively. (The ellipsis-end alias already appears to be implemented, but it’d be nice if the Text mark constructor implemented the alias explicitly instead of it relying on the switch statement/default casing in overflow. I think this means that clip should be promoted to clip-end in the constructor, and ellipsis promoted to ellipsis-end?)
Rewrite the proposed implementation to use the existing strategy of detecting surrogate pairs during iteration (first >= 0xd800) rather than converting the input lines into arrays of characters ([...input]). The existing strategy should be more performant and there probably aren’t too many places where we have to deal with multi-code point characters explicitly (I hope).

Let me know when you’ve done that and I’ll take another look. Thanks! 🙏

Fil · 2023-02-21T22:04:35Z

I see two issues with the existing text metrics code, which are a bit difficult to disentangle from this PR:

Emojis are counted as double chars in monospace, which doesn't seem right. We're going to clip them much too soon.
The default to "e" doesn't really work in two common cases: the long dash (em-dash "—"), and emojis. I'm not sure how to address this. Defaulting unknown to 100 (% of 1em) solves it for these two cases but there are probably other characters for which it would not be a good idea?

My changes address these two issues (otherwise the clipping code doesn't work well with emojis) … but I'm not sure that's the right call.

Random note: I lost quite a bit of time trying to understand why a certain emoji is not clipped correctly. Here's the answer (see the added invisible char after the cross). This seems like an indication that I'm missing something?
[..."🐤🐥🍎✖️🦜🎃🦜"].join("-") // "🐤-🐥-🍎-✖-️-🦜-🎃-🦜"

…ix clipping

mbostock · 2023-02-21T22:13:11Z

If you think the em dash is common, then we should add it to defaultWidthMap; that’s what it’s for. The fallback to the default “e” is for anything uncommon (or “less” common) because it would be prohibitive to include an entry in the map for every possible Unicode character.

plot/src/marks/text.js

Line 286 in 228f62e

const defaultWidthMap = {

We should fix the monospaceWidth implementation so that it counts surrogate pairs as a single character, too. It should be the same as defaultWidth assuming that defaultWidthMap contained 1 (or 100, or whatever) for every character.

mbostock · 2023-02-21T22:15:20Z

src/marks/text.js

@@ -406,7 +406,9 @@ function defaultWidth(text, start, end) {
 }

 function monospaceWidth(text, start, end) {
-  return 100 * (end - start);
+  let sum = end - start;
+  for (let i = start; i < end; ++i) sum -= isSurrogatePair(text[i], text[i + 1]);


If a surrogate pair starts at i (i.e., [i, i + 1] is a surrogate pair) then we shouldn’t test whether another surrogate pair starts at i + 1 (i.e., [i + 1, i + 2]); we should skip an index. (Hence the i += … in defaultWidth.)

oh but monospace emoji are very much larger than the other chars! I'm sending what I have, and will review tomorrow with fresh eyes.

We shouldn’t assume that all surrogate pairs are emojis. If you want to add testing specifically for emojis, I suppose we could do that (testing for the Unicode emoticons block), but my thinking is that we shouldn’t assume anything about the characters, at least for the monospace case—we should interpret monospace as “all characters have equal width” even if they don’t in practice.

linearize the overflow function

…nt character; should allow to generalize to multi-code point chars.

Fil · 2023-02-22T15:40:08Z

I've reverted all changes to the metrics. Many character ranges are poorly measured (emoji, and non-US), but that's OK. The way to frame this for now is that the user has to adapt the lineWidth to their particular case (for example, if you're working with emojis, you need to give a smaller lineWidth; with arabic script, a larger one).

Fixing this fully will be for another PR when time comes, but it's not simple: first, it's hard to cover a wide range of characters without measuring (we don't want to ship a huge database for that).

Second, some emojis are composed of many code points (👩‍❤️‍💋‍👩, 👁️‍🗨️, and there are many modifiers for 🧑🏾👨🏻👧🏼👦🏽🧒🏿🙆🏻‍♂️, sometimes several fitz modifiers in a single emoji like 🫱🏻‍🫲🏼). This doesn't matter much (or at all) for the line wrapping algorithm (which is not meant to wrap lines made of many emojis, and only cuts on spaces). but for the clipping algorithm it's ~~important~~ nice if we can avoid cutting a complex emoji in two parts. But I think we'll have to solve this independently of the clipping algorithm. I've reached a version where at least it's not resulting in broken characters �, but 👩‍❤️‍💋‍👩 might be clipped as 👩‍❤ (which is just a partial rendering of the composed glyph, and is not great, but acceptable). I've left these examples in the unit test.

mbostock · 2023-02-22T17:01:02Z

src/marks/text.js

+  let a;
+  let j = 0;
+  do {
+    a = text.charCodeAt(i + j++);


The mixed use of j++ and ++j is likely wrong here; doesn’t this mean we’re reading the same character we previously read on line 458 in the case of a surrogate pair?

It's deliberate and I think, correct—this only happens when the codePoint is the zero width joiner. But I should probably set up unit tests.

Hmm, I’ll investigate further as I’m not convinced. 🤔 It means that we’re reading the same character twice.

Also, smaller nit, but I think this function should return the index of the next character/glyph so that the calling code can say i = readGlyph(text, i) (slightly simpler) instead of i += glyphLength(text, i).

yes; I also wanted to explore making a generator that yields each glyph in turn, but it started to impact the line wrapping code.

README.md

mbostock · 2023-02-22T17:31:48Z

src/marks/text.js

+    ? textOverflow
+      ? (t) => [overflow(t, lineWidth * 100, measure, textOverflow)]
+      : (t) => lineWrap(t, lineWidth * 100, measure)


I’d like to investigate promoting this into separate functions (similar to what we do for facetAnchor) so there is less case branching during evaluation.

mbostock · 2023-02-23T00:56:55Z

src/marks/text.js

+    j = readCharacter(text, i);
+    const char = text.slice(i, j);
+    const l = widthof(text, i, j);
+    if (w < width * p) {


At the point that we test w here, I think we want to already have added l to it. I.e., we want to test whether adding the next character will exceed the allowed maximum line width. I believe that currently the code allows one too many characters before clipping. And we want w <= width * p instead of w < width * p. And we don’t want to pre-emptively subtract widthof(insert, 0, insert.length) from width (because it we can fit the line in the given length, it doesn’t matter how long the insertion text is).

mbostock · 2023-02-23T00:58:00Z

test/plots/text-overflow.js

+        fx: () => "monospace",
+        monospace: true,
+        textOverflow: "ellipsis-end",
+        lineWidth: 13,


Suggested change

lineWidth: 13,

lineWidth: 13.5,

Giving a fractional width here is useful for testing < vs. <= (per previous comment).

Fil · 2023-02-23T08:09:46Z

0f4342a reverses the meaning of end and start (and changes the default orientation) of the clipping. Was this intentional?

mbostock · 2023-02-23T14:47:07Z

No, it wasn’t intentional to reverse the meaning.

mbostock

👏👏👏

src/marks/text.js

* textOverflow * more textOverflow options; work with utf8 chars rather than string indices * format * the measure for … determines how some strings are clipped * clip-end, ellipsis-end * roll back the changes to lineWrap, treat unknown char width as 1em, fix clipping * monospace emoji is 1 char * tests * monospace emoji * undo changes to the metric linearize the overflow function * tests * restore comment, clean up * clarify the role of this function: it returns the length of the current character; should allow to generalize to multi-code point chars. * cleaner * glyph length * test readCharacter * isPictographic * tweak * cut * more rigorous clip tests * add failing tests * better middle clip; fix names * center ellipsis * comments * splitText * separate splitting from clipping * splitLines, clipLine * inferFontVariant * maybeTextOverflow * widthof(text) shorthand * include ellipsis in default width map * add a multiline film title to the test, and remove obsolete comments * optimize and improve readability * Update README --------- Co-authored-by: Mike Bostock <[email protected]>

textOverflow

e644e47

Fil requested a review from mbostock February 17, 2023 14:01

mbostock reviewed Feb 17, 2023

View reviewed changes

src/marks/text.js Outdated Show resolved Hide resolved

more textOverflow options; work with utf8 chars rather than string in…

cd1e6b0

…dices

mbostock reviewed Feb 21, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

format

51927ff

mbostock reviewed Feb 21, 2023

View reviewed changes

Fil added 5 commits February 21, 2023 23:05

the measure for … determines how some strings are clipped

73bdb09

clip-end, ellipsis-end

ba53f59

roll back the changes to lineWrap, treat unknown char width as 1em, f…

32f1716

…ix clipping

monospace emoji is 1 char

1e074ac

tests

1921321

Fil requested a review from mbostock February 21, 2023 22:12

mbostock reviewed Feb 21, 2023

View reviewed changes

Fil added 6 commits February 21, 2023 23:43

monospace emoji

08bca9b

undo changes to the metric

59dc2c2

linearize the overflow function

tests

3e69fae

restore comment, clean up

b1ec861

clarify the role of this function: it returns the length of the curre…

008bb77

…nt character; should allow to generalize to multi-code point chars.

cleaner

8e35247

glyph length

f0d0355

mbostock reviewed Feb 22, 2023

View reviewed changes

README.md Outdated Show resolved Hide resolved

mbostock reviewed Feb 22, 2023

View reviewed changes

mbostock added 3 commits February 22, 2023 16:22

test readCharacter

6fe7852

isPictographic

20f7049

tweak

56d8c01

mbostock reviewed Feb 23, 2023

View reviewed changes

cut

0f4342a

mbostock added 12 commits February 23, 2023 08:49

more rigorous clip tests

402c1c6

add failing tests

4640b3d

better middle clip; fix names

5b55f8e

center ellipsis

71ef663

comments

93a2ed8

splitText

6c97854

separate splitting from clipping

9742747

splitLines, clipLine

56898a7

inferFontVariant

e351c88

maybeTextOverflow

2491140

widthof(text) shorthand

cfc7e2e

include ellipsis in default width map

f2c4440

mbostock approved these changes Feb 24, 2023

View reviewed changes

add a multiline film title to the test, and remove obsolete comments

2b58dff

Fil commented Feb 24, 2023

View reviewed changes

src/marks/text.js Outdated Show resolved Hide resolved

mbostock added 2 commits February 27, 2023 14:58

optimize and improve readability

954ee22

Update README

64d117e

mbostock merged commit e118b95 into main Feb 28, 2023

mbostock deleted the fil/textoverflow branch February 28, 2023 00:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

textOverflow #1283

textOverflow #1283

Fil commented Feb 17, 2023 •

edited

Loading

mbostock commented Feb 17, 2023

Fil commented Feb 17, 2023

Fil commented Feb 21, 2023

mbostock left a comment

Fil commented Feb 21, 2023

mbostock commented Feb 21, 2023

mbostock Feb 21, 2023

Fil Feb 21, 2023

mbostock Feb 22, 2023

Fil commented Feb 22, 2023 •

edited

Loading

mbostock Feb 22, 2023

Fil Feb 22, 2023

mbostock Feb 22, 2023

Fil Feb 22, 2023

mbostock Feb 22, 2023

mbostock Feb 23, 2023

mbostock Feb 23, 2023

Fil commented Feb 23, 2023

mbostock commented Feb 23, 2023

mbostock left a comment

textOverflow #1283

textOverflow #1283

Conversation

Fil commented Feb 17, 2023 • edited Loading

mbostock commented Feb 17, 2023

Fil commented Feb 17, 2023

Fil commented Feb 21, 2023

mbostock left a comment

Choose a reason for hiding this comment

Fil commented Feb 21, 2023

mbostock commented Feb 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fil commented Feb 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fil commented Feb 23, 2023

mbostock commented Feb 23, 2023

mbostock left a comment

Choose a reason for hiding this comment

Fil commented Feb 17, 2023 •

edited

Loading

Fil commented Feb 22, 2023 •

edited

Loading