-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
textOverflow #1283
textOverflow #1283
Conversation
This is great! Very excited to see this at last. I wonder what you think about supporting an ellipsis in the middle, like macOS Finder does? Could that be the default? Or how would you specify whether you want the ellipsis at the start, middle, or end? I see some discussion around a CSS textOverflowMiddle property, but it seems like it never made it into browsers. |
Yes I think we could have a clipAnchor option that defaults to "end"? In terms of implementation, it should be as simple as (in theory):
|
I implemented the different overflow strategies and extended them to work with accents and emojis. Tricky! but it seems to work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap our in-person discussion this morning, we agreed you would:
-
Add clip-end and ellipsis-end aliases that are equivalent to clip and ellipsis, respectively. (The ellipsis-end alias already appears to be implemented, but it’d be nice if the Text mark constructor implemented the alias explicitly instead of it relying on the switch statement/default casing in
overflow
. I think this means that clip should be promoted to clip-end in the constructor, and ellipsis promoted to ellipsis-end?) -
Rewrite the proposed implementation to use the existing strategy of detecting surrogate pairs during iteration (
first >= 0xd800
) rather than converting the input lines into arrays of characters ([...input]
). The existing strategy should be more performant and there probably aren’t too many places where we have to deal with multi-code point characters explicitly (I hope).
Let me know when you’ve done that and I’ll take another look. Thanks! 🙏
I see two issues with the existing text metrics code, which are a bit difficult to disentangle from this PR:
My changes address these two issues (otherwise the clipping code doesn't work well with emojis) … but I'm not sure that's the right call. Random note: I lost quite a bit of time trying to understand why a certain emoji is not clipped correctly. Here's the answer (see the added invisible char after the cross). This seems like an indication that I'm missing something? |
If you think the em dash is common, then we should add it to Line 286 in 228f62e
We should fix the |
src/marks/text.js
Outdated
@@ -406,7 +406,9 @@ function defaultWidth(text, start, end) { | |||
} | |||
|
|||
function monospaceWidth(text, start, end) { | |||
return 100 * (end - start); | |||
let sum = end - start; | |||
for (let i = start; i < end; ++i) sum -= isSurrogatePair(text[i], text[i + 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a surrogate pair starts at i (i.e., [i, i + 1] is a surrogate pair) then we shouldn’t test whether another surrogate pair starts at i + 1 (i.e., [i + 1, i + 2]); we should skip an index. (Hence the i += …
in defaultWidth
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh but monospace emoji are very much larger than the other chars! I'm sending what I have, and will review tomorrow with fresh eyes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn’t assume that all surrogate pairs are emojis. If you want to add testing specifically for emojis, I suppose we could do that (testing for the Unicode emoticons block), but my thinking is that we shouldn’t assume anything about the characters, at least for the monospace case—we should interpret monospace as “all characters have equal width” even if they don’t in practice.
linearize the overflow function
…nt character; should allow to generalize to multi-code point chars.
I've reverted all changes to the metrics. Many character ranges are poorly measured (emoji, and non-US), but that's OK. The way to frame this for now is that the user has to adapt the lineWidth to their particular case (for example, if you're working with emojis, you need to give a smaller lineWidth; with arabic script, a larger one). Fixing this fully will be for another PR when time comes, but it's not simple: first, it's hard to cover a wide range of characters without measuring (we don't want to ship a huge database for that). Second, some emojis are composed of many code points (👩❤️💋👩, 👁️🗨️, and there are many modifiers for 🧑🏾👨🏻👧🏼👦🏽🧒🏿🙆🏻♂️, sometimes several fitz modifiers in a single emoji like 🫱🏻🫲🏼). This doesn't matter much (or at all) for the line wrapping algorithm (which is not meant to wrap lines made of many emojis, and only cuts on spaces). but for the clipping algorithm it's |
src/marks/text.js
Outdated
let a; | ||
let j = 0; | ||
do { | ||
a = text.charCodeAt(i + j++); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mixed use of j++
and ++j
is likely wrong here; doesn’t this mean we’re reading the same character we previously read on line 458 in the case of a surrogate pair?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's deliberate and I think, correct—this only happens when the codePoint is the zero width joiner. But I should probably set up unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I’ll investigate further as I’m not convinced. 🤔 It means that we’re reading the same character twice.
Also, smaller nit, but I think this function should return the index of the next character/glyph so that the calling code can say i = readGlyph(text, i)
(slightly simpler) instead of i += glyphLength(text, i)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes; I also wanted to explore making a generator that yields each glyph in turn, but it started to impact the line wrapping code.
src/marks/text.js
Outdated
? textOverflow | ||
? (t) => [overflow(t, lineWidth * 100, measure, textOverflow)] | ||
: (t) => lineWrap(t, lineWidth * 100, measure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d like to investigate promoting this into separate functions (similar to what we do for facetAnchor) so there is less case branching during evaluation.
src/marks/text.js
Outdated
j = readCharacter(text, i); | ||
const char = text.slice(i, j); | ||
const l = widthof(text, i, j); | ||
if (w < width * p) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the point that we test w
here, I think we want to already have added l
to it. I.e., we want to test whether adding the next character will exceed the allowed maximum line width. I believe that currently the code allows one too many characters before clipping. And we want w <= width * p
instead of w < width * p
. And we don’t want to pre-emptively subtract widthof(insert, 0, insert.length)
from width
(because it we can fit the line in the given length, it doesn’t matter how long the insertion text is).
fx: () => "monospace", | ||
monospace: true, | ||
textOverflow: "ellipsis-end", | ||
lineWidth: 13, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lineWidth: 13, | |
lineWidth: 13.5, |
Giving a fractional width here is useful for testing <
vs. <=
(per previous comment).
0f4342a reverses the meaning of end and start (and changes the default orientation) of the clipping. Was this intentional? |
No, it wasn’t intentional to reverse the meaning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏👏👏
* textOverflow * more textOverflow options; work with utf8 chars rather than string indices * format * the measure for … determines how some strings are clipped * clip-end, ellipsis-end * roll back the changes to lineWrap, treat unknown char width as 1em, fix clipping * monospace emoji is 1 char * tests * monospace emoji * undo changes to the metric linearize the overflow function * tests * restore comment, clean up * clarify the role of this function: it returns the length of the current character; should allow to generalize to multi-code point chars. * cleaner * glyph length * test readCharacter * isPictographic * tweak * cut * more rigorous clip tests * add failing tests * better middle clip; fix names * center ellipsis * comments * splitText * separate splitting from clipping * splitLines, clipLine * inferFontVariant * maybeTextOverflow * widthof(text) shorthand * include ellipsis in default width map * add a multiline film title to the test, and remove obsolete comments * optimize and improve readability * Update README --------- Co-authored-by: Mike Bostock <[email protected]>
Adds a textOverflow option to the text mark (and by way of consequence to the axes), which respects the lineWidth option (with the same text metrics as the multiline text, including monospace). But, instead of wrapping the text over several lines, clips it to the given length.
If the option is specified as "clip", it's a "white clip", if it's specified as "ellipsis", an ellipsis is added at the end of the clipped string.
The option tries to be a little smart in that it does not replace the last character by an ellipsis, and it trims whitespace.
Unless there is an explicit title channel, any text that is clipped that way receives a title with the full text. Texts that are not clipped do not receive a title (it's a bit opinionated, but easy to override by specifying the title channel).
Regarding axes, only the tick labels are clipped: the option is not passed to the axis label.
closes #394
Build available at https://observablehq.com/@fil/plot-text-overflow-1283 if you want to play with it.