Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

textOverflow #1283

Merged
merged 34 commits into from
Feb 28, 2023
Merged

textOverflow #1283

merged 34 commits into from
Feb 28, 2023

Conversation

Fil
Copy link
Contributor

@Fil Fil commented Feb 17, 2023

Adds a textOverflow option to the text mark (and by way of consequence to the axes), which respects the lineWidth option (with the same text metrics as the multiline text, including monospace). But, instead of wrapping the text over several lines, clips it to the given length.

If the option is specified as "clip", it's a "white clip", if it's specified as "ellipsis", an ellipsis is added at the end of the clipped string.

The option tries to be a little smart in that it does not replace the last character by an ellipsis, and it trims whitespace.

Unless there is an explicit title channel, any text that is clipped that way receives a title with the full text. Texts that are not clipped do not receive a title (it's a bit opinionated, but easy to override by specifying the title channel).

Regarding axes, only the tick labels are clipped: the option is not passed to the axis label.

closes #394

Capture d’écran 2023-02-17 à 15 09 10

Build available at https://observablehq.com/@fil/plot-text-overflow-1283 if you want to play with it.

@Fil Fil requested a review from mbostock February 17, 2023 14:01
@mbostock
Copy link
Member

This is great! Very excited to see this at last.

I wonder what you think about supporting an ellipsis in the middle, like macOS Finder does? Could that be the default? Or how would you specify whether you want the ellipsis at the start, middle, or end? I see some discussion around a CSS textOverflowMiddle property, but it seems like it never made it into browsers.

Screenshot 2023-02-17 at 10 57 11 AM

src/marks/text.js Outdated Show resolved Hide resolved
@Fil
Copy link
Contributor Author

Fil commented Feb 17, 2023

Yes I think we could have a clipAnchor option that defaults to "end"? In terms of implementation, it should be as simple as (in theory):

  • "start": reverse the string, clip, reverse
  • "middle": interlace the first part of the string with the second part flipped, clip, deinterlace

@Fil
Copy link
Contributor Author

Fil commented Feb 21, 2023

I implemented the different overflow strategies and extended them to work with accents and emojis. Tricky! but it seems to work.

README.md Outdated Show resolved Hide resolved
Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To recap our in-person discussion this morning, we agreed you would:

  1. Add clip-end and ellipsis-end aliases that are equivalent to clip and ellipsis, respectively. (The ellipsis-end alias already appears to be implemented, but it’d be nice if the Text mark constructor implemented the alias explicitly instead of it relying on the switch statement/default casing in overflow. I think this means that clip should be promoted to clip-end in the constructor, and ellipsis promoted to ellipsis-end?)

  2. Rewrite the proposed implementation to use the existing strategy of detecting surrogate pairs during iteration (first >= 0xd800) rather than converting the input lines into arrays of characters ([...input]). The existing strategy should be more performant and there probably aren’t too many places where we have to deal with multi-code point characters explicitly (I hope).

Let me know when you’ve done that and I’ll take another look. Thanks! 🙏

@Fil
Copy link
Contributor Author

Fil commented Feb 21, 2023

I see two issues with the existing text metrics code, which are a bit difficult to disentangle from this PR:

  • Emojis are counted as double chars in monospace, which doesn't seem right. We're going to clip them much too soon.
  • The default to "e" doesn't really work in two common cases: the long dash (em-dash "—"), and emojis. I'm not sure how to address this. Defaulting unknown to 100 (% of 1em) solves it for these two cases but there are probably other characters for which it would not be a good idea?

My changes address these two issues (otherwise the clipping code doesn't work well with emojis) … but I'm not sure that's the right call.

Random note: I lost quite a bit of time trying to understand why a certain emoji is not clipped correctly. Here's the answer (see the added invisible char after the cross). This seems like an indication that I'm missing something?
[..."🐤🐥🍎✖️🦜🎃🦜"].join("-") // "🐤-🐥-🍎-✖-️-🦜-🎃-🦜"

@Fil Fil requested a review from mbostock February 21, 2023 22:12
@mbostock
Copy link
Member

If you think the em dash is common, then we should add it to defaultWidthMap; that’s what it’s for. The fallback to the default “e” is for anything uncommon (or “less” common) because it would be prohibitive to include an entry in the map for every possible Unicode character.

const defaultWidthMap = {

We should fix the monospaceWidth implementation so that it counts surrogate pairs as a single character, too. It should be the same as defaultWidth assuming that defaultWidthMap contained 1 (or 100, or whatever) for every character.

@@ -406,7 +406,9 @@ function defaultWidth(text, start, end) {
}

function monospaceWidth(text, start, end) {
return 100 * (end - start);
let sum = end - start;
for (let i = start; i < end; ++i) sum -= isSurrogatePair(text[i], text[i + 1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a surrogate pair starts at i (i.e., [i, i + 1] is a surrogate pair) then we shouldn’t test whether another surrogate pair starts at i + 1 (i.e., [i + 1, i + 2]); we should skip an index. (Hence the i += … in defaultWidth.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh but monospace emoji are very much larger than the other chars! I'm sending what I have, and will review tomorrow with fresh eyes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn’t assume that all surrogate pairs are emojis. If you want to add testing specifically for emojis, I suppose we could do that (testing for the Unicode emoticons block), but my thinking is that we shouldn’t assume anything about the characters, at least for the monospace case—we should interpret monospace as “all characters have equal width” even if they don’t in practice.

@Fil
Copy link
Contributor Author

Fil commented Feb 22, 2023

I've reverted all changes to the metrics. Many character ranges are poorly measured (emoji, and non-US), but that's OK. The way to frame this for now is that the user has to adapt the lineWidth to their particular case (for example, if you're working with emojis, you need to give a smaller lineWidth; with arabic script, a larger one).

Fixing this fully will be for another PR when time comes, but it's not simple: first, it's hard to cover a wide range of characters without measuring (we don't want to ship a huge database for that).

Second, some emojis are composed of many code points (👩‍❤️‍💋‍👩, 👁️‍🗨️, and there are many modifiers for 🧑🏾👨🏻👧🏼👦🏽🧒🏿🙆🏻‍♂️, sometimes several fitz modifiers in a single emoji like 🫱🏻‍🫲🏼). This doesn't matter much (or at all) for the line wrapping algorithm (which is not meant to wrap lines made of many emojis, and only cuts on spaces). but for the clipping algorithm it's important nice if we can avoid cutting a complex emoji in two parts. But I think we'll have to solve this independently of the clipping algorithm. I've reached a version where at least it's not resulting in broken characters �, but 👩‍❤️‍💋‍👩 might be clipped as 👩‍❤ (which is just a partial rendering of the composed glyph, and is not great, but acceptable). I've left these examples in the unit test.

let a;
let j = 0;
do {
a = text.charCodeAt(i + j++);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mixed use of j++ and ++j is likely wrong here; doesn’t this mean we’re reading the same character we previously read on line 458 in the case of a surrogate pair?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's deliberate and I think, correct—this only happens when the codePoint is the zero width joiner. But I should probably set up unit tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I’ll investigate further as I’m not convinced. 🤔 It means that we’re reading the same character twice.

Also, smaller nit, but I think this function should return the index of the next character/glyph so that the calling code can say i = readGlyph(text, i) (slightly simpler) instead of i += glyphLength(text, i).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes; I also wanted to explore making a generator that yields each glyph in turn, but it started to impact the line wrapping code.

README.md Outdated Show resolved Hide resolved
Comment on lines 135 to 137
? textOverflow
? (t) => [overflow(t, lineWidth * 100, measure, textOverflow)]
: (t) => lineWrap(t, lineWidth * 100, measure)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to investigate promoting this into separate functions (similar to what we do for facetAnchor) so there is less case branching during evaluation.

j = readCharacter(text, i);
const char = text.slice(i, j);
const l = widthof(text, i, j);
if (w < width * p) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the point that we test w here, I think we want to already have added l to it. I.e., we want to test whether adding the next character will exceed the allowed maximum line width. I believe that currently the code allows one too many characters before clipping. And we want w <= width * p instead of w < width * p. And we don’t want to pre-emptively subtract widthof(insert, 0, insert.length) from width (because it we can fit the line in the given length, it doesn’t matter how long the insertion text is).

fx: () => "monospace",
monospace: true,
textOverflow: "ellipsis-end",
lineWidth: 13,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lineWidth: 13,
lineWidth: 13.5,

Giving a fractional width here is useful for testing < vs. <= (per previous comment).

@Fil
Copy link
Contributor Author

Fil commented Feb 23, 2023

0f4342a reverses the meaning of end and start (and changes the default orientation) of the clipping. Was this intentional?

@mbostock
Copy link
Member

No, it wasn’t intentional to reverse the meaning.

Copy link
Member

@mbostock mbostock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏👏👏

src/marks/text.js Outdated Show resolved Hide resolved
@mbostock mbostock merged commit e118b95 into main Feb 28, 2023
@mbostock mbostock deleted the fil/textoverflow branch February 28, 2023 00:06
chaichontat pushed a commit to chaichontat/plot that referenced this pull request Jan 14, 2024
* textOverflow

* more textOverflow options; work with utf8 chars rather than string indices

* format

* the measure for … determines how some strings are clipped

* clip-end, ellipsis-end

* roll back the changes to lineWrap, treat unknown char width as 1em, fix clipping

* monospace emoji is 1 char

* tests

* monospace emoji

* undo changes to the metric
linearize the overflow function

* tests

* restore comment, clean up

* clarify the role of this function: it returns the length of the current character; should allow to generalize to multi-code point chars.

* cleaner

* glyph length

* test readCharacter

* isPictographic

* tweak

* cut

* more rigorous clip tests

* add failing tests

* better middle clip; fix names

* center ellipsis

* comments

* splitText

* separate splitting from clipping

* splitLines, clipLine

* inferFontVariant

* maybeTextOverflow

* widthof(text) shorthand

* include ellipsis in default width map

* add a multiline film title to the test, and remove obsolete comments

* optimize and improve readability

* Update README

---------

Co-authored-by: Mike Bostock <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Truncate long tick labels with ellipses so they don’t get cropped.
2 participants