Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(biome_console): fix printing emojis consisting of multiple codepoints #693

Merged
merged 10 commits into from
Nov 10, 2023

Conversation

simonxabris
Copy link
Contributor

Summary

Fixes #455

The source of the bug was, that the code used the .chars() iterator, which creates Unicode code point chunks. Some complex unicode characters, like emojis can consist of multiple code points, which the code did not handle correctly. As suggested in the PR (and the conclusion I came to as well), I used the unicode_segmentation library, to be able to iterate over graphemes (which is basically a character as perceived by humans). This fixes the error by treating multiple code points representing a single character as one unit.

Test Plan

I've added a new test where the input string contains emojis that the CLI couldn't print before, and I'm basically asserting that the output is the exact same as the input.

@github-actions github-actions bot added the A-CLI Area: CLI label Nov 9, 2023
@simonxabris simonxabris changed the title fix: fix printing emojis consisting of multiple codepoints fix(biome_console): fix printing emojis consisting of multiple codepoints Nov 9, 2023
@github-actions github-actions bot added A-Website Area: website A-Changelog Area: changelog labels Nov 9, 2023
@simonxabris
Copy link
Contributor Author

I see from the test run the that windows character replacement stuff is not working fully, I'll try to figure out something there!

Copy link
Member

@ematipico ematipico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here I thought that the issue was the formatter :) Great work here and thank you for fixing it!

/// Determines if a unicode grapheme consists only of code points
/// which are considered whitepsace characters in ASCII
fn grapheme_is_whitespace(grapheme: &str) -> bool {
grapheme.chars().all(|c| c.is_whitespace())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL "grapheme" 😄

CHANGELOG.md Outdated Show resolved Hide resolved
website/src/content/docs/internals/changelog.mdx Outdated Show resolved Hide resolved
@ematipico ematipico merged commit 657f83d into biomejs:main Nov 10, 2023
@simonxabris simonxabris deleted the fix/emoji-printing-to-console branch November 10, 2023 16:47
@melMass melMass mentioned this pull request Apr 25, 2024
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-CLI Area: CLI A-Website Area: website
Projects
None yet
Development

Successfully merging this pull request may close these issues.

🐛 biome format breaks emojis when used via stdin
2 participants