Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 biome format breaks emojis when used via stdin #455

Closed
1 task done
chrisgrieser opened this issue Sep 30, 2023 · 2 comments · Fixed by #693
Closed
1 task done

🐛 biome format breaks emojis when used via stdin #455

chrisgrieser opened this issue Sep 30, 2023 · 2 comments · Fixed by #693
Labels
A-CLI Area: CLI S-Bug-confirmed Status: report has been confirmed as a valid bug

Comments

@chrisgrieser
Copy link
Contributor

chrisgrieser commented Sep 30, 2023

Environment information

CLI:
  Version:                      1.2.2
  Color support:                true

Platform:
  CPU Architecture:             aarch64
  OS:                           macos

Environment:
  BIOME_LOG_DIR:                unset
  NO_COLOR:                     unset
  TERM:                         "xterm-256color"
  JS_RUNTIME_VERSION:           "v20.7.0"
  JS_RUNTIME_NAME:              "node"
  NODE_PACKAGE_MANAGER:         unset

Biome Configuration:
  Status:                       unset

Workspace:
  Open Documents:               0

Discovering running Biome servers...

Server:
  Status:                       stopped

What happened?

When using biome format via stdin, some emojis seem to break. This does not affect all emojis, and it does not affect biome format --write.

(The different emoji sizes is a Wezterm-font-thing, I checked that the issue persists when opening the file in TextEdit.)

Pasted image 2023-09-30 at 13 14 41@2x

Expected result

Emojis not breaking

Code of Conduct

  • I agree to follow Biome's Code of Conduct
@ematipico
Copy link
Member

This was also flagged in the Rome repository, and it seems it wasn't fixed.

Here's a possible reason of the bug: rome/tools#3915 (comment)

@ideologism
Copy link

ideologism commented Oct 30, 2023

I think that problem is the logic of converting strings to buffers.

Sometimes a Unicode "character" is made up of multiple Unicode scalar values, like the emoji1 in the above example or the ”é“ character, but in Rust, char can only represent one unicode scalar.

The conversion logic here turns the second byte into a replacement_character, which causes the problem. I'm not sure why there is a need to make this conversion, but I think the unicode-segmentation might be helpful in solving this problem.

Relevant code:

let is_whitespace = item.is_whitespace();
let is_zero_width = UnicodeWidthChar::width(item).map_or(true, |width| width == 0);
let item = if !is_whitespace && is_zero_width {
char::REPLACEMENT_CHARACTER

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-CLI Area: CLI S-Bug-confirmed Status: report has been confirmed as a valid bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants