-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed StringWidth() implementation by using proper Unicode grapheme cluster segmentation. Fixes #28 #29
Conversation
…luster segmentation. Fixes mattn#28
Thanks your contribution. I'm thinking there are terminals which does not support ZWJ yet. So I prefer to disable ZWJ. |
How to add enable/disable ZWJ? |
Using If you want to support terminals that don't render these characters, the question is how do they render them? If the difference is that some terminals render characters rune-by-rune, then a flag like But singling out the zero-width joiner rule (which is only one of 13 rules) is probably not very useful. Let me know what you think about this. |
Most of terminals which does not support ZWJ displays Unicode characters as what we can see. For example, |
So you're saying that these terminals support all kinds of Unicode characters (e.g. flags, Hangul, spacing/prepend marks) but ZWJ is the one they don't support? Do you have an example for such a terminal? |
For example, gnome-terminal displays |
Also ZeroWidthJoiner is already used in some projects. https://github.com/4avka/tview/blob/d7a9d6c70ab205d578fc64f02f4e681ccb5b5180/util.go#L56 |
What about ( |
I'll check it in later. But probably 뢴 is 2 cells |
Interesting. Well, let me know what the result of these other characters is. If we need to be able to turn off individual Unicode breaking rules, then that functionality needs to be implemented in I'll wait for the widths of the characters you wanted to look up and we'll see then how to proceed. |
@rivo Can you revert removing ZeroWidthJoiner? |
Any thought? |
Apologies for the late reply. And thanks to @alecrabbit for providing the screenshot. Well, reintroducing the Also, the If you insist on keeping that flag in there, I think you'll need to ignore this pull request. I think it's generally difficult or even impossible to make this package work on all platforms. One would have to keep a database of how the different platforms render different characters. Since new emojis are introduced all the time, that's whole bunch of work. And it seems that not even popular packages get this right. Here's iTerm on macOS 10.14: Same with Apple's Terminal: In short, I have no solution. There will always be someone who will complain. Unless someone is willing to put the effort into building a huge database of terminals+OSs and how they render characters, there will be mistakes. But personally, I think this PR gets us closer to the correct way to handle this than the current version. |
Could you please fix conflict? |
Conflicts are resolved. However, some tests still fail. Specifically "TestStringWidth" (see my last bullet point in my opening comment above) and "TestZeroWidthJoiner". Regarding "TestZeroWidthJoiner", the problem appears not to be the zero-width joiner but the fact that the white flag 🏳️ (U+1F3F3) is assigned a width of 1 although its actual width is 2. My terminal (iTerm2 on macOS) seems to make the same mistake: And so is Apple's native Terminal application: So this seems to be a problem across applications. In go-runewidth/runewidth_table.go Line 50 in 5885066
I believe it should be in there. If you change the end of this line to {0x1F3F3, 0x1F3F4} the test will pass. (I can do it in this PR if you want.) |
Yes, please |
I made that change and also fixed the test case failures resulting from it. I also fixed the Please note that as mentioned way above, the
I'm not familiar with the |
Thanks. Could you please fix conflicts? |
Could you please tell me what changes were made in the main branch? I cannot fix the conflicts if I don't understand what changes were made in parallel. |
Thanks. This still leaves the failing test regarding the |
Your merge conflict resolutions removed the necessary changes that I made, which is why your test failed. I made them again. Tests pass now. |
Test is still failing. https://travis-ci.org/github/mattn/go-runewidth/jobs/687468815 |
|
With this change this package isn't about |
…day, the rainbow flag only has a width of 1.
Codecov Report
@@ Coverage Diff @@
## master #29 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 2 2
Lines 160 151 -9
=========================================
- Hits 160 151 -9
Continue to review full report at Codecov.
|
Thanks @nojus297 for this hint. It took me quite a long time to figure it out. It's a surprisingly difficult issue and it looks like there is not satisfying solution at the moment. So here we go:
The white flag 🏳️ (which is the basis for the rainbow flag 🏳️🌈) is not contained in that table. It's probably new. The way I understand it, the East Asian Width table only includes emojis for historical reasons (East Asian code points were used when emojis were introduced). Unicode Annex #11 "East Asian Width" says (Section 5, "Recommendations"):
I believe
In any case, for now, I have changed the test such that for the rainbow flag, a width of 1 is expected. This appears to be in line with the official Unicode information I could find. And as mentioned further above, that's how the terminals also treat it. It's not ideal because obviously, the actual flag is wider than one character. But a solution would probably require the Unicode consortium to make changes (or at least clarify). All checks are passing now. |
Any thoughts? |
As I mentioned above, I don't want to remove ZeroWidthJoiner. So any part using uniseg should be separeted with flag ZeroWidthJoiner. |
As I mentioned, my project As soon as this PR is merged, I will remove |
Ah, I see. Thanks your explaining. Looks good to me. I'll check this soon. |
Thank you! |
1 similar comment
Thank you! |
This introduces an implementation of StringWidth() using Unicode grapheme clusters which should be the correct way to split a string into its individual characters. The built-in assumption is that if we have combined runes (emojis, flags etc.), their width is the width of the first non-zero-width rune. Many of these combined runes were previously not handled correctly by this package.
Please note:
rivo/uniseg
over.)TestStringWidth
test case but only the part whereEastAsianWidth = true
. I'm not very familiar with this flag so I don't know how to fix that. You may want to review this.