-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support "true" extended grapheme cluster #46
Labels
Comments
k-takata
added a commit
that referenced
this issue
Dec 1, 2016
Regexp supports Unicoe 9.0.0's \X * meta character \X matches Unicode 9.0.0 characters with some workarounds for UTR #51 Unicode Emoji, Version 4.0 emoji zwj sequences. [Feature #12831] [ruby-core:77586] The term "character" can have many meanings bytes, codepoints, combined characters, and so on. "grapheme cluster" is highest one of such words, which means user-perceived characters. Unicode Standard Annex #29 UNICODE TEXT SEGMENTATION specifies how to handle grapheme clusters (extended grapheme cluster). But some specs aren't updated to current situation because Unicode Emoji is rapidly extended without well definition. It breaks the precondition of UTR#29 "Grapheme cluster boundaries can be easily tested by looking at immediately adjacent characters". (the sentence will be removed in the next version) Though some of its detail are described in Unicode Technical Report #51 UNICODE EMOJI but it is not merged into UTR#29 yet. http://unicode.org/reports/tr29/ http://unicode.org/reports/tr51/ http://unicode.org/Public/emoji/4.0/
k-takata
added a commit
that referenced
this issue
Dec 1, 2016
k-takata
added a commit
that referenced
this issue
Dec 1, 2016
k-takata
added a commit
that referenced
this issue
Dec 1, 2016
k-takata
added a commit
that referenced
this issue
Dec 1, 2016
Fixed. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Currently
\X
is defined as(?>\P{M}\p{M}*)
. This definition is the same as Perl 5.10. However the definition was changed as of Perl 5.12.See pp.15-18 of this slide. The slide says Onigmo supports legacy grapheme clusters.
See also: UAX #29: Unicode Text Segmentation
The text was updated successfully, but these errors were encountered: