-
-
Notifications
You must be signed in to change notification settings - Fork 827
replace graphemer by unicode-segmenter #12617
Conversation
Signed-off-by: Hyeseong Kim <[email protected]>
Based on the comparison in the link, why wouldn't we just switch to Intl.Segmenter altogether? We only support 2 major vers of Chrome & Firefox + 2 minors of Safari so they all support it. Any wins against Segmenter directly? |
Yes for runtime perf and compatibility (if matter). That's the same reason graphemer was originally used.
I'd recommend using |
As compound-web now uses @t3chguy If you’d like to use |
@cometkim the main reason for moving to Segmenter in Compound was bundle size, not for Element but for projects like https://github.com/matrix-org/matrix-authentication-service where it was ~1/3rd of the bundle.
This seems quite interesting in context of #12582 - if it has a way to detect strings which are entirely emoji, excluding textual emoji cc @robintown |
You can use for (const { segment } of segmenter.segment(text)) {
if (/\p{Emoji_Presentation}/u.test(segment)) {
const emoji = segment;
}
} However, using unicode-segmenter here has a little performance gain, as the // This adds 1KB gzipped size, or you can Unicode RegExp.
import { isEmojiPresentation } from 'unicode-segmenter/emoji';
import { GraphemeCategory, graphemeSegments } from 'unicode-segmenter/grapheme';
for (const { segment, _cat } of graphemeSegments(text)) {
if (
// Check its category first, so reduce unnecessary searching on non-emoji characters
_cat === GraphemeCategory.Extended_Pictographic &&
isEmojiPresentation(segment.codePointAt(0))
) {
const emoji = segment;
}
} |
A note: |
I see, I think adding support for Emoji Sets to |
I made a Unicode library that is much smaller and faster than graphemer. Check it out: https://github.com/cometkim/unicode-segmenter?tab=readme-ov-file#unicode-segmentergrapheme-vs
It ensures compliance with the latest Unicode data by performing tests and fuzzing against the
Intl.Segmenter
API.graphemer
is still in the bundle as transitive dependency from the@vector-im/compound-web
package, so I made PR to it element-hq/compound-web#181Note: The library may possibly replace
emojibase-regex
too.