CommonMark issue: commonmark/commonmark-spec#650
The following chapters are written as an amendment to the original CommonMark specification. Missing chapters, sections, and definitions are the same as in the original specification.
A CJK code point without variation selector is an Unicode code point that meets at least one of the following criteria:
- Meets both of the following criteria:
- UAX #11 East Asian Width category is either
W
,F
, orH
- Not in RGI emoji set (i.e. is not fully-qualified emoji) defined in UTS #51 Unicode Emoji
- UAX #11 East Asian Width category is either
- UAX #24 Unicode Script Property is Hangul
An IVS (Ideographic Variation Selector/Sequence) is an Unicode code point in the Variation Selectors Supplement Block (U+E0100–U+E01EF).
A SVS (Standard Variation Selector/Sequence) that can follow CJK is an Unicode code point other than U+FE0F in the Variation Selectors Block (U+FE00–U+FE0F) that can follow CJK code point without variation selector (U+FE00–U+FE02 or U+FE0E as of Unicode 161).
A CJK punctuation character is a Unicode punctuation character that is also a CJK code point without variation selector.
A non-CJK punctuation character is a Unicode punctuation character other than CJK punctuation character.
Note
To see the concrete ranges of each definition, see ranges.md.
Note
The bold italic means the modified part.
A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a non-CJK punctuation character or (2b) followed by a non-CJK punctuation character and preceded by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation character, (2bγ) a CJK code point without variation selector, (2bδ) an IVS, or (2bε) a SVS that can follow CJK preceded by a CJK code point without variation selector. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a non-CJK punctuation character, or (2b) preceded by a non-CJK punctuation character and followed by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation character, or (2bγ) a CJK code point without variation selector. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
See implementers-tips.md.
Footnotes
-
The range except for U+FE0E is computed from https://www.unicode.org/Public/16.0.0/ucd/StandardizedVariants.txt (as of Unicode 16) by extracting those that can follow CJK characters. Also, https://unicode.org/Public/16.0.0/ucd/emoji/emoji-variation-sequences.txt shows that U+FE0E can follow some CJK characters. ↩