Skip to content

Latest commit

 

History

History
57 lines (36 loc) · 6.16 KB

specification.md

File metadata and controls

57 lines (36 loc) · 6.16 KB

CommonMark CJK-friendly Amendments Specification

CommonMark issue: commonmark/commonmark-spec#650

The following chapters are written as an amendment to the original CommonMark specification. Missing chapters, sections, and definitions are the same as in the original specification.

2. Preliminaries

2.1 Characters and lines

A CJK code point without variation selector is an Unicode code point that meets at least one of the following criteria:

An IVS (Ideographic Variation Selector/Sequence) is an Unicode code point in the Variation Selectors Supplement Block (U+E0100–U+E01EF).

A SVS (Standard Variation Selector/Sequence) that can follow CJK is an Unicode code point other than U+FE0F in the Variation Selectors Block (U+FE00–U+FE0F) that can follow CJK code point without variation selector (U+FE00–U+FE02 or U+FE0E as of Unicode 161).

A CJK punctuation character is a Unicode punctuation character that is also a CJK code point without variation selector.

A non-CJK punctuation character is a Unicode punctuation character other than CJK punctuation character.

Note

To see the concrete ranges of each definition, see ranges.md.

6. Inlines

6.2 Emphasis and strong emphasis

Note

The bold italic means the modified part.

A left-flanking delimiter run is a delimiter run that is (1) not followed by Unicode whitespace, and either (2a) not followed by a non-CJK punctuation character or (2b) followed by a non-CJK punctuation character and preceded by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation character, (2bγ) a CJK code point without variation selector, (2bδ) an IVS, or (2bε) a SVS that can follow CJK preceded by a CJK code point without variation selector. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

A right-flanking delimiter run is a delimiter run that is (1) not preceded by Unicode whitespace, and either (2a) not preceded by a non-CJK punctuation character, or (2b) preceded by a non-CJK punctuation character and followed by (2bα) Unicode whitespace, (2bβ) a non-CJK punctuation character, or (2bγ) a CJK code point without variation selector. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.

Tips for Implementers

See implementers-tips.md.

Unicode data list

Data name Latest Unicode 16
East Asian Width https://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt https://www.unicode.org/Public/16.0.0/ucd/EastAsianWidth.txt
Script https://www.unicode.org/Public/UCD/latest/ucd/Scripts.txt https://www.unicode.org/Public/16.0.0/ucd/Scripts.txt
Block https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt https://www.unicode.org/Public/16.0.0/ucd/Blocks.txt
Characters followed by SVS https://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt https://www.unicode.org/Public/16.0.0/ucd/StandardizedVariants.txt
Characters followed by U+FE0E/U+FE0F https://unicode.org/Public/UCD/latest/ucd/emoji/emoji-variation-sequences.txt https://unicode.org/Public/16.0.0/ucd/emoji/emoji-variation-sequences.txt
Fully-qualified Emojis (without ZWJ) https://unicode.org/Public/emoji/latest/emoji-sequences.txt https://unicode.org/Public/16.0.0/emoji/emoji-sequences.txt
Emoji qualification test https://unicode.org/Public/emoji/latest/emoji-test.txt https://unicode.org/Public/16.0.0/emoji/emoji-test.txt
Characters that can be emoji (Useless) https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt https://www.unicode.org/Public/16.0.0/ucd/emoji/emoji-data.txt

Footnotes

  1. The range except for U+FE0E is computed from https://www.unicode.org/Public/16.0.0/ucd/StandardizedVariants.txt (as of Unicode 16) by extracting those that can follow CJK characters. Also, https://unicode.org/Public/16.0.0/ucd/emoji/emoji-variation-sequences.txt shows that U+FE0E can follow some CJK characters.