diff --git a/spec.txt b/spec.txt index c62df5bb..7dcfc48e 100644 --- a/spec.txt +++ b/spec.txt @@ -294,37 +294,28 @@ of [characters] rather than bytes. A conforming parser may be limited to a certain encoding. A [line](@) is a sequence of zero or more [characters] -other than newline (`U+000A`) or carriage return (`U+000D`), +other than line feed (`U+000A`) or carriage return (`U+000D`), followed by a [line ending] or by the end of file. -A [line ending](@) is a newline (`U+000A`), a carriage return -(`U+000D`) not followed by a newline, or a carriage return and a -following newline. +A [line ending](@) is a line feed (`U+000A`), a carriage return +(`U+000D`) not followed by a line feed, or a carriage return and a +following line feed. A line containing no characters, or a line containing only spaces (`U+0020`) or tabs (`U+0009`), is called a [blank line](@). The following definitions of character classes will be used in this spec: -A [whitespace character](@) is a space -(`U+0020`), tab (`U+0009`), newline (`U+000A`), line tabulation (`U+000B`), -form feed (`U+000C`), or carriage return (`U+000D`). - -[Whitespace](@) is a sequence of one or more [whitespace -characters]. - A [Unicode whitespace character](@) is any code point in the Unicode `Zs` general category, or a tab (`U+0009`), -carriage return (`U+000D`), newline (`U+000A`), or form feed -(`U+000C`). +line feed (`U+000A`), form feed (`U+000C`), or carriage return (`U+000D`). -[Unicode whitespace](@) is a sequence of one -or more [Unicode whitespace characters]. +[Unicode whitespace](@) is a sequence of one or more +[Unicode whitespace characters]. -A [space](@) is `U+0020`. +A [tab](@) is `U+0009`. -A [non-whitespace character](@) is any character -that is not a [whitespace character]. +A [space](@) is `U+0020`. An [ASCII control character](@) is a character between `U+0000–1F` (both including) or `U+007F`. @@ -336,14 +327,14 @@ is `!`, `"`, `#`, `$`, `%`, `&`, `'`, `(`, `)`, `[`, `\`, `]`, `^`, `_`, `` ` `` (U+005B–0060), `{`, `|`, `}`, or `~` (U+007B–007E). -A [punctuation character](@) is an [ASCII +A [Unicode punctuation character](@) is an [ASCII punctuation character] or anything in the general Unicode categories `Pc`, `Pd`, `Pe`, `Pf`, `Pi`, `Po`, or `Ps`. ## Tabs Tabs in lines are not expanded to [spaces]. However, -in contexts where whitespace helps to define block structure, +in contexts where spaces help to define block structure, tabs behave as if they were replaced by spaces with a tab stop of 4 characters. @@ -871,8 +862,8 @@ Markdown document. ## Thematic breaks -A line consisting of 0-3 spaces of indentation, followed by a sequence -of three or more matching `-`, `_`, or `*` characters, each followed +A line consisting of optionally up to three spaces of indentation, followed by a +sequence of three or more matching `-`, `_`, or `*` characters, each followed optionally by any number of spaces or tabs, forms a [thematic break](@). @@ -916,7 +907,7 @@ __
```````````````````````````````` -One to three spaces indent are allowed: +Up to three spaces of indentation are allowed: ```````````````````````````````` example *** @@ -929,7 +920,7 @@ One to three spaces indent are allowed: ```````````````````````````````` -Four spaces is too many: +Four spaces of indentation is too many: ```````````````````````````````` example *** @@ -957,7 +948,7 @@ _____________________________________ ```````````````````````````````` -Spaces are allowed between the characters: +Spaces and tabs are allowed between the characters: ```````````````````````````````` example - - - @@ -980,7 +971,7 @@ Spaces are allowed between the characters: ```````````````````````````````` -Spaces are allowed at the end: +Spaces and tabs are allowed at the end: ```````````````````````````````` example - - - - @@ -1004,7 +995,7 @@ a------ ```````````````````````````````` -It is required that all of the [non-whitespace characters] be the same. +It is required that all of the characters other than spaces or tabs be the same. So, this is not a thematic break: ```````````````````````````````` example @@ -1099,13 +1090,13 @@ An [ATX heading](@) consists of a string of characters, parsed as inline content, between an opening sequence of 1--6 unescaped `#` characters and an optional closing sequence of any number of unescaped `#` characters. -The opening sequence of `#` characters must be followed by a -[space] or by the end of line. The optional closing sequence of `#`s must be -preceded by a [space] and may be followed by spaces only. The opening -`#` character may be indented 0-3 spaces. The raw contents of the -heading are stripped of leading and trailing spaces before being parsed -as inline content. The heading level is equal to the number of `#` -characters in the opening sequence. +The opening sequence of `#` characters must be followed by spaces or tabs, or +by the end of line. The optional closing sequence of `#`s must be preceded by +spaces or tabs and may be followed by spaces or tabs only. The opening +`#` character may be preceded by up to three spaces of indentation. The raw +contents of the heading are stripped of leading and trailing space or tabs +before being parsed as inline content. The heading level is equal to the number +of `#` characters in the opening sequence. Simple headings: @@ -1135,7 +1126,7 @@ More than six `#` characters is not a heading: ```````````````````````````````` -At least one space is required between the `#` characters and the +At least one space or tab is required between the `#` characters and the heading's contents, unless the heading is empty. Note that many implementations currently do not require the space. However, the space was required by the @@ -1171,7 +1162,7 @@ Contents are parsed as inlines: ```````````````````````````````` -Leading and trailing [whitespace] is ignored in parsing inline content: +Leading and trailing spaces or tabs are ignored in parsing inline content: ```````````````````````````````` example # foo @@ -1180,7 +1171,7 @@ Leading and trailing [whitespace] is ignored in parsing inline content: ```````````````````````````````` -One to three spaces indentation are allowed: +Up to three spaces of indentation are allowed: ```````````````````````````````` example ### foo @@ -1193,7 +1184,7 @@ One to three spaces indentation are allowed: ```````````````````````````````` -Four spaces are too much: +Four spaces of indentation is too many: ```````````````````````````````` example # foo @@ -1234,7 +1225,7 @@ It need not be the same length as the opening sequence: ```````````````````````````````` -Spaces are allowed after the closing sequence: +Spaces or tabs are allowed after the closing sequence: ```````````````````````````````` example ### foo ### @@ -1243,7 +1234,7 @@ Spaces are allowed after the closing sequence: ```````````````````````````````` -A sequence of `#` characters with anything but [spaces] following it +A sequence of `#` characters with anything but spaces or tabs following it is not a closing sequence, but counts as part of the contents of the heading: @@ -1254,7 +1245,7 @@ heading: ```````````````````````````````` -The closing sequence must be preceded by a space: +The closing sequence must be preceded by a space or tab: ```````````````````````````````` example # foo# @@ -1318,8 +1309,8 @@ ATX headings can be empty: ## Setext headings A [setext heading](@) consists of one or more -lines of text, each containing at least one [non-whitespace -character], with no more than 3 spaces indentation, followed by +lines of text, not interrupted by a blank line, of which the first line does not +have more than 3 spaces of indentation, followed by a [setext heading underline]. The lines of text must be such that, were they not followed by the setext heading underline, they would be interpreted as a paragraph: they cannot be @@ -1329,7 +1320,7 @@ interpretable as a [code fence], [ATX heading][ATX headings], A [setext heading underline](@) is a sequence of `=` characters or a sequence of `-` characters, with no more than 3 -spaces indentation and any number of trailing spaces. If a line +spaces of indentation and any number of trailing spaces or tabs. If a line containing a single `-` can be interpreted as an empty [list items], it should be interpreted this way and not as a [setext heading underline]. @@ -1373,7 +1364,7 @@ baz The contents are the result of parsing the headings's raw content as inlines. The heading's raw content is formed by concatenating the lines and removing initial and final -[whitespace]. +spaces or tabs. ```````````````````````````````` example Foo *bar @@ -1399,8 +1390,8 @@ Foo ```````````````````````````````` -The heading content can be indented up to three spaces, and need -not line up with the underlining: +The heading content can be preceded by up to three spaces of indentation, and +need not line up with the underlining: ```````````````````````````````` example Foo @@ -1418,7 +1409,7 @@ not line up with the underlining: ```````````````````````````````` -Four spaces indent is too much: +Four spaces of indentation is too many: ```````````````````````````````` example Foo @@ -1436,8 +1427,8 @@ Foo ```````````````````````````````` -The setext heading underline can be indented up to three spaces, and -may have trailing spaces: +The setext heading underline can be preceded by up to three spaces of +indentation, and may have trailing spaces or tabs: ```````````````````````````````` example Foo @@ -1447,7 +1438,7 @@ Foo ```````````````````````````````` -Four spaces is too much: +Four spaces of indentation is too many: ```````````````````````````````` example Foo @@ -1458,7 +1449,7 @@ Foo ```````````````````````````````` -The setext heading underline cannot contain internal spaces: +The setext heading underline cannot contain internal spaces or tabs: ```````````````````````````````` example Foo @@ -1474,7 +1465,7 @@ Foo ```````````````````````````````` -Trailing spaces in the content line do not cause a line break: +Trailing spaces or tabs in the content line do not cause a hard line break: ```````````````````````````````` example Foo @@ -1739,8 +1730,8 @@ baz An [indented code block](@) is composed of one or more [indented chunks] separated by blank lines. An [indented chunk](@) is a sequence of non-blank lines, -each indented four or more spaces. The contents of the code block are -the literal contents of the lines, including trailing +each preceded by four or more spaces of indentation. The contents of the code +block are the literal contents of the lines, including trailing [line endings], minus four spaces of indentation. An indented code block has no [info string]. @@ -1833,8 +1824,8 @@ chunk3 ```````````````````````````````` -Any initial spaces beyond four will be included in the content, even -in interior blank lines: +Any initial spaces or tabs beyond four spaces of indentation will be included in +the content, even in interior blank lines: ```````````````````````````````` example chunk1 @@ -1861,7 +1852,7 @@ bar ```````````````````````````````` -However, any non-blank line with fewer than four leading spaces ends +However, any non-blank line with fewer than four spaces of indentation ends the code block immediately. So a paragraph may occur immediately after indented code: @@ -1896,7 +1887,7 @@ Heading ```````````````````````````````` -The first line can be indented more than four spaces: +The first line can be preceded by more than four spaces of indentation: ```````````````````````````````` example foo @@ -1923,7 +1914,7 @@ are not included in it: ```````````````````````````````` -Trailing spaces are included in the code block's content: +Trailing spaces or tabs are included in the code block's content: ```````````````````````````````` example foo @@ -1940,11 +1931,11 @@ A [code fence](@) is a sequence of at least three consecutive backtick characters (`` ` ``) or tildes (`~`). (Tildes and backticks cannot be mixed.) A [fenced code block](@) -begins with a code fence, indented no more than three spaces. +begins with a code fence, preceded by up to three spaces of indentation. The line with the opening code fence may optionally contain some text following the code fence; this is trimmed of leading and trailing -whitespace and called the [info string](@). If the [info string] comes +spaces or tabs and called the [info string](@). If the [info string] comes after a backtick fence, it may not contain any backtick characters. (The reason for this restriction is that otherwise some inline code would be incorrectly interpreted as the @@ -1954,13 +1945,13 @@ The content of the code block consists of all subsequent lines, until a closing [code fence] of the same type as the code block began with (backticks or tildes), and with at least as many backticks or tildes as the opening code fence. If the leading code fence is -indented N spaces, then up to N spaces of indentation are removed from -each line of the content (if present). (If a content line is not -indented, it is preserved unchanged. If it is indented less than N -spaces, all of the indentation is removed.) +preceded by N spaces of indentation, then up to N spaces of indentation are +removed from each line of the content (if present). (If a content line is not +indented, it is preserved unchanged. If it is indented N spaces or less, all +of the indentation is removed.) -The closing code fence may be indented up to three spaces, and may be -followed only by spaces, which are ignored. If the end of the +The closing code fence may be preceded by up to three spaces of indentation, and +may be followed only by spaces or tabs, which are ignored. If the end of the containing block (or document) is reached and no closing code fence has been found, the code block contains all of the lines after the opening code fence until the end of the containing block (or @@ -2173,7 +2164,7 @@ aaa ```````````````````````````````` -Four spaces indentation produces an indented code block: +Four spaces of indentation is too many: ```````````````````````````````` example ``` @@ -2187,8 +2178,8 @@ aaa ```````````````````````````````` -Closing fences may be indented by 0-3 spaces, and their indentation -need not match that of the opening fence: +Closing fences may be preceded by up to three spaces of indentation, and their +indentation need not match that of the opening fence: ```````````````````````````````` example ``` @@ -2224,7 +2215,7 @@ aaa -Code fences (opening and closing) cannot contain internal spaces: +Code fences (opening and closing) cannot contain internal spaces or tabs: ```````````````````````````````` example ``` ``` @@ -2367,7 +2358,7 @@ as raw HTML (and will not be escaped in HTML output). There are seven kinds of [HTML block], which can be defined by their start and end conditions. The block begins with a line that meets a -[start condition](@) (after up to three spaces optional indentation). +[start condition](@) (after up to three optional spaces of indentation). It ends with the first subsequent line that meets a matching [end condition](@), or the last line of the document, or the last line of the [container block](#container-blocks) containing the current HTML @@ -2376,7 +2367,7 @@ the first line meets both the [start condition] and the [end condition], the block will contain just that line. 1. **Start condition:** line begins with the string ``, ``, or `` (case-insensitive; it @@ -2407,14 +2398,14 @@ followed by one of the strings (case-insensitive) `address`, `nav`, `noframes`, `ol`, `optgroup`, `option`, `p`, `param`, `section`, `source`, `summary`, `table`, `tbody`, `td`, `tfoot`, `th`, `thead`, `title`, `tr`, `track`, `ul`, followed -by [whitespace], the end of the line, the string `>`, or +by a space, a tab, the end of the line, the string `>`, or the string `/>`.\ **End condition:** line is followed by a [blank line]. 7. **Start condition:** line begins with a complete [open tag] (with any [tag name] other than `script`, `style`, or `pre`) or a complete [closing tag], -followed only by [whitespace] or the end of the line.\ +followed only by a space, a tab, or the end of the line.\ **End condition:** line is followed by a [blank line]. HTML blocks continue until they are closed by their appropriate @@ -2445,7 +2436,7 @@ _world_. ```````````````````````````````` -In this case, the HTML block is terminated by the newline — the `**Hello**` +In this case, the HTML block is terminated by the blank line — the `**Hello**` text remains verbatim — and regular parsing resumes, with a paragraph, emphasised `world` and inline and block HTML following. @@ -2947,7 +2938,8 @@ function matchwo(a,b) ```````````````````````````````` -The opening tag can be indented 1-3 spaces, but not 4: +The opening tag can be preceded by up to three spaces of indentation, but not +four: ```````````````````````````````` example @@ -3023,7 +3015,7 @@ specification, which says: > The only restrictions are that block-level HTML elements — > e.g. `