Skip to content

Commit

Permalink
Normative: Make B.1.2 "String Literals" normative (tc39#1867)
Browse files Browse the repository at this point in the history
(Part of Annex B reform, see PR tc39#1595.)

B.1.2 makes 2 changes to the EscapeSequence production:
(1) It adds the rhs `NonOctalDecimalEscapeSequence`.
(2) It replaces the rhs:
        `0` [lookahead <! DecimalDigit]
    with:
        LegacyOctalEscapeSequence
    where the latter nonterminal generates `0` among lots of other things.

Change 1 is straightforward, but change 2 is tricky.
In the EscapeSequence production, we can't simply replace
the `0` alternative with LegacyOctalEscapeSequence (as B.1.2 does),
because the `0` alternative must be treated differently
from everything else that LegacyOctalEscapeSequence derives.
(The `0` alternative is allowed in contexts where
everything else that LegacyOctalEscapeSequence derives is forbidden.)
So instead, we redefine LegacyOctalEscapeSequence to exclude the `0` alternative.
Specifically, the 'overlap' comes from:

    LegacyOctalEscapeSequence ::
        OctalDigit [lookahead ∉ OctalDigit]

so we replace that with:

    LegacyOctalEscapeSequence ::
        `0` [lookahead ∈ {`8`, `9`}]
        NonZeroOctalDigit [lookahead ∉ OctalDigit]

(See Issue tc39#1975 for more details.)
Resolves tc39#1975.
  • Loading branch information
jmdyck committed Aug 12, 2021
1 parent 9fed387 commit f79dfd2
Showing 1 changed file with 94 additions and 93 deletions.
187 changes: 94 additions & 93 deletions spec.html
Original file line number Diff line number Diff line change
Expand Up @@ -16440,7 +16440,7 @@ <h1>Static Semantics: MV</h1>
The MV of <emu-grammar>SignedInteger :: `-` DecimalDigits</emu-grammar> is the negative of the MV of |DecimalDigits|.
</li>
<li>
The MV of <emu-grammar>DecimalDigit :: `0`</emu-grammar> or of <emu-grammar>HexDigit :: `0`</emu-grammar> or of <emu-grammar>OctalDigit :: `0`</emu-grammar> or of <emu-grammar>BinaryDigit :: `0`</emu-grammar> is 0.
The MV of <emu-grammar>DecimalDigit :: `0`</emu-grammar> or of <emu-grammar>HexDigit :: `0`</emu-grammar> or of <emu-grammar>OctalDigit :: `0`</emu-grammar> or of <emu-grammar>LegacyOctalEscapeSequence :: `0`</emu-grammar> or of <emu-grammar>BinaryDigit :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>DecimalDigit :: `1`</emu-grammar> or of <emu-grammar>NonZeroDigit :: `1`</emu-grammar> or of <emu-grammar>HexDigit :: `1`</emu-grammar> or of <emu-grammar>OctalDigit :: `1`</emu-grammar> or of <emu-grammar>BinaryDigit :: `1`</emu-grammar> is 1.
Expand Down Expand Up @@ -16595,7 +16595,7 @@ <h1>Static Semantics: NumericValue</h1>
</emu-clause>
</emu-clause>

<emu-clause id="sec-literals-string-literals">
<emu-clause id="sec-literals-string-literals" oldids="sec-additional-syntax-string-literals">
<h1>String Literals</h1>
<emu-note>
<p>A string literal is 0 or more Unicode code points enclosed in single or double quotes. Unicode code points may also be represented by an escape sequence. All code points may appear literally in a string literal except for the closing quote code points, U+005C (REVERSE SOLIDUS), U+000D (CARRIAGE RETURN), and U+000A (LINE FEED). Any code points may appear in the form of an escape sequence. String literals evaluate to ECMAScript String values. When generating these String values Unicode code points are UTF-16 encoded as defined in <emu-xref href="#sec-utf16encodecodepoint"></emu-xref>. Code points belonging to the Basic Multilingual Plane are encoded as a single code unit element of the string. All other code points are encoded as two code unit elements of the string.</p>
Expand Down Expand Up @@ -16632,11 +16632,11 @@ <h2>Syntax</h2>
EscapeSequence ::
CharacterEscapeSequence
`0` [lookahead &notin; DecimalDigit]
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence
</emu-grammar>
<p>A conforming implementation, when processing strict mode code, must not extend the syntax of |EscapeSequence| to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.</p>
<emu-grammar type="definition">

CharacterEscapeSequence ::
SingleEscapeCharacter
NonEscapeCharacter
Expand All @@ -16653,6 +16653,25 @@ <h2>Syntax</h2>
`x`
`u`

LegacyOctalEscapeSequence ::
`0` [lookahead &isin; {`8`, `9`}]
NonZeroOctalDigit [lookahead &notin; OctalDigit]
ZeroToThree OctalDigit [lookahead &notin; OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit

NonZeroOctalDigit ::
OctalDigit but not `0`

ZeroToThree :: one of
`0` `1` `2` `3`

FourToSeven :: one of
`4` `5` `6` `7`

NonOctalDecimalEscapeSequence :: one of
`8` `9`

HexEscapeSequence ::
`x` HexDigit HexDigit

Expand All @@ -16668,7 +16687,26 @@ <h2>Syntax</h2>
<p>&lt;LF&gt; and &lt;CR&gt; cannot appear in a string literal, except as part of a |LineContinuation| to produce the empty code points sequence. The proper way to include either in the String value of a string literal is to use an escape sequence such as `\\n` or `\\u000A`.</p>
</emu-note>

<emu-clause id="sec-static-semantics-sv" oldids="sec-string-literals-static-semantics-stringvalue" type="sdo" aoid="SV">
<emu-clause id="sec-string-literals-early-errors">
<h1>Static Semantics: Early Errors</h1>
<emu-grammar>
EscapeSequence :: LegacyOctalEscapeSequence

EscapeSequence :: NonOctalDecimalEscapeSequence
</emu-grammar>
<ul>
<li>It is a Syntax Error if the source code matching this production is strict mode code.</li>
</ul>
<emu-note>In non-strict code, this syntax is Legacy.</emu-note>
<emu-note>
<p>It is possible for string literals to precede a Use Strict Directive that places the enclosing code in <emu-xref href="#sec-strict-mode-code">strict mode</emu-xref>, and implementations must take care to enforce the above rules for such literals. For example, the following source text contains a Syntax Error:</p>
<pre><code class="javascript">
function invalid() { "\7"; "use strict"; }
</code></pre>
</emu-note>
</emu-clause>

<emu-clause id="sec-static-semantics-sv" oldids="sec-string-literals-static-semantics-stringvalue,sec-additional-syntax-string-literals-static-semantics" type="sdo" aoid="SV">
<h1>Static Semantics: SV</h1>
<p>A string literal stands for a value of the String type. SV produces String values for string literals through recursive application on the various parts of the string literal. As part of this process, some Unicode code points within the string literal are interpreted as having a mathematical value, as described below or in <emu-xref href="#sec-literals-numeric-literals"></emu-xref>.</p>
<ul>
Expand Down Expand Up @@ -16865,6 +16903,15 @@ <h1>Static Semantics: SV</h1>
<li>
The SV of <emu-grammar>NonEscapeCharacter :: SourceCharacter but not one of EscapeCharacter or LineTerminator</emu-grammar> is the result of performing UTF16EncodeCodePoint on the code point value of |SourceCharacter|.
</li>
<li>
The SV of <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> is the String value consisting of the code unit whose value is the MV of |LegacyOctalEscapeSequence|.
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `8`</emu-grammar> is the String value consisting of the code unit 0x0038 (DIGIT EIGHT).
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `9`</emu-grammar> is the String value consisting of the code unit 0x0039 (DIGIT NINE).
</li>
<li>
The SV of <emu-grammar>HexEscapeSequence :: `x` HexDigit HexDigit</emu-grammar> is the String value consisting of the code unit whose value is the MV of |HexEscapeSequence|.
</li>
Expand All @@ -16883,6 +16930,39 @@ <h1>Static Semantics: SV</h1>
<emu-clause id="sec-string-literals-static-semantics-mv">
<h1>Static Semantics: MV</h1>
<ul>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit</emu-grammar> is (8 times the MV of |ZeroToThree|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: FourToSeven OctalDigit</emu-grammar> is (8 times the MV of |FourToSeven|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit</emu-grammar> is (64 (that is, 8<sup>2</sup>) times the MV of |ZeroToThree|) plus (8 times the MV of the first |OctalDigit|) plus the MV of the second |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `1`</emu-grammar> is 1.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `2`</emu-grammar> is 2.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `3`</emu-grammar> is 3.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `4`</emu-grammar> is 4.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `5`</emu-grammar> is 5.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `6`</emu-grammar> is 6.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `7`</emu-grammar> is 7.
</li>
<li>
The MV of <emu-grammar>HexEscapeSequence :: `x` HexDigit HexDigit</emu-grammar> is (16 times the MV of the first |HexDigit|) plus the MV of the second |HexDigit|.
</li>
Expand Down Expand Up @@ -27100,7 +27180,7 @@ <h1>Forbidden Extensions</h1>
When processing strict mode code, an implementation must not relax the early error rules of <emu-xref href="#sec-numeric-literals-early-errors"></emu-xref>.
</li>
<li>
|TemplateEscapeSequence| must not be extended to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as defined in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.
|TemplateEscapeSequence| must not be extended to include |LegacyOctalEscapeSequence| or |NonOctalDecimalEscapeSequence| as defined in <emu-xref href="#sec-literals-string-literals"></emu-xref>.
</li>
<li>
When processing strict mode code, the extensions defined in <emu-xref href="#sec-labelled-function-declarations"></emu-xref>, <emu-xref href="#sec-block-level-function-declarations-web-legacy-compatibility-semantics"></emu-xref>, <emu-xref href="#sec-functiondeclarations-in-ifstatement-statement-clauses"></emu-xref>, and <emu-xref href="#sec-initializers-in-forin-statement-heads"></emu-xref> must not be supported.
Expand Down Expand Up @@ -45054,6 +45134,11 @@ <h1>Lexical Grammar</h1>
<emu-prodref name="SingleEscapeCharacter"></emu-prodref>
<emu-prodref name="NonEscapeCharacter"></emu-prodref>
<emu-prodref name="EscapeCharacter"></emu-prodref>
<emu-prodref name="LegacyOctalEscapeSequence"></emu-prodref>
<emu-prodref name="NonZeroOctalDigit"></emu-prodref>
<emu-prodref name="ZeroToThree"></emu-prodref>
<emu-prodref name="FourToSeven"></emu-prodref>
<emu-prodref name="NonOctalDecimalEscapeSequence"></emu-prodref>
<emu-prodref name="HexEscapeSequence"></emu-prodref>
<emu-prodref name="UnicodeEscapeSequence"></emu-prodref>
<emu-prodref name="Hex4Digits"></emu-prodref>
Expand Down Expand Up @@ -45401,90 +45486,6 @@ <h1>Additional ECMAScript Features for Web Browsers</h1>
<emu-annex id="sec-additional-syntax">
<h1>Additional Syntax</h1>

<emu-annex id="sec-additional-syntax-string-literals">
<h1>String Literals</h1>
<p>The syntax and semantics of <emu-xref href="#sec-literals-string-literals"></emu-xref> is extended as follows except that this extension is not allowed for strict mode code:</p>
<h2>Syntax</h2>
<emu-grammar type="definition">
EscapeSequence ::
CharacterEscapeSequence
LegacyOctalEscapeSequence
NonOctalDecimalEscapeSequence
HexEscapeSequence
UnicodeEscapeSequence

LegacyOctalEscapeSequence ::
OctalDigit [lookahead &notin; OctalDigit]
ZeroToThree OctalDigit [lookahead &notin; OctalDigit]
FourToSeven OctalDigit
ZeroToThree OctalDigit OctalDigit

ZeroToThree :: one of
`0` `1` `2` `3`

FourToSeven :: one of
`4` `5` `6` `7`

NonOctalDecimalEscapeSequence :: one of
`8` `9`
</emu-grammar>
<p>This definition of |EscapeSequence| is not used in strict mode.</p>
<emu-note>
<p>It is possible for string literals to precede a Use Strict Directive that places the enclosing code in <emu-xref href="#sec-strict-mode-code">strict mode</emu-xref>, and implementations must take care to not use this extended definition of |EscapeSequence| with such literals. For example, attempting to parse the following source text must fail:</p>
<pre><code class="javascript">
function invalid() { "\7"; "use strict"; }
</code></pre>
</emu-note>

<emu-annex id="sec-additional-syntax-string-literals-static-semantics">
<h1>Static Semantics</h1>
<ul>
<li>
The SV of <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> is the String value consisting of the code unit whose value is the MV of |LegacyOctalEscapeSequence|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit</emu-grammar> is (8 times the MV of |ZeroToThree|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: FourToSeven OctalDigit</emu-grammar> is (8 times the MV of |FourToSeven|) plus the MV of |OctalDigit|.
</li>
<li>
The MV of <emu-grammar>LegacyOctalEscapeSequence :: ZeroToThree OctalDigit OctalDigit</emu-grammar> is (64 (that is, 8<sup>2</sup>) times the MV of |ZeroToThree|) plus (8 times the MV of the first |OctalDigit|) plus the MV of the second |OctalDigit|.
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `8`</emu-grammar> is the String value consisting of the code unit 0x0038 (DIGIT EIGHT).
</li>
<li>
The SV of <emu-grammar>NonOctalDecimalEscapeSequence :: `9`</emu-grammar> is the String value consisting of the code unit 0x0039 (DIGIT NINE).
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `0`</emu-grammar> is 0.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `1`</emu-grammar> is 1.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `2`</emu-grammar> is 2.
</li>
<li>
The MV of <emu-grammar>ZeroToThree :: `3`</emu-grammar> is 3.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `4`</emu-grammar> is 4.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `5`</emu-grammar> is 5.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `6`</emu-grammar> is 6.
</li>
<li>
The MV of <emu-grammar>FourToSeven :: `7`</emu-grammar> is 7.
</li>
</ul>
</emu-annex>
</emu-annex>

<emu-annex id="sec-html-like-comments">
<h1>HTML-like Comments</h1>
<p>The syntax and semantics of <emu-xref href="#sec-comments"></emu-xref> is extended as follows except that this extension is not allowed when parsing source code using the goal symbol |Module|:</p>
Expand Down Expand Up @@ -45689,7 +45690,7 @@ <h1>Static Semantics: CharacterValue</h1>
</emu-alg>
<emu-grammar>CharacterEscape :: LegacyOctalEscapeSequence</emu-grammar>
<emu-alg>
1. Return the MV of |LegacyOctalEscapeSequence| (see <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>).
1. Return the MV of |LegacyOctalEscapeSequence| (see <emu-xref href="#sec-string-literals-static-semantics-mv"></emu-xref>).
</emu-alg>
</emu-annex>

Expand Down Expand Up @@ -46548,7 +46549,7 @@ <h1>The Strict Mode of ECMAScript</h1>
A conforming implementation, when processing strict mode code, must disallow instances of the productions <emu-grammar>NumericLiteral :: LegacyOctalIntegerLiteral</emu-grammar> and <emu-grammar>DecimalIntegerLiteral :: NonOctalDecimalIntegerLiteral</emu-grammar>.
</li>
<li>
A conforming implementation, when processing strict mode code, may not extend the syntax of |EscapeSequence| to include <emu-xref href="#prod-annexB-LegacyOctalEscapeSequence"></emu-xref> or <emu-xref href="#prod-annexB-NonOctalDecimalEscapeSequence"></emu-xref> as described in <emu-xref href="#sec-additional-syntax-string-literals"></emu-xref>.
A conforming implementation, when processing strict mode code, must disallow instances of the productions <emu-grammar>EscapeSequence :: LegacyOctalEscapeSequence</emu-grammar> and <emu-grammar>EscapeSequence :: NonOctalDecimalEscapeSequence</emu-grammar>.
</li>
<li>
Assignment to an undeclared identifier or otherwise unresolvable reference does not create a property in the global object. When a simple assignment occurs within strict mode code, its |LeftHandSideExpression| must not evaluate to an unresolvable Reference. If it does a *ReferenceError* exception is thrown (<emu-xref href="#sec-putvalue"></emu-xref>). The |LeftHandSideExpression| also may not be a reference to a data property with the attribute value { [[Writable]]: *false* }, to an accessor property with the attribute value { [[Set]]: *undefined* }, nor to a non-existent property of an object whose [[Extensible]] internal slot has the value *false*. In these cases a `TypeError` exception is thrown (<emu-xref href="#sec-assignment-operators"></emu-xref>).
Expand Down

0 comments on commit f79dfd2

Please sign in to comment.