From 2dce2393803a5b213c8da304c9cd23f4720a14a1 Mon Sep 17 00:00:00 2001 From: Michael Dyck Date: Mon, 19 Aug 2019 20:36:45 -0400 Subject: [PATCH] Normative: Make B.1.4 "Regular Expressions Patterns" normative (Merge its Syntax, Static Semantics, and Runtime Semantics into the main body.) (Part of Annex B reform, see PR #1595.) --- spec.html | 506 ++++++++++++++++++++---------------------------------- 1 file changed, 190 insertions(+), 316 deletions(-) diff --git a/spec.html b/spec.html index 53e350562ba..fbb4d063ade 100644 --- a/spec.html +++ b/spec.html @@ -27021,7 +27021,7 @@

Forbidden Extensions

The behaviour of built-in methods which are specified in ECMA-402, such as those named `toLocaleString`, must not be extended except as specified in ECMA-402.
  • - The RegExp pattern grammars in and must not be extended to recognize any of the source characters A-Z or a-z as |IdentityEscape[+U]| when the [U] grammar parameter is present. + The RegExp pattern grammars in must not be extended to recognize any of the source characters A-Z or a-z as |IdentityEscape| when the [U] grammar parameter is present.
  • The Syntactic Grammar must not be extended in any manner that allows the token `:` to immediately follow source text that matches the |BindingIdentifier| nonterminal symbol. @@ -32988,6 +32988,7 @@

    RegExp (Regular Expression) Objects

    Syntax for Patterns

    The `RegExp` constructor applies the following grammar to the input pattern String. An error occurs if the grammar cannot interpret the String as an expansion of |Pattern|.

    +

    Some of these productions (indicated by “::!”) introduce ambiguities that are broken by the ordering of alternatives. When parsing using such productions, each alternative is considered only if previous alternatives do not match.

    Patterns

    Pattern[U, N] :: @@ -33001,21 +33002,30 @@

    Patterns

    [empty] Alternative[?U, ?N] Term[?U, ?N] - Term[U, N] :: - Assertion[?U, ?N] - Atom[?U, ?N] - Atom[?U, ?N] Quantifier + Term[U, N] ::! + [+U] Assertion[+U, ?N] + [+U] Atom[+U, ?N] Quantifier + [+U] Atom[+U, ?N] + [~U] QuantifiableAssertion[?N] Quantifier + [~U] Assertion[~U, ?N] + [~U] ExtendedAtom[?N] Quantifier + [~U] ExtendedAtom[?N] Assertion[U, N] :: `^` `$` `\` `b` `\` `B` - `(` `?` `=` Disjunction[?U, ?N] `)` - `(` `?` `!` Disjunction[?U, ?N] `)` + [+U] `(` `?` `=` Disjunction[+U, ?N] `)` + [+U] `(` `?` `!` Disjunction[+U, ?N] `)` + [~U] QuantifiableAssertion[?N] `(` `?` `<=` Disjunction[?U, ?N] `)` `(` `?` `<!` Disjunction[?U, ?N] `)` + QuantifiableAssertion[N] :: + `(` `?` `=` Disjunction[~U, ?N] `)` + `(` `?` `!` Disjunction[~U, ?N] `)` + Quantifier :: QuantifierPrefix QuantifierPrefix `?` @@ -33028,6 +33038,16 @@

    Patterns

    `{` DecimalDigits[~Sep] `,` `}` `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` + ExtendedAtom[N] ::! + `.` + `\` AtomEscape[~U, ?N] + `\` [lookahead == `c`] + CharacterClass[~U] + `(` Disjunction[~U, ?N] `)` + `(` `?` `:` Disjunction[~U, ?N] `)` + InvalidBracedQuantifier + ExtendedPatternCharacter + Atom[U, N] :: PatternCharacter `.` @@ -33036,6 +33056,14 @@

    Patterns

    `(` GroupSpecifier[?U] Disjunction[?U, ?N] `)` `(` `?` `:` Disjunction[?U, ?N] `)` + InvalidBracedQuantifier :: + `{` DecimalDigits[~Sep] `}` + `{` DecimalDigits[~Sep] `,` `}` + `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` + + ExtendedPatternCharacter :: + SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|` + PatternCharacter :: SourceCharacter but not SyntaxCharacter @@ -33102,23 +33130,30 @@

    Character Classes

    `-` ClassAtomNoDash[?U] - ClassAtomNoDash[U] :: + ClassAtomNoDash[U, N] ::! SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?U] + `\` ClassEscape[?U, ?N] + `\` [lookahead == `c`]

    Escapes

    - ClassEscape[U] :: + ClassEscape[U, N] ::! `b` [+U] `-` + [~U] `c` ClassControlLetter CharacterClassEscape[?U] - CharacterEscape[?U] + CharacterEscape[?U, ?N] + + ClassControlLetter :: + DecimalDigit + `_` - AtomEscape[U, N] :: - DecimalEscape + AtomEscape[U, N] ::! + [+U] DecimalEscape + [~U] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] CharacterClassEscape[?U] - CharacterEscape[?U] + CharacterEscape[?U, ?N] [+N] `k` GroupName[?U] DecimalEscape :: @@ -33161,13 +33196,14 @@

    Escapes

    ControlLetter `_` - CharacterEscape[U] :: + CharacterEscape[U, N] ::! ControlEscape `c` ControlLetter `0` [lookahead <! DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence[?U] - IdentityEscape[?U] + [~U] LegacyOctalEscapeSequence + IdentityEscape[?U, ?N] ControlEscape :: one of `f` `n` `r` `t` `v` @@ -33195,25 +33231,36 @@

    Escapes

    HexNonSurrogate :: Hex4Digits [> but only if the MV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF] - IdentityEscape[U] :: + IdentityEscape[U, N] :: [+U] SyntaxCharacter [+U] `/` - [~U] SourceCharacter but not UnicodeIDContinue + [~U] SourceCharacterIdentityEscape[?N] + + SourceCharacterIdentityEscape[N] :: + [~N] SourceCharacter but not `c` + [+N] SourceCharacter but not one of `c` or `k`
    + +

    Patterns that use the following productions are allowed, but deprecated:

    + + Term ::! QuantifiableAssertion Quantifier + + ExtendedAtom ::! `\` [lookahead == `c`] + + ClassAtomNoDash ::! `\` [lookahead == `c`] + + ClassEscape ::! `c` ClassControlLetter + + CharacterEscape ::! LegacyOctalEscapeSequence + +

    Static Semantics for Patterns

    - -

    A number of productions in this section are given alternative definitions in section .

    -
    - - +

    Static Semantics: Early Errors

    - -

    This section is amended in .

    -
    Pattern :: Disjunction
    • @@ -33229,6 +33276,12 @@

      Static Semantics: Early Errors

      It is a Syntax Error if the MV of the first |DecimalDigits| is larger than the MV of the second |DecimalDigits|.
    + ExtendedAtom ::! InvalidBracedQuantifier +
      +
    • + It is a Syntax Error if any source text matches this rule. +
    • +
    RegExpIdentifierStart :: `\` RegExpUnicodeEscapeSequence
    • @@ -33244,7 +33297,7 @@

      Static Semantics: Early Errors

      NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges
      • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true*. + It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [U] parameter.
      • It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|. @@ -33253,19 +33306,19 @@

        Static Semantics: Early Errors

        NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges
        • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true*. + It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [U] parameter.
        • It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|.
        - AtomEscape :: DecimalEscape + AtomEscape ::! DecimalEscape
        • It is a Syntax Error if the CapturingGroupNumber of |DecimalEscape| is larger than _NcapturingParens_ ().
        - AtomEscape :: `k` GroupName + AtomEscape ::! `k` GroupName
        • It is a Syntax Error if the enclosing |Pattern| does not contain a |GroupSpecifier| with an enclosed |RegExpIdentifierName| whose CapturingGroupName equals the CapturingGroupName of the |RegExpIdentifierName| of this production's |GroupName|. @@ -33302,9 +33355,6 @@

          Static Semantics: Early Errors

          Static Semantics: CapturingGroupNumber

          - -

          This section is amended in .

          -
          DecimalEscape :: NonZeroDigit 1. Return the MV of |NonZeroDigit|. @@ -33317,36 +33367,32 @@

          Static Semantics: CapturingGroupNumber

          The definitions of “the MV of |NonZeroDigit|” and “the MV of |DecimalDigits|” are in .

          - +

          Static Semantics: IsCharacterClass

          - -

          This section is amended in .

          -
          ClassAtom :: `-` - ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` + ClassAtomNoDash ::! SourceCharacter but not one of `\` or `]` or `-` - ClassEscape :: `b` + ClassAtomNoDash ::! `\` [lookahead == `c`] - ClassEscape :: `-` + ClassEscape ::! `b` - ClassEscape :: CharacterEscape + ClassEscape ::! `-` + + ClassEscape ::! CharacterEscape 1. Return *false*. - ClassEscape :: CharacterClassEscape + ClassEscape ::! CharacterClassEscape 1. Return *true*.
          - +

          Static Semantics: CharacterValue

          - -

          This section is amended in .

          -
          ClassAtom :: `-` @@ -33354,25 +33400,37 @@

          Static Semantics: CharacterValue

          1. Return the code point value of U+002D (HYPHEN-MINUS). - ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` + ClassAtomNoDash ::! SourceCharacter but not one of `\` or `]` or `-` 1. Let _ch_ be the code point matched by |SourceCharacter|. 1. Return the code point value of _ch_. - ClassEscape :: `b` + ClassAtomNoDash ::! `\` [lookahead == `c`] + + + 1. Return the code point value of U+005C (REVERSE SOLIDUS). + + + ClassEscape ::! `b` 1. Return the code point value of U+0008 (BACKSPACE). - ClassEscape :: `-` + ClassEscape ::! `-` 1. Return the code point value of U+002D (HYPHEN-MINUS). - CharacterEscape :: ControlEscape + ClassEscape ::! `c` ClassControlLetter + + 1. Let _ch_ be the code point matched by |ClassControlLetter|. + 1. Let _i_ be _ch_'s code point value. + 1. Return the remainder of dividing _i_ by 32. + + CharacterEscape ::! ControlEscape 1. Return the code point value according to . @@ -33484,23 +33542,27 @@

          Static Semantics: CharacterValue

          - CharacterEscape :: `c` ControlLetter + CharacterEscape ::! `c` ControlLetter 1. Let _ch_ be the code point matched by |ControlLetter|. 1. Let _i_ be _ch_'s code point value. 1. Return the remainder of dividing _i_ by 32. - CharacterEscape :: `0` [lookahead <! DecimalDigit] + CharacterEscape ::! `0` [lookahead <! DecimalDigit] 1. Return the code point value of U+0000 (NULL).

          `\\0` represents the <NUL> character and cannot be followed by a decimal digit.

          - CharacterEscape :: HexEscapeSequence + CharacterEscape ::! HexEscapeSequence 1. Return the MV of |HexEscapeSequence|. + CharacterEscape ::! LegacyOctalEscapeSequence + + 1. Return the MV of |LegacyOctalEscapeSequence| (see ). + RegExpUnicodeEscapeSequence :: `u` HexLeadSurrogate `\u` HexTrailSurrogate 1. Let _lead_ be the CharacterValue of |HexLeadSurrogate|. @@ -33560,7 +33622,7 @@

          Static Semantics: CapturingGroupName

          - +

          Runtime Semantics for Patterns

          A regular expression pattern is converted into an Abstract Closure using the process described below. An implementation is encouraged to use more efficient algorithms than the ones listed below, as long as the results are the same. The Abstract Closure is used as the value of a RegExp object's [[RegExpMatcher]] internal slot.

          A |Pattern| is either a BMP pattern or a Unicode pattern depending upon whether or not its associated flags contain a `u`. A BMP pattern matches against a String interpreted as consisting of a sequence of 16-bit values that are Unicode code points in the range of the Basic Multilingual Plane. A Unicode pattern matches against a String interpreted as consisting of Unicode code points encoded using UTF-16. In the context of describing the behaviour of a BMP pattern “character” means a single 16-bit Unicode BMP code point. In the context of describing the behaviour of a Unicode pattern “character” means a UTF-16 encoded code point (). In either context, “character value” means the numeric value of the corresponding non-encoded code point.

          @@ -33715,18 +33777,18 @@

          Alternative

          Term

          With parameter _direction_.

          -

          The production Term :: Assertion evaluates as follows:

          +

          The production Term ::! Assertion evaluates as follows:

          1. Return the Matcher that is the result of evaluating |Assertion|.

          The resulting Matcher is independent of _direction_.

          -

          The production Term :: Atom evaluates as follows:

          +

          The production Term ::! Atom evaluates as follows:

          1. Return the Matcher that is the result of evaluating |Atom| with argument _direction_. -

          The production Term :: Atom Quantifier evaluates as follows:

          +

          The production Term ::! Atom Quantifier evaluates as follows:

          1. Evaluate |Atom| with argument _direction_ to obtain a Matcher _m_. 1. Evaluate |Quantifier| to obtain the three results: a non-negative integer _min_, a non-negative integer (or +∞) _max_, and Boolean _greedy_. @@ -33738,6 +33800,11 @@

          Term

          1. Assert: _c_ is a Continuation. 1. Return ! RepeatMatcher(_m_, _min_, _max_, _greedy_, _x_, _c_, _parenIndex_, _parenCount_).
          +

          ----

          +

          In the above algorithm, references to Atom :: `(` GroupSpecifier Disjunction `)` are to be interpreted as meaning Atom :: `(` GroupSpecifier Disjunction `)` or ExtendedAtom ::! `(` Disjunction `)` .

          +

          The production Term ::! QuantifiableAssertion Quantifier evaluates the same as the production Term ::! Atom Quantifier but with |QuantifiableAssertion| substituted for |Atom|.

          +

          The production Term ::! ExtendedAtom Quantifier evaluates the same as the production Term ::! Atom Quantifier but with |ExtendedAtom| substituted for |Atom|.

          +

          The production Term ::! ExtendedAtom evaluates the same as the production Term ::! Atom but with |ExtendedAtom| substituted for |Atom|.

          @@ -33895,6 +33962,11 @@

          Assertion

          1. If _r_ is not ~failure~, return ~failure~. 1. Return _c_(_x_). +

          The production Assertion :: QuantifiableAssertion evaluates as follows:

          + + 1. Evaluate |QuantifiableAssertion| to obtain a Matcher _m_. + 1. Return _m_. +

          The production Assertion :: `(` `?` `<=` Disjunction `)` evaluates as follows:

          1. Evaluate |Disjunction| with -1 as its _direction_ argument to obtain a Matcher _m_. @@ -33925,6 +33997,8 @@

          Assertion

          1. If _r_ is not ~failure~, return ~failure~. 1. Return _c_(_x_).
          +

          ----

          +

          The evaluation rules for the Assertion :: `(` `?` `=` Disjunction `)` and Assertion :: `(` `?` `!` Disjunction `)` productions are also used for the |QuantifiableAssertion| productions, but with |QuantifiableAssertion| substituted for |Assertion|.

          @@ -34004,6 +34078,11 @@

          Atom

          1. Return the Matcher that is the result of evaluating |AtomEscape| with argument _direction_. +

          The production ExtendedAtom ::! `\` [lookahead == `c`] evaluates as follows:

          + + 1. Let _A_ be the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). + 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). +

          The production Atom :: CharacterClass evaluates as follows:

          1. Evaluate |CharacterClass| to obtain a CharSet _A_ and a Boolean _invert_. @@ -34037,6 +34116,14 @@

          Atom

          1. Return the Matcher that is the result of evaluating |Disjunction| with argument _direction_. +

          The production ExtendedAtom ::! ExtendedPatternCharacter evaluates as follows:

          + + 1. Let _ch_ be the character represented by |ExtendedPatternCharacter|. + 1. Let _A_ be a one-element CharSet containing the character _ch_. + 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). + +

          ----

          +

          The evaluation rules for the |Atom| productions except for Atom :: PatternCharacter are also used for the |ExtendedAtom| productions, but with |ExtendedAtom| substituted for |Atom|.

          @@ -34208,10 +34295,28 @@

          NonemptyClassRanges

          1. Evaluate the first |ClassAtom| to obtain a CharSet _A_. 1. Evaluate the second |ClassAtom| to obtain a CharSet _B_. 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRange(_A_, _B_). + 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). 1. Return the union of _D_ and _C_.
          + +

          + CharacterRangeOrUnion ( + _A_: a CharSet, + _B_: a CharSet, + ) +

          +
          +
          + + 1. If _Unicode_ is *false*, then + 1. If _A_ does not contain exactly one character or _B_ does not contain exactly one character, then + 1. Let _C_ be the CharSet containing the single character `-` U+002D (HYPHEN-MINUS). + 1. Return the union of CharSets _A_, _B_ and _C_. + 1. Return ! CharacterRange(_A_, _B_). + +
          +

          CharacterRange ( @@ -34250,7 +34355,7 @@

          NonemptyClassRangesNoDash

          1. Evaluate |ClassAtomNoDash| to obtain a CharSet _A_. 1. Evaluate |ClassAtom| to obtain a CharSet _B_. 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRange(_A_, _B_). + 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). 1. Return the union of _D_ and _C_. @@ -34278,25 +34383,32 @@

          ClassAtom

          ClassAtomNoDash

          -

          The production ClassAtomNoDash :: SourceCharacter but not one of `\` or `]` or `-` evaluates as follows:

          +

          The production ClassAtomNoDash ::! SourceCharacter but not one of `\` or `]` or `-` evaluates as follows:

          1. Return the CharSet containing the character matched by |SourceCharacter|. -

          The production ClassAtomNoDash :: `\` ClassEscape evaluates as follows:

          +

          The production ClassAtomNoDash ::! `\` ClassEscape evaluates as follows:

          1. Return the CharSet that is the result of evaluating |ClassEscape|. +

          The production ClassAtomNoDash ::! `\` [lookahead == `c`] evaluates as follows:

          + + 1. Return the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). + + This production can only be reached from the sequence `\c` within a character class where it is not followed by an acceptable control character.

          ClassEscape

          The |ClassEscape| productions evaluate as follows:

          - ClassEscape :: `b` + ClassEscape ::! `b` + + ClassEscape ::! `-` - ClassEscape :: `-` + ClassEscape ::! `c` ClassControlLetter - ClassEscape :: CharacterEscape + ClassEscape ::! CharacterEscape 1. Let _cv_ be the CharacterValue of this |ClassEscape|. @@ -34304,7 +34416,7 @@

          ClassEscape

          1. Return the CharSet containing the single character _c_.
          - ClassEscape :: CharacterClassEscape + ClassEscape ::! CharacterClassEscape 1. Return the CharSet that is the result of evaluating |CharacterClassEscape|. @@ -34317,13 +34429,13 @@

          ClassEscape

          AtomEscape

          With parameter _direction_.

          -

          The production AtomEscape :: DecimalEscape evaluates as follows:

          +

          The production AtomEscape ::! DecimalEscape evaluates as follows:

          1. Evaluate |DecimalEscape| to obtain an integer _n_. 1. Assert: _n_ ≤ _NcapturingParens_. 1. Return ! BackreferenceMatcher(_n_, _direction_). -

          The production AtomEscape :: CharacterClassEscape evaluates as follows:

          +

          The production AtomEscape ::! CharacterClassEscape evaluates as follows:

          1. Evaluate |CharacterClassEscape| to obtain a CharSet _A_. 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). @@ -34331,13 +34443,13 @@

          AtomEscape

          An escape sequence of the form `\\` followed by a non-zero decimal number _n_ matches the result of the _n_th set of capturing parentheses (). It is an error if the regular expression has fewer than _n_ capturing parentheses. If the regular expression has _n_ or more capturing parentheses but the _n_th one is *undefined* because it has not captured anything, then the backreference always succeeds.

          -

          The production AtomEscape :: CharacterEscape evaluates as follows:

          +

          The production AtomEscape ::! CharacterEscape evaluates as follows:

          1. Evaluate |CharacterEscape| to obtain a character _ch_. 1. Let _A_ be a one-element CharSet containing the character _ch_. 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). -

          The production AtomEscape :: `k` GroupName evaluates as follows:

          +

          The production AtomEscape ::! `k` GroupName evaluates as follows:

          1. Search the enclosing |Pattern| for an instance of a |GroupSpecifier| containing a |RegExpIdentifierName| which has a CapturingGroupName equal to the CapturingGroupName of the |RegExpIdentifierName| contained in |GroupName|. 1. Assert: A unique such |GroupSpecifier| is found. @@ -34444,12 +34556,13 @@

          CharacterClassEscape

          CharacterEscape

          The |CharacterEscape| productions evaluate as follows:

          - CharacterEscape :: + CharacterEscape ::! ControlEscape `c` ControlLetter `0` [lookahead <! DecimalDigit] HexEscapeSequence RegExpUnicodeEscapeSequence + LegacyOctalEscapeSequence IdentityEscape @@ -45195,9 +45308,13 @@

          Regular Expressions

          + + + + @@ -45214,6 +45331,7 @@

          Regular Expressions

          + @@ -45233,6 +45351,7 @@

          Regular Expressions

          + @@ -45410,252 +45529,7 @@

          HTML-like Comments

          Regular Expressions Patterns

          -

          The syntax of is modified and extended as follows. These changes introduce ambiguities that are broken by the ordering of grammar productions and by contextual information. When parsing using the following grammar, each alternative is considered only if previous production alternatives do not match.

          -

          This alternative pattern grammar and semantics only changes the syntax and semantics of BMP patterns. The following grammar extensions include productions parameterized with the [U] parameter. However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [U] parameter present on the goal symbol.

          -

          Syntax

          - - Term[U, N] :: - [+U] Assertion[+U, ?N] - [+U] Atom[+U, ?N] Quantifier - [+U] Atom[+U, ?N] - [~U] QuantifiableAssertion[?N] Quantifier - [~U] Assertion[~U, ?N] - [~U] ExtendedAtom[?N] Quantifier - [~U] ExtendedAtom[?N] - - Assertion[U, N] :: - `^` - `$` - `\` `b` - `\` `B` - [+U] `(` `?` `=` Disjunction[+U, ?N] `)` - [+U] `(` `?` `!` Disjunction[+U, ?N] `)` - [~U] QuantifiableAssertion[?N] - `(` `?` `<=` Disjunction[?U, ?N] `)` - `(` `?` `<!` Disjunction[?U, ?N] `)` - - QuantifiableAssertion[N] :: - `(` `?` `=` Disjunction[~U, ?N] `)` - `(` `?` `!` Disjunction[~U, ?N] `)` - - ExtendedAtom[N] :: - `.` - `\` AtomEscape[~U, ?N] - `\` [lookahead == `c`] - CharacterClass[~U] - `(` Disjunction[~U, ?N] `)` - `(` `?` `:` Disjunction[~U, ?N] `)` - InvalidBracedQuantifier - ExtendedPatternCharacter - - InvalidBracedQuantifier :: - `{` DecimalDigits[~Sep] `}` - `{` DecimalDigits[~Sep] `,` `}` - `{` DecimalDigits[~Sep] `,` DecimalDigits[~Sep] `}` - - ExtendedPatternCharacter :: - SourceCharacter but not one of `^` `$` `\` `.` `*` `+` `?` `(` `)` `[` `|` - - AtomEscape[U, N] :: - [+U] DecimalEscape - [~U] DecimalEscape [> but only if the CapturingGroupNumber of |DecimalEscape| is ≤ _NcapturingParens_] - CharacterClassEscape[?U] - CharacterEscape[?U, ?N] - [+N] `k` GroupName[?U] - - CharacterEscape[U, N] :: - ControlEscape - `c` ControlLetter - `0` [lookahead <! DecimalDigit] - HexEscapeSequence - RegExpUnicodeEscapeSequence[?U] - [~U] LegacyOctalEscapeSequence - IdentityEscape[?U, ?N] - - IdentityEscape[U, N] :: - [+U] SyntaxCharacter - [+U] `/` - [~U] SourceCharacterIdentityEscape[?N] - - SourceCharacterIdentityEscape[N] :: - [~N] SourceCharacter but not `c` - [+N] SourceCharacter but not one of `c` or `k` - - ClassAtomNoDash[U, N] :: - SourceCharacter but not one of `\` or `]` or `-` - `\` ClassEscape[?U, ?N] - `\` [lookahead == `c`] - - ClassEscape[U, N] :: - `b` - [+U] `-` - [~U] `c` ClassControlLetter - CharacterClassEscape[?U] - CharacterEscape[?U, ?N] - - ClassControlLetter :: - DecimalDigit - `_` - - -

          When the same left hand sides occurs with both [+U] and [\~U] guards it is to control the disambiguation priority.

          -
          - - -

          Static Semantics: Early Errors

          -

          The semantics of is extended as follows:

          - ExtendedAtom :: InvalidBracedQuantifier -
            -
          • - It is a Syntax Error if any source text matches this rule. -
          • -
          -

          Additionally, the rules for the following productions are modified with the addition of the highlighted text:

          - NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges -
            -
          • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *true* or IsCharacterClass of the second |ClassAtom| is *true* and this production has a [U] parameter. -
          • -
          • - It is a Syntax Error if IsCharacterClass of the first |ClassAtom| is *false* and IsCharacterClass of the second |ClassAtom| is *false* and the CharacterValue of the first |ClassAtom| is larger than the CharacterValue of the second |ClassAtom|. -
          • -
          - NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges -
            -
          • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *true* or IsCharacterClass of |ClassAtom| is *true* and this production has a [U] parameter. -
          • -
          • - It is a Syntax Error if IsCharacterClass of |ClassAtomNoDash| is *false* and IsCharacterClass of |ClassAtom| is *false* and the CharacterValue of |ClassAtomNoDash| is larger than the CharacterValue of |ClassAtom|. -
          • -
          -
          - - -

          Static Semantics: IsCharacterClass

          -

          The semantics of is extended as follows:

          - - ClassAtomNoDash :: `\` [lookahead == `c`] - - - 1. Return *false*. - -
          - - -

          Static Semantics: CharacterValue

          -

          The semantics of is extended as follows:

          - - ClassAtomNoDash :: `\` [lookahead == `c`] - - - 1. Return the code point value of U+005C (REVERSE SOLIDUS). - - ClassEscape :: `c` ClassControlLetter - - 1. Let _ch_ be the code point matched by |ClassControlLetter|. - 1. Let _i_ be _ch_'s code point value. - 1. Return the remainder of dividing _i_ by 32. - - CharacterEscape :: LegacyOctalEscapeSequence - - 1. Return the MV of |LegacyOctalEscapeSequence| (see ). - -
          - - -

          Pattern Semantics

          -

          The semantics of is extended as follows:

          -

          Within reference to “Atom :: `(` GroupSpecifier Disjunction `)` ” are to be interpreted as meaning “Atom :: `(` GroupSpecifier Disjunction `)` ” or “ExtendedAtom :: `(` Disjunction `)` ”.

          - -

          Term () includes the following additional evaluation rules:

          -

          The production Term :: QuantifiableAssertion Quantifier evaluates the same as the production Term :: Atom Quantifier but with |QuantifiableAssertion| substituted for |Atom|.

          -

          The production Term :: ExtendedAtom Quantifier evaluates the same as the production Term :: Atom Quantifier but with |ExtendedAtom| substituted for |Atom|.

          -

          The production Term :: ExtendedAtom evaluates the same as the production Term :: Atom but with |ExtendedAtom| substituted for |Atom|.

          - -

          Assertion () includes the following additional evaluation rule:

          -

          The production Assertion :: QuantifiableAssertion evaluates as follows:

          - - 1. Evaluate |QuantifiableAssertion| to obtain a Matcher _m_. - 1. Return _m_. - - -

          Assertion () evaluation rules for the Assertion :: `(` `?` `=` Disjunction `)` and Assertion :: `(` `?` `!` Disjunction `)` productions are also used for the |QuantifiableAssertion| productions, but with |QuantifiableAssertion| substituted for |Assertion|.

          - -

          Atom () evaluation rules for the |Atom| productions except for Atom :: PatternCharacter are also used for the |ExtendedAtom| productions, but with |ExtendedAtom| substituted for |Atom|. The following evaluation rules, with parameter _direction_, are also added:

          -

          The production ExtendedAtom :: `\` [lookahead == `c`] evaluates as follows:

          - - 1. Let _A_ be the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - -

          The production ExtendedAtom :: ExtendedPatternCharacter evaluates as follows:

          - - 1. Let _ch_ be the character represented by |ExtendedPatternCharacter|. - 1. Let _A_ be a one-element CharSet containing the character _ch_. - 1. Return ! CharacterSetMatcher(_A_, *false*, _direction_). - - -

          CharacterEscape () includes the following additional evaluation rule:

          -

          The production CharacterEscape :: LegacyOctalEscapeSequence evaluates as follows:

          - - 1. Let _cv_ be the CharacterValue of this |CharacterEscape|. - 1. Return the character whose character value is _cv_. - - -

          NonemptyClassRanges () modifies the following evaluation rule:

          -

          The production NonemptyClassRanges :: ClassAtom `-` ClassAtom ClassRanges evaluates as follows:

          - - 1. Evaluate the first |ClassAtom| to obtain a CharSet _A_. - 1. Evaluate the second |ClassAtom| to obtain a CharSet _B_. - 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). - 1. Return the union of _D_ and _C_. - - -

          NonemptyClassRangesNoDash () modifies the following evaluation rule:

          -

          The production NonemptyClassRangesNoDash :: ClassAtomNoDash `-` ClassAtom ClassRanges evaluates as follows:

          - - 1. Evaluate |ClassAtomNoDash| to obtain a CharSet _A_. - 1. Evaluate |ClassAtom| to obtain a CharSet _B_. - 1. Evaluate |ClassRanges| to obtain a CharSet _C_. - 1. Let _D_ be ! CharacterRangeOrUnion(_A_, _B_). - 1. Return the union of _D_ and _C_. - - -

          ClassEscape () includes the following additional evaluation rule:

          -

          The production ClassEscape :: `c` ClassControlLetter evaluates as follows:

          - - 1. Let _cv_ be the CharacterValue of this |ClassEscape|. - 1. Let _c_ be the character whose character value is _cv_. - 1. Return the CharSet containing the single character _c_. - - -

          ClassAtomNoDash () includes the following additional evaluation rule:

          -

          The production ClassAtomNoDash :: `\` [lookahead == `c`] evaluates as follows:

          - - 1. Return the CharSet containing the single character `\\` U+005C (REVERSE SOLIDUS). - - - This production can only be reached from the sequence `\c` within a character class where it is not followed by an acceptable control character. - - -

          - CharacterRangeOrUnion ( - _A_: a CharSet, - _B_: a CharSet, - ) -

          -
          -
          - - 1. If _Unicode_ is *false*, then - 1. If _A_ does not contain exactly one character or _B_ does not contain exactly one character, then - 1. Let _C_ be the CharSet containing the single character `-` U+002D (HYPHEN-MINUS). - 1. Return the union of CharSets _A_, _B_ and _C_. - 1. Return ! CharacterRange(_A_, _B_). - -
          -
          +

          Some of the syntax and semantics of BMP patterns ([~U]) used to be normative optional.