Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updates to EscapedChars.md #34

Merged
merged 2 commits into from
Jul 6, 2015
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions EscapedChars.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,20 @@ in the ES2015 specification. The characters included in the list are the followi

|Character | Why escape it?
|-----------|--------------|
| `^` | So that `new RegExp(RegExp.escape('^') + "a")` will match `"^a"` rather than the `^` being treated as a negation or start of sentencecontrol construct. |
| `^` | So that `new RegExp(RegExp.escape('^') + "a")` will match `"^a"` rather than the `^` being treated as a negation or start of sentence control construct. |
| `$` | So that `new RegExp("a" + RegExp.escape('$'))` will match `"a$"` rather than the `$` being treated as a end of sentence control construct. |
| `\` | So that `new RegExp(RegExp.escape("\\"))` won't throw a type error and instead match `"\\"`, and more generally that `\` won't be treated as an escape control construct. |
| `.` | So that `new RegExp(RegExp.escape("."))` won't be matched against single characters like `"a"` but instead against an actual dot ("."), and more generally that `.` won't be treated as an "any character" control construct. |
| `*` | So that `new RegExp(RegExp.escape("*"))` won't throw a type error but instead match against an actual star ("*"), and more generally that `*` won't be treated as a "zero or more times" quantifier. |
| `+` | So that `new RegExp(RegExp.escape("+"))` won't throw a type error but instead match against an actual plus sign ("+"), and more generally that `+` won't be treated as a "one or more times" quantifier. |
| `?` | So that `new RegExp(RegExp.escape("?"))` won't throw a type error but instead match against an actual question mark sign ("?"), and more generally that `?` won't be treated as a "once or not at all" quantifier. |
| `?` | So that `new RegExp(RegExp.escape("?"))` won't throw a type error but instead match against an actual question mark sign ("?"), and more generally that `?` won't be treated as a "once or not at all" quantifier. Also that `new RegExp("("+RegExp.escape("?=")+")")` will match a literal question mark followed by an equals sign, instead of introducint a lookahead, and more generally that `?` won't make groups become assertions or non-capturing. |
| `(` | So that `new RegExp(RegExp.escape("("))` won't throw a type error but instead match against an actual opening parenthesis ("("), and more generally that `(` won't be treated as a "start of a capturing group" logical operator. |
| `)` | So that `new RegExp(RegExp.escape(")"))` won't throw a type error but instead match against an actual closing parenthesis (")"), and more generally that `(` won't be treated as a "end of a capturing group" logical operator. |
| `[` | So that `new RegExp(RegExp.escape("["))` won't throw a type error but instead match against an actual opening bracket ("["), and more generally that `[` won't be treated as a "start of a character class" construct. |
| `]` | This construct is needed to allow escaping inside character classes. `new RegExp("]")` is perfectly valid but we want to allow `new RegExp("["+RegExp.escape("]...")+"]")` in which the `]` needs to be taken literally (and not as the closing "end of character class" character. |
| `{` | So that `new RegExp("a" + RegExp.escape("{1,2}"))` will not match `"aaa"`, and more generally that `{` is taken literally and not as a quantifier. |
| `}` | So that `new RegExp("a" + RegExp.escape("{1,2}"))` will not match `"aaa"`, and more generally that `}` is taken literally and not as a quantifier. |
| `|` | So that `|` will be treated literally and `new RegExp(Regxp.escape("a|b"))` will produce a string that matches `"a|b"` instead of the | being treated as a logical "or" operator. |
| `|` | So that `|` will be treated literally and `new RegExp(Regxp.escape("a|b"))` will produce a string that matches `"a|b"` instead of the `|` being treated as the alternative operator. |


### "Safe with extra escape set" Proposal.
Expand All @@ -31,21 +31,22 @@ This proposal additionally escapes `-` for context sensitive inside-character-cl

|Character | Why escape it?
|-----------|--------------|
| `-` | This construct is needed to allow escaping inside character classes. `new RegExp("-")` is perfectly valid but we want to allow `new RegExp("[a"+RegExp.escape("-")+"b]")` in which the `-` needs to be taken literally (and not as a character range character. |
| `-` | This construct is needed to allow escaping inside character classes. `new RegExp("-")` is perfectly valid but we want to allow `new RegExp("[a"+RegExp.escape("-")+"b]")` in which the `-` needs to be taken literally (and not as a character range character). |

And __only at the start__ of strings:

|Character | Why escape it?
|-----------|--------------|
| `0-9` | So that in `new RegExp("(foo)\\1" + RegExp.escape(1))` the back reference will still treat the first group and not the 11th and the `1` will be taken literally - see [this issue](https://github.com/benjamingr/RegExp.escape/issues/17) for more details. |
| `0-9a-fA-F` | So that `new RegExp("\\u41" + RegExp.escape("B"))` will not match the letter "Л" (`\u041B`) but rather the sequence "AB", or more generally that a leading hexadecimal character may not continue a preceding escape sequence - see [this issue](https://github.com/benjamingr/RegExp.escape/issues/29) for more details. |

Note that if we ever introduce named capturing groups to a subclass of the default `RegExp` those would also need to escape those characters. On escaping hex digits see https://github.com/benjamingr/RegExp.escape/issues/29 .
Note that if we ever introduce named capturing groups to a subclass of the default `RegExp` those would also need to escape those characters.

### Extended "Safe" Proposal

This proposal escapes a maximal set of characters and ensures compatibility with edge cases like passing the result to `eval`.

|Character | Why escape it?
|-----------|--------------|
| `/` | So that `eval("/"+RegExp.espace("/")+"/")` will produce as a valid regular expression. More generally so that regular expressions will be passable to `eval` if sent from elsewhere with `/`. Note that [data](https://github.com/benjamingr/RegExp.escape/tree/master/data) indicates this is not a common use case. |
| [`WhiteSpace`](http://www.ecma-international.org/ecma-262/6.0/index.html#table-32) | So that `eval("/"+RegExp.espace("/")+"/")` will produce as a valid regular expression. More generally so that regular expressions will be passable to `eval` if sent from elsewhere with `/`. Note that [data](https://github.com/benjamingr/RegExp.escape/tree/master/data) indicates this is not a common use case. |
| `/` | So that `eval("/"+RegExp.espace("/")+"/")` will produce a valid regular expression. More generally so that regular expressions will be passable to `eval` if sent from elsewhere with `/`. Note that [data](https://github.com/benjamingr/RegExp.escape/tree/master/data) indicates this is not a common use case. |
| [`WhiteSpace`](http://www.ecma-international.org/ecma-262/6.0/index.html#table-32) | So that `eval("/"+RegExp.espace("\r\n")+"/")` will produce a valid regular expression. More generally so that regular expressions will be passable to `eval` if sent from elsewhere with `/`. Also improves readability of the `escape` output. See [this issue](https://github.com/benjamingr/RegExp.escape/issues/30) for more details. Note that [data](https://github.com/benjamingr/RegExp.escape/tree/master/data) indicates this is not a common use case. |