Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: ES6-style unicode string escaping. #446

Merged
merged 2 commits into from
Dec 11, 2014
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 69 additions & 0 deletions text/0000-es6-unicode-escapes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
- Start Date: 2014-11-05
- RFC PR:
- Rust Issue:

# Summary

Remove `\u203D` and `\U0001F4A9` unicode string escapes, and add
[ECMAScript 6-style](https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point)
`\u{1F4A9}` escapes instead.

# Motivation

The syntax of `\u` followed by four hexadecimal digits dates from when Unicode
was a 16-bit encoding, and only went up to U+FFFF.
`\U` followed by eight hex digits was added as a band-aid
when Unicode was extended to U+10FFFF,
but neither four nor eight digits particularly make sense now.

Having two different syntaxes with the same meaning but that apply
to different ranges of values is inconsistent and arbitrary.
This proposal unifies them into a single syntax that has a precedent
in ECMAScript a.k.a. JavaScript.


# Detailed design

In terms of the grammar in [The Rust Reference](
http://doc.rust-lang.org/reference.html#character-and-string-literals),
replace:

```
unicode_escape : 'u' hex_digit 4
| 'U' hex_digit 8 ;
```

with

```
unicode_escape : 'u' '{' hex_digit+ 6 '}'
```

That is, `\u{` followed by one to six hexadecimal digits, followed by `}`.

The behavior would otherwise be identical.


# Drawbacks

* This is a breaking change and updating code for it manually is annoying.
It is however very mechanical, and we could provide scripts to automate it.
* Formatting templates already use curly braces.
Having multiple curly braces pairs in the same strings that have a very
different meaning can be surprising:
`format!("\u{e8}_{e8}", e8 = "é")` would be `"è_é"`.
However, there is a precedent of overriding characters:
`\` can start an escape sequence both in the Rust lexer for strings
and in regular expressions.


# Alternatives

* Status quo: don’t change the escaping syntax.
* Add the new `\u{…}` syntax, but also keep the existing `\u` and `\U` syntax.
This is what ES 6 does, but only to keep compatibility with ES 5.
We don’t have that constaint pre-1.0.

# Unresolved questions

None so far.