From 387ed90b8136e71686d145a841273ab4d3b7b2f2 Mon Sep 17 00:00:00 2001 From: Simon Sapin Date: Wed, 5 Nov 2014 16:34:10 -0800 Subject: [PATCH 1/2] RFC: ES6-style unicode string escaping. --- text/0000-es6-unicode-escapes.md | 62 ++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 text/0000-es6-unicode-escapes.md diff --git a/text/0000-es6-unicode-escapes.md b/text/0000-es6-unicode-escapes.md new file mode 100644 index 00000000000..d429cd47803 --- /dev/null +++ b/text/0000-es6-unicode-escapes.md @@ -0,0 +1,62 @@ +- Start Date: 2014-11-05 +- RFC PR: +- Rust Issue: + +# Summary + +Remove `\u203D` and `\U0001F4A9` unicode string escapes, and add +[ECMAScript 6-style](https://mathiasbynens.be/notes/javascript-escapes#unicode-code-point) +`\u{1F4A9}` escapes instead. + +# Motivation + +The syntax of `\u` followed by four hexadecimal digits dates from when Unicode +was a 16-bit encoding, and only went up to U+FFFF. +`\U` followed by eight hex digits was added as a band-aid +when Unicode was extended to U+10FFFF, +but neither four nor eight digits particularly make sense now. + +Having two different syntaxes with the same meaning but that apply +to different ranges of values is inconsistent and arbitrary. +This proposal unifies them into a single syntax that has a precedent +in ECMAScript a.k.a. JavaScript. + + +# Detailed design + +In terms of the grammar in [The Rust Reference]( +http://doc.rust-lang.org/reference.html#character-and-string-literals), +replace: + +``` +unicode_escape : 'u' hex_digit 4 + | 'U' hex_digit 8 ; +``` + +with + +``` +unicode_escape : 'u' '{' hex_digit+ 6 '}' +``` + +That is, `\u{` followed by one to six hexadecimal digits, followed by `}`. + +The behavior would otherwise be identical. + + +# Drawbacks + +This is a breaking change and updating code for it manually is annoying. +It is however very mechanical, and we could provide scripts to automate it. + + +# Alternatives + +* Status quo: don’t change the escaping syntax. +* Add the new `\u{…}` syntax, but also keep the existing `\u` and `\U` syntax. + This is what ES 6 does, but only to keep compatibility with ES 5. + We don’t have that constaint pre-1.0. + +# Unresolved questions + +None so far. From b310c87c62347fdc2097fe4b867f771596e0c7ac Mon Sep 17 00:00:00 2001 From: Simon Sapin Date: Fri, 7 Nov 2014 20:01:15 +0000 Subject: [PATCH 2/2] ES6-style escaping: add drawback: overriding {} with format! --- text/0000-es6-unicode-escapes.md | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/text/0000-es6-unicode-escapes.md b/text/0000-es6-unicode-escapes.md index d429cd47803..d5a61221c5f 100644 --- a/text/0000-es6-unicode-escapes.md +++ b/text/0000-es6-unicode-escapes.md @@ -46,8 +46,15 @@ The behavior would otherwise be identical. # Drawbacks -This is a breaking change and updating code for it manually is annoying. -It is however very mechanical, and we could provide scripts to automate it. +* This is a breaking change and updating code for it manually is annoying. + It is however very mechanical, and we could provide scripts to automate it. +* Formatting templates already use curly braces. + Having multiple curly braces pairs in the same strings that have a very + different meaning can be surprising: + `format!("\u{e8}_{e8}", e8 = "é")` would be `"è_é"`. + However, there is a precedent of overriding characters: + `\` can start an escape sequence both in the Rust lexer for strings + and in regular expressions. # Alternatives