- Feature Name: raw_keywords
- Start Date: 2021-03-05
- RFC PR: rust-lang/rfcs#0000
- Rust Issue: rust-lang/rust#0000
Reserve k#keyword
in edition 2021 and beyond as a general syntax for adding keywords mid-edition instead of needing speculative reservations.
There were a few attempts to reserve keywords for the the 2018 edition. Some of those proved controversial, and the language team eventually decided not to accept any reservations for not-yet-approved features:
[...] felt particularly strongly that up-front reservations are wrong and a mistake in the initial Edition proposal, basically for the reasons I've already outlined in the thread: they force up-front decisions about surface issues of features that are not yet fully proposed, let alone accepted or implemented. That just seems totally backwards and is going to keep leading to unworkable discussions. We both feel that the role of Editions here is that they can absorb any keyword-flags that have accumulated in the meantime.
In all, there is certainly no consensus to merge this RFC as-is, and I think there are no objections to instead closing it, under the assumption that we'll add a keyword-flag mechanism (or something like it) as needed later.
This RFC is thus a proposal to add that general mechanism.
The other thing that was learned with the 2018 edition is that the period between editions is long enough that the normal "stability without stagnation" principle of "it can just wait for the next train" doesn't work. Instead, it encouraged rushing to try to get things in on time, which had negative quality of life consequences for many contributors. As such, it's important that an alternative mechanism be made available so that missing an edition train doesn't mean having to wait another 3 years -- even if that alternative has syntax that's slightly less nice until the next train.
As an additional bonus, this gives a space in which experimental syntax can be implemented on nightly without risking breakage. In the past, this was sometimes done in conjunction with other keywords, for example do catch { ... }
instead of just catch { ... }
to avoid the grammar conflict with a struct initializer. With this RFC, it could instead have been implemented as k#catch { ... }
directly without worry.
Pretend the year is 2023 and Rust has just stabilized trust_me { ... }
blocks as a clearer syntax for unsafe { ... }
blocks. The blog post in which they stabilize might say something like this.
This release stabilizes "trust me" blocks! Newcomers to rust are often confused by the difference between unsafe
functions and unsafe
blocks, as they do very different things. So these do a better job of emphasizing that these blocks are the place in which you can call unsafe code.
Because of Rust's commitment to its stability guarantees, these are available to edition 2021 code using the syntax k#trust_me { ... do unsafe things here ... }
to avoid breaking hypothetical code using trust_me
as a function/type/etc name. In another year when the next edition comes out on its usual train, trust_me
will be a reserved keyword in it and the edition migration will remove the k#
for you. But for now you'll need to keep it.
(This RFC is, of course, not actually proposing "trust me" blocks.)
k#keyword
is never valid rust code on its own, so this is only relevant inside calls to macros, where it will affect tokenization.
For example, consider this code in the 2018 edition:
macro_rules! demo {
( $x:tt ) => { "one" };
( $a:tt $b:tt $c:tt ) => { "three" };
}
fn main() {
dbg!(demo!(k#keyword));
dbg!(demo!(r#keyword));
dbg!(demo!(k#struct));
dbg!(demo!(r#struct));
dbg!(demo!(k #struct));
dbg!(demo!(r #struct));
}
It produces the following output:
[src/main.rs:7] demo!(k # keyword) = "three"
[src/main.rs:8] demo!(r#keyword) = "one"
[src/main.rs:9] demo!(k # struct) = "three"
[src/main.rs:10] demo!(r#struct) = "one"
[src/main.rs:11] demo!(k # struct) = "three"
[src/main.rs:12] demo!(r # struct) = "three"
In the 2021 edition and beyond it will instead be
[src/main.rs:7] demo!(k#keyword) = "one"
[src/main.rs:8] demo!(r#keyword) = "one"
[src/main.rs:9] demo!(k#struct) = "one"
[src/main.rs:10] demo!(r#struct) = "one"
[src/main.rs:11] demo!(k # struct) = "three"
[src/main.rs:12] demo!(r # struct) = "three"
So it will only affect you if you're making calls with all three of those tokens directly adjacent. The edition pre-migration fix will update such calls to add spaces around the #
such that the called macro will continue to see three tokens.
For a feature using a new keyword foo
, follow these steps:
- Implement it in nightly as
k#foo
, ensuring that all uses ofk#foo
are feature-gated in the parsing code. - Test and debug the feature as you would any other feature.
- Pause here until ready to stabilize.
- Add an edition pre-migration fix to replace all uses of
foo
withr#foo
. - Make it parse as both
foo
andk#foo
in edition vNext. - Add an edition post-migration fix to replace all uses of
k#foo
withfoo
. - Be sure to reference the test for those steps in the stabilization report for FCP.
A new tokenizer rule is introduced:
RAW_KEYWORD :
k#
IDENTIFIER_OR_KEYWORD
Unlike RAW_IDENTIFIER, this doesn't need the crate
/self
/super
/Self
exclusions, as those are all keywords anyway.
Analogously to raw identifiers, raw keywords are always interpreted as keywords and never as plain identifiers, regardless of context. They are also treated equivalent to a keyword that wasn't raw.
For contextual keywords, that mean that a raw keyword is only accepted where it's being used as a keyword, not as an identifier. For example, k#union Foo { x: i32, y: u32 }
is valid, but fn k#union() {}
is not.
In a rust version where k#pineapple
is not a known keyword, it causes a tokenization error. (Like using r#$pineapple
does today, and like how r#pineapple
did before raw identifiers were a thing.)
The pre-migration fix will look for the tokens "k
#
ident" in a macro call without whitespace between either pair, and will add a single space on either side of the #
.
A new tokenizer rule is introduced:
RAW_KEYWORD :
r#$
IDENTIFIER_OR_KEYWORD
This is supported for use in 2015 and 2018, as well as in 2021 for edition migration purposes. In 2024 and beyond, this will no longer be supported.
However, it's strongly recommended that everyone migrate to a current edition rather than use r#$
. For example, code wanting to use async.await
should just move to the 2018 edition, not use .r#$await
.
Semantically, it will do the same as the equivalent k#
, just with different syntax.
There is a warn-by-default lint against using r#$pineapple
in 2021, which will be included as a post-migration --fix
lint, so that code using foo.r$#await
in 2018 will be changed to using foo.k#await
in 2021.
- This adds more ways of writing the same thing.
- This makes macro token rules even more complicated than they already were.
- This only works for keywords that will match the existing IDENTIFIER_OR_KEYWORD category.
- This is more complicated than just telling people to wait for the next edition.
- This cannot be done in the 2015 and 2018 editions, with the proposed regex.
There are a few fundamental differences between raw keywords and raw identifiers:
-
It was important that old editions support raw identifiers, but old editions do not need to support raw keywords.
Raw identifiers in 2015 were needed so that pre-migration fixes could be applied to renameasync
->r#async
separately from updating the edition number. There's no 2015 nor 2018 edition code that needs raw keywords, however. Editions are meant to be adopted, so it's fine to expect actively-developed code that wants to write (necessarily) new code using new features to move to a new edition in order to do so. -
Raw identifiers can be forced on you by another crate, but raw keywords are up to you.
If a crate you're using has a method namedr#crate
, then you're stuck using a raw identifier to call it (unless you fork the crate). But nothing going on in an external crate can force you to use a feature that needs a raw keyword. If you want to only use things once they're available in the new edition as full keywords, you can do that. -
We hope that code won't need raw identifiers, but expect people will use raw keywords.
Part of the decision process for a new keyword involves looking at the impact it would have. That's not to say it's a controlling factor -- we don't need to pick a suboptimal keyword just to avoid breakage -- but the goal is that is that it not create a pervasive issue. Whereas accepting a new feature implies that it's useful enough that many people will likely wish to use it immediately, despite the extra lexical wart.
In concert, these push for a particular tradeoff:
It's better for raw keywords to be nice on 2021 than for them to be consistent with 2015
Arguably they never should be used in 2015 (or even in 2018, since there are no features planned to use this before 2021 stabilizes), as it's always better to move to the newest-available edition before adopting new features, but they're available with a worse syntax there for completeness.
This is patterned on RFC #2151, raw_identifiers
.
Some scripting languages take the opposite approach and essentially reserve all unprefixed identifiers as keywords, requiring a sigil (such as $foo
) to have it be interpreted as an identifier in an expression. This is clearly infeasible for rust, due to the extraordinary churn it would require.
C reserves all identifiers starting with an underscore, and uses that along with #define
to add features. For example, it added _Bool
, and made that available as bool
only when #include <stdbool.h>
is specified. Rust doesn't need this for types (as i32
and friends are not keywords), but could add new syntax constructs as macros.
C# releases new versions irregularly, major versions of which may include source-breaking changes such as new keywords. Rust could decide to just roll editions more often instead of introducing features in the middle of them.
C# also leverages contextual keywords heavily. For example, await
is only a keyword inside functions using the async
contextual keyword, so they could be introduced as non-breaking. This kind of contextual behaviour is more awkward for rust, which needs to be able to parse an expr
to pass it to a macro.
Python uses future statements to allow use of the new features on a per-module basis before those feature become standard. Rust's #![feature(foo)]
on nightly is similar here.
Haskell has the LANGUAGE
pragma, which ghc
also supports as command line parameters. This is again similar to Rust's #![feature(foo)]
on nightly.
None
- Since an edition fix that can do it is required anyway, it may be good to have a lint on by default that suggests removing superfluous
k#
s.