Suffixed string and char literals are accepted if they don't go through parser #60494

petrochenkov · 2019-05-02T23:54:12Z

Things like "abc"suffix are lexically valid tokens, but they are reported as errors when parsed with the Rust Language Parser.

However, macro inputs are not parsed by the Rust Language Parser, each macro has its own language that it parses by itself.

It means that suffixed string and char literals can be passed to procedural macros (a procedural macro can process them as they want and make them legal and meaningful in its language), or declarative macros (if they are ignored).

Example:

macro_rules! blackhole { ($tt:tt) => () }

fn main() {
    blackhole!("string"suffix); // OK
}

Possible solutions:

Alternative 1: Prohibit this during lexing, a string/character literal tokens won't be able to be suffixed, proc macros won't be able to use them in their DSLs.
Alternative 2: Do nothing, string/character literal tokens can be suffixed (but not in the Rust Language proper), proc macros will be able to use them in their DSLs.

The text was updated successfully, but these errors were encountered:

petrochenkov · 2019-05-03T13:02:22Z

This is listed as an unresolved question in the RFC introducing literal suffixes into lexer (rust-lang/rfcs#463).
@nrc argued for the currently implemented behavior in rust-lang/rfcs#463 (comment).

petrochenkov · 2019-05-03T13:07:34Z

@rust-lang/lang needs to make some decision and document it in the reference.

Centril · 2019-05-03T13:38:16Z

@petrochenkov What are the advantages to Alt 1. as you see it? Also, do we have some sort of crater data?

petrochenkov · 2019-05-04T20:53:20Z

What are the advantages to Alt 1. as you see it?

This is just a somewhat exceptional case of a token which is lexically valid, but always rejected by parser.
I think ~ is the only other example of such token right now (and also <-, but that's only because const generics like Type<-1> are not properly parsed yet).

After some thought, yeah, there's probably not much advantage.

Centril · 2019-05-05T06:56:53Z

Yeah my inclination is that this may be useful for macro authors somehow so I think we can leave this accepted.

eddyb · 2019-05-05T11:06:04Z

I regret not coming up with the "joint" system for composed tokens much earlier, and doing at least literal suffixes with it. But overall, I'm in favor of as much flexibility as possible for proc macro authors.

The only useful constraint I see, in terms of the token model, is having balanced ()/[]/{}, because they are used in the invocation syntax, and therefore allow unambiguous parsing around the invocation.
String literals and comments must have a fixed grammar specifically because they can contain those characters, and knowing whether it's part of the source or part of a literal/comment is crucial.

(Although I don't know if other @rust-lang/lang members share this view)

nikomatsakis · 2019-05-05T11:52:36Z

I agree with @eddyb as well =)

petrochenkov · 2019-05-31T12:21:29Z

Looks like there's some consensus to keep the existing behavior.
I made a PR to the reference documenting it - rust-lang/reference#612.

petrochenkov added A-frontend Area: Compiler frontend (errors, parsing and HIR) A-grammar Area: The grammar of Rust labels May 2, 2019

petrochenkov mentioned this issue May 3, 2019

introduce unescape module #60261

Merged

5 tasks

jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. labels May 3, 2019

petrochenkov added I-needs-decision Issue: In need of a decision. T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels May 3, 2019

petrochenkov mentioned this issue May 31, 2019

Document that literals with any suffixes are valid as tokens rust-lang/reference#612

Merged

Centril closed this as completed in rust-lang/reference#612 May 31, 2019

ehuss mentioned this issue Jun 28, 2019

Update suffixed literal support rust-lang/wg-grammar#49

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suffixed string and char literals are accepted if they don't go through parser #60494

Suffixed string and char literals are accepted if they don't go through parser #60494

petrochenkov commented May 2, 2019

petrochenkov commented May 3, 2019

petrochenkov commented May 3, 2019

Centril commented May 3, 2019

petrochenkov commented May 4, 2019

Centril commented May 5, 2019

eddyb commented May 5, 2019

nikomatsakis commented May 5, 2019

petrochenkov commented May 31, 2019

Suffixed string and char literals are accepted if they don't go through parser #60494

Suffixed string and char literals are accepted if they don't go through parser #60494

Comments

petrochenkov commented May 2, 2019

petrochenkov commented May 3, 2019

petrochenkov commented May 3, 2019

Centril commented May 3, 2019

petrochenkov commented May 4, 2019

Centril commented May 5, 2019

eddyb commented May 5, 2019

nikomatsakis commented May 5, 2019

petrochenkov commented May 31, 2019