Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suffixed string and char literals are accepted if they don't go through parser #60494

Closed
petrochenkov opened this issue May 2, 2019 · 8 comments · Fixed by rust-lang/reference#612
Labels
A-frontend Area: Compiler frontend (errors, parsing and HIR) A-grammar Area: The grammar of Rust A-parser Area: The parsing of Rust source code to an AST I-needs-decision Issue: In need of a decision. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@petrochenkov
Copy link
Contributor

Things like "abc"suffix are lexically valid tokens, but they are reported as errors when parsed with the Rust Language Parser.

However, macro inputs are not parsed by the Rust Language Parser, each macro has its own language that it parses by itself.

It means that suffixed string and char literals can be passed to procedural macros (a procedural macro can process them as they want and make them legal and meaningful in its language), or declarative macros (if they are ignored).

Example:

macro_rules! blackhole { ($tt:tt) => () }

fn main() {
    blackhole!("string"suffix); // OK
}

Possible solutions:

  • Alternative 1: Prohibit this during lexing, a string/character literal tokens won't be able to be suffixed, proc macros won't be able to use them in their DSLs.
  • Alternative 2: Do nothing, string/character literal tokens can be suffixed (but not in the Rust Language proper), proc macros will be able to use them in their DSLs.
@petrochenkov petrochenkov added A-frontend Area: Compiler frontend (errors, parsing and HIR) A-grammar Area: The grammar of Rust labels May 2, 2019
@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. labels May 3, 2019
@petrochenkov
Copy link
Contributor Author

This is listed as an unresolved question in the RFC introducing literal suffixes into lexer (rust-lang/rfcs#463).
@nrc argued for the currently implemented behavior in rust-lang/rfcs#463 (comment).

@petrochenkov petrochenkov added I-needs-decision Issue: In need of a decision. T-lang Relevant to the language team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels May 3, 2019
@petrochenkov
Copy link
Contributor Author

@rust-lang/lang needs to make some decision and document it in the reference.

@Centril
Copy link
Contributor

Centril commented May 3, 2019

@petrochenkov What are the advantages to Alt 1. as you see it? Also, do we have some sort of crater data?

@petrochenkov
Copy link
Contributor Author

What are the advantages to Alt 1. as you see it?

This is just a somewhat exceptional case of a token which is lexically valid, but always rejected by parser.
I think ~ is the only other example of such token right now (and also <-, but that's only because const generics like Type<-1> are not properly parsed yet).

After some thought, yeah, there's probably not much advantage.

@Centril
Copy link
Contributor

Centril commented May 5, 2019

Yeah my inclination is that this may be useful for macro authors somehow so I think we can leave this accepted.

@eddyb
Copy link
Member

eddyb commented May 5, 2019

I regret not coming up with the "joint" system for composed tokens much earlier, and doing at least literal suffixes with it. But overall, I'm in favor of as much flexibility as possible for proc macro authors.

The only useful constraint I see, in terms of the token model, is having balanced ()/[]/{}, because they are used in the invocation syntax, and therefore allow unambiguous parsing around the invocation.
String literals and comments must have a fixed grammar specifically because they can contain those characters, and knowing whether it's part of the source or part of a literal/comment is crucial.

(Although I don't know if other @rust-lang/lang members share this view)

@nikomatsakis
Copy link
Contributor

I agree with @eddyb as well =)

@petrochenkov
Copy link
Contributor Author

Looks like there's some consensus to keep the existing behavior.
I made a PR to the reference documenting it - rust-lang/reference#612.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-frontend Area: Compiler frontend (errors, parsing and HIR) A-grammar Area: The grammar of Rust A-parser Area: The parsing of Rust source code to an AST I-needs-decision Issue: In need of a decision. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants