Stabilize proc_macro Span::source_text #101991

jam1garner · 2022-09-18T17:58:24Z

In the interest of improving error/diagnostic quality for macros I'd like to try and get some of the proc_macro_span feature stabilized.

While some of these APIs don't seem to have consensus on their effect on IDE/language server support, the source_text API seems both well-tested and to have consensus that its result only being invalidated by edits to the contents of the macro makes it a non-issue for IDE support (as noted by niko in #54725).

API

impl Span {
    pub fn source_text(&self) -> Option<String>;
}

@rustbot label +T-libs-api +A-macros +A-proc-macros

The text was updated successfully, but these errors were encountered:

joshtriplett · 2022-09-19T20:43:50Z

Nominating to consider for partial stabilization.

I remember we had a lot of conversation about some of the proc_macro APIs in the past. I don't remember this one in particular being of concern.

Does anyone on @rust-lang/libs-api know of any blockers for stabilizing this?

jhpratt · 2022-09-20T00:05:54Z

Most of the discussion was related to either user-defined diagnostics or the line/column information being exposed. I see no reason this can't be stabilized.

joshtriplett · 2022-09-20T13:35:58Z

@rfcbot merge

rfcbot · 2022-09-20T13:36:01Z

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

~~a question of line-endings~~ resolved by Stabilize proc_macro Span::source_text #101991 (comment)
~~span which escapes the macro~~ resolved by Stabilize proc_macro Span::source_text #101991 (comment)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

BurntSushi · 2022-09-23T12:13:04Z

While some of these APIs don't seem to have consensus on their effect on IDE/language server support, the source_text API seems both well-tested and to have consensus that its result only being invalidated by edits to the contents of the macro makes it a non-issue for IDE support (as noted by niko in #54725).

@matklad Do you have thoughts here?

BurntSushi · 2022-09-23T12:15:04Z

It would be nice to have an example in the docs for this API, if possible.

One other thing that pops out at me is the return type of Option<String>. My guess is that perf isn't generally a huge concern in this context, and therefore it's okay. But I don't know that for sure. Otherwise, this return type is going to force a clone/allocation on every call. I don't know enough about proc macro internals to judge whether that's reasonable or not.

matklad · 2022-09-23T12:20:20Z

Yeah, I think getting the text of what's passed into a macro shouldn't be problematic. What could problematic is the macro somehow getting a span which "escapes" the macro (so, eg, the span for the whole file containing the macro) and querying the source of that, but I assume that's not possible.

There's also a question of line-endings. Does the returned text preserve original line-endings, or is it allowed to normalize to \n? I'd prefer to not expose original line endings.

rfcbot · 2022-09-24T19:33:13Z

🔔 This is now entering its final comment period, as per the review above. 🔔

dtolnay · 2022-09-24T21:42:32Z

There's also a question of line-endings. Does the returned text preserve original line-endings, or is it allowed to normalize to \n? I'd prefer to not expose original line endings.

@rfcbot concern a question of line-endings

dtolnay · 2022-09-24T21:45:24Z

What could problematic is the macro somehow getting a span which "escapes" the macro (so, eg, the span for the whole file containing the macro) and querying the source of that, but I assume that's not possible.

Not the whole file, but it can certainly escape a single macro call, and can include totally unrelated items. It sounds similarly problematic as getting a span for the whole file.

# Cargo.toml

[package]
name = "repro"
version = "0.0.0"
edition = "2021"

[lib]
proc-macro = true

// src/lib.rs

#![feature(proc_macro_span)]

use proc_macro::TokenStream;

#[proc_macro]
pub fn source_text(input: TokenStream) -> TokenStream {
    let mut iter = input.into_iter();
    let begin = iter.next().unwrap().span();
    let end = iter.next().unwrap().span().resolved_at(begin);
    let joined = begin.join(end).unwrap();
    let source_text = joined.source_text().unwrap();
    format!("{source_text:?}").parse().unwrap()
}

// src/main.rs

macro_rules! source_text {
    ($end:ident) => {
        repro::source_text!(start $end)
    };
}

// huh
fn sick_fn_bro() {}

fn main() {
    println!("{}", source_text!(end));
}

$ cargo run
start $end)
    };
}

// huh
fn sick_fn_bro() {}

fn main() {
    println!("{}", source_text!(end

@rfcbot concern span which escapes the macro

Is join supposed to return None here? Is source_text supposed to return None?

jam1garner · 2022-09-25T23:18:16Z

I would personally expect that to fail at the source_text call (good find), unless we want to make more stringent guarantees about external spans being impossible to construct. (Personally my expectations of Span::join are that failures are about spans impossible to represent (from incompatible sources), not ones we choose not to let the user do things with)

est31 · 2022-09-26T13:50:00Z

I wonder whether one should do indoc style removals of indentation before returning the source text, or whether to not do that. Ideally, adding indentation to a macro invocation / struct declaration should not issue a re-run of the macro. Edit: I'm wondering about this for two reasons, first because it looks weird, and second for the use case of extending incremental compilation to parsing.

indoc just iterates all lines of the input and takes the minimum of all indentations. This is not precisely what I had in mind, it was more something that integrates the start of the macro invocation. So if you have:

fn main() {
    let s = python!(
        print("hi")
    );
    println!("{}", s);
}

For a macro:

#[proc_macro]
pub fn python(_input: TokenStream) -> TokenStream {
    let source_text = proc_macro::Span::call_site().source_text().unwrap();
    format!("{source_text:?}").parse().unwrap()
}

Then it currently outputs:

python!(
        print("hi")
    )

Which isn't that nice. It's not just not what shows up in the code, as the ) is aligned with the python (my first concern) but also if you increase the level of indentation by one you are getting:

python!(
            print("hi")
        )

I wonder if there is some way of processing that we could arrive at something like:

python!(
    print("hi")
)

est31 · 2022-09-26T14:05:43Z

(FTR, @dtolnay 's example above is using join which is unstable, but there are hacks to do join with stable code as well)

jam1garner · 2022-09-26T15:05:22Z

I think modifying whitespace outside of newline normalization is a bad idea. I'm not sure what usecase the API would be useful for if that were the case. Even for diagnostics/errors, changing the code formatting is detrimental?

Or well, I guess for indoc-style it is consistent across the contents of the macro so it's maybe fine? I would be worried about how it'd effect mapping spans to the contents of source_text (such as pointing to specific tokens in an error)

joshtriplett · 2022-10-04T15:31:39Z

I don't think we should do any modification to the source text, even newline normalization. I think we should just return the source text verbatim. If a user wants to normalize it, they can do so (or use any number of crates to do so); however, a user can't un-normalize it if we've normalized it.

matklad · 2022-10-04T15:39:21Z

To clarify why normalizing might be a good idea:

What line endings are used is already somewhat ill-defined. The source code of the same git project checked out on Linux and on Windows would typically have different line-endings.

If we don't normalize line-endings and expose them to proc-macros, that opens another avenue why, eg, cargo build --target wasm32-unknown-unknown might give different results depending on the host OS the project is compiled on.

I don't think that in practice there will be a lot of these sorts of problems, but if we are committed to reproducible builds in principle, it seems worthwhile to not add fundamental non-determinism, if we can help it.

Amanieu · 2022-10-04T15:46:00Z

I think that differing newlines will already cause issues with reproducible builds since all of the spans will have different offsets, the debuginfo will be different, etc. I think it's reasonable to require configuring git to preserve line endings if you want reproducible builds.

matklad · 2022-10-04T15:49:27Z

... and I now recall that we actually do normalize newlines already?

rust/compiler/rustc_span/src/lib.rs

Lines 1763 to 1844 in e1d7dec

    
           /// Normalizes the source code and records the normalizations. 
        
           fn normalize_src(src: &mut String, start_pos: BytePos) -> Vec<NormalizedPos> { 
        
               let mut normalized_pos = vec![]; 
        
               remove_bom(src, &mut normalized_pos); 
        
               normalize_newlines(src, &mut normalized_pos); 
        
               // Offset all the positions by start_pos to match the final file positions. 
        
               for np in &mut normalized_pos { 
        
                   np.pos.0 += start_pos.0; 
        
               } 
        
               normalized_pos 
        
           } 
        
           /// Removes UTF-8 BOM, if any. 
        
           fn remove_bom(src: &mut String, normalized_pos: &mut Vec<NormalizedPos>) { 
        
               if src.starts_with('\u{feff}') { 
        
                   src.drain(..3); 
        
                   normalized_pos.push(NormalizedPos { pos: BytePos(0), diff: 3 }); 
        
               } 
        
           } 
        
           /// Replaces `\r\n` with `\n` in-place in `src`. 
        
           /// 
        
           /// Returns error if there's a lone `\r` in the string. 
        
           fn normalize_newlines(src: &mut String, normalized_pos: &mut Vec<NormalizedPos>) { 
        
               if !src.as_bytes().contains(&b'\r') { 
        
                   return; 
        
               } 
        
               // We replace `\r\n` with `\n` in-place, which doesn't break utf-8 encoding. 
        
               // While we *can* call `as_mut_vec` and do surgery on the live string 
        
               // directly, let's rather steal the contents of `src`. This makes the code 
        
               // safe even if a panic occurs. 
        
               let mut buf = std::mem::replace(src, String::new()).into_bytes(); 
        
               let mut gap_len = 0; 
        
               let mut tail = buf.as_mut_slice(); 
        
               let mut cursor = 0; 
        
               let original_gap = normalized_pos.last().map_or(0, |l| l.diff); 
        
               loop { 
        
                   let idx = match find_crlf(&tail[gap_len..]) { 
        
                       None => tail.len(), 
        
                       Some(idx) => idx + gap_len, 
        
                   }; 
        
                   tail.copy_within(gap_len..idx, 0); 
        
                   tail = &mut tail[idx - gap_len..]; 
        
                   if tail.len() == gap_len { 
        
                       break; 
        
                   } 
        
                   cursor += idx - gap_len; 
        
                   gap_len += 1; 
        
                   normalized_pos.push(NormalizedPos { 
        
                       pos: BytePos::from_usize(cursor + 1), 
        
                       diff: original_gap + gap_len as u32, 
        
                   }); 
        
               } 
        
               // Account for removed `\r`. 
        
               // After `set_len`, `buf` is guaranteed to contain utf-8 again. 
        
               let new_len = buf.len() - gap_len; 
        
               unsafe { 
        
                   buf.set_len(new_len); 
        
                   *src = String::from_utf8_unchecked(buf); 
        
               } 
        
               fn find_crlf(src: &[u8]) -> Option<usize> { 
        
                   let mut search_idx = 0; 
        
                   while let Some(idx) = find_cr(&src[search_idx..]) { 
        
                       if src[search_idx..].get(idx + 1) != Some(&b'\n') { 
        
                           search_idx += idx + 1; 
        
                           continue; 
        
                       } 
        
                       return Some(search_idx + idx); 
        
                   } 
        
                   None 
        
               } 
        
               fn find_cr(src: &[u8]) -> Option<usize> { 
        
                   src.iter().position(|&b| b == b'\r') 
        
               } 
        
           }

dtolnay · 2022-10-04T19:42:58Z

I don't feel that either of these warrants blocking the stabilization of source_text:

@rfcbot resolve a question of line-endings
@rfcbot resolve span which escapes the macro

For line endings, either normalizing or not is okay with me. For spans that cross boundaries of a macro invocation, I'd probably prefer if join would return None in this situation but it's pretty far down the list of things wrong/broken with macros already, so I don't find it concerning for this stabilization.

rfcbot · 2022-10-04T19:43:02Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2022-10-14T19:45:16Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

est31 · 2022-10-14T20:23:51Z

For spans that cross boundaries of a macro invocation, I'd probably prefer if join would return None in this situation but it's pretty far down the list of things wrong/broken with macros already, so I don't find it concerning for this stabilization.

@dtolnay how would you deal with join being already exposed on stable, without an Option like interface? I've pointed that out above in the thread, linking to a comment that links to a join primitive in your syn crate. Given that the API is already stable and doesn't return an Option, it should IMO be source_text that returns a None if a span is unhygienic. Maybe spans could contain a bool param on whether they are hygienic or not, which is set by join as well as join-like primitives. Then source_text checks that bool and makes its descision based on that. This stuff can I think be done after source_text stabilization, too.

m-ou-se · 2022-10-18T14:15:04Z

how would you deal with join being already exposed on stable, without an Option like interface?

That hack doesn't result in a joined Span. That hack creates a TokenStream using both the start and the end span such that the resulting compiler error will join the two spans, but it doesn't give your proc_macro a joined Span.

est31 · 2022-10-18T14:54:45Z

@m-ou-se good point. I've tried building a similar primitive that joins spans in a similar fashion, but it doesn't work:

fn join_spans(sp1: Span, sp2: Span) -> Span {
    let begin = TokenTree::Ident(Ident::new("A", sp1));
    let end = TokenTree::Ident(Ident::new("B", sp2));
    let group = Group::new(Delimiter::None, [begin, end].into_iter().collect());
    group.span()
}

If you try that on the example from dtolnay given above, you get as output:

        repro::source_text!(start $end)

It's I think because the group wholly ignores the spans of its children and instead assumes the span of the macro invocation that created it. I guess this combination trick can only be used to issue compilation errors. So things work differently than I thought they would work. I'm sorry for having brought this up. Case rested :).

est31 · 2022-10-18T15:11:35Z

Stabilization PR: #103197

…text, r=petrochenkov Stabilize proc_macro::Span::source_text Splits `proc_macro::Span::source_text` into a new feature gate and stabilizes it. The [FCP is complete](rust-lang#101991 (comment)). ```Rust impl Span { pub fn source_text(&self) -> Option<String>; } ``` Closes rust-lang#101991

…etrochenkov Stabilize proc_macro::Span::source_text Splits `proc_macro::Span::source_text` into a new feature gate and stabilizes it. The [FCP is complete](rust-lang/rust#101991 (comment)). ```Rust impl Span { pub fn source_text(&self) -> Option<String>; } ``` Closes #101991

rustbot added A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-proc-macros Area: Procedural macros T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 18, 2022

joshtriplett added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Sep 19, 2022

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 20, 2022

rfcbot added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. labels Sep 24, 2022

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Sep 24, 2022

dtolnay mentioned this issue Sep 24, 2022

Expose proc_macro::Span::source_text dtolnay/proc-macro2#349

Closed

est31 mentioned this issue Sep 26, 2022

Tracking issue for proc_macro::Span inspection APIs #54725

Open

rfcbot added the final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. label Oct 4, 2022

rfcbot removed the proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. label Oct 4, 2022

rfcbot added finished-final-comment-period The final comment period is finished for this PR / Issue. and removed final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. labels Oct 14, 2022

rfcbot added the to-announce Announce this issue on triage meeting label Oct 14, 2022

m-ou-se removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Oct 18, 2022

est31 mentioned this issue Oct 18, 2022

Stabilize proc_macro::Span::source_text #103197

Merged

bors closed this as completed in 62bb0c6 Oct 20, 2022

apiraino removed the to-announce Announce this issue on triage meeting label Jan 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize proc_macro Span::source_text #101991

Stabilize proc_macro Span::source_text #101991

jam1garner commented Sep 18, 2022

joshtriplett commented Sep 19, 2022

jhpratt commented Sep 20, 2022

joshtriplett commented Sep 20, 2022

rfcbot commented Sep 20, 2022 •

edited

Loading

BurntSushi commented Sep 23, 2022

BurntSushi commented Sep 23, 2022

matklad commented Sep 23, 2022

rfcbot commented Sep 24, 2022

dtolnay commented Sep 24, 2022

dtolnay commented Sep 24, 2022 •

edited

Loading

jam1garner commented Sep 25, 2022

est31 commented Sep 26, 2022 •

edited

Loading

est31 commented Sep 26, 2022

jam1garner commented Sep 26, 2022 •

edited

Loading

joshtriplett commented Oct 4, 2022

matklad commented Oct 4, 2022

Amanieu commented Oct 4, 2022

matklad commented Oct 4, 2022 •

edited

Loading

dtolnay commented Oct 4, 2022

rfcbot commented Oct 4, 2022

rfcbot commented Oct 14, 2022

est31 commented Oct 14, 2022

m-ou-se commented Oct 18, 2022

est31 commented Oct 18, 2022

est31 commented Oct 18, 2022

Stabilize proc_macro Span::source_text #101991

Stabilize proc_macro Span::source_text #101991

Comments

jam1garner commented Sep 18, 2022

API

joshtriplett commented Sep 19, 2022

jhpratt commented Sep 20, 2022

joshtriplett commented Sep 20, 2022

rfcbot commented Sep 20, 2022 • edited Loading

BurntSushi commented Sep 23, 2022

BurntSushi commented Sep 23, 2022

matklad commented Sep 23, 2022

rfcbot commented Sep 24, 2022

dtolnay commented Sep 24, 2022

dtolnay commented Sep 24, 2022 • edited Loading

jam1garner commented Sep 25, 2022

est31 commented Sep 26, 2022 • edited Loading

est31 commented Sep 26, 2022

jam1garner commented Sep 26, 2022 • edited Loading

joshtriplett commented Oct 4, 2022

matklad commented Oct 4, 2022

Amanieu commented Oct 4, 2022

matklad commented Oct 4, 2022 • edited Loading

dtolnay commented Oct 4, 2022

rfcbot commented Oct 4, 2022

rfcbot commented Oct 14, 2022

est31 commented Oct 14, 2022

m-ou-se commented Oct 18, 2022

est31 commented Oct 18, 2022

est31 commented Oct 18, 2022

rfcbot commented Sep 20, 2022 •

edited

Loading

dtolnay commented Sep 24, 2022 •

edited

Loading

est31 commented Sep 26, 2022 •

edited

Loading

jam1garner commented Sep 26, 2022 •

edited

Loading

matklad commented Oct 4, 2022 •

edited

Loading