Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert EMPTY_LINE_AFTER_OUTER_ATTR and EMPTY_LINE_AFTER_OUTER_ATTR lint into early lints #13658

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

GuillaumeGomez
Copy link
Member

This is needed to help the compiler attributes API rewrite.

cc @jdonszelmann @xFrednet

changelog: Convert EMPTY_LINE_AFTER_OUTER_ATTR and EMPTY_LINE_AFTER_OUTER_ATTR lint into early lints

@rustbot
Copy link
Collaborator

rustbot commented Nov 5, 2024

r? @y21

rustbot has assigned @y21.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties label Nov 5, 2024
@jdonszelmann
Copy link
Contributor

yayyy

@@ -690,7 +611,7 @@ fn check_attrs(cx: &LateContext<'_>, valid_idents: &FxHashSet<String>, attrs: &[
Some(("fake".into(), "fake".into()))
}

if suspicious_doc_comments::check(cx, attrs) || empty_line_after::check(cx, attrs) || is_doc_hidden(attrs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It said this check was to avoid spurious warnings somewhere, but we don't do it anymore. Is that a problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran cargo uibless so I think it's ok if nothing new got picked up. To be confirmed by someone more knowledgeable though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's intended for e.g. #12917

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should we do here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may cause some confusion but if attrs are being reworked it's something we could live without unless we want to add another early -> late storage hack

impl Stop {
fn convert_to_inner(&self) -> (Span, String) {
let inner = match self.kind {
// #|[...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, but this should be #! not #| I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah indeed!

@@ -5,7 +5,7 @@ LL | / /// for the crate
LL | |
| |_
LL | fn first_in_crate() {}
| ------------------- the comment documents this function
| ---------------------- the comment documents this function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this now points to the entire function, while previously it pointed to the signature of the function. That'd mean that if the function is longer than 0 lines it'd also highlight the block. Do we want that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I supposed it was fine but I can try to see if it can be shortened somehow.

Copy link
Contributor

@jdonszelmann jdonszelmann Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustc_ast::FnSig has a span

image

@@ -71,8 +74,11 @@ error: empty line after outer attribute
LL | / #[repr(C)]
LL | |
| |_
LL | struct Foo {
| ---------- the attribute applies to this struct
LL | / struct Foo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here I guess, you can see it better here

@GuillaumeGomez GuillaumeGomez force-pushed the migrate-empty_after branch 2 times, most recently from 7089e72 to 1f9ccc2 Compare November 5, 2024 21:02
@jdonszelmann
Copy link
Contributor

much better

@Alexendoo
Copy link
Member

Is there a link to why this is needed?

@jdonszelmann
Copy link
Contributor

So as part of rust-lang/compiler-team#796, I bumped into a few issues with clippy where some lints are late, but use a lot of information that mostly relates to the AST, and have very little reason to interact with the hir. After lowering, clippy is the only part of the compiler using information such as the AttrId, AttrStyle and Span on attributes.

By moving these lints to be early:

  1. attributes, after lowering, can be only 8 bytes instead of their current 48
  2. the refactor to pre-parsed attributes becomes easier
  3. moving one of the lints to early (not this PR) we actually solved a bug with attributes being emitted twice

Hope that clears some things up :)

@jdonszelmann
Copy link
Contributor

for 2), that might mean that at some point in the future I will revisit these lints and make them use the new attribute parsing logic. By making them early, they can enjoy the old parsing logic for a little longer while the new parsing logic can be implemented without having to fix clippy at the same time as introducing a large refactoring in the compiler.

@flip1995
Copy link
Member

flip1995 commented Nov 6, 2024

Lintcheck diff is quite big and some of them look like FPs. Now I'm even more concerned that rust-lang/rust#132598 might've introduced FPs... We will see tomorrow, once I can do the sync.

@GuillaumeGomez
Copy link
Member Author

A lot of them are because of message changes, like:

 8 |   macro_rules! test_for_each_provider {
-  |   ----------------------------------- the comment documents this macro
+  |   ----------------------------------- the comment documents this macro definition

or of span changes like:

 51 |   struct FlatMapConsumer<'f, C, F> {
-   |   -------------------------------- the comment documents this struct
+   |   ---------------------- the comment documents this struct

For changes like:

Added clippy::doc_lazy_continuation at [url-2.5.2/src/lib.rs:379](https://docs.rs/url/2.5.2/src/url/lib.rs.html#379)

```
warning: doc list item without indentation
   --> target/lintcheck/sources/url-2.5.2/src/lib.rs:379:9
    |
379 |     /// Without it, the last path component is considered to be a “file” name
    |         ^
    |
    = help: if this is supposed to be its own paragraph, add a blank line
    = note: `--force-warn clippy::doc-lazy-continuation` implied by `--force-warn clippy::all`
help: indent this line
    |
379 |     ///   Without it, the last path component is considered to be a “file” name
    |         ++
```

are because of what was discussed here. In short: since it's not part of the doc lints anymore (one of the 3 that was triggering an early return), some more lints are emitted. It's not incorrect but will generate more warnings.

@flip1995
Copy link
Member

flip1995 commented Nov 6, 2024

Just went through all of them: There's actually only 1 FP and I'm not sure if it is related to this PR:

warning: backticks are unbalanced
   --> target/lintcheck/sources/chrono-0.4.38/src/month.rs:191:9
    |
191 |       /// `Month::from_i64(n: i64)`: | `1`                  | `2`                   | ... | `12`
    |  _________^
192 | |     /// ---------------------------| -------------------- | --------------------- | ... | -----
193 | |     /// ``:                        | Some(Month::January) | Some(Month::February) | ... | Some(Month::December)
    | |_______________________________________________________________________________________________________________^
    |
    = help: a backtick may be missing a pair

In chrono-0.4.38/src/month.rs:191. Might be because of the double backticks in the 3rd line. But the span is then wrong.

@GuillaumeGomez
Copy link
Member Author

That seems like a bug into another lint that might have been uncovered by this PR.

@bors
Copy link
Contributor

bors commented Nov 21, 2024

☔ The latest upstream changes (presumably 8298da7) made this pull request unmergeable. Please resolve the merge conflicts.

@jdonszelmann
Copy link
Contributor

hey all, rust-lang/rust#135726 currently can't be merged because of this. The last status seems to be that it's not this PR that introduces a bug. But, someone probably needs to take a decision whether the small regression is worth it or something needs to be done about it right now. (cc @flip1995, you might be the person or know who to defer such decisions to). However, I guess this is my gentle reminder to see if we can resolve and merge this PR :)

I don't mind doing the rebase, though I'm not sure that I can since this branch is by @GuillaumeGomez so I think I'll be unable to push to it.

@GuillaumeGomez
Copy link
Member Author

Didn't know it was blocking your PR. :o

Rebasing it then.

@jdonszelmann
Copy link
Contributor

jdonszelmann commented Jan 24, 2025

Well, you couldn't lol. I didn't know either. When we talked about this in november it seems I anticipated that some day it'd block the PR I made last week and that it should be fixed early hahaha, just seems we never got to it afterwards

xkcd future self, 1421

@GuillaumeGomez
Copy link
Member Author

Fixed merge conflicts.

@jdonszelmann
Copy link
Contributor

May I ask when the next merge to rust-lang/rust is? no need to rush it for me, the other PR might be open for a little longer, just curious :3

@GuillaumeGomez
Copy link
Member Author

Better ask the question on zulip. ;)

Copy link
Member

@y21 y21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, I feel like this lint actually made some sense as a late lint seeing how it needs to do things like access the parent node which Just Works on the HIR, but now needs to be manually implemented with a stack of item info, but I guess that's fine

self.items.pop();
}

fn check_item(&mut self, cx: &EarlyContext<'_>, item: &Item) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably also implement check_impl_item and check_trait_item -- they are no longer linted with this change looking at the lintcheck output

self.items.push(ItemInfo {
kind: item.kind.descr(),
name: item.ident.name,
span: if item.span.contains(item.ident.span) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this false? Is this for dummy spans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I"m trying to figure out but the original impl didn't do this. @GuillaumeGomez ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for proc-macro spans.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

@jdonszelmann
Copy link
Contributor

@y21 there's a complicated trade-off here. You probably know that, but I guess I'll just document it anyway for future readers.


On the one hand, we want to simplify the way attributes are represented, just like we want to simplify how all elements of a programming language are represented throughout a compiler. for loops become while loops, async becomes state machines, conditionals become jumps and basic blocks, etc. All complexity is reduced until the complexity matches that of what hardware can execute. As the complexity reduces, it becomes simpler to extract information from it in a way in which we can guarantee, to some level, is correct, by being able to reason about it. At the level of the AST, we can't yet reason about types and flows of lifetimes, so we first reduce the complexity. We do what we can or have to at the AST level (name resolution, expansion) until we can safely discard some details and reduce the complexity. This isn't some magical insight of mine or something, I just like this way of thinking about it.

Attributes are interesting, because their complexity is (was) never reduced. On all levels of the compiler, we reason about them as if they're at the syntactical level. While, after expansion, they aren't really used as syntax anymore. They're used as a small set of annotations to guide processes such as code generation. At this point, we don't really care about how the user wrote these annotations, we just care about the list of annotations themselves. This item has C representation, should be used, is stable since 1.85, etc. Those properties are true whether you annotated the item from the inside with #![repr(C)] or from the outside with #[repr(C)]. Similarly, we don't really care anymore whether you said #[repr(C)] #[repr(packed)] struct Foo; or #[repr(C, packed)] struct Foo; because they mean the same. So, there is complexity we can reduce here to make later passes easier. I think this is in most cases a desirable change.

From that point of view, lints that tell users that the syntax of their attributes is wrong, is supposed to happen early. In a stage of the compiler where we still care about syntaxes of things. While lints that care about the structure of attributes,
let's say a hypothetical lint that says repr(C) and repr(Rust) are incompatible, is fine to go at the hir level of abstraction


On the other hand, and that's the downside we see here of course, we sometimes need information spanning these abstraction levels. I wanted to say that this is especially true for clippy. For example, we sometimes need information about the syntax of an attribute, and the type of an item.

However, the compiler has this situation very often too. if the compiler itself needed this kind of spanning information, it'd make some kind of table. Name resolution is an example of this. At the AST level we already analyse code for names, and store that in a bunch of maps. Then, later parts of the compiler can query what name refers to what definition. Type checking is a similar operation, and so are spans to later be used in diagnostics.

Clippy here has the problem that it might need information in a table like this, to span abstraction levels, that the compiler does not. So far, where this happened it lead to awkwardness. Here it's needing parent nodes (information that is only analysed after HIR creation) together with the purely syntactical representation of attributes. Another place you see this in is around format_args!() leading to what we now call "early -> late storage hacks".

I'm not necessarily advocating for those hacks, but in some ways, this is kind of how rustc works, and how compilers work all the time. It's just that clippy can usually benefit from the storage and analyses of the compiler, and only rarely needs to do these kinds of hacks for just itself. The few times that it does we call it a hack because it's so exceptional.


It seems that representing attributes at the AST-complexity level throughout the entire compiler – while lowering the complexity of all other constructs (to HIR, and then MIR, and further) – isn't maintainable. It's unfortunate that this, at least at the moment, gives clippy slightly less information from the compiler side. Still, I'm somewhat convinced that this is the right direction anyway. Clippy lints dealing with the syntactical representation of attributes should be early, since that's where we deal with syntactical constructs. But, I'm not maintaining clippy, so that might be easy for me to say.

I guess this long comment is the reasoning I have behind these changes, and how I like looking at it from, admittedly, a little idealized point of view of what compilers should look like. Maybe that reasoning is good to have documented somewhere, so I guess for now that's here. However, neither rustc nor clippy are even close to ideal, I think we can all mostly agree on that. So, I realise that if the overhead is exceedingly large everywhere except in the compiler, it's well possible I could be wrong. I'm happy to hear either alternative opinions or maybe ways to go forward with this and keep the situation better in clippy as well.

<3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants