-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: transition to Roslyn's model for trivia #6584
Comments
🤨 Presumably the downside is that it might be easier to lose the "trivia" with modifications that replace tokens? Or there might be cases (like comments at the end of a line) where it would be more natural for the trivia to attach to the prior token. |
Yeah, the might be! For example, I expect that sometimes one wants to lose the spacing. For example, when doing "inline variable", the space around expression should be removed. Here's an example of similar wraningling from Roslyn: https://github.com/dotnet/roslyn/blob/4d8cada2584dd2e58b391c32c84f10f29442fdde/src/Analyzers/CSharp/CodeFixes/InlineDeclaration/CSharpInlineDeclarationCodeFixProvider.cs#L165-L174 (aside: if you think that IDEs are like math, where you juggle recursive ADTs with finesse and precision, you are wrong: every time you need to touch a CST, it's just a giant pile of special cases within special cases)
The swift documents goes into a bit more of details how this is represented. Tokens actually have a leading and traling trivial, and a run of whitespace can actually be split into two trivias. Generally, tokens include indentation before and newline after. The trivia between tokens on the same line is split more or less arbitrary (it is attached to the following token, but I think attaching to the previous would work as well). For trivia just at the end of file, there's a clever trick: each file ends with a (zero-width) EOF token, so it's a natural anchor for everyting that's left. |
I really like the shallower trees where the meat of a refactor is easier to get at. I find it a little awkward that trivia can be attached to both ends (I think that might make certain operations more awkward) but maybe that's not a big deal in practice if you stick to a convention like Swift's. Though now that I'm rereading some of the swift docs I suspect a lot of refactoring may either be ignoring trivia or zeroing it out at both ends and reconstructing. |
Would attaching it to tokens mean that rowan has to be changed to be trivia aware? |
If I want to do this, very basically I could image these things need to be done: in rowan
NOT SURE:
Not related, but cool:
in RA
Sorry for asking a lot questions, have a overview would really be helpful. And maybe I fotgot some key point? Thanks! |
@ShuiRuTian sorry, I've missed your comment :( I don't think we are quite ready to do this, as I am still on the fence. I don't have a very detailed plan for how we'd do this if we want, I'd probably just start hacking rowan. |
another case from today: let workspace_build_data = match self.fetch_build_data_queue.last_op_result() {
Some(Ok(it)) => Some(it.clone()),
None => None,
$0
Some(Err(err)) => None,
}; In Roslyn model of trivia / swifts model of punctuation, the match arms completely cover the match arm list, so the cursor is considered to be on a match arm. In our current model, the cursor is considered to be between them, so we don't trigger "merge match arms" assist. |
Is there anything going on with this RFC in |
You can try Rome's fork of |
Lossless syntax trees (rowan), by definition, need to represent whitespace and comments (trivai).
There are two approaches how to represent them in the syntax tree (
1 + 1
in the example):Attach to nodes:
Attach to tokens:
The first approach is what's used by rust-analyzer today by IntelliJ. Here, we attach trivia which sits between two nodes to the parent node.
The second approach is what's used by Roslyn & Swift' libsyntax. It's a bit more hacky -- a trivia is attached to a following non-trivia token. That is, each token conceptually stores a
leading_trivia: Vec<Trivia>
(but of course the encoding is optimized for common cases like single whitespace between tokens). See this doc for a more thorough description.Why go with this strange hack with "fat" tokens? There are several benefits to it:
fixed structure of the syntax tree. With floating tokens, any node can have /any/ /number/ /of/ /children/. If we attached trivial to tokens, we can classify nodes into two buckets:
This in turn gives us O(1) access to a specific child
better programming model. I hypothesize that having trivia attached to nodes makes certain refactors to "just work". Specifically, type-safe modifications are naturally trivia-preserving. For example, if you have two blocks, and you want to append the content of the first block to the second one, you can do roughly:
this works with attached trivia, but, with floating trivia, you'd need to transfer trivia nodes explicitly, or resort to an untyped API. Note that this is a hypothesis: I haven't worked with Roslyn-style API closely, so I don't know how important is it in practice
better performance. This also is hypothetical, but, with token interning, storing trivia inside tokens probably won't increase the overall storage for tokens that much. However, we'd spend 2x less memory on storing pointers to tokens, because roughly half of the tokens are trivia.
I think I lean towards trying the Roslyn trivia model -- it seems like it can be better long term. I wish we can experiment with this in a simple way though :-(
The text was updated successfully, but these errors were encountered: