-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change the way trivia extraction works #125
Conversation
I need to look into why the tests didn't fail, I'll try today or tomorrow. |
I'm curious what the motivation for this, and wonder if splitting up the whitespace trivia into multiple pieces might be a way to address the underlying issue. |
The reasoning behind this was because currently with one of my projects (https://github.com/JohnnyMorganz/StyLua), I deal with each statement individually, and in this, new lines are placed at the end of statements. The problem here was that full moon would parse new line as leading trivia for the next token. I discussed this with Kampfkarren, and he was open to the PR. I also believe it makes more logical sense for the new line to trail the end of the line it was related to. I'm not sure if splitting whitespace further would help, as the issue here isn't the parsing of the whitespace tokens itself, but how the trivia gets bound to the closest token. |
Hmm. Looking closer at the definition of the whitespace token, it appears that it is an optional newline, followed by non-newline whitespace (note that this does not match the regex in the comment in |
This is a very good point which I seemed to have missed, thanks! I'll take a look tomorrow |
The second point in #128 might be relevant to you, particularly if you are changing how whitespace trivia works. |
Does that answer your question as to why the tests aren't failing then? |
I'm not entirely sure what you are referring to. If you are talking about the comment above about whitespace parsing, I think that is a different issue (although changing this would also cause tests to fail!) You can see what I mean by "tests should have failed" if you delete the |
Ah, I see. I will look into it today, promise. |
I've been rethinking over this PR, and I wanted to get a second opinion. I've been thinking of the new line character here as a line terminator, in contrast to the original rule. I believe it is a valid interpretation, but is it the interpretation we want to go for in the library? Before I proceed any further, I just wanted to ask you both what you think makes more sense? (I can most likely accommodate my code for either scenario.) |
Personally, any way works for me, as long as:
I think StyLua is a good test case for this--choose whichever one makes your life easier. |
Ah, I know why tests didn't fail--leading/trailing trivia wasn't in the tests to begin with.
This is confusing, but I'll figure it out. |
Okay, it's because the tests are using |
I was looking into this, and there seems to be a flaw in my rules to determine when to stop extracting trivia. Currently, we only stop at comments after newlines, so that the comment can be parsed as leading trivia for the next token. But this means, for something like: do [\n]
[\t] call() [\n]
end the indent trivia (in this case, a tab character preceding
This in turn should also make the rule simpler, and it seems to be well established. EDIT: You may find https://sharplab.io/ useful, if you want to visualise what C# does |
#134 is merged. Update your CHANGELOG and lets see if the CI passes. |
Re: Your last comment, is this still the PR you want to go with? |
Yeah, I researched into this further and found that this is what the Roslyn compiler does. I simplified my logic (see the edit for the original PR message) greatly to follow what they do aswell - no need for "state" anymore. Since it's used in practice by quite a large parser, I think it is an OK change to make |
triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125
triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125
* Upgrade full_moon to 0.13.1 and add default roblox feature triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125 * Fix newline handling
Previously, trivia was bound to tokens using the rules defined for swift-syntax
This PR changes the first rule to follow how the Rosyln Compiler works.
To quote their page:
Examples of change
I will explicitly show the newline character to give a sense of how it was parsed.
Previously, the old rules would parse something like this:
i.e., the newline character is added to the leading trivia of the
local bar = baz
line.Our new rules change this around
The newline character is now added as trailing trivia for
local foo = bar
Similarly, if there is a line break in between the two statements, such as:
The first newline will be parsed as trailing trivia for
local foo = bar
, whilst the second is left as leading trivia forlocal bar = baz
(previously, they were both leading trivia forlocal bar = baz
)I will also show another example with comments.
In this example,
-- comment 1
will still be parsed as trailing trivia tolocal foo = bar
. We then have a newline character. Previously, this was parsed as leading trivia oflocal bar = baz
, but now it is parsed as trailing trivia forlocal foo = bar
.We then see
-- comment 2
. As we have already seen a newline character, we stop taking in any more trailing trivia, so-- comment 2
is extracted as leading trivia forlocal bar = baz
.Old Suggestion
Instead of stopping at the first newline trivia found when parsing trailing trivia, we now continue taking it in, except toggle a state indicating that we have seen a newline. We then use this state if we then reach a comment. If we have already seen a newline, we will not parse in the comment as trailing trivia - it will be left as leading trivia for the next token. If we haven't seen a newline, the comment will be parsed as trailing trivia as normal.This is best explained through some examples. I will explicitly show the newline character to give a sense of how it was parsed.
Previously, the old rules would parse something like this:
i.e., the newline character is added to the leading trivia of the
local bar = baz
line.Our new rules change this around
The newline character is now added as trailing trivia for
local foo = bar
Similarly, if there is a line break in between the two statements, such as:
Both of these newlines will now be parsed as trailing trivia for
local foo = bar
(previously, they were leading trivia forlocal bar = baz
)I will also show another example with comments. This shows the reasoning behind our "state".
In this example,
-- comment 1
will still be parsed as trailing trivia tolocal foo = bar
. We then have a newline character. Previously, this was parsed as leading trivia oflocal bar = baz
, but now it is parsed as trailing trivia forlocal foo = bar
.We then see
-- comment 2
. As we have already seen a newline character, we stop taking in any more trailing trivia, so-- comment 2
is extracted as leading trivia forlocal bar = baz
.This behaviour regarding comments remains the same as before. This PR should not change how comments are extracted - it only changes newlines.
This PR is a breaking change for any users of the library who relied on newline trivia. Comment trivia logic should remain the same.