Change the way trivia extraction works #125

JohnnyMorganz · 2021-02-26T12:53:07Z

Previously, trivia was bound to tokens using the rules defined for swift-syntax

This PR changes the first rule to follow how the Rosyln Compiler works.
To quote their page:

In general, a token owns any trivia after it on the same line up to the next token. Any trivia after that line is associated with the following token. The first token in the source file gets all the initial trivia, and the last sequence of trivia in the file is tacked onto the end-of-file token, which otherwise has zero width.

Examples of change

I will explicitly show the newline character to give a sense of how it was parsed.
Previously, the old rules would parse something like this:

local foo = bar
[\n] local bar = baz

i.e., the newline character is added to the leading trivia of the local bar = baz line.
Our new rules change this around

local foo = bar [\n]
local bar = baz

The newline character is now added as trailing trivia for local foo = bar
Similarly, if there is a line break in between the two statements, such as:

local foo = bar [\n]
[\n]
local bar = baz

The first newline will be parsed as trailing trivia for local foo = bar, whilst the second is left as leading trivia for local bar = baz (previously, they were both leading trivia for local bar = baz)

I will also show another example with comments.

local foo = bar -- comment 1[\n]
-- comment 2[\n]
local bar = baz

In this example, -- comment 1 will still be parsed as trailing trivia to local foo = bar. We then have a newline character. Previously, this was parsed as leading trivia of local bar = baz, but now it is parsed as trailing trivia for local foo = bar.
We then see -- comment 2. As we have already seen a newline character, we stop taking in any more trailing trivia, so -- comment 2 is extracted as leading trivia for local bar = baz.

Old Suggestion

Instead of stopping at the first newline trivia found when parsing trailing trivia, we now continue taking it in, except toggle a state indicating that we have seen a newline. We then use this state if we then reach a comment. If we have already seen a newline, we will not parse in the comment as trailing trivia - it will be left as leading trivia for the next token. If we haven't seen a newline, the comment will be parsed as trailing trivia as normal.

This is best explained through some examples. I will explicitly show the newline character to give a sense of how it was parsed.
Previously, the old rules would parse something like this:

local foo = bar
[\n] local bar = baz

i.e., the newline character is added to the leading trivia of the local bar = baz line.
Our new rules change this around

local foo = bar [\n]
local bar = baz

The newline character is now added as trailing trivia for local foo = bar
Similarly, if there is a line break in between the two statements, such as:

local foo = bar [\n]
[\n]
local bar = baz

Both of these newlines will now be parsed as trailing trivia for local foo = bar (previously, they were leading trivia for local bar = baz)

I will also show another example with comments. This shows the reasoning behind our "state".

local foo = bar -- comment 1[\n]
-- comment 2[\n]
local bar = baz

In this example, -- comment 1 will still be parsed as trailing trivia to local foo = bar. We then have a newline character. Previously, this was parsed as leading trivia of local bar = baz, but now it is parsed as trailing trivia for local foo = bar.
We then see -- comment 2. As we have already seen a newline character, we stop taking in any more trailing trivia, so -- comment 2 is extracted as leading trivia for local bar = baz.

This behaviour regarding comments remains the same as before. This PR should not change how comments are extracted - it only changes newlines.

This PR is a breaking change for any users of the library who relied on newline trivia. Comment trivia logic should remain the same.

full-moon/src/ast/mod.rs

Kampfkarren · 2021-02-26T23:37:55Z

I need to look into why the tests didn't fail, I'll try today or tomorrow.

tomprince · 2021-03-02T09:43:45Z

I'm curious what the motivation for this, and wonder if splitting up the whitespace trivia into multiple pieces might be a way to address the underlying issue.

JohnnyMorganz · 2021-03-02T10:33:44Z

I'm curious what the motivation for this, and wonder if splitting up the whitespace trivia into multiple pieces might be a way to address the underlying issue.

The reasoning behind this was because currently with one of my projects (https://github.com/JohnnyMorganz/StyLua), I deal with each statement individually, and in this, new lines are placed at the end of statements. The problem here was that full moon would parse new line as leading trivia for the next token. I discussed this with Kampfkarren, and he was open to the PR. I also believe it makes more logical sense for the new line to trail the end of the line it was related to.

I'm not sure if splitting whitespace further would help, as the issue here isn't the parsing of the whitespace tokens itself, but how the trivia gets bound to the closest token.

tomprince · 2021-03-02T21:55:58Z

Hmm. Looking closer at the definition of the whitespace token, it appears that it is an optional newline, followed by non-newline whitespace (note that this does not match the regex in the comment in tokenizer::parse_whitespace). Without having used this in practice, I think if you are making the change describe above, the newline should be terminating, rather than starting a whitespace token, so that the indent on the following line is still trivia of the first token on it.

JohnnyMorganz · 2021-03-02T22:08:59Z

Hmm. Looking closer at the definition of the whitespace token, it appears that it is an optional newline, followed by non-newline whitespace (note that this does not match the regex in the comment in tokenizer::parse_whitespace). Without having used this in practice, I think if you are making the change describe above, the newline should be terminating, rather than starting a whitespace token, so that the indent on the following line is still trivia of the first token on it.

This is a very good point which I seemed to have missed, thanks! I'll take a look tomorrow

tomprince · 2021-03-02T23:08:36Z

This is a very good point which I seemed to have missed, thanks! I'll take a look tomorrow

The second point in #128 might be relevant to you, particularly if you are changing how whitespace trivia works.

Kampfkarren · 2021-03-02T23:16:20Z

Does that answer your question as to why the tests aren't failing then?

JohnnyMorganz · 2021-03-02T23:44:44Z

Does that answer your question as to why the tests aren't failing then?

I'm not entirely sure what you are referring to. If you are talking about the comment above about whitespace parsing, I think that is a different issue (although changing this would also cause tests to fail!)

You can see what I mean by "tests should have failed" if you delete the ast.json in one of the test folders where the source contained a new line (try anonymous-function-1 for example). Running the tests again, you will see when ast.json is recreated, there is a diff in the file - the tests aren't picking this up. I think this is an issue in the pass_cases.rs test

Kampfkarren · 2021-03-02T23:50:59Z

Ah, I see. I will look into it today, promise.

JohnnyMorganz · 2021-03-03T00:39:11Z

I've been rethinking over this PR, and I wanted to get a second opinion. I've been thinking of the new line character here as a line terminator, in contrast to the original rule. I believe it is a valid interpretation, but is it the interpretation we want to go for in the library?

Before I proceed any further, I just wanted to ask you both what you think makes more sense? (I can most likely accommodate my code for either scenario.)

Kampfkarren · 2021-03-03T02:40:51Z

Personally, any way works for me, as long as:

It's useful for immutable consumers, such as Selene, without much boilerplate. I believe the only place this is done in that codebase in particular is https://github.com/Kampfkarren/selene/blob/9a522dca0ad0c06a6df2e9a26350f5594d9d4b32/selene-lib/src/lint_filtering.rs#L96.
It's useful for mutable consumers, such as your own StyLua. I don't have any projects that are like this, so I trust your judgment on that whatever it ends up being.

I think StyLua is a good test case for this--choose whichever one makes your life easier.

Kampfkarren · 2021-03-04T03:38:52Z

Ah, I know why tests didn't fail--leading/trailing trivia wasn't in the tests to begin with.

there is a diff in the file - the tests aren't picking this up

This is confusing, but I'll figure it out.

Kampfkarren · 2021-03-04T03:44:23Z

Okay, it's because the tests are using Eq, but Eq passes even if the trivia is different! I'll file a fix for that, and you can throw it into your branch.

Kampfkarren · 2021-03-04T03:51:44Z

#134

JohnnyMorganz · 2021-03-04T23:17:32Z

the newline should be terminating, rather than starting a whitespace token, so that the indent on the following line is still trivia of the first token on it.

I was looking into this, and there seems to be a flaw in my rules to determine when to stop extracting trivia. Currently, we only stop at comments after newlines, so that the comment can be parsed as leading trivia for the next token. But this means, for something like:

do [\n]
[\t] call() [\n]
end

the indent trivia (in this case, a tab character preceding call(), is still parsed as trailing trivia for do.
To solve this, I think we should follow what the Rosyln Compiler (compiler for C#/.NET) does instead for trivia.
To quote their page:

In general, a token owns any trivia after it on the same line up to the next token. Any trivia after that line is associated with the following token. The first token in the source file gets all the initial trivia, and the last sequence of trivia in the file is tacked onto the end-of-file token, which otherwise has zero width.

This in turn should also make the rule simpler, and it seems to be well established.

EDIT: You may find https://sharplab.io/ useful, if you want to visualise what C# does

Kampfkarren · 2021-03-05T23:56:10Z

#134 is merged. Update your CHANGELOG and lets see if the CI passes.

…o extract-trivia

Kampfkarren · 2021-03-07T01:10:19Z

Re: Your last comment, is this still the PR you want to go with?

JohnnyMorganz · 2021-03-07T11:41:18Z

Re: Your last comment, is this still the PR you want to go with?

Yeah, I researched into this further and found that this is what the Roslyn compiler does. I simplified my logic (see the edit for the original PR message) greatly to follow what they do aswell - no need for "state" anymore.

Since it's used in practice by quite a large parser, I think it is an OK change to make

triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125

* Upgrade full_moon to 0.13.1 and add default roblox feature triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125 * Fix newline handling

JohnnyMorganz added 2 commits February 26, 2021 12:33

Change logic of how trivia is extracted

88c3b0b

Update test cases

edec4bd

Kampfkarren reviewed Feb 26, 2021

View reviewed changes

full-moon/src/ast/mod.rs Outdated Show resolved Hide resolved

JohnnyMorganz added 2 commits February 26, 2021 14:44

Change to unreachable and fix clippy warning

e19aa61

Update CHANGELOG.md

ce08475

JohnnyMorganz mentioned this pull request Feb 28, 2021

Empty table with comment wrongly collapses closing parenthesis into comment JohnnyMorganz/StyLua#64

Closed

Kampfkarren mentioned this pull request Mar 2, 2021

Start position of tokens at the beginning of the line are recorded as being at the end of the previous line. #128

Closed

JohnnyMorganz mentioned this pull request Mar 3, 2021

Strange formatting at pasting code JohnnyMorganz/StyLua#67

Closed

Kampfkarren added this to the 1.0.0 milestone Mar 5, 2021

JohnnyMorganz added 5 commits March 5, 2021 11:42

Change whitespace parser to make newline terminating

0c539ef

Update test tokens relating to whitespace

3a245ac

Change trailing trivia extraction to Roslyn method

1b6f94f

Update fail case tokens

e15b4d7

Update test case AST

806d2e7

JohnnyMorganz and others added 6 commits March 6, 2021 00:27

Merge branch 'master' of https://github.com/Kampfkarren/full-moon int…

ff415df

…o extract-trivia

Update test files

610f659

Merge branch 'master' into extract-trivia

72fffe6

Update Luau tests

c208f53

Merge branch 'master' of https://github.com/Kampfkarren/full-moon int…

47ec121

…o extract-trivia

Update Lua 5.2 tests

4caf5b1

Merge branch 'master' into extract-trivia

7a61d4e

Kampfkarren merged commit 713a293 into Kampfkarren:master Mar 7, 2021

kennethloeffler added a commit to kennethloeffler/moonwave that referenced this pull request Oct 2, 2021

Upgrade full_moon to 0.13.1 and add default roblox feature

9514237

triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125

kennethloeffler added a commit to kennethloeffler/moonwave that referenced this pull request Oct 2, 2021

Upgrade full_moon to 0.13.1 and add default roblox feature

baf5d33

triple_dash and free_function tests currently fail, likely due to Kampfkarren/full-moon#125

kennethloeffler mentioned this pull request Oct 2, 2021

Update to full_moon 0.13.1 and add roblox feature evaera/moonwave#40

Merged

JohnnyMorganz deleted the extract-trivia branch October 19, 2021 16:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the way trivia extraction works #125

Change the way trivia extraction works #125

JohnnyMorganz commented Feb 26, 2021 •

edited

Loading

Kampfkarren commented Feb 26, 2021

tomprince commented Mar 2, 2021

JohnnyMorganz commented Mar 2, 2021

tomprince commented Mar 2, 2021

JohnnyMorganz commented Mar 2, 2021

tomprince commented Mar 2, 2021

Kampfkarren commented Mar 2, 2021

JohnnyMorganz commented Mar 2, 2021

Kampfkarren commented Mar 2, 2021

JohnnyMorganz commented Mar 3, 2021

Kampfkarren commented Mar 3, 2021

Kampfkarren commented Mar 4, 2021 •

edited

Loading

Kampfkarren commented Mar 4, 2021

Kampfkarren commented Mar 4, 2021

JohnnyMorganz commented Mar 4, 2021 •

edited

Loading

Kampfkarren commented Mar 5, 2021

Kampfkarren commented Mar 7, 2021

JohnnyMorganz commented Mar 7, 2021

Change the way trivia extraction works #125

Change the way trivia extraction works #125

Conversation

JohnnyMorganz commented Feb 26, 2021 • edited Loading

Kampfkarren commented Feb 26, 2021

tomprince commented Mar 2, 2021

JohnnyMorganz commented Mar 2, 2021

tomprince commented Mar 2, 2021

JohnnyMorganz commented Mar 2, 2021

tomprince commented Mar 2, 2021

Kampfkarren commented Mar 2, 2021

JohnnyMorganz commented Mar 2, 2021

Kampfkarren commented Mar 2, 2021

JohnnyMorganz commented Mar 3, 2021

Kampfkarren commented Mar 3, 2021

Kampfkarren commented Mar 4, 2021 • edited Loading

Kampfkarren commented Mar 4, 2021

Kampfkarren commented Mar 4, 2021

JohnnyMorganz commented Mar 4, 2021 • edited Loading

Kampfkarren commented Mar 5, 2021

Kampfkarren commented Mar 7, 2021

JohnnyMorganz commented Mar 7, 2021

JohnnyMorganz commented Feb 26, 2021 •

edited

Loading

Kampfkarren commented Mar 4, 2021 •

edited

Loading

JohnnyMorganz commented Mar 4, 2021 •

edited

Loading