Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify location tracking in the lexer. #41

Merged
merged 4 commits into from
Jun 13, 2021
Merged

Simplify location tracking in the lexer. #41

merged 4 commits into from
Jun 13, 2021

Conversation

Kangz
Copy link
Owner

@Kangz Kangz commented Jun 13, 2021

This commit changes the location tracking in lexer.rs to be almost
completely implicit, except for line tracking. The offset can be
computed from the start of the &str and the current position of the
innermost iterator. The final location information is only created in
Lexer::next, at the start and the end of the currently build token.

To simplify the control flow for each char consumed, inner iterators
are changed to not use Peekable but instead do copies of inner iterators
as needed.

This commit changes the location tracking in lexer.rs to be almost
completely implicit, except for line tracking. The offset can be
computed from the start of the &str and the current position of the
innermost iterator. The final location information is only created in
Lexer::next, at the start and the end of the currently build token.

To simplify the control flow for each char consumed, inner iterators
are changed to not use Peekable but instead do copies of inner iterators
as needed.
@Kangz Kangz requested review from pjoe and JCapucho June 13, 2021 12:34
@Kangz
Copy link
Owner Author

Kangz commented Jun 13, 2021

PTAL, it took me forever to finish this PR but it's finally there. I think it is a good improvement because we don't need to manually count offset anymore, and the code in Lexer becomes completely agnostic to Location information, except for Lexer::next

Copy link
Collaborator

@JCapucho JCapucho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, I just have some concerns

Comment on lines +47 to +49
let mut peek_inner = self.inner.clone();
if peek_inner.next() == Some('\r') {
self.inner = peek_inner;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like this pattern, it forces us to copy the entire struct which in this case is roughly 16 bytes so it's not much but even then it also incurs some instructions to copy everything and move it later if needed to restore it, whilst peek would only add plus 4 bytes (because a char is 4 bytes but has inhabited values) and save us some copying and moving and has the added benefit that peeking is handled implicitly instead of always needing to remember when to restore, also in the future when naga's msrv allows it we could use https://doc.rust-lang.org/std/iter/struct.Peekable.html#method.next_if

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was going to be the controversial change in this PR :)

The problem is that a peekable for CharsAndLine forces peekable for SkipComents and LexerInnerIterator too (because Peekable is not Cloneable). My concern with using a lot of chained peekables is that it adds a bunch of control flow for each LexerInnerIterator.next() that's not needed because our iterators are almost stateless.

If you want I can try to undo this change, and put it in a separate PR so we can discuss there.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't quite know how the change that makes LexerToken contain a CoW will look like, but it's very likely that it will make the iterators even more stateless because we likely need to pass the whole &str from the outside of the iterators (which wouldn't be iterators anymore).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to undo this change I just find Peekable more idiomatic but it's hard to measure performance here without running a proper benchmark since there are things like cache locality which shouldn't really affect this too much but you never know. I think we can merge this has is and in the future if we feel like it look at performance here

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

Comment on lines +58 to +60
let mut peek_inner = self.inner.clone();
if peek_inner.next() == Some('\n') {
self.inner = peek_inner;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing


if current.0 != '/' {
debug_assert!(current.0 != COMMENT_SENTINEL_VALUE);
return Some(current);
}

match self.inner.peek() {
let mut peek_inner = self.inner.clone();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing

@JCapucho
Copy link
Collaborator

JCapucho commented Jun 13, 2021

There seems to be some weird formatting going on maybe we should think about adding formatting to the CI checks

Kangz added 3 commits June 13, 2021 16:53
 - using &str::as_ptr
 - introducing a next_char_if helper.
@Kangz Kangz merged commit 08cf72a into main Jun 13, 2021
@Kangz Kangz deleted the lexer_location branch June 13, 2021 16:02
Copy link
Collaborator

@pjoe pjoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit late to the party, but I'll give it a thumbs up 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants