-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify location tracking in the lexer. #41
Conversation
This commit changes the location tracking in lexer.rs to be almost completely implicit, except for line tracking. The offset can be computed from the start of the &str and the current position of the innermost iterator. The final location information is only created in Lexer::next, at the start and the end of the currently build token. To simplify the control flow for each char consumed, inner iterators are changed to not use Peekable but instead do copies of inner iterators as needed.
PTAL, it took me forever to finish this PR but it's finally there. I think it is a good improvement because we don't need to manually count offset anymore, and the code in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good, I just have some concerns
let mut peek_inner = self.inner.clone(); | ||
if peek_inner.next() == Some('\r') { | ||
self.inner = peek_inner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like this pattern, it forces us to copy the entire struct which in this case is roughly 16 bytes so it's not much but even then it also incurs some instructions to copy everything and move it later if needed to restore it, whilst peek would only add plus 4 bytes (because a char is 4 bytes but has inhabited values) and save us some copying and moving and has the added benefit that peeking is handled implicitly instead of always needing to remember when to restore, also in the future when naga's msrv allows it we could use https://doc.rust-lang.org/std/iter/struct.Peekable.html#method.next_if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was going to be the controversial change in this PR :)
The problem is that a peekable for CharsAndLine
forces peekable for SkipComents
and LexerInnerIterator
too (because Peekable is not Cloneable). My concern with using a lot of chained peekables is that it adds a bunch of control flow for each LexerInnerIterator.next()
that's not needed because our iterators are almost stateless.
If you want I can try to undo this change, and put it in a separate PR so we can discuss there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't quite know how the change that makes LexerToken contain a CoW will look like, but it's very likely that it will make the iterators even more stateless because we likely need to pass the whole &str
from the outside of the iterators (which wouldn't be iterators anymore).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to undo this change I just find Peekable
more idiomatic but it's hard to measure performance here without running a proper benchmark since there are things like cache locality which shouldn't really affect this too much but you never know. I think we can merge this has is and in the future if we feel like it look at performance here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you.
let mut peek_inner = self.inner.clone(); | ||
if peek_inner.next() == Some('\n') { | ||
self.inner = peek_inner; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing
|
||
if current.0 != '/' { | ||
debug_assert!(current.0 != COMMENT_SENTINEL_VALUE); | ||
return Some(current); | ||
} | ||
|
||
match self.inner.peek() { | ||
let mut peek_inner = self.inner.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing
There seems to be some weird formatting going on maybe we should think about adding formatting to the CI checks |
- using &str::as_ptr - introducing a next_char_if helper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late to the party, but I'll give it a thumbs up 👍
This commit changes the location tracking in lexer.rs to be almost
completely implicit, except for line tracking. The offset can be
computed from the start of the &str and the current position of the
innermost iterator. The final location information is only created in
Lexer::next, at the start and the end of the currently build token.
To simplify the control flow for each char consumed, inner iterators
are changed to not use Peekable but instead do copies of inner iterators
as needed.