Skip to content

Fix typo in comment: “occured” → “occurred” in scanner docs #801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/blog/scanner.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,9 @@ The scanner [chooses](https://cs.chromium.org/chromium/src/v8/src/scanner.cc?rcl

## Whitespace

Tokens can be separated by various types of whitespace, e.g., newline, space, tab, single line comments, multiline comments, etc. One type of whitespace can be followed by other types of whitespace. Whitespace adds meaning if it causes a line break between two tokens: that possibly results in [automatic semicolon insertion](https://tc39.es/ecma262/#sec-automatic-semicolon-insertion). So before scanning the next token, all whitespace is skipped keeping track of whether a newline occured. Most real-world production JavaScript code is minified, and so multi-character whitespace luckily isn’t very common. For that reason V8 uniformly scans each type of whitespace independently as if they were regular tokens. E.g., if the first token character is `/` followed by another `/`, V8 scans this as a single-line comment which returns `Token::WHITESPACE`. That loop simply continues scanning tokens [until](https://cs.chromium.org/chromium/src/v8/src/scanner.cc?rcl=edf3dab4660ed6273e5d46bd2b0eae9f3210157d&l=671) we find a token other than `Token::WHITESPACE`. This means that if the next token is not preceded by whitespace, we immediately start scanning the relevant token without needing to explicitly check for whitespace.
Tokens can be separated by various types of whitespace, e.g., newline, space, tab, single line comments, multiline comments, etc. One type of whitespace can be followed by other types of whitespace. Whitespace adds meaning if it causes a line break between two tokens: that possibly results in [automatic semicolon insertion](https://tc39.es/ecma262/#sec-automatic-semicolon-insertion). So before scanning the next token, all whitespace is skipped keeping track of whether a newline occurred. Most real-world production JavaScript code is minified, and so multi-character whitespace luckily isn’t very common. For that reason V8 uniformly scans each type of whitespace independently as if they were regular tokens. E.g., if the first token character is `/` followed by another `/`, V8 scans this as a single-line comment which returns `Token::WHITESPACE`. That loop simply continues scanning tokens [until](https://cs.chromium.org/chromium/src/v8/src/scanner.cc?rcl=edf3dab4660ed6273e5d46bd2b0eae9f3210157d&l=671) we find a token other than `Token::WHITESPACE`. This means that if the next token is not preceded by whitespace, we immediately start scanning the relevant token without needing to explicitly check for whitespace.

The loop itself however adds overhead to each scanned token: it requires a branch to verify the token that we’ve just scanned. It would be better to continue the loop only if the token we have just scanned could be a `Token::WHITESPACE`. Otherwise we should just break out of the loop. We do this by moving the loop itself into a separate [helper method](https://cs.chromium.org/chromium/src/v8/src/parsing/scanner-inl.h?rcl=d62ec0d84f2ec8bc0d56ed7b8ed28eaee53ca94e&l=178) from which we return immediately when we’re certain the token isn’t `Token::WHITESPACE`. Even though these kinds of changes may seem really small, they remove overhead for each scanned token. This especially makes a difference for really short tokens like punctuation:
The loop itself however adds overhead to each scanned token: it requires a branch to verify the token that we’ve just scanned. It would be better to continue the loop only if the token we have just scanned could be a `Token::WHITESPACE`. Otherwise, we should just break out of the loop. We do this by moving the loop itself into a separate [helper method](https://cs.chromium.org/chromium/src/v8/src/parsing/scanner-inl.h?rcl=d62ec0d84f2ec8bc0d56ed7b8ed28eaee53ca94e&l=178) from which we return immediately when we’re certain the token isn’t `Token::WHITESPACE`. Even though these kinds of changes may seem really small, they remove overhead for each scanned token. This especially makes a difference for really short tokens like punctuation:

![](/_img/scanner/punctuation.svg)

Expand Down