Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(css_parser): better handling for Dimensions #1375

Merged
merged 3 commits into from
Dec 30, 2023

Conversation

faultyserver
Copy link
Contributor

Summary

#268. Closes #1370.

CSS Dimensions (and percentages) are pretty straightforward, but also complicated to deal with in Biome/Rowan's token system. Tokens here are all lightweight, direct representations of a string of text, which is great for efficiency, but limits how much information can be stored in them (also a good thing most of the time!). Dimensions and percentages, however, rely on a bit of additional context to be parsed correctly.

<dimension-token>: Spec: https://drafts.csswg.org/css-syntax-3/#dimension-token-diagram

A <dimension-token> is a singular token that consists of a number immediately followed by an identifier (as shown in the railroad diagram in the link). No whitespace is allowed. The spec also calls out that for a <dimension>, it must be a "unit identifier", but does not define what a unit identifier actually is. For this reason, most parsers seem to treat <dimension> exactly like <dimension-token> (which the spec says is correct) and just allow any identifier, where the result of parsing is either a "regular" dimension for known unit values, or an "unknown" dimension otherwise.

While lexing a number and an identifier is easy enough, the restriction on whitespace presents a problem when the rest of the grammar effectively ignores whitespace. One possibility is to use a new LexContext when attempting to lex a dimension versus a number, but then if there's a failure, the lexer has to re-lex the token, and knowing when to attempt a dimension and when to stop at the end of the number is very difficult to thread through without duplicating work. It's also not really correct, either! It would be invalid to parse a number and an identifier with no whitespace between them as anything other than a dimension, because that is the syntax definition.

So we always want to parse the entirety of the dimension token together, but we also still want to know which part of the token is the number and which part is the unit. And this is where it gets complicated. In other parsers, they handle this by just creating a Dimension token that has multiple fields. One for the number and one for the unit. Servo's rust-cssparser does exactly this, for example. But with Rowan, we are limited to just tracking the raw text of the token and can't add fields to it all willy-nilly.

What do we do then? We cheat a little and use some intermediate representation! This PR introduces a new token type, CSS_DIMENSION_VALUE, that represents a number literal that is immediately followed by an identifier. It does not include the identifier, but just checks that the next bytes of the token stream will be an identifier. By returning this special kind of token, the parser can then distinguish between a plain number and a dimension value, and if it sees a dimension value it can continue to consume the identifier as well and create a CssRegularDimension from it (or CssUnknownDimension for unknown unit values). Before creating the node, it will also re-cast the value token into a CSS_NUMBER_LITERAL, keeping the behavior purely internal and opaque to the resulting syntax tree.

<percentage> is a very similar token type in the CSS spec: a number followed immediately by a % character. While not technically a dimension, we can treat them the same way, which this PR does by also introducing a CSS_PERCENTAGE_VALUE that does similar checks as the CSS_DIMENSION_VALUE described above, and again the parser consumes and converts the value as needed.
 
Other effects

This change touches a surprising amount of code! As an overview of everything that's changed:

  • Added CssUnknownDimension to handle dimensions with unknown unit types. This is the end-user solution for 📎 Handle and recover from unknown dimensions #1370, where 0\0 was not being parsed as a dimension, even though \0 is a valid identifier.
  • Added the new token types as described above, and adjusted the lexer to handle them.
  • Refactored css_dimension to understand and re-cast the dimension value tokens into Dimension nodes as needed.
  • Refactored the parsing for @keyframes to look for CSS_PERCENTAGE_VALUE tokens rather than CSS_NUMBER_LITERAL, which will also make effective recovery simple.
  • Refactored parse_pseudo_class_nth to treat the An+B microsyntax as a possible Dimension followed by an offset. This is awkward because the definition for the microsyntax is very specific but also flexible. The result still passes all existing tests, so I believe it is sound.
  • Updated snapshots for all tests that use Dimensions.
  • Added tests to cover the usae of the 0\0 hack.

Test Plan

All of the snapshots for the relevant features should be updated now, and they all pass. I want to wait on #1371 to get the updated recovery for @keyframes and then update that snapshot as well, since it's quite different and will change with this PR as well (a good change, but will probably cause git conflicts and such).

Copy link

netlify bot commented Dec 30, 2023

Deploy Preview for biomejs canceled.

Name Link
🔨 Latest commit 55b68b8
🔍 Latest deploy log https://app.netlify.com/sites/biomejs/deploys/65908bfbd4785a0007796087

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-CSS Language: CSS labels Dec 30, 2023
Copy link
Contributor

Parser conformance results on

js/262

Test result main count This PR count Difference
Total 49701 49701 0
Passed 48721 48721 0
Failed 980 980 0
Panics 0 0 0
Coverage 98.03% 98.03% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6322 6322 0
Passed 2036 2036 0
Failed 4286 4286 0
Panics 0 0 0
Coverage 32.20% 32.20% 0.00%

ts/babel

Test result main count This PR count Difference
Total 662 662 0
Passed 592 592 0
Failed 70 70 0
Panics 0 0 0
Coverage 89.43% 89.43% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 17646 17646 0
Passed 13452 13452 0
Failed 4192 4192 0
Panics 2 2 0
Coverage 76.23% 76.23% 0.00%

Copy link

codspeed-hq bot commented Dec 30, 2023

CodSpeed Performance Report

Merging #1375 will create unknown performance changes

⚠️ No base runs were found

Falling back to comparing faulty/css-backslash-recover (55b68b8) with main (5e8db95)

Summary

⚠️ No benchmarks were detected in both the base of the PR and the PR.

Copy link
Contributor

@denbezrukov denbezrukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice trick for handling a whitespace token.

I'm wondering if it would be better not to skip whitespaces automatically. But that's another story! 😄

@denbezrukov
Copy link
Contributor

I'd like to mention that you did a great job with the summary and comments inside the code. I really appreciate that!

@faultyserver faultyserver force-pushed the faulty/css-backslash-recover branch from 7a6d6eb to 55b68b8 Compare December 30, 2023 21:30
@faultyserver
Copy link
Contributor Author

The CI failure here is the benchmarking issue that's currently known, so I'm going to merge anyway.

@faultyserver faultyserver merged commit 8061a77 into main Dec 30, 2023
20 of 21 checks passed
@faultyserver faultyserver deleted the faulty/css-backslash-recover branch December 30, 2023 21:48
@Conaclos Conaclos added the A-Changelog Area: changelog label Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools L-CSS Language: CSS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

📎 Handle and recover from unknown dimensions
3 participants