feat(css_parser): better handling for Dimensions #1375

faultyserver · 2023-12-30T04:59:03Z

Summary

CSS Dimensions (and percentages) are pretty straightforward, but also complicated to deal with in Biome/Rowan's token system. Tokens here are all lightweight, direct representations of a string of text, which is great for efficiency, but limits how much information can be stored in them (also a good thing most of the time!). Dimensions and percentages, however, rely on a bit of additional context to be parsed correctly.

<dimension-token>: Spec: https://drafts.csswg.org/css-syntax-3/#dimension-token-diagram

A <dimension-token> is a singular token that consists of a number immediately followed by an identifier (as shown in the railroad diagram in the link). No whitespace is allowed. The spec also calls out that for a <dimension>, it must be a "unit identifier", but does not define what a unit identifier actually is. For this reason, most parsers seem to treat <dimension> exactly like <dimension-token> (which the spec says is correct) and just allow any identifier, where the result of parsing is either a "regular" dimension for known unit values, or an "unknown" dimension otherwise.

While lexing a number and an identifier is easy enough, the restriction on whitespace presents a problem when the rest of the grammar effectively ignores whitespace. One possibility is to use a new LexContext when attempting to lex a dimension versus a number, but then if there's a failure, the lexer has to re-lex the token, and knowing when to attempt a dimension and when to stop at the end of the number is very difficult to thread through without duplicating work. It's also not really correct, either! It would be invalid to parse a number and an identifier with no whitespace between them as anything other than a dimension, because that is the syntax definition.

So we always want to parse the entirety of the dimension token together, but we also still want to know which part of the token is the number and which part is the unit. And this is where it gets complicated. In other parsers, they handle this by just creating a Dimension token that has multiple fields. One for the number and one for the unit. Servo's rust-cssparser does exactly this, for example. But with Rowan, we are limited to just tracking the raw text of the token and can't add fields to it all willy-nilly.

What do we do then? We cheat a little and use some intermediate representation! This PR introduces a new token type, CSS_DIMENSION_VALUE, that represents a number literal that is immediately followed by an identifier. It does not include the identifier, but just checks that the next bytes of the token stream will be an identifier. By returning this special kind of token, the parser can then distinguish between a plain number and a dimension value, and if it sees a dimension value it can continue to consume the identifier as well and create a CssRegularDimension from it (or CssUnknownDimension for unknown unit values). Before creating the node, it will also re-cast the value token into a CSS_NUMBER_LITERAL, keeping the behavior purely internal and opaque to the resulting syntax tree.

<percentage> is a very similar token type in the CSS spec: a number followed immediately by a % character. While not technically a dimension, we can treat them the same way, which this PR does by also introducing a CSS_PERCENTAGE_VALUE that does similar checks as the CSS_DIMENSION_VALUE described above, and again the parser consumes and converts the value as needed.

Other effects

This change touches a surprising amount of code! As an overview of everything that's changed:

Added CssUnknownDimension to handle dimensions with unknown unit types. This is the end-user solution for 📎 Handle and recover from unknown dimensions #1370, where 0\0 was not being parsed as a dimension, even though \0 is a valid identifier.
Added the new token types as described above, and adjusted the lexer to handle them.
Refactored css_dimension to understand and re-cast the dimension value tokens into Dimension nodes as needed.
Refactored the parsing for @keyframes to look for CSS_PERCENTAGE_VALUE tokens rather than CSS_NUMBER_LITERAL, which will also make effective recovery simple.
Refactored parse_pseudo_class_nth to treat the An+B microsyntax as a possible Dimension followed by an offset. This is awkward because the definition for the microsyntax is very specific but also flexible. The result still passes all existing tests, so I believe it is sound.
Updated snapshots for all tests that use Dimensions.
Added tests to cover the usae of the 0\0 hack.

Test Plan

All of the snapshots for the relevant features should be updated now, and they all pass. I want to wait on #1371 to get the updated recovery for @keyframes and then update that snapshot as well, since it's quite different and will change with this PR as well (a good change, but will probably cause git conflicts and such).

netlify · 2023-12-30T04:59:07Z

✅ Deploy Preview for biomejs canceled.

Name	Link
🔨 Latest commit	`55b68b8`
🔍 Latest deploy log	https://app.netlify.com/sites/biomejs/deploys/65908bfbd4785a0007796087

github-actions · 2023-12-30T05:03:17Z

Parser conformance results on

js/262

Test result	`main` count	This PR count	Difference
Total	49701	49701	0
Passed	48721	48721	0
Failed	980	980	0
Panics	0	0	0
Coverage	98.03%	98.03%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	40	40	0
Passed	37	37	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.50%	92.50%	0.00%

symbols/microsoft

Test result	`main` count	This PR count	Difference
Total	6322	6322	0
Passed	2036	2036	0
Failed	4286	4286	0
Panics	0	0	0
Coverage	32.20%	32.20%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	662	662	0
Passed	592	592	0
Failed	70	70	0
Panics	0	0	0
Coverage	89.43%	89.43%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	17646	17646	0
Passed	13452	13452	0
Failed	4192	4192	0
Panics	2	2	0
Coverage	76.23%	76.23%	0.00%

codspeed-hq · 2023-12-30T05:14:07Z

CodSpeed Performance Report

Merging #1375 will create unknown performance changes

_{⚠️ No base runs were found}

_{Falling back to comparing faulty/css-backslash-recover (55b68b8) with main (5e8db95)}

Summary

⚠️ No benchmarks were detected in both the base of the PR and the PR.

denbezrukov

Nice trick for handling a whitespace token.

I'm wondering if it would be better not to skip whitespaces automatically. But that's another story! 😄

denbezrukov · 2023-12-30T18:04:25Z

I'd like to mention that you did a great job with the summary and comments inside the code. I really appreciate that!

faultyserver · 2023-12-30T21:48:13Z

The CI failure here is the benchmarking issue that's currently known, so I'm going to merge anyway.

github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-CSS Language: CSS labels Dec 30, 2023

faultyserver requested a review from denbezrukov December 30, 2023 07:26

denbezrukov approved these changes Dec 30, 2023

View reviewed changes

faultyserver added 3 commits December 30, 2023 18:35

feat(css_parser): better handling for Dimensions

8243999

use tokens directly in Dimension/Percentage nodes

0a98c2e

update keyframes with new recovery

55b68b8

faultyserver force-pushed the faulty/css-backslash-recover branch from 7a6d6eb to 55b68b8 Compare December 30, 2023 21:30

faultyserver merged commit 8061a77 into main Dec 30, 2023
20 of 21 checks passed

faultyserver deleted the faulty/css-backslash-recover branch December 30, 2023 21:48

Conaclos added the A-Changelog Area: changelog label Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(css_parser): better handling for Dimensions #1375

feat(css_parser): better handling for Dimensions #1375

faultyserver commented Dec 30, 2023

netlify bot commented Dec 30, 2023 •

edited

Loading

github-actions bot commented Dec 30, 2023

codspeed-hq bot commented Dec 30, 2023 •

edited

Loading

denbezrukov left a comment

denbezrukov commented Dec 30, 2023

faultyserver commented Dec 30, 2023

feat(css_parser): better handling for Dimensions #1375

feat(css_parser): better handling for Dimensions #1375

Conversation

faultyserver commented Dec 30, 2023

Summary

Test Plan

netlify bot commented Dec 30, 2023 • edited Loading

✅ Deploy Preview for biomejs canceled.

github-actions bot commented Dec 30, 2023

Parser conformance results on

js/262

jsx/babel

symbols/microsoft

ts/babel

ts/microsoft

codspeed-hq bot commented Dec 30, 2023 • edited Loading

CodSpeed Performance Report

Merging #1375 will create unknown performance changes

Summary

denbezrukov left a comment

Choose a reason for hiding this comment

denbezrukov commented Dec 30, 2023

faultyserver commented Dec 30, 2023

netlify bot commented Dec 30, 2023 •

edited

Loading

codspeed-hq bot commented Dec 30, 2023 •

edited

Loading