feat(css_parser): Parse `border` exactly #1448

faultyserver · 2024-01-05T23:36:44Z

Summary

#268. This implements exact parsing for the border property, according to the spec: https://drafts.csswg.org/css-backgrounds/#propdef-border

 border =
     <line-width>  ||
     <line-style>  ||
     <color>       
 
 <line-width> =
     <length [0,∞]>  |
     thin            |
     medium          |
     thick           
 
 <line-style> =
     none    |
     hidden  |
     dotted  |
     dashed  |
     solid   |
     double  |
     groove  |
     ridge   |
     inset   |
     outset

(well, almost exact. We don't validate the numeric range of length yet, but we can at least assert that it is a Length Dimension).

Implementing this makes me realize that there is going to be a massive amount of code needed to support parsing all of the CSS properties. I think a lot of that is just inevitable. There are hundreds, and each can have 2, 3, 4, maybe even 8 sub-node types within the value definition. I'm not sure how or even if we'll be able to keep the number of node types reasonable, and the nodes.rs file and other code generated stuff is going to become massive lol, but it is pretty neat that we can parse and represent all of these things exactly with a CST. I think that's pretty special and unique now.

The reason I implemented this first is to show off the unordered syntax in the grammar! It worked out perfectly, but I did have to adjust the codegen a little bit to handle some errors that it had when nested inside and Any and such.

Test Plan

Added a snapshot test with a variety of permutations for the values.

netlify · 2024-01-05T23:36:48Z

✅ Deploy Preview for biomejs canceled.

Name	Link
🔨 Latest commit	`c6d84e9`
🔍 Latest deploy log	https://app.netlify.com/sites/biomejs/deploys/6598cf26386bad000858f811

github-actions · 2024-01-05T23:41:32Z

Parser conformance results on

js/262

Test result	`main` count	This PR count	Difference
Total	49701	49701	0
Passed	48721	48721	0
Failed	980	980	0
Panics	0	0	0
Coverage	98.03%	98.03%	0.00%

jsx/babel

Test result	`main` count	This PR count	Difference
Total	40	40	0
Passed	37	37	0
Failed	3	3	0
Panics	0	0	0
Coverage	92.50%	92.50%	0.00%

symbols/microsoft

Test result	`main` count	This PR count	Difference
Total	6322	6322	0
Passed	2036	2036	0
Failed	4286	4286	0
Panics	0	0	0
Coverage	32.20%	32.20%	0.00%

ts/babel

Test result	`main` count	This PR count	Difference
Total	662	662	0
Passed	592	592	0
Failed	70	70	0
Panics	0	0	0
Coverage	89.43%	89.43%	0.00%

ts/microsoft

Test result	`main` count	This PR count	Difference
Total	17646	17646	0
Passed	13452	13452	0
Failed	4192	4192	0
Panics	2	2	0
Coverage	76.23%	76.23%	0.00%

faultyserver · 2024-01-05T23:40:06Z

xtask/codegen/css.ungram

+CssBorder =
+	line_width: AnyCssLineWidth ||
+	line_style: CssLineStyle ||
+	color: CssColor
+
+// TODO: This should be narrowed down to CssLength when we can.
+AnyCssLineWidth =
+	CssRegularDimension
+	| CssLineWidthKeyword
+
+CssLineWidthKeyword = keyword: ('thin' | 'medium' | 'thick')
+
+CssLineStyle =
+	keyword: ( 'none' | 'hidden' | 'dotted' | 'dashed' | 'solid' | 'double'  | 'groove' | 'ridge' | 'inset' | 'outset')


I think it would be great if we could just make this a generic "CssKeywordValue" and have the parser enforce what the value is, otherwise we're going to end up with 100 of these Css*Keyword node types and I think that's not really all that helpful.

I would rather be able to write this whole thing as:

CssBorder = width: CssLineWidth || ... AnyCssLineWidth = CssRegularDimension | keyword: ('thin' | 'medium' | 'thick')

But our current implementation of codegen won't understand that and just generates a regular node with 4 different keyword_token members lol. Thankfully that will mostly be an infrastructure change and I think we can safely handle that in the future.

faultyserver · 2024-01-05T23:41:01Z

xtask/codegen/src/css_kinds_src.rs

+        "thin",
+        "medium",
+        "thick",
+        "none",
+        "hidden",
+        "dotted",
+        "dashed",
+        "solid",
+        "double",
+        "groove",
+        "ridge",
+        "inset",
+        "outset",


This is just a single property....and there are plenty with even more keywords. I think it might be worth just treating these as plain identifiers rather than expanding this list to like 1000 keywords, but I'm not really sure. Both ways have benefits and drawbacks.

I agree, I don't have a certain opinion about it.

I like having them in the ungram to see which keywords I can use, and to assert in a factory if one wants to use an invalid keyword. Another thing to consider is that implementing a binary search in the lexer could offer a performance benefit, as it would allow us to compare enum variants instead of strings.

faultyserver · 2024-01-05T23:41:50Z

crates/biome_css_parser/tests/css_test_suite/ok/property/property_border.css.snap

+                                line_width: CssRegularDimension {
+                                    value_token: [email protected] "10" [] [],
+                                    unit_token: [email protected] "px" [] [],
+                                },
+                                line_style: CssLineStyle {
+                                    keyword: [email protected] "groove" [] [Whitespace("  ")],
+                                },
+                                color: CssColor {
+                                    hash_token: [email protected] "#" [] [],
+                                    value_token: [email protected] "fff" [] [Whitespace(" ")],
+                                },


Look mom! The AST can track these in any order!

Let's GOOOOO!

faultyserver · 2024-01-05T23:42:20Z

crates/biome_css_parser/tests/css_test_suite/ok/property/property_border.css.snap

+              2: [email protected]
+                0: [email protected]
+                  0: [email protected] "groove" [] [Whitespace("  ")]
+                1: [email protected]
+                  0: [email protected] "#" [] []
+                  1: [email protected] "fff" [] [Whitespace(" ")]
+                2: [email protected]
+                  0: [email protected] "10" [] []
+                  1: [email protected] "px" [] []


But the CST preserves the actual token order from the source!

faultyserver · 2024-01-05T23:45:28Z

crates/biome_css_parser/src/syntax/property/border.rs

+    let mut map = [false; 3];
+    let mut any = false;
+
+    loop {
+        if !map[0] && is_at_line_width(p) {
+            parse_any_line_width(p).ok();
+            map[0] = true;
+            any = true;
+        } else if !map[1] && is_at_line_style(p) {
+            parse_line_style(p).ok();
+            map[1] = true;
+            any = true;
+        } else if !map[2] && is_at_color(p) {
+            parse_color(p).ok();
+            map[2] = true;
+            any = true;
+        } else {
+            break;
+        }
+    }
+
+    if !any {
+        p.error(expect_one_of(
+            &["line width", "line style", "color"],
+            p.cur_range(),
+        ));
+        m.abandon(p);
+        return Absent;
+    }


I'd really like to wrap this up into a nice macro. it could look like:

parse_unordered_some! { "line width" => is_at_line_width(p) => parse_any_line_width(p), "line style" => is_at_line_style(p) => parse_any_line_style(p), "color" => is_at_color(p) => parse_any_color(p) };

And would generate the exact code I've selected here. But for the meantime, rewriting this loop isn't terrible.

parse_unordered_all would also assert that all of the possible branches are filled, to match the && combinator.

Or we can implement a trait like we have for lists, I'm thinking about an error recovery.

A trait sounds good too! Especially for recovery. I'm definitely a little out of my current depth in that regard, so if you have any ideas I'd be very happy to hear and talk about them to see what can be the best approach.

faultyserver · 2024-01-05T23:46:27Z

crates/biome_css_parser/src/syntax/property/border.rs

+const LINE_STYLE_TOKEN_SET: TokenSet<CssSyntaxKind> = token_set![
+    T![none],
+    T![hidden],
+    T![dotted],
+    T![dashed],
+    T![solid],
+    T![double],
+    T![groove],
+    T![ridge],
+    T![inset],
+    T![outset]
+];


Maybe we could do something in the css_kinds_src or in the codegen to group these keywords ahead of time and let this be p.cur().is_line_style_keyword() or something instead of having to rebuild token sets? Definitely will be useful if we keep these as keywords.

It sounds good.
We can try to split them into groups, because now we have to maintain the order inside the keywords array, and it's fragile sometimes.

codspeed-hq · 2024-01-06T00:08:34Z

CodSpeed Performance Report

Merging #1448 will degrade performances by 24.77%

_{Comparing faulty/css-parse-border (c6d84e9) with main (86688e4)}

Summary

❌ 1 regressions
✅ 92 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`faulty/css-parse-border`	Change
❌	`big5-added.json[cached]`	2.3 ms	3 ms	-24.77%

denbezrukov · 2024-01-06T19:32:15Z

crates/biome_css_syntax/src/generated/nodes.rs

+pub enum AnyCssBorderPropertyValue {
+    CssBogusPropertyValue(CssBogusPropertyValue),
+    CssBorder(CssBorder),
+    CssUnknownPropertyValue(CssUnknownPropertyValue),
+    CssWideKeyword(CssWideKeyword),
+}


denbezrukov

It's a brilliant idea for handling random order in the AST!

feat(css_parser): Parse border exactly

ce10126

github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-CSS Language: CSS labels Jan 5, 2024

faultyserver commented Jan 5, 2024

View reviewed changes

faultyserver added 2 commits January 6, 2024 03:36

clean up clippy from codegen

5ee1f9c

fix clippy for slot_map codegen

c6d84e9

denbezrukov reviewed Jan 6, 2024

View reviewed changes

denbezrukov approved these changes Jan 6, 2024

View reviewed changes

faultyserver merged commit 166cbab into main Jan 7, 2024
19 of 20 checks passed

faultyserver deleted the faulty/css-parse-border branch January 7, 2024 03:46

Conaclos added the A-Changelog Area: changelog label Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(css_parser): Parse `border` exactly #1448

feat(css_parser): Parse `border` exactly #1448

faultyserver commented Jan 5, 2024

netlify bot commented Jan 5, 2024 •

edited

Loading

github-actions bot commented Jan 5, 2024

faultyserver Jan 5, 2024

faultyserver Jan 5, 2024

denbezrukov Jan 6, 2024

faultyserver Jan 5, 2024

denbezrukov Jan 6, 2024

faultyserver Jan 5, 2024

faultyserver Jan 5, 2024

denbezrukov Jan 6, 2024

faultyserver Jan 7, 2024

faultyserver Jan 5, 2024

denbezrukov Jan 6, 2024

codspeed-hq bot commented Jan 6, 2024 •

edited

Loading

denbezrukov Jan 6, 2024

denbezrukov left a comment

feat(css_parser): Parse border exactly #1448

feat(css_parser): Parse border exactly #1448

Conversation

faultyserver commented Jan 5, 2024

Summary

Test Plan

netlify bot commented Jan 5, 2024 • edited Loading

✅ Deploy Preview for biomejs canceled.

github-actions bot commented Jan 5, 2024

Parser conformance results on

js/262

jsx/babel

symbols/microsoft

ts/babel

ts/microsoft

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codspeed-hq bot commented Jan 6, 2024 • edited Loading

Merging #1448 will degrade performances by 24.77%

Summary

Benchmarks breakdown

Choose a reason for hiding this comment

denbezrukov left a comment

Choose a reason for hiding this comment

feat(css_parser): Parse `border` exactly #1448

feat(css_parser): Parse `border` exactly #1448

netlify bot commented Jan 5, 2024 •

edited

Loading

codspeed-hq bot commented Jan 6, 2024 •

edited

Loading