Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(css_parser): Parse border exactly #1448

Merged
merged 3 commits into from
Jan 7, 2024
Merged

Conversation

faultyserver
Copy link
Contributor

Summary

#268. This implements exact parsing for the border property, according to the spec: https://drafts.csswg.org/css-backgrounds/#propdef-border

 border =
     <line-width>  ||
     <line-style>  ||
     <color>       
 
 <line-width> =
     <length [0,∞]>  |
     thin            |
     medium          |
     thick           
 
 <line-style> =
     none    |
     hidden  |
     dotted  |
     dashed  |
     solid   |
     double  |
     groove  |
     ridge   |
     inset   |
     outset

(well, almost exact. We don't validate the numeric range of length yet, but we can at least assert that it is a Length Dimension).

Implementing this makes me realize that there is going to be a massive amount of code needed to support parsing all of the CSS properties. I think a lot of that is just inevitable. There are hundreds, and each can have 2, 3, 4, maybe even 8 sub-node types within the value definition. I'm not sure how or even if we'll be able to keep the number of node types reasonable, and the nodes.rs file and other code generated stuff is going to become massive lol, but it is pretty neat that we can parse and represent all of these things exactly with a CST. I think that's pretty special and unique now.

The reason I implemented this first is to show off the unordered syntax in the grammar! It worked out perfectly, but I did have to adjust the codegen a little bit to handle some errors that it had when nested inside and Any and such.

Test Plan

Added a snapshot test with a variety of permutations for the values.

Copy link

netlify bot commented Jan 5, 2024

Deploy Preview for biomejs canceled.

Name Link
🔨 Latest commit c6d84e9
🔍 Latest deploy log https://app.netlify.com/sites/biomejs/deploys/6598cf26386bad000858f811

@github-actions github-actions bot added A-Parser Area: parser A-Formatter Area: formatter A-Tooling Area: internal tools L-CSS Language: CSS labels Jan 5, 2024
Copy link
Contributor

github-actions bot commented Jan 5, 2024

Parser conformance results on

js/262

Test result main count This PR count Difference
Total 49701 49701 0
Passed 48721 48721 0
Failed 980 980 0
Panics 0 0 0
Coverage 98.03% 98.03% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6322 6322 0
Passed 2036 2036 0
Failed 4286 4286 0
Panics 0 0 0
Coverage 32.20% 32.20% 0.00%

ts/babel

Test result main count This PR count Difference
Total 662 662 0
Passed 592 592 0
Failed 70 70 0
Panics 0 0 0
Coverage 89.43% 89.43% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 17646 17646 0
Passed 13452 13452 0
Failed 4192 4192 0
Panics 2 2 0
Coverage 76.23% 76.23% 0.00%

Comment on lines +583 to +596
CssBorder =
line_width: AnyCssLineWidth ||
line_style: CssLineStyle ||
color: CssColor

// TODO: This should be narrowed down to CssLength when we can.
AnyCssLineWidth =
CssRegularDimension
| CssLineWidthKeyword

CssLineWidthKeyword = keyword: ('thin' | 'medium' | 'thick')

CssLineStyle =
keyword: ( 'none' | 'hidden' | 'dotted' | 'dashed' | 'solid' | 'double' | 'groove' | 'ridge' | 'inset' | 'outset')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be great if we could just make this a generic "CssKeywordValue" and have the parser enforce what the value is, otherwise we're going to end up with 100 of these Css*Keyword node types and I think that's not really all that helpful.

I would rather be able to write this whole thing as:

CssBorder =
  width: CssLineWidth || ...
  
AnyCssLineWidth =
  CssRegularDimension
  | keyword: ('thin' | 'medium' | 'thick')

But our current implementation of codegen won't understand that and just generates a regular node with 4 different keyword_token members lol. Thankfully that will mostly be an infrastructure change and I think we can safely handle that in the future.

Comment on lines +95 to +107
"thin",
"medium",
"thick",
"none",
"hidden",
"dotted",
"dashed",
"solid",
"double",
"groove",
"ridge",
"inset",
"outset",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a single property....and there are plenty with even more keywords. I think it might be worth just treating these as plain identifiers rather than expanding this list to like 1000 keywords, but I'm not really sure. Both ways have benefits and drawbacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I don't have a certain opinion about it.

I like having them in the ungram to see which keywords I can use, and to assert in a factory if one wants to use an invalid keyword. Another thing to consider is that implementing a binary search in the lexer could offer a performance benefit, as it would allow us to compare enum variants instead of strings.

Comment on lines +528 to +538
line_width: CssRegularDimension {
value_token: [email protected] "10" [] [],
unit_token: [email protected] "px" [] [],
},
line_style: CssLineStyle {
keyword: [email protected] "groove" [] [Whitespace(" ")],
},
color: CssColor {
hash_token: [email protected] "#" [] [],
value_token: [email protected] "fff" [] [Whitespace(" ")],
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look mom! The AST can track these in any order!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's GOOOOO!

Comment on lines +1074 to +1082
2: [email protected]
0: [email protected]
0: [email protected] "groove" [] [Whitespace(" ")]
1: [email protected]
0: [email protected] "#" [] []
1: [email protected] "fff" [] [Whitespace(" ")]
2: [email protected]
0: [email protected] "10" [] []
1: [email protected] "px" [] []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the CST preserves the actual token order from the source!

Comment on lines +58 to +86
let mut map = [false; 3];
let mut any = false;

loop {
if !map[0] && is_at_line_width(p) {
parse_any_line_width(p).ok();
map[0] = true;
any = true;
} else if !map[1] && is_at_line_style(p) {
parse_line_style(p).ok();
map[1] = true;
any = true;
} else if !map[2] && is_at_color(p) {
parse_color(p).ok();
map[2] = true;
any = true;
} else {
break;
}
}

if !any {
p.error(expect_one_of(
&["line width", "line style", "color"],
p.cur_range(),
));
m.abandon(p);
return Absent;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really like to wrap this up into a nice macro. it could look like:

parse_unordered_some! {
  "line width" => is_at_line_width(p) => parse_any_line_width(p),
  "line style" => is_at_line_style(p) => parse_any_line_style(p),
  "color" => is_at_color(p) => parse_any_color(p)
};

And would generate the exact code I've selected here. But for the meantime, rewriting this loop isn't terrible.

parse_unordered_all would also assert that all of the possible branches are filled, to match the && combinator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can implement a trait like we have for lists, I'm thinking about an error recovery.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A trait sounds good too! Especially for recovery. I'm definitely a little out of my current depth in that regard, so if you have any ideas I'd be very happy to hear and talk about them to see what can be the best approach.

Comment on lines +115 to +126
const LINE_STYLE_TOKEN_SET: TokenSet<CssSyntaxKind> = token_set![
T![none],
T![hidden],
T![dotted],
T![dashed],
T![solid],
T![double],
T![groove],
T![ridge],
T![inset],
T![outset]
];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could do something in the css_kinds_src or in the codegen to group these keywords ahead of time and let this be p.cur().is_line_style_keyword() or something instead of having to rebuild token sets? Definitely will be useful if we keep these as keywords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good.
We can try to split them into groups, because now we have to maintain the order inside the keywords array, and it's fragile sometimes.

Copy link

codspeed-hq bot commented Jan 6, 2024

CodSpeed Performance Report

Merging #1448 will degrade performances by 24.77%

Comparing faulty/css-parse-border (c6d84e9) with main (86688e4)

Summary

❌ 1 regressions
✅ 92 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main faulty/css-parse-border Change
big5-added.json[cached] 2.3 ms 3 ms -24.77%

Comment on lines +5937 to +5942
pub enum AnyCssBorderPropertyValue {
CssBogusPropertyValue(CssBogusPropertyValue),
CssBorder(CssBorder),
CssUnknownPropertyValue(CssUnknownPropertyValue),
CssWideKeyword(CssWideKeyword),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic :D

Copy link
Contributor

@denbezrukov denbezrukov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a brilliant idea for handling random order in the AST!

@faultyserver faultyserver merged commit 166cbab into main Jan 7, 2024
19 of 20 checks passed
@faultyserver faultyserver deleted the faulty/css-parse-border branch January 7, 2024 03:46
@Conaclos Conaclos added the A-Changelog Area: changelog label Jan 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-Formatter Area: formatter A-Parser Area: parser A-Tooling Area: internal tools L-CSS Language: CSS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants