-
-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(css_parser): Parse border
exactly
#1448
Conversation
✅ Deploy Preview for biomejs canceled.
|
Parser conformance results onjs/262
jsx/babel
symbols/microsoft
ts/babel
ts/microsoft
|
CssBorder = | ||
line_width: AnyCssLineWidth || | ||
line_style: CssLineStyle || | ||
color: CssColor | ||
|
||
// TODO: This should be narrowed down to CssLength when we can. | ||
AnyCssLineWidth = | ||
CssRegularDimension | ||
| CssLineWidthKeyword | ||
|
||
CssLineWidthKeyword = keyword: ('thin' | 'medium' | 'thick') | ||
|
||
CssLineStyle = | ||
keyword: ( 'none' | 'hidden' | 'dotted' | 'dashed' | 'solid' | 'double' | 'groove' | 'ridge' | 'inset' | 'outset') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be great if we could just make this a generic "CssKeywordValue" and have the parser enforce what the value is, otherwise we're going to end up with 100 of these Css*Keyword
node types and I think that's not really all that helpful.
I would rather be able to write this whole thing as:
CssBorder =
width: CssLineWidth || ...
AnyCssLineWidth =
CssRegularDimension
| keyword: ('thin' | 'medium' | 'thick')
But our current implementation of codegen won't understand that and just generates a regular node with 4 different keyword_token
members lol. Thankfully that will mostly be an infrastructure change and I think we can safely handle that in the future.
"thin", | ||
"medium", | ||
"thick", | ||
"none", | ||
"hidden", | ||
"dotted", | ||
"dashed", | ||
"solid", | ||
"double", | ||
"groove", | ||
"ridge", | ||
"inset", | ||
"outset", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a single property....and there are plenty with even more keywords. I think it might be worth just treating these as plain identifiers rather than expanding this list to like 1000 keywords, but I'm not really sure. Both ways have benefits and drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I don't have a certain opinion about it.
I like having them in the ungram to see which keywords I can use, and to assert in a factory if one wants to use an invalid keyword. Another thing to consider is that implementing a binary search in the lexer could offer a performance benefit, as it would allow us to compare enum variants instead of strings.
line_width: CssRegularDimension { | ||
value_token: [email protected] "10" [] [], | ||
unit_token: [email protected] "px" [] [], | ||
}, | ||
line_style: CssLineStyle { | ||
keyword: [email protected] "groove" [] [Whitespace(" ")], | ||
}, | ||
color: CssColor { | ||
hash_token: [email protected] "#" [] [], | ||
value_token: [email protected] "fff" [] [Whitespace(" ")], | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look mom! The AST can track these in any order!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's GOOOOO!
2: [email protected] | ||
0: [email protected] | ||
0: [email protected] "groove" [] [Whitespace(" ")] | ||
1: [email protected] | ||
0: [email protected] "#" [] [] | ||
1: [email protected] "fff" [] [Whitespace(" ")] | ||
2: [email protected] | ||
0: [email protected] "10" [] [] | ||
1: [email protected] "px" [] [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the CST preserves the actual token order from the source!
let mut map = [false; 3]; | ||
let mut any = false; | ||
|
||
loop { | ||
if !map[0] && is_at_line_width(p) { | ||
parse_any_line_width(p).ok(); | ||
map[0] = true; | ||
any = true; | ||
} else if !map[1] && is_at_line_style(p) { | ||
parse_line_style(p).ok(); | ||
map[1] = true; | ||
any = true; | ||
} else if !map[2] && is_at_color(p) { | ||
parse_color(p).ok(); | ||
map[2] = true; | ||
any = true; | ||
} else { | ||
break; | ||
} | ||
} | ||
|
||
if !any { | ||
p.error(expect_one_of( | ||
&["line width", "line style", "color"], | ||
p.cur_range(), | ||
)); | ||
m.abandon(p); | ||
return Absent; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to wrap this up into a nice macro. it could look like:
parse_unordered_some! {
"line width" => is_at_line_width(p) => parse_any_line_width(p),
"line style" => is_at_line_style(p) => parse_any_line_style(p),
"color" => is_at_color(p) => parse_any_color(p)
};
And would generate the exact code I've selected here. But for the meantime, rewriting this loop isn't terrible.
parse_unordered_all
would also assert that all of the possible branches are filled, to match the &&
combinator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can implement a trait like we have for lists, I'm thinking about an error recovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A trait sounds good too! Especially for recovery. I'm definitely a little out of my current depth in that regard, so if you have any ideas I'd be very happy to hear and talk about them to see what can be the best approach.
const LINE_STYLE_TOKEN_SET: TokenSet<CssSyntaxKind> = token_set![ | ||
T![none], | ||
T![hidden], | ||
T![dotted], | ||
T![dashed], | ||
T![solid], | ||
T![double], | ||
T![groove], | ||
T![ridge], | ||
T![inset], | ||
T![outset] | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could do something in the css_kinds_src
or in the codegen to group these keywords ahead of time and let this be p.cur().is_line_style_keyword()
or something instead of having to rebuild token sets? Definitely will be useful if we keep these as keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good.
We can try to split them into groups, because now we have to maintain the order inside the keywords array, and it's fragile sometimes.
CodSpeed Performance ReportMerging #1448 will degrade performances by 24.77%Comparing Summary
Benchmarks breakdown
|
pub enum AnyCssBorderPropertyValue { | ||
CssBogusPropertyValue(CssBogusPropertyValue), | ||
CssBorder(CssBorder), | ||
CssUnknownPropertyValue(CssUnknownPropertyValue), | ||
CssWideKeyword(CssWideKeyword), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Magic :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a brilliant idea for handling random order in the AST!
Summary
#268. This implements exact parsing for the
border
property, according to the spec: https://drafts.csswg.org/css-backgrounds/#propdef-border(well, almost exact. We don't validate the numeric range of
length
yet, but we can at least assert that it is a Length Dimension).Implementing this makes me realize that there is going to be a massive amount of code needed to support parsing all of the CSS properties. I think a lot of that is just inevitable. There are hundreds, and each can have 2, 3, 4, maybe even 8 sub-node types within the value definition. I'm not sure how or even if we'll be able to keep the number of node types reasonable, and the
nodes.rs
file and other code generated stuff is going to become massive lol, but it is pretty neat that we can parse and represent all of these things exactly with a CST. I think that's pretty special and unique now.The reason I implemented this first is to show off the unordered syntax in the grammar! It worked out perfectly, but I did have to adjust the codegen a little bit to handle some errors that it had when nested inside and
Any
and such.Test Plan
Added a snapshot test with a variety of permutations for the values.