-
Notifications
You must be signed in to change notification settings - Fork 656
Conversation
Deploying with
|
Latest commit: |
9a96362
|
Status: | ✅ Deploy successful! |
Preview URL: | https://6b426c46.tools-8rn.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. Overall it looks good! Great work! :)
Re: the |
declarators: JsVariableDeclarator* | ||
|
||
JsVariableDeclarator = | ||
id: JsBinding '=' init: JsExpression? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, we should change declarators
to (JsVariableDeclarator (',' JsVariableDeclarator)*)
(maybe with trailing)?
|
||
JsTemplateElement = JsStringTemplateElement | JsExpression | ||
|
||
JsStringTemplateElement = value: 'js_string_template_element' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created JsStringTemplateElement
because unions should always be over nodes and not tokens.
I created a js_string_template_element
because it isn't a quoted string as it is the case for js_string
crates/rome-js-grammar/src/rfc.md
Outdated
Having fine granular nodes has the advantage that the API allows querying for a very specific node when, for example, using `node.descendants::<PreIncrementExpression>()`. | ||
|
||
Having more coarse-grained nodes on the other hand has the advantage that it's easier to implement common behaviour that, for example, applies to all Binary expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get the best of both worlds by having additional specific AST nodes that allow you to refine a coarse node? The specific nodes would need additional checks on the can_cast
method (that would probably be hand-written in many cases). And we'd implement From<SpecificNode> for CoarseNode
to make them easier to use.
Alternately, we could add granularity through utility functions: node.descendants().filter_map(UnaryExpression::is_pre_increment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and I think this is overlapping with what @ematipico proposed here. For example, we could define a BinaryExpression
union that contains the types for each specific operator. You can then call BinaryExpression::cast(specific_expr.syntax())
to get a BinaryExpression
out of it or just do specific_expr.into()
in a place where a BinaryExpression
is expected.
The only downside is that these fine-grained unions will expand, for example, into the Expr
union. But that might be less of a problem because these matches will not be exhaustive for most use cases?
|
||
|
||
JsIdentifier = | ||
name: 'js_identifier' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be named like JsStatic*Name
or just get folded into JsStaticMemberName
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make sense to unify them (I also see it as an option to use JsIdentifier
in place of JsStatic*Name
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kinda worry about using the more generic sounding JsIdentifier
because the other types have been split out:
JsBindingIdentifier
JsIdentifierAssignmentTarget
JsIdentifier
JsReferenceIdentifierExpression
A user might match on JsIdentifier
thinking it will return all kinds of identifiers, especially since that is the case in Babel/etc.
I think we should also take a moment to consider how some of these nodes relate to the TypeScript and JSX grammars. For example there are a number of different types of identifiers that we have already:
TypeScript has parallels for almost all of these (except the type TsBindingIdentifier<
TsBindingIdentifier extends TsIdentifierReferenceExpression = TsIdentifierReferenceExpression
> = {
TsStaticMemberName: TsIdentifierReferenceExpression.TsStaticMemberName
} JSX also has a few: <>
{/* member expression */}
<jsxIdentifier.jsxIdentifier jsxIdentifier/>
<JsxReferenceIdentifier.jsxIdentifier/>
{/* namespaced name */}
<JsxReferenceIdentifier:jsxIdentifier jsxIdentifier:jsxIdentifier/>
{/* tsx */}
<JsxReferenceIdentifier<TsReferenceIdentifier> />
</> Initially I was leaning towards saying that we should reuse grammar names when possible (i.e. No separate For the people who do want to work with identifiers in the abstract, we could have separate tools for doing so: // just as an example, each identifier-like node could implement a shared trait
trait Identifier { fn name() -> String }
impl Identifier for JsBindingIdentifier { .. }
impl Identifier for JsIdentifierAssignmentTarget { .. }
// or belong to some enum
enum AnyIdentifier { .. }
// etc...
if node.is_identifier() { .. } // check if a node is some identifier-like node
any_identifier.name(); // get the name of some identifier-like node We should come up with general categories for these in that case:
Maybe even further into the grammar:
If we do want to go further into the grammar, we might want to consider splitting some nodes like |
crates/rome-js-grammar/src/rfc.md
Outdated
|
||
## Ungrammar extension proposals | ||
|
||
Add support for `///` comments that we can use to document nodes and tokens. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSDoc and fam use /** ... */
fairly universally. Did you want to parse inside these comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have explained a bit more.
My understanding is that ungrammar doesn't provide the means for us today to generate documentation for the AST nodes and fields. Meaning, that our AST facade is undocumented.
What I was imagining is that we could write that documentation in the grammar. Having docs right above the grammar definition is not only useful for users of our facade but also if you try to figure out what code a node represents.
/// Some documentation that the code gen will add to the generated `AnyExpression` type
type AnyExpression = ...
type JsArrayExpression =
'['
/// The elements of the array. The elements are separated by a comma. (or something more useful)
elements: (AnyJsArrayElemet (',' AnyJsArrayElement)* ','?)
generates
/// Some documentation that the code gen will add to the generated `AnyExpression` type
pub enum AnyExpression {... }
impl JsArrayExpression {
/// The elements of the array. The elements are separated by a comma. (or something more useful)
pub fn elements(&self) -> AstSeparatedList<AnyJsArrayElement> {}
}
I haven't spent much time thinking about that yet (maybe I should have). My first question would be if we feel comfortable moving forward with a JS only grammar for now and figure out some of JS's dialects after (or while implementing the JS grammar). My motivation for delaying the decision is that implementing the JS grammar will give us a better understanding of what worked well and what doesn't. I'm very much undecided right now on which approach to take and would need to look closer into how the grammars differ but I'm probably more leaning toward different nodes if they differ significantly. My main thoughts from a grammar perspective are:
I'm leaning towards an enum here because we already have the infrastructure in place to define it inside of our grammar and testing if something is an instance of it ``AnyJsIdentifier::cast(node.syntax())`
I like this categorisation and having these "abstract" types will simplify writing analysis a lot. Roslyn has the concept of
I actually liked Rome's approach of computed/static a bit better because it gets away with fewer node types but it certainly makes sense to split it if it helps to categorise the node types. |
23e0a27
to
7f3ccfb
Compare
9c84e7d
to
a1a8f04
Compare
Moved the files to the RFC folder. Updated the JS grammar with the proposed changes. |
Renames and restructures the following statements as we agreed to in our new JS grammar (#1719): * BlockStatement: Rename node and rename `stmts` child to `statements` * DebuggerStatement: Rename node * EmptyStatement: Rename node * ExpressionStatement: Rename node, `expr` child to `expression`, and add optional trailing semicolon * ReturnStatement: Rename node, and rename 'value' child to 'argument' * LabeledStatement: Rename node, change type of `label` child to a token, and rename `stmt` child to `body` * WithStatement: * Rename node * inline "condition" (it's not a condition) * rename `condition` to `object` * rename `cons` to `body` * Add parser test
…1774) Changes the js grammar to match our new AST facade as defined in #1725 (and proposed in #1719) * `Continue` Statement * Rename from `ContinueStmt` to `JsContinueStatement` * Inline the label identifier (It's not a normal "variable" reference but a label reference) * `Break` Statement * Rename from `BreakStmt` to `JsBreakStatement` * Inline the label identifier (It's not a normal "variable" reference but a label reference) * `While` Statement * Rename from `WhileStmt` to `JsWhileStatement` * Rename `cons` to `body` * Inline the condition and rename `condition` to `test` * `DoWhileStatement` * Rename to `JsDoWhileStatement` * Inline condition * rename `cons` to `body` * rename `condition` to `test`
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Changes the js grammar of the variable declaration to match our new AST facade as defined in #1725 (and proposed in #1719) * Split `VarDecl` into `JsVariableDeclarationStatement` and `JsVariableDeclaration` to correctly account for variable declarations inside for loops (that aren't statements). * Change the parser to emit a `let` token instead of an ident if let is used inside a variable declaration (let is a valid identifier which is why the lexer returns an `ident` token) * Rename `Declarator` to `JsVariableDeclarator` * Split out the `init` into a `JsEqualValueClause` that holds the `=` token and the initializer expression.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
Refines the AST Facade for literals to match the grammar proposed in #1719. This change is part of #1725. * Introduce new `JsAnyLiteral` * Split `Literal` into `JsStringLiteral`, `JsBooleanLiteral`, `JsNullLiteral`, `JsRegexLiteral`, and `JsBigIntLiteral`. This allows to implement custom methods on the corresponding literal nodes. * Renames the `number` and kinds to `JS_NUMBER_LITERAL_TOKEN` `JS_STRING_LITERAL_TOKEN` to avoid conflicts with the TS `number` keyword (and keep symmetry). * Removed some unused keywords and special handling inside of the code gen.
…rrow function expression AST tree structures (#1786) Updates the facade for function declarations and function expression to match the grammar defined in #1719. * Function declaration * Rename from `FnDecl` to `JsFunctionDeclaration` * Rename `ParameterList` to `JsParameterList` * Introduce new `JsIdentifierBinding` type for `id` * Parses `...rest` as `JS_REST_PARAMETER` instead of `JS_REST_PATTERN`. * Split out `TsReturnType` for the `:` and the return type * Function expression * Rename from `FnExpr` to `JsFunctionExpression` * Otherwise the same as for declaration * Arrow function expression * Rename from `ArrowExpr` to `JsArrowFunctionExpression` * Rename `ExprOrBlock` to `JsAnyArrowFunctionBody` * Otherwise the same as for declaration. This PR cleans up some of the function parsing (extracts helpers rather than using boolean params). This improves error reporting for the `func_decl` use case if the name is missing. This PR also removes the support for decorators. The parser had some support for decorators but they haven't been part of the grammar. We should re-add them when we decide to properly support them (and add tests)
Summary
RFC for a re-structured AST Facade (see
/grammar.ungram
file). The main motivations for restructuring the grammar are (#1718 goes into more details of why):Err
if a mandatory element is missing.else
token and thealternative
branch of anif
statement must both either be present or absent. Therefore, add an optionalElseClause
to theif
statement that has the mandatoryelse
token andalternative
branch children` (if one is present, the other must as well).UpdatExpression
(x++
or++x
) must be modelled as two distinct nodes to guarantee that the operator and identifier are always in the same order for each node type (PreUpdateExpression
,PostUpdateExpression
)The grammar does not yet cover typescript. This is my first time trying to design an AST. I'm certain there are some missing children, cases I didn't think of and some confusing names.
Scope
crates/rome-js-grammar/grammar.ungram
defines the JS grammarcrates/rome-js-grammar/rfc.md
lists open questions, proposed decisions, code gen extensions etc.Test Plan
My imagination... There's no parser generating the defined AST today. I sketched some trees by hand but that's all. Changing the parser will bring up any cases that I missed.