-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve C# grammar generator. #73340
Conversation
s_normalizationRegex.Replace(name.EndsWith("Syntax") ? name[..^"Syntax".Length] : name, "_").ToLower(), | ||
ImmutableArray.Create(name)); | ||
|
||
// Converts a PascalCased name into snake_cased name. | ||
private static readonly Regex s_normalizationRegex = new Regex( | ||
"(?<=[A-Z])(?=[A-Z][a-z]) | (?<=[^A-Z])(?=[A-Z]) | (?<=[A-Za-z])(?=[^A-Za-z])", | ||
"(?<=[A-Z])(?=[A-Z][a-z0-9]) | (?<=[^A-Z])(?=[A-Z]) | (?<=[A-Za-z0-9])(?=[^A-Za-z0-9])", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensures that utf8 stays together as a single word. not utf_8
|
||
var seen = new HashSet<string>(); | ||
|
||
// Define a few major sections to help keep the grammar file naturally grouped. | ||
var majorRules = ImmutableArray.Create( | ||
"CompilationUnitSyntax", "MemberDeclarationSyntax", "TypeSyntax", "StatementSyntax", "ExpressionSyntax", "XmlNodeSyntax", "StructuredTriviaSyntax"); | ||
"CompilationUnitSyntax", "MemberDeclarationSyntax", "TypeSyntax", "StatementSyntax", "ExpressionSyntax", "XmlNodeSyntax", "StructuredTriviaSyntax", "Utf8Suffix"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensures we write out all the utf8 rules first before printing out the utf8 suffix.
rules.Add("Utf8StringLiteralToken", [Join(" ", [RuleReference("StringLiteralToken"), RuleReference("Utf8Suffix")])]); | ||
rules.Add("Utf8MultiLineRawStringLiteralToken", [Join(" ", [RuleReference("MultiLineRawStringLiteralToken"), RuleReference("Utf8Suffix")])]); | ||
rules.Add("Utf8SingleLineRawStringLiteralToken", [Join(" ", [RuleReference("SingleLineRawStringLiteralToken"), RuleReference("Utf8Suffix")])]); | ||
rules.Add("Utf8Suffix", [new("'u8'"), new("'U8'")]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adds a few pseudo rules to make it so that hte generated grammar has less 'see lexical specificatoin' productions.
@333fred ptal :-) |
| 'false' | ||
| 'null' | ||
| 'true' | ||
| '__arglist' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caused by move to case insensitive sorting.
| utf_8_string_literal_token | ||
| utf8_multi_line_raw_string_literal_token | ||
| utf8_single_line_raw_string_literal_token | ||
| utf8_string_literal_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tweaked so that numbers don't start a new 'word' in snake_casing.
base_argument_list | ||
: argument_list | ||
| bracketed_argument_list | ||
syntax_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fleshed out this construct.
IEnumerable<Production> repeat(Production production, int count) | ||
=> Enumerable.Repeat(production, count); | ||
|
||
IEnumerable<Production> anyCasing(string value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a helper to produce all casing variations of an initial piece of text. e.g. anyCasing("ul")
is what generates UL/Ul/uL/ul
. Useful for not having to specify a bunch of casing changes in several token places.
rules.Add("SingleCharacter", [new("""/* ~['\\\u000D\u000A\u0085\u2028\u2029] anything but ', \\, and new_line_character */""")]); | ||
} | ||
|
||
IEnumerable<Production> productionRange(char start, char end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you can say productionRange('a', 'f')
instead of having to spell it out by hand.
Gentle poke @333fred . This is just a qol update for me :) |
Fleshes out several rules that previously would say "/* see lexical specification */".
Added rules for many token types (identifiers, keywords,, modifiers, operators, punctuation, numerics, strings).