Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JavaScript] Improved identifier and number lexing. #1427

Merged
merged 3 commits into from
Mar 14, 2018

Conversation

Thom1729
Copy link
Collaborator

Implemented the identifier rules from the spec.

The syntax previously used \b in a lot of rules. This is not quite right, and it leads to some bugs:

function$NotAFunction

Where \b followed identifier characters, I replaced it with a negative lookahead. (Testing showed that there was no effect on performance.) Where \b preceded identifier characters, we would need a lookbehind, which the fast regexp engine does not support. However, we don't actually need assertions at the beginning of matches if we maintain the invariant that the "cursor" should never pause with identifier characters on both sides. This is guaranteed if every match that ends with an identifier character ends with the negative lookahead.

Implementing this correctly required changing the implementation of numeric literals. I added more tests and improved the behavior of numbers in some corner cases.

@michaelblyons
Copy link
Collaborator

Ruby may need this sort of thing, too.

@wbond wbond merged commit a1320a4 into sublimehq:master Mar 14, 2018
@Thom1729 Thom1729 deleted the javascript-exact-identifiers branch March 14, 2018 21:19
identifier: '{{identifier_start}}{{identifier_part}}*{{identifier_break}}'
constant_identifier: '[[:upper:]]{{identifier_part}}*{{identifier_break}}'
dollar_only_identifier: '\${{identifier_break}}'
dollar_identifier: '(\$){{identifier_part}}*{{identifier_break}}+'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thom1729 This dollar_identifier regular expression seems to cause errors when the JavaScript language definition is used within syntect.

In particular, it uses + on {{identifier_break}}, which is a negative-lookahead group: (?!...)+. Is this valid regex syntax? Could/should the + be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid but redundant. I don't know why syntect doesn't handle this, but if you open up a PR to remove the + I'll endorse it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for anyone arriving here, @sharkdp raised a PR: #1525)

deathaxe pushed a commit to deathaxe/sublime-packages that referenced this pull request Jun 9, 2019
…tifiers

[JavaScript] Improved identifier and number lexing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants