Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support new Unicode properties in regexps for syntax definitions #2193

Closed
Thom1729 opened this issue Feb 15, 2018 · 5 comments
Closed

Support new Unicode properties in regexps for syntax definitions #2193

Thom1729 opened this issue Feb 15, 2018 · 5 comments

Comments

@Thom1729
Copy link

Thom1729 commented Feb 15, 2018

Several languages, including Python, JavaScript, and Rust, define their identifier syntax using new Unicode properties (ID_Start, ID_Continue, XID_Start, and XID_Continue). Ideally, these languages' Sublime syntax definitions should refer to those Unicode properties to ensure correctness. Emulating those properties manually is not easy and is likely to be a source of syntax bugs. My JavaScript PR is not truly comprehensive, because it does not consider the properties Other_ID_Start, Pattern_Syntax, or Pattern_White_Space.

Sublime currently uses Oniguruma 6.9.1, which does implement these properties. However, Sublime's custom regexp engine does not implement them. As a result, syntax rules that use these properties trigger the Oniguruma engine. For something as central as parsing identifiers, this would likely cause an unacceptable performance regression.

@Thom1729
Copy link
Author

Implemented in build 3186.

@wbond wbond added this to the Build 3186 milestone Feb 10, 2019
@wbond
Copy link
Member

wbond commented Feb 10, 2019

Just to note since I don’t think the change log included it, but we now use Oniguruma 6.9.1.

@Thom1729 Thom1729 reopened this Mar 11, 2019
@Thom1729
Copy link
Author

Reopening this issue because, as I've discovered, Sublime's internal engine does not support these properties. Using them in a regexp triggers the Oniguruma engine. For something as central as parsing identifiers, this would probably be an unacceptable performance regression for most syntaxes.

Now that Sublime's Oniguruma version implements these properties, I hope that it may make sense to add them to the internal engine.

@FichteFoll
Copy link
Collaborator

FichteFoll commented Mar 11, 2019

For issue tracking purposes, I believe it makes more sense to address this in a new issue.

@Thom1729
Copy link
Author

Makes sense, especially since this one's assigned to a milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants