-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consider implementing the means of abstraction used by Perl5/PCRE regexes #962
Comments
I think this pushes regexes, at least how they are currently conceived, too far IMO. I rarely if ever see this sort of advanced syntax used in the wild, and in my experience it isn't well supported outside of Perl and PCRE. If you get to the point where your regex is so complicated as to benefit from this, then I'd rather see you use your programming language features to assemble the regex. (In Rust, that might be |
That’s a shame. The whole point of adding advanced features is to promote them to a new audience outside of Perl and Raku (and Python and Ruby and…) who might not otherwise have access to them.
True, as presented here it could be done purely with Of course, I would also like to see some regex crates that don’t embed the regexes in strings at all. Something akin to Emacs’ rx syntax would be really nice to have as well. Of course that is completely orthogonal to the question of which features the crate has. |
I don't think using Basically what this means is that there has to be some "reasonableness" threshold at which point we say "no there are differences here and we aren't going to resolve them." I think this sort of complex syntax falls well beyond that reasonableness threshold for me personally. |
I was looking through this repo to see whether anyone had translated from emacs regexp syntax using regex-syntax yet (my eventual goal is to produce an emacs native module). I'm pretty sure I could do that by transpiling the emacs regexp pattern string to a As mentioned by @db48x in #962 (comment), emacs provides a high-level At first glance I think e.g. @BurntSushi is that how you would suggest architecting this if I wanted to interface other regexp interfaces to EDIT: started this in https://github.com/cosmicexplorer/emacs-regexp! |
Thanks for the question! Although I'd love for these kinds of things to be new Discussion questions so that they can be curated and made discoverable for others.
Maybe? I've never done the exercise before, so I'm not sure. It's plausible the right target is Either target is reasonable IMO. With that said, if you need to deal with Unicode, then targeting the |
Done! #1167 |
Perl and PCRE define a means of defining names for parts of a regex and then reusing to those parts by name much as one will often define functions in Rust and then call them by name.
An example:
I’m not very familiar with the implementation details, but I believe that most of the changes needed to implement this would be in regex-syntax, plus the part of the compiler that builds the Hir. The compiler would replace each call by the definition and then compile that. The resulting Hir would then be identical to one where the named expressions were written out by hand.
Given the design goals of this crate, I would recommend keeping the language regular by returning an error when a recursive call is detected. This can be done during compilation by simply keeping a stack of calls and returning an error if a call is made to a group that is already in the stack.
However, it would still be desirable to keep the list of definitions around even after the regex is fully compiled. This would allow them to be reused while building new regexes. Suppose one had a regex containing definitions, called a grammar:
This grammar could be reused to create multiple new regexes by substitution:
Obviously, one could imagine additional APIs beyond simple textual substitution, but that is a separate topic. Similarly, the exact syntax for definitions and calls could be different. PCRE supports three variations originating from Perl5, Python, and Ruby; we could choose any of them or invent our own.
The text was updated successfully, but these errors were encountered: