-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor positive lookaheads better into find optimizations #112107
Conversation
A positive lookahead at the start of a pattern can be used for determining find optimizations even when the non-zero-width portions of the pattern aren't. This helps particularly in cases where the positive lookahead contains an anchor or a literal. Also extends the existing alternation reduction optimization to factor out anchors that begin every branch of an alternation.
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Tip: If you use Visual Studio Code, you can request a review from Copilot before you push from the "Source Control" tab. Learn more
...ies/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexPrefixAnalyzer.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/tests/UnitTests/RegexReductionTests.cs
Show resolved
Hide resolved
src/libraries/System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexNode.cs
Outdated
Show resolved
Hide resolved
.../System.Text.RegularExpressions/src/System/Text/RegularExpressions/RegexFindOptimizations.cs
Show resolved
Hide resolved
* main: (23 commits) add important remarks to NrbfDecoder (dotnet#111286) docs: fix spelling grammar and missing words in clr-code-guide.md (dotnet#112222) Consider type declaration order in MethodImpls (dotnet#111998) Add a feature flag to not use GVM in Linq Select (dotnet#109978) [cDAC] Implement ISOSDacInterface::GetMethodDescPtrFromIp (dotnet#110755) Restructure JSImport/JSExport generators to share more code and utilize more Microsoft.Interop.SourceGeneration shared code (dotnet#107769) Add more detailed explanations to control-flow RegexOpcode values (dotnet#112170) Add repo-specific condition to labeling workflows (dotnet#112169) Fix bad assembly when a nested exported type is marked via link.xml (dotnet#107945) Make `CalculateAssemblyAction` virtual. (dotnet#112154) JIT: Enable reusing profile-aware DFS trees between phases (dotnet#112198) Add support for LDAPTLS_CACERTDIR \ TrustedCertificateDirectory (dotnet#111877) JIT: Support custom `ClassLayout` instances with GC pointers in them (dotnet#112064) Factor positive lookaheads better into find optimizations (dotnet#112107) Add ImmutableCollectionsMarshal.AsMemory (dotnet#112177) [mono] ILStrip write directly to the output filestream (dotnet#112142) Allow the NativeAOT runtime pack to be specified as the ILC runtime package (dotnet#111876) JIT: some reworking for conditional escape analysis (dotnet#112194) Replace HELPER_METHOD_FRAME with DynamicHelperFrame in patchpoints (dotnet#112025) [Android] Decouple runtime initialization and entry point execution for Android sample (dotnet#111742) ...
A positive lookahead at the start of a pattern can be used for determining find optimizations even when the non-zero-width portions of the pattern aren't. This helps particularly in cases where the positive lookahead contains an anchor or a literal.
Also extends the existing alternation reduction optimization to factor out anchors that begin every branch of an alternation.
As an example of how this applies, this is one of the expressions in our list:
Note that the pattern begins with a positive lookahead and that what comes after it isn't particularly searchable. This leads to the following generated TryFindNextPossibleStartingPosition routine:
which effectively always just returns true (as long as the input is long enough), which means we end up trying to match at most positions.
Now with the change, we get this:
Note the new
pos == 0
check. That happens because we will now factor in the ^ from the positive lookahead's alternation.