Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex.Split() with \G stops after first item #44957

Closed
sharpjs opened this issue Nov 19, 2020 · 2 comments · Fixed by #44975
Closed

Regex.Split() with \G stops after first item #44957

sharpjs opened this issue Nov 19, 2020 · 2 comments · Fixed by #44975

Comments

@sharpjs
Copy link

sharpjs commented Nov 19, 2020

Description

When the \G anchor is used in a (?<=...) construct, the Regex.Split(string, string) method stops splitting after the first item.

This bug is also exposed in PowerShell 7.1.0: PowerShell/PowerShell#14112

Example

Regex.Split("aabbccdd", @"(?<=\G..)(?=..)")
Case Result
Expected: { "aa", "bb", "cc", "dd" }
.NET 5.0 { "aa", "bbccdd" }
.NET Core 3.1 { "aa", "bb", "cc", "dd" }

Reproduction

To reproduce, clone sharpjs/Net50RegexSplitBug and run tests.

image

Configuration

.NET 5.0.100
Windows 20H2 (OS Build 19042.630) x64
Not specific to this configuration; easily reproducible using PowerShell 7.1.0 on Linux.

docker run -it --rm mcr.microsoft.com/powershell:preview
[regex]::Split("aabbccdd", "(?<=\G..)(?=..)")

Regression?

Yes, in .NET 5.0.100.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.RegularExpressions untriaged New issue has not been triaged by the area owner labels Nov 19, 2020
@ghost
Copy link

ghost commented Nov 19, 2020

Tagging subscribers to this area: @eerhardt, @pgovind, @jeffhandley
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

When the \G anchor is used in a (?<=...) construct, the Regex.Split(string, string) method stops splitting after the first item.

This bug is also exposed in PowerShell 7.1.0: PowerShell/PowerShell#14112

Example

Regex.Split("aabbccdd", @"(?<=\G..)(?=..)")
Case Result
Expected: { "aa", "bb", "cc", "dd" }
.NET 5.0 { "aa", "bbccdd" }
.NET Core 3.1 { "aa", "bb", "cc", "dd" }

Reproduction

To reproduce, clone sharpjs/Net50RegexSplitBug and run tests.

image

Configuration

.NET 5.0.100
Windows 20H2 (OS Build 19042.630) x64
Not specific to this configuration; easily reproducible using PowerShell 7.1.0 on Linux.

docker run -it --rm mcr.microsoft.com/powershell:preview
[regex]::Split("aabbccdd", "(?<=\G..)(?=..)")

Regression?

Yes, in .NET 5.0.100.

Author: sharpjs
Assignees: -
Labels:

area-System.Text.RegularExpressions, untriaged

Milestone: -

@stephentoub stephentoub self-assigned this Nov 19, 2020
@stephentoub stephentoub added bug and removed untriaged New issue has not been triaged by the area owner labels Nov 19, 2020
@stephentoub stephentoub added this to the 6.0.0 milestone Nov 19, 2020
@stephentoub
Copy link
Member

Thanks for the repro. This is a one-line fix. I'll put up a PR tonight.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 20, 2020
@stephentoub stephentoub modified the milestones: 6.0.0, 5.0.1 Jan 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants