Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544

Closed
Unknown6656 opened this issue Jan 26, 2024 · 4 comments

Comments

@Unknown6656
Copy link

Unknown6656 commented Jan 26, 2024

Description

I am developing a small library, in which custom string to double parsing is employed. The parsing function also takes care of "unconventional" unicode symbols, i.e. numbers prefixed with \u02D6/\uFE62/\uFF0B/\u208A/\u207A instead of +, etc.

I noticed the following (possible) bug when filtering for unicode minus/hyphen signs:

string one = 1.ToString();
bool starts = one.StartsWith("\U00011052"); // 𑁒
bool equals_inv = one.Equals("\U00011052", System.StringComparison.InvariantCulture);
bool equals = one.Equals("\U00011052");

System.Console.WriteLine(starts);
System.Console.WriteLine(equals_inv);
System.Console.WriteLine(equals);

prints the following output:

True
True
False

Reproduction Steps

See code snippet in the previous section or see the following minimal working example on sharplab.

Expected behavior

Either

False
True
False

or

False
True
False

Actual behavior

True
True
False

This can also be observed here:

string A = "1";
string B = "𑁒";

_ = A.StartsWith(B); // True
_ = A.Equals(B); // False

The two expressions .StartsWith and .Equals do not behave consistently when called without passing a value for StringComparison

Other information

Now, I get that https://www.compart.com/en/unicode/U+11052 defines the codepoint as "Brahmi Number One", however, I'm not sure why '1' and '𑁒' are considered equal due to their different optical appearance. Furthermore, the documentation on String.StartsWith(String) does not document the varying behaviour in regards to String.Equals(String) when not supplied with an explicit value for StringComparison. (see https://learn.microsoft.com/en-us/dotnet/api/system.string.startswith?view=net-8.0#system-string-startswith(system-string))

Question

Would it be possible to clarify this behaviour somehow in the C# docs? Or alternatively fixing the method in the runtime itself?

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 26, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 26, 2024
@Unknown6656
Copy link
Author

Unknown6656 commented Jan 26, 2024

@Unknown6656 Unknown6656 changed the title Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) [Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) Jan 26, 2024
@EgorBo EgorBo added area-System.Globalization and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 26, 2024
@ghost
Copy link

ghost commented Jan 26, 2024

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I am developing a small library, in which custom string to double parsing is employed. The parsing function also takes care of "unconventional" unicode symbols, i.e. numbers prefixed with \u02D6/\uFE62/\uFF0B/\u208A/\u207A instead of +, etc.

I noticed the following (possible) bug when filtering for unicode minus/hyphen signs:

string one = 1.ToString();
bool starts = one.StartsWith("\U00011052"); // 𑁒
bool equals_inv = one.Equals("\U00011052", System.StringComparison.InvariantCulture);
bool equals = one.Equals("\U00011052");

System.Console.WriteLine(starts);
System.Console.WriteLine(equals_inv);
System.Console.WriteLine(equals);

prints the following output:

True
True
False

Reproduction Steps

See code snippet in the previous section or see the following minimal working example on sharplab.

Expected behavior

Either

False
True
False

or

False
True
False

Actual behavior

True
True
False

This can also be observed here:

string A = "1";
string B = "𑁒";

_ = A.StartsWith(B); // True
_ = A.Equals(B); // False

The two expressions .StartsWith and .Equals do not behave consistently when called without passing a value for StringComparison

Other information

Now, I get that https://www.compart.com/en/unicode/U+11052 defines the codepoint as "Brahmi Number One", however, I'm not sure why '1' and '𑁒' are considered equal due to their different optical appearance. Furthermore, the documentation on String.StartsWith(String) does not document the varying behaviour in regards to String.Equals(String) when not supplied with an explicit value for StringComparison. (see https://learn.microsoft.com/en-us/dotnet/api/system.string.startswith?view=net-8.0#system-string-startswith(system-string))

Question

Would it be possible to clarify this behaviour somehow in the C# docs? Or alternatively fixing the method in the runtime itself?

Author: Unknown6656
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@huoyaoyuan
Copy link
Member

There are plenty of issues around this: #72770 #72992 and so on: https://github.com/dotnet/runtime/issues?q=is%3Aissue+label%3Aarea-System.Globalization+StartsWith+

@tarekgh
Copy link
Member

tarekgh commented Jan 26, 2024

Please have a look at the comment #72992 (comment) explaning the behavior. If you want to get the behavior you desire, please use StringComparison.Ordinal when calling StartsWith.

@tarekgh tarekgh closed this as completed Jan 26, 2024
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jan 26, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants