[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544

Unknown6656 · 2024-01-26T11:55:30Z

Description

I am developing a small library, in which custom string to double parsing is employed. The parsing function also takes care of "unconventional" unicode symbols, i.e. numbers prefixed with \u02D6/\uFE62/\uFF0B/\u208A/\u207A instead of +, etc.

I noticed the following (possible) bug when filtering for unicode minus/hyphen signs:

string one = 1.ToString();
bool starts = one.StartsWith("\U00011052"); // 𑁒
bool equals_inv = one.Equals("\U00011052", System.StringComparison.InvariantCulture);
bool equals = one.Equals("\U00011052");

System.Console.WriteLine(starts);
System.Console.WriteLine(equals_inv);
System.Console.WriteLine(equals);

prints the following output:

True
True
False

Reproduction Steps

See code snippet in the previous section or see the following minimal working example on sharplab.

Expected behavior

Either

False
True
False

or

False
True
False

Actual behavior

True
True
False

This can also be observed here:

string A = "1";
string B = "𑁒";

_ = A.StartsWith(B); // True
_ = A.Equals(B); // False

The two expressions .StartsWith and .Equals do not behave consistently when called without passing a value for StringComparison

Other information

Now, I get that https://www.compart.com/en/unicode/U+11052 defines the codepoint as "Brahmi Number One", however, I'm not sure why '1' and '𑁒' are considered equal due to their different optical appearance. Furthermore, the documentation on String.StartsWith(String) does not document the varying behaviour in regards to String.Equals(String) when not supplied with an explicit value for StringComparison. (see https://learn.microsoft.com/en-us/dotnet/api/system.string.startswith?view=net-8.0#system-string-startswith(system-string))

Question

Would it be possible to clarify this behaviour somehow in the C# docs? Or alternatively fixing the method in the runtime itself?

The text was updated successfully, but these errors were encountered:

Unknown6656 · 2024-01-26T11:56:45Z

(transferred from dotnet/csharplang#7876)

Possibly relevant:

Improving the developer experience with regard to default string globalization #43956

ghost · 2024-01-26T12:20:17Z

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

I am developing a small library, in which custom string to double parsing is employed. The parsing function also takes care of "unconventional" unicode symbols, i.e. numbers prefixed with \u02D6/\uFE62/\uFF0B/\u208A/\u207A instead of +, etc.

I noticed the following (possible) bug when filtering for unicode minus/hyphen signs:

string one = 1.ToString();
bool starts = one.StartsWith("\U00011052"); // 𑁒
bool equals_inv = one.Equals("\U00011052", System.StringComparison.InvariantCulture);
bool equals = one.Equals("\U00011052");

System.Console.WriteLine(starts);
System.Console.WriteLine(equals_inv);
System.Console.WriteLine(equals);

prints the following output:

True
True
False

Reproduction Steps

See code snippet in the previous section or see the following minimal working example on sharplab.

Expected behavior

Either

False
True
False

or

False
True
False

Actual behavior

True
True
False

This can also be observed here:

string A = "1";
string B = "𑁒";

_ = A.StartsWith(B); // True
_ = A.Equals(B); // False

The two expressions .StartsWith and .Equals do not behave consistently when called without passing a value for StringComparison

Other information

Now, I get that https://www.compart.com/en/unicode/U+11052 defines the codepoint as "Brahmi Number One", however, I'm not sure why '1' and '𑁒' are considered equal due to their different optical appearance. Furthermore, the documentation on String.StartsWith(String) does not document the varying behaviour in regards to String.Equals(String) when not supplied with an explicit value for StringComparison. (see https://learn.microsoft.com/en-us/dotnet/api/system.string.startswith?view=net-8.0#system-string-startswith(system-string))

Question

Would it be possible to clarify this behaviour somehow in the C# docs? Or alternatively fixing the method in the runtime itself?

Author:	Unknown6656
Assignees:	-
Labels:	`area-System.Globalization`, `untriaged`
Milestone:	-

huoyaoyuan · 2024-01-26T12:32:50Z

There are plenty of issues around this: #72770 #72992 and so on: https://github.com/dotnet/runtime/issues?q=is%3Aissue+label%3Aarea-System.Globalization+StartsWith+

tarekgh · 2024-01-26T17:59:29Z

Please have a look at the comment #72992 (comment) explaning the behavior. If you want to get the behavior you desire, please use StringComparison.Ordinal when calling StartsWith.

ghost added the untriaged New issue has not been triaged by the area owner label Jan 26, 2024

dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 26, 2024

Unknown6656 changed the title ~~Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052)~~ [Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) Jan 26, 2024

EgorBo added area-System.Globalization and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jan 26, 2024

tarekgh closed this as completed Jan 26, 2024

ghost removed the untriaged New issue has not been triaged by the area owner label Jan 26, 2024

github-actions bot locked and limited conversation to collaborators Feb 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544

[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544

Unknown6656 commented Jan 26, 2024 •

edited

Loading

Unknown6656 commented Jan 26, 2024 •

edited

Loading

ghost commented Jan 26, 2024

Description

Reproduction Steps

Expected behavior

Actual behavior

Other information

Question

huoyaoyuan commented Jan 26, 2024

tarekgh commented Jan 26, 2024

[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544

[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544

Comments

Unknown6656 commented Jan 26, 2024 • edited Loading

Description

Reproduction Steps

Expected behavior

Actual behavior

Other information

Question

Unknown6656 commented Jan 26, 2024 • edited Loading

ghost commented Jan 26, 2024

Description

Reproduction Steps

Expected behavior

Actual behavior

Other information

Question

huoyaoyuan commented Jan 26, 2024

tarekgh commented Jan 26, 2024

Unknown6656 commented Jan 26, 2024 •

edited

Loading

Unknown6656 commented Jan 26, 2024 •

edited

Loading