-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug / Unclear Specs] Strings starting with '1' (0x31) are identical to strings starting with '𑁒' (\U00011052) #97544
Comments
(transferred from dotnet/csharplang#7876) Possibly relevant: |
Tagging subscribers to this area: @dotnet/area-system-globalization Issue DetailsDescriptionI am developing a small library, in which custom string to double parsing is employed. The parsing function also takes care of "unconventional" unicode symbols, i.e. numbers prefixed with I noticed the following (possible) bug when filtering for unicode minus/hyphen signs: string one = 1.ToString();
bool starts = one.StartsWith("\U00011052"); // 𑁒
bool equals_inv = one.Equals("\U00011052", System.StringComparison.InvariantCulture);
bool equals = one.Equals("\U00011052");
System.Console.WriteLine(starts);
System.Console.WriteLine(equals_inv);
System.Console.WriteLine(equals); prints the following output:
Reproduction StepsSee code snippet in the previous section or see the following minimal working example on sharplab. Expected behaviorEither
or
Actual behavior
This can also be observed here: string A = "1";
string B = "𑁒";
_ = A.StartsWith(B); // True
_ = A.Equals(B); // False The two expressions Other informationNow, I get that https://www.compart.com/en/unicode/U+11052 defines the codepoint as "Brahmi Number One", however, I'm not sure why '1' and '𑁒' are considered equal due to their different optical appearance. Furthermore, the documentation on QuestionWould it be possible to clarify this behaviour somehow in the C# docs? Or alternatively fixing the method in the runtime itself?
|
There are plenty of issues around this: #72770 #72992 and so on: https://github.com/dotnet/runtime/issues?q=is%3Aissue+label%3Aarea-System.Globalization+StartsWith+ |
Please have a look at the comment #72992 (comment) explaning the behavior. If you want to get the behavior you desire, please use |
Description
I am developing a small library, in which custom string to double parsing is employed. The parsing function also takes care of "unconventional" unicode symbols, i.e. numbers prefixed with
\u02D6
/\uFE62
/\uFF0B
/\u208A
/\u207A
instead of+
, etc.I noticed the following (possible) bug when filtering for unicode minus/hyphen signs:
prints the following output:
Reproduction Steps
See code snippet in the previous section or see the following minimal working example on sharplab.
Expected behavior
Either
or
Actual behavior
This can also be observed here:
The two expressions
.StartsWith
and.Equals
do not behave consistently when called without passing a value forStringComparison
Other information
Now, I get that https://www.compart.com/en/unicode/U+11052 defines the codepoint as "Brahmi Number One", however, I'm not sure why '1' and '𑁒' are considered equal due to their different optical appearance. Furthermore, the documentation on
String.StartsWith(String)
does not document the varying behaviour in regards toString.Equals(String)
when not supplied with an explicit value forStringComparison
. (see https://learn.microsoft.com/en-us/dotnet/api/system.string.startswith?view=net-8.0#system-string-startswith(system-string))Question
Would it be possible to clarify this behaviour somehow in the C# docs? Or alternatively fixing the method in the runtime itself?
The text was updated successfully, but these errors were encountered: