You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The CompareInfo.IndexOf(..., CompareOptions.OrdinalIgnoreCase) functions on ICU use u_toupper, though they should really use u_caseFold. Case mapping (e.g., u_toupper and u_tolower) are used when converting strings to a standard casing. Case folding (u_caseFold) should be used when comparing strings for ordinal / non-linguistic equality. In particular, we should use simple case folding instead of full case folding.
This means that, for instance, the strings "ß" (U+00DF LATIN SMALL LETTER SHARP S) and "ẞ" (U+1E9E LATIN CAPITAL LETTER SHARP S) will be treated as unequal under an OrdinalIgnoreCase comparer.
To be fair, the current behavior of performing an uppercase mapping does match the Windows NLS behavior, but the Windows NLS behavior is a legacy behavior that for compatibility reasons cannot be updated to match Unicode best practices as described in https://unicode.org/faq/casemap_charprop.html. In the ICU code paths we should follow the Unicode recommendations as often as we can.
The text was updated successfully, but these errors were encountered:
GrabYourPitchforks
changed the title
ICU: OrdinalIgnoreCase functions convert to upper instead of using case folding
ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding
Jan 31, 2020
I am working on optimizing ordinal cases in general but I am not going to change the way we do the toupper. the reason is it is easier to describe that the ordinal casing behavior is what UnicodeDatat.txt is providing. It is also case folding can introduce some cases that could be not expected for ordinal. I would not risk doing that and we can look in the future if we need to do it. closing it but feel free to let me know if you disagree.
@tarekgh Sure, that sounds reasonable. Do you think there's still interest in a new dedicated "perform case folding" API? You've previously pointed out the one in corefxlab, but I'm trying to figure out when (if ever) we'd port it to runtime.
The
CompareInfo.IndexOf(..., CompareOptions.OrdinalIgnoreCase)
functions on ICU useu_toupper
, though they should really useu_caseFold
. Case mapping (e.g.,u_toupper
andu_tolower
) are used when converting strings to a standard casing. Case folding (u_caseFold
) should be used when comparing strings for ordinal / non-linguistic equality. In particular, we should use simple case folding instead of full case folding.Example line that exhibits the problem:
runtime/src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Line 545 in 5cc6019
This means that, for instance, the strings
"ß"
(U+00DF LATIN SMALL LETTER SHARP S) and"ẞ"
(U+1E9E LATIN CAPITAL LETTER SHARP S) will be treated as unequal under anOrdinalIgnoreCase
comparer.To be fair, the current behavior of performing an uppercase mapping does match the Windows NLS behavior, but the Windows NLS behavior is a legacy behavior that for compatibility reasons cannot be updated to match Unicode best practices as described in https://unicode.org/faq/casemap_charprop.html. In the ICU code paths we should follow the Unicode recommendations as often as we can.
The text was updated successfully, but these errors were encountered: