ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

GrabYourPitchforks · 2020-01-31T22:25:20Z

The CompareInfo.IndexOf(..., CompareOptions.OrdinalIgnoreCase) functions on ICU use u_toupper, though they should really use u_caseFold. Case mapping (e.g., u_toupper and u_tolower) are used when converting strings to a standard casing. Case folding (u_caseFold) should be used when comparing strings for ordinal / non-linguistic equality. In particular, we should use simple case folding instead of full case folding.

Example line that exhibits the problem:

runtime/src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c

Line 545 in 5cc6019

return u_toupper(one) == u_toupper(two);

This means that, for instance, the strings "ß" (U+00DF LATIN SMALL LETTER SHARP S) and "ẞ" (U+1E9E LATIN CAPITAL LETTER SHARP S) will be treated as unequal under an OrdinalIgnoreCase comparer.

To be fair, the current behavior of performing an uppercase mapping does match the Windows NLS behavior, but the Windows NLS behavior is a legacy behavior that for compatibility reasons cannot be updated to match Unicode best practices as described in https://unicode.org/faq/casemap_charprop.html. In the ICU code paths we should follow the Unicode recommendations as often as we can.

The text was updated successfully, but these errors were encountered:

tarekgh · 2020-08-12T17:04:55Z

I am working on optimizing ordinal cases in general but I am not going to change the way we do the toupper. the reason is it is easier to describe that the ordinal casing behavior is what UnicodeDatat.txt is providing. It is also case folding can introduce some cases that could be not expected for ordinal. I would not risk doing that and we can look in the future if we need to do it. closing it but feel free to let me know if you disagree.

GrabYourPitchforks · 2020-08-14T19:00:31Z

@tarekgh Sure, that sounds reasonable. Do you think there's still interest in a new dedicated "perform case folding" API? You've previously pointed out the one in corefxlab, but I'm trying to figure out when (if ever) we'd port it to runtime.

tarekgh · 2020-08-14T19:04:20Z

We should look at the case folding support APIs in the next releases. I am still seeing it is useful to support and we already have requests for it.

GrabYourPitchforks added bug area-System.Globalization labels Jan 31, 2020

GrabYourPitchforks added this to the 5.0 milestone Jan 31, 2020

GrabYourPitchforks self-assigned this Jan 31, 2020

GrabYourPitchforks changed the title ~~ICU: OrdinalIgnoreCase functions convert to upper instead of using case folding~~ ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding Jan 31, 2020

GrabYourPitchforks mentioned this issue Jan 31, 2020

ICU comparison routines should use case folding, not case mapping #27540

Closed

GrabYourPitchforks mentioned this issue Feb 8, 2020

Clean up usage of string.IndexOf / ToUpper / ToLower / Trim throughout the framework #31968

Merged

GrabYourPitchforks mentioned this issue Jul 6, 2020

ICU comparison routines should use case folding, not case mapping GrabYourPitchforks/runtime#8

Closed

tarekgh closed this as completed Aug 12, 2020

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

GrabYourPitchforks commented Jan 31, 2020 •

edited

Loading

tarekgh commented Aug 12, 2020

GrabYourPitchforks commented Aug 14, 2020

tarekgh commented Aug 14, 2020

ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

ICU: OrdinalIgnoreCase comparison functions convert to upper instead of using case folding #26961

Comments

GrabYourPitchforks commented Jan 31, 2020 • edited Loading

tarekgh commented Aug 12, 2020

GrabYourPitchforks commented Aug 14, 2020

tarekgh commented Aug 14, 2020

GrabYourPitchforks commented Jan 31, 2020 •

edited

Loading