-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688
Handle non-ASCII strings in GetNonRandomizedHashCodeOrdinalIgnoreCase #44688
Conversation
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
@GrabYourPitchforks I'm not sure how to fix https://github.com/dotnet/runtime/blob/master/src/libraries/System.Collections/tests/Generic/Dictionary/HashCollisionScenarios/OutOfBoundsRegression.cs#L194-L196 test now - it relies on fact that we can easily generate collisions for a specific hash but actually 99.9.. % of such generated strings are non-ASCII so they go the slow path and don't produce collisions. So I need to somehow generate 100 ascii string with the same hashcode 🤔 |
@EgorBo I think the only reason they're not producing the same hash code is that we're calling the Marvin routine. If we update the fallback logic to perform an uppercase conversion but still use the naïve bit-shifting routines already present in |
ah ok, let me rewrite it then, thanks! |
@GrabYourPitchforks updated, could you please take a look if it's what you meant |
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I left some perf nit comments. These don't need to be addressed; they're mainly to point out areas of low hanging fruit just in case we did end up harming performance and we're looking for some easy ways to knock out a few percentage here and there.
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
I've reproed the unit test issue, one sec and I'll get a workaround out to you. |
@EgorBo The commit GrabYourPitchforks@113d1e6 in my private branch includes three changes that will be of interest here:
Might be worth merging the patch into this PR so that both issues can be closed at once? (I was going to submit my patch as its own PR to address #44695, but since that patch relies on this PR being committed, if I were to submit it prematurely all of the unit tests would fail.) |
Is there a more limited version of the change that might be lower risk to backport? |
The least code churn version of this change for servicing purposes would be:
Together, these will essentially disable the perf optimization that was done in #36252, and |
Hmm, that would be a significant takeback. |
…domizedHashCodeOrdinalIgnoreCase
/azp run runtime |
Azure Pipelines successfully started running 1 pipeline(s). |
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/String.Comparison.cs
Outdated
Show resolved
Hide resolved
return (int)(hash1 + (hash2 * 1566083941)); | ||
|
||
NotAscii: | ||
return GetNonRandomizedHashCodeOrdinalIgnoreCaseSlow(this, hash1, hash2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is correct to pass hash1 and hash2 into the slow method. We could have processed some number of characters already and so hash1 and hash2 may not have their original values.
It would be nice to add a test case that covers this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, indeed
I am not sure it can be tested anyhow other than hardcoding the expected hashcode value 🤔
The new test is failing on Mono. Could you please investigate? |
I used runfo to pull the logs from the helix machine, but there don't seem to be any failures recorded? The zip file I downloaded contains the script used to run the tests, but I don't see anything that resembles output from the test run. Will dig in further tomorrow. |
I see this failures when I go to
|
Ah, it's because iOS is always invariant atm (I'm integrating ICU there).
Will ignore that test. |
Passed CI on Linux before. CI failures are #45061 |
/backport to release/5.0 |
Started backporting to release/5.0: https://github.com/dotnet/runtime/actions/runs/376611056 |
Fixes #44681