Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests sorted according to random index instead of random hash #2070

Merged
merged 1 commit into from
Nov 17, 2020

Conversation

loximann
Copy link
Contributor

Description

Use a random number instead of a hash to sort the tests.

Motivation

In v2.x, test cases with names differing only in the last character are almost always run one after the other. For example, for 4 tests called "a1", "a2", "b1" and "b2", only 8 combinations out of the possible 24 were observed in 1000 runs with different seeds.

@horenmar
Copy link
Member

Something like this was one of my big reservations about the change.

However, uniform_int_distribution is problematic, as it is not reproducible across different platforms. At first glance, I also don't see how this change keeps the random order subset-invariant (this means that if a subset of all tests is selected, that subset will have the same relative order given same seed).

@codecov
Copy link

codecov bot commented Oct 23, 2020

Codecov Report

Merging #2070 (d654b2c) into v2.x (8dd25b0) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             v2.x    #2070      +/-   ##
==========================================
+ Coverage   88.75%   88.79%   +0.05%     
==========================================
  Files         138      138              
  Lines        5651     5683      +32     
==========================================
+ Hits         5015     5046      +31     
- Misses        636      637       +1     

@loximann
Copy link
Contributor Author

About reproducibility I am not completely sure? std::minstd_rand is defined to have the same parameters always http://www.cplusplus.com/reference/random/minstd_rand/, so it should be reproducible. std::uniform_int_distribution is just a mapping from the output of the engine to an int, but sure, it would be safer to cast the output of the engine to a uint64_t.

In any case, you are absolutely right that subset invariance is not preserved. My bad.

So scrape that. I pushed a new proposal, which is kind of hacky, but it works preserving the current features. It basically adds always the same string to all test cases, in this way the spread of the hashes improves a lot.

@horenmar
Copy link
Member

horenmar commented Oct 26, 2020

The transformation within uniform_int_distribution is unspecified, so every implementation does its own thing.


e: I will also need to investigate how come that your original PR passed tests, because there should be a test for the subset invariance.

@loximann
Copy link
Contributor Author

Yeah, I was going to ask about that, because I ran the tests and I thought that this would be checked.

But thanks a lot for bothering to illustrate the issue with std::uniform_int_distribution! :-D Good to know.

@loximann
Copy link
Contributor Author

Hm, how can I get the .clang-tidy file used by Travis? It would be faster to get that sorted locally.

If the approach I am suggesting makes sense, of course.

@horenmar
Copy link
Member

Catch2 doesn't use clang-tidy. It is just a plain warning on specific version of Clang.

@loximann
Copy link
Contributor Author

OK, addressed, it's building now.

Would it make sense to mention that clang should be used in contributing.md?

@horenmar
Copy link
Member

Eh, the CI also uses GCC and MSVC, all of them across bunch of different versions.

Also I will have to look at a better way of doing this, the current proposal would have significant runtime overhead... I have 2 ideas right now

  1. Actually use the generated bytes from RNG not to seed the hash, but instead to append them at the end.
  2. Significantly shorter append string (also avoid having it as std::string and relying on optimizer to avoid the allocation if it can)

@jbytheway Ping.

@loximann
Copy link
Contributor Author

Sure, I would also use a shorter string. 2^64/1099511628211 ~ 2^16, so if this estimate is any reasonable (which I don't know 😬) two characters should suffice. I did a quick test appending "abcd" and it seems to do the trick, the distribution of the 24 combinations seems reasonably uniform...

@jbytheway
Copy link
Contributor

Yeah, I'd expect mixing in just a couple of additional characters ought to suffice. But since these are arbitrary extra mixing, it's probably better to use full uint64_t values rather than just chars. That is probably just as fast and ought to mix better? You might get away with adding just one xor-and-multiply?

@loximann
Copy link
Contributor Author

That sounds better, yeah. Any suggestions for the value of the constant?

@loximann
Copy link
Contributor Author

I played around with two large random numbers, and the spread was not great. I get better results with 4 numbers, even if they are not very large. I'll look into this and suggest a new proposal, unless someone has a clear idea :-D

@loximann
Copy link
Contributor Author

OK, so here is a new proposal! The original authors suggest one recipe to improve dispersion: XOR the upper half of the hash with the lower half. This of course reduces the hash to 32 bits, but for the current purpose I guess that it is more than enough. This is discussed in the wikipedia article on the algorithm:
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function#Non-cryptographic_hash
However I was playing with it, and XOring didn't cut it. I tried multiplying, and that seems to work. As this is outside my area of expertise, I checked in with Fowler (the F in FVN-1a), let's see what he has to say...

@horenmar
Copy link
Member

horenmar commented Nov 7, 2020

Okay, I think there is some perf budget that can be given to the hashing as an extra -- I am thinking 4 numbers drawn from the PRNG. Then there are 2 things to do

  1. Add that extra "suffix" to the hashing
  2. Find out why the initial PR was not failed by CI.

I am going to open a new issue for 2). @loximann Do you want to make that change, or should I do it?

@loximann
Copy link
Contributor Author

loximann commented Nov 8, 2020

Sorry, which change do you mean? I can investigate 2, and for 1 I was thinking that now that the request seems reasonable, I can do a more proper analysis of the different alternatives before deciding for one or another. How does this sound?

@horenmar
Copy link
Member

horenmar commented Nov 8, 2020

Right, sorry. I meant the change in 1, and I would definitely like to see a more thorough analysis.

@loximann
Copy link
Contributor Author

loximann commented Nov 9, 2020

Good good, I'll get on with it! I'll post my results here at some point this week.

@loximann
Copy link
Contributor Author

Alright, so this little thing turned out being much more fun than expected, and I ended up doing this unnecessarily deep analysis:
https://gitlab.com/loximann/testing-hashers-for-shuffling-strings

I tested a bunch of possibilities, and the one that worked the best was the original FNV-1a algorithm, fusing the seed as suffix, followed by multiplication folding (multiplying the lower 32 bits with the upper 32 bits). Speed-wise it looks OK, and I would expect that for up to ~100 000 test cases it should work perfectly fine. Even for more test cases, when hash collisions are sure to happen, I would wager that the impact should be negligible for the needs of randomization in Catch2.

I look forward to your comments, if there is anything else you think I should check, please let me know.

@horenmar
Copy link
Member

@loximann That looks really nice, good work. 👍

On the code:

  • You have bit of a type confusion in there. Making the hasher return hash_t when it is typedefed as 64bit unsigned int and the implementation transforms the value into 32bit unsigned int causes warnings (rightfully), which is what fails the builds.

Some notes on the text:

  • The graphs are not shown inline, not sure why.
  • Catch2's microbenchmarking support would give you the error bars and other statistical measures 😉
  • I think it is interesting enough that you should link it somewhere
    • I recommend giving it a quick look under a spellcheck first
    • I would also note that std::default_random_engine is platform-dependent (if you were to run your tests on Windows, you would probably find it much slower).

@loximann
Copy link
Contributor Author

Thank you! And thanks for the nice comments!

I'll definitely check out the microbenchmarking facilities, i had no idea they existed and they'll come in handy. Actually, I'm calculating uncertainties, but they are much smaller than the variability I observe from run to run, so that's why I decided to skip them.

And I'll fix the build... I thought I had sparkled enough static_casts around, but I guess I missed a spot.

Any recommendations other than default_random_engine? minstd_rnd?

@loximann
Copy link
Contributor Author

OK, this looks promising. What is the policy for the final merge? Should I squash the commits? Anything else?

@loximann
Copy link
Contributor Author

Hm, I am not quite sure what broke the last build attempt? The log only shows "An error occurred while generating the build script.".

@horenmar
Copy link
Member

Just random TravisCI breakage. As for PR, the preference is for small atomic commit, which for this one should be 1 I guess.

@horenmar horenmar merged commit 3ba7455 into catchorg:v2.x Nov 17, 2020
@horenmar
Copy link
Member

I also did some investigations into the (not)failing tests, and I've got nothing. If I intentionally break the ordering by doing plain shuffle, then it breaks.

Either the passing commit was lucky, and the seeds just happened to work, or there is some weirder edge case that is harder to reproduce.

@loximann
Copy link
Contributor Author

Yay!

Could you point me to which test case is checking that? I could have a look.

@horenmar
Copy link
Member

add_test(NAME RandomTestOrdering COMMAND ${PYTHON_EXECUTABLE}
${CATCH_DIR}/projects/TestScripts/testRandomOrder.py $<TARGET_FILE:SelfTest>)

https://github.com/catchorg/Catch2/blob/v2.x/projects/TestScripts/testRandomOrder.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants