-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests sorted according to random index instead of random hash #2070
Conversation
Something like this was one of my big reservations about the change. However, |
Codecov Report
@@ Coverage Diff @@
## v2.x #2070 +/- ##
==========================================
+ Coverage 88.75% 88.79% +0.05%
==========================================
Files 138 138
Lines 5651 5683 +32
==========================================
+ Hits 5015 5046 +31
- Misses 636 637 +1 |
About reproducibility I am not completely sure? std::minstd_rand is defined to have the same parameters always http://www.cplusplus.com/reference/random/minstd_rand/, so it should be reproducible. std::uniform_int_distribution is just a mapping from the output of the engine to an int, but sure, it would be safer to cast the output of the engine to a uint64_t. In any case, you are absolutely right that subset invariance is not preserved. My bad. So scrape that. I pushed a new proposal, which is kind of hacky, but it works preserving the current features. It basically adds always the same string to all test cases, in this way the spread of the hashes improves a lot. |
The transformation within e: I will also need to investigate how come that your original PR passed tests, because there should be a test for the subset invariance. |
Yeah, I was going to ask about that, because I ran the tests and I thought that this would be checked. But thanks a lot for bothering to illustrate the issue with std::uniform_int_distribution! :-D Good to know. |
Hm, how can I get the .clang-tidy file used by Travis? It would be faster to get that sorted locally. If the approach I am suggesting makes sense, of course. |
Catch2 doesn't use clang-tidy. It is just a plain warning on specific version of Clang. |
OK, addressed, it's building now. Would it make sense to mention that clang should be used in contributing.md? |
Eh, the CI also uses GCC and MSVC, all of them across bunch of different versions. Also I will have to look at a better way of doing this, the current proposal would have significant runtime overhead... I have 2 ideas right now
@jbytheway Ping. |
Sure, I would also use a shorter string. 2^64/1099511628211 ~ 2^16, so if this estimate is any reasonable (which I don't know 😬) two characters should suffice. I did a quick test appending "abcd" and it seems to do the trick, the distribution of the 24 combinations seems reasonably uniform... |
Yeah, I'd expect mixing in just a couple of additional characters ought to suffice. But since these are arbitrary extra mixing, it's probably better to use full |
That sounds better, yeah. Any suggestions for the value of the constant? |
I played around with two large random numbers, and the spread was not great. I get better results with 4 numbers, even if they are not very large. I'll look into this and suggest a new proposal, unless someone has a clear idea :-D |
OK, so here is a new proposal! The original authors suggest one recipe to improve dispersion: XOR the upper half of the hash with the lower half. This of course reduces the hash to 32 bits, but for the current purpose I guess that it is more than enough. This is discussed in the wikipedia article on the algorithm: |
Okay, I think there is some perf budget that can be given to the hashing as an extra -- I am thinking 4 numbers drawn from the PRNG. Then there are 2 things to do
I am going to open a new issue for 2). @loximann Do you want to make that change, or should I do it? |
Sorry, which change do you mean? I can investigate 2, and for 1 I was thinking that now that the request seems reasonable, I can do a more proper analysis of the different alternatives before deciding for one or another. How does this sound? |
Right, sorry. I meant the change in 1, and I would definitely like to see a more thorough analysis. |
Good good, I'll get on with it! I'll post my results here at some point this week. |
Alright, so this little thing turned out being much more fun than expected, and I ended up doing this unnecessarily deep analysis: I tested a bunch of possibilities, and the one that worked the best was the original FNV-1a algorithm, fusing the seed as suffix, followed by multiplication folding (multiplying the lower 32 bits with the upper 32 bits). Speed-wise it looks OK, and I would expect that for up to ~100 000 test cases it should work perfectly fine. Even for more test cases, when hash collisions are sure to happen, I would wager that the impact should be negligible for the needs of randomization in Catch2. I look forward to your comments, if there is anything else you think I should check, please let me know. |
@loximann That looks really nice, good work. 👍 On the code:
Some notes on the text:
|
Thank you! And thanks for the nice comments! I'll definitely check out the microbenchmarking facilities, i had no idea they existed and they'll come in handy. Actually, I'm calculating uncertainties, but they are much smaller than the variability I observe from run to run, so that's why I decided to skip them. And I'll fix the build... I thought I had sparkled enough static_casts around, but I guess I missed a spot. Any recommendations other than default_random_engine? minstd_rnd? |
OK, this looks promising. What is the policy for the final merge? Should I squash the commits? Anything else? |
Hm, I am not quite sure what broke the last build attempt? The log only shows "An error occurred while generating the build script.". |
Just random TravisCI breakage. As for PR, the preference is for small atomic commit, which for this one should be 1 I guess. |
I also did some investigations into the (not)failing tests, and I've got nothing. If I intentionally break the ordering by doing plain Either the passing commit was lucky, and the seeds just happened to work, or there is some weirder edge case that is harder to reproduce. |
Yay! Could you point me to which test case is checking that? I could have a look. |
Catch2/projects/CMakeLists.txt Lines 483 to 484 in dc7e705
https://github.com/catchorg/Catch2/blob/v2.x/projects/TestScripts/testRandomOrder.py |
Description
Use a random number instead of a hash to sort the tests.
Motivation
In v2.x, test cases with names differing only in the last character are almost always run one after the other. For example, for 4 tests called "a1", "a2", "b1" and "b2", only 8 combinations out of the possible 24 were observed in 1000 runs with different seeds.