-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paranoia settings are a bit confused #261
Comments
You're right there. Forgot to change it in that place.
Truth be told, I'm still no big fan of that bastard hash. From my point of view it neither adds real security, nor performance advantages. Personally I'd even vote to make a single |
Agreed, was thinking this also; anyone who can't live with the risk of 512 bit hash collisions isn't going to be satisfied by 1024 bits either. Let's drop |
Here's a comparison table based on our standard /usr test with warm caches, R5 1600 cpu:
Another thing I came across in the process when reading about hash collision attacks: if I'm reading them correctly, if two files have identical murmur hashes then they will also have identical seeded murmur hashes (http://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/). So our MURMUR256 and MURMUR512 hashes (which are just multiple MURMUR128 hashes with different seeds) actually offer no better collision resistance than MURMUR128. I suspect the same may also be true for our CITY256 and CITY512 concoctions... |
Cool table! 👍 I guess your point here is also that paranoid is pretty much as fast as
You're right here, but that's due the way we seeded the initial hash. In the end it's 4 times the same hash xor'd with the initial seed value. We'd have to do the seeding like for the sha1 digests (i.e. feed murmur with a constant seed in the beginning for the second half in case of murmur256). But after all, I also don't mind if we remove those "creations" from our codebase. I guess we support by far enough established hash functions. |
Checksum code cleanup well underway: https://github.com/SeeSpotRun/rmlint/tree/checksum-dev |
Hmmm, seems that none of our non-cryptographic hashes have correct streaming implementations. Feeding the data to the hasher in several increments gives a different checksum to when the same data is hashed all at once. The current strategy for these hashes is to use the result from the previous increment as seed for the next increment. That's probably still going to give a hash result with the desirable properties of randomness and collision resistance, but there's really no way to be sure. For murmur in particular there is an issue because it only uses the first 32 bits of the previous increment as seed for the next. So files which differ at the start but have identical endings will have a 1 in 2^32 chance of hash collision instead of the expected 1 in 2^128. My thought is we should drop most of these hashes from
The excellent analysis at https://github.com/rurban/smhasher gives some good clues on which non-crypt hash to go with. |
I generally agree, but should note that hash functions generally work blockwise anyways. Streaming implementations usually do nothing more than a XOR (or similar) of the previous block value with the current block value.
True, that's a problem.
I don't mind dropping some less used ones (like farmhash/city e.g.). But instead of doing implementing "streaming" (which is not a native concept and merely an API concept, as said above) we could do also the following:
Both options are pretty general and work for each non-crypotgraphic hash we have.
Very nice link. |
I've taken this further in https://github.com/SeeSpotRun/rmlint/tree/checksum-dev. I converted murmur and xxhash to streaming form. They have fairly small state structures (40 & 88 bytes respectively). I ditched spooky, city and farmhash because they required large state structs and didn't offer any particular advantages. Based on https://github.com/rurban/smhasher the best-performed 128-bit hash is metro hash, so I have added that to Test results for a synthetic test (256 pairs of duplicate ~4MB sparse files grouped into 32 size groups of 16 files each; 2 cpu cores):
The metro hashes live up to the promise of https://github.com/rurban/smhasher. The 256 and 128 bit variants are my choices for The sha3 results are disappointingly slow, despite the speed results at https://www.linkedin.com/pulse/sha-3-shake-keccak-final-version-william-buchanan. I suspect there is a flaw in our implementation (either the hash function itself or the way we manage threads and buffering). Running Sha512 is way faster than blake and looks like a good candidate for default hash. In which case there is no good candidate for -p other than paranoid. The paranoid hash would be a good candidate for the default hash except that: |
Well even after copying the sha3 code directly from rhash source, sha3 was still extremely slow. So I had a look at the compiler settings and realised that no compiler optimisation is applied with Lines 613 to 618 in dcf23a0
Anyway with -Os the times come down to something more reasonable, and can match the rhash sha3 times if I compile with -O2 .
For the same synthetic benchmark as above:
The 5 hashes highlighted with EDIT: significantly better sha3 performance if compiled with clang vs gcc (clang times added above) |
Definitely. Using
I like your benchmarks, but keep in mind that the performance here is pretty hard to measure. It also depends a lot on the CPU that you are using (especially the instruction set), the chose compiler (you noticed that), what settings you pass to the compiler and also how optimized the implementation is (you tied different ones, but one might perform better on Intel than on AMD). Still you're doing very good work to get the much needed "feeling" for performance here.
This already sounds very reasonable to me (with |
Ah ok thanks for clarifying |
I guess we can close this since #265 was merged. |
Hello, $ rmlint -pp
|
It's still there. From the manpage:
|
thank you, it was confusing since -pp is still in the archlinux manpage: v 2.7.0-1 -p --paranoid / -P --less-paranoid (default) It sounded like -pp was more secure than -p (paranoid) |
Current hashes for different paranoia levels are:
The
-p
option is actually less secure than the default option, due to the default being upgraded to blake2b.Branch https://github.com/SeeSpotRun/rmlint/tree/paranoia proposes a new sequence for discussion. This includes a re-write of bastard hash to provide a more secure option than blake2b:
The text was updated successfully, but these errors were encountered: