-
-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1101.integration.xxh #2580
1101.integration.xxh #2580
Conversation
Hmm, from a "keep code amount we have to care for minimal" perspective: How about using blake2b? It's also faster than sha512 and we already depend on it anyway (and we already have code taking it from an external lib, if available). As we use it for the index files only currently, it (due to lower volume, some GBs) is not as speed critical as the checksums for the backup data written into the repository (TBs). I'ld also like something better than crc32 for the repo, but that is not going to happen soon - so we could delay adding xxhash until we decide to change the repo format (borg 2.0) if we feel the added speed over blake2b is worth it. |
As a general note: I vaguely remember reading some stuff about that the size of the checksum (relative to the size of the data being checksummed) is important. So, e.g., it is bad to use a big checksum/hash (e.g. 256 or 512bit) if there is only little data (few bytes, like e.g. a chunk header). |
chunks.archive.d is large (x-xxx GB) and generation / reading is already CPU constrained. Blake2b is a bit faster than SHA-512, but not that much - in the implementation we're using -, and more importantly, not universally. Edit: This may become a bit less relevant with 1.2 if such operations were parallelized. I'm open to alternate suggestions, but none of the algorithms we are using so far are a convincingly good fit. Though I could live with Blake2. I don't use small iron; I just try to be considerate towards those who do.
That's correct. Considering a channel with a defined BER, an optimal length exists for each individual checksum where the system BER (sum of the true positives and false positives due to corrupted checksum) is minimal. |
Codecov Report
@@ Coverage Diff @@
## master #2580 +/- ##
==========================================
+ Coverage 83.62% 83.66% +0.03%
==========================================
Files 22 22
Lines 8166 8183 +17
Branches 1390 1391 +1
==========================================
+ Hits 6829 6846 +17
Misses 956 956
Partials 381 381
Continue to review full report at Codecov.
|
@enkore do you have throughput numbers of the sha512 / blake2b / xxhash code as used by us? oh, and i forgot about chunks.archive.d... - yeah, that is more volume, unfortunately. |
hmm, is there no libxxh or something for this we could use alternatively? |
With SHA512 merging indices becomes about 75 % (80 s) slower compared I made another series of tests to look at the difference between base That works out to about 12 GB/s inside the application. Simpler The data set is basically the same I used in #2572 and I also merged The relative impact on writing chunks.archive.d is of course lower |
Not that I'm aware of. There is a PyPI package which also vendors the code, but that'd be a pip dependency (still no OS packages) and I'd have to review the code to see whether it's bad or not, I don't have to do that here, since I know it's good. As an side, for many users of XXH it would not make much sense to use a shared library, or, in other words, it would be bad to do so. It's typical for checksumming to be stitched into other processing, e.g. if you're a compressor, you'd stitch the checksum and copying/writing to the output buffer; you want to avoid extra function calls there. (This is essentially the same technique that's used in authenticated crypto for fused EtM, and also by AEADs like AES-GCM or Chapoly). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a test on ppc (BE) and some 32bit system before merging this?
src/borg/algorithms/checksums.pyx
Outdated
XXH64_hash_t XXH64_digest (const XXH64_state_t* statePtr); | ||
|
||
void XXH64_canonicalFromHash(XXH64_canonical_t* dst, XXH64_hash_t hash); | ||
XXH64_hash_t XXH64_hashFromCanonical(const XXH64_canonical_t* src); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have consistent style? compare 30..32 and 34..35 (and other places).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the parentheses placement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, or more in general: futile attempts to align things.
#define XXH_STATIC_ASSERT(c) { enum { XXH_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */ | ||
XXH_PUBLIC_API unsigned XXH_versionNumber (void) { return XXH_VERSION_NUMBER; } | ||
|
||
#ifndef XXH_NO_LONG_LONG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if that is defined? do we have to care?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we'd define this configure-like option then XXH64 would not be compiled in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it is not like we lose XXH64 on some platform / compiler we otherwise support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No.
(Note: Borg already uses xxhash by way of LZ4 which uses xxh32 for its checksum, and contains a verbatim copy of the xxhash sources in its source tree)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll run a test on PPC, I don't expect any surprises though. I don't think I operate x86 any more, so can't test that one easily now. It does work stand-alone with -m32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FUSE tests fail for me in wheezy32 (looks like a missing modprobe), everything else is ok, incl. checksums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem here for the vagrant wheezy32 machine is that a reboot is required as the fuse module does not fit the running kernel after security updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppc BE test was also ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes (see above)
68fa1c7
to
6c91a75
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
A cryptographic hash is slow for providing features not required here. I don't want to use a CRC since we already know from both theory and practice (use in Repository) that they are not that good at detecting storage errors. So xxh64 it is, specifically designed as a fast storage checksum over large data blocks (used in LZ4, zstd and various media/footage-management products).
(If we ever redo the Repository format, using it as a checksum instead makes sense. The implementation is light on cache, unlike table-based CRC, but as fast as crc32_clmul - and for longer blocks it's a bit faster, without requiring more than plain C compiled to plain x64).
Part of my ongoing series to realize #1101 (#1688 #2496 #2502 #2568)