Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1101.integration.xxh #2580

Merged
merged 2 commits into from
Jun 2, 2017
Merged

Conversation

enkore
Copy link
Contributor

@enkore enkore commented Jun 1, 2017

A cryptographic hash is slow for providing features not required here. I don't want to use a CRC since we already know from both theory and practice (use in Repository) that they are not that good at detecting storage errors. So xxh64 it is, specifically designed as a fast storage checksum over large data blocks (used in LZ4, zstd and various media/footage-management products).

(If we ever redo the Repository format, using it as a checksum instead makes sense. The implementation is light on cache, unlike table-based CRC, but as fast as crc32_clmul - and for longer blocks it's a bit faster, without requiring more than plain C compiled to plain x64).

Part of my ongoing series to realize #1101 (#1688 #2496 #2502 #2568)

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jun 1, 2017

Hmm, from a "keep code amount we have to care for minimal" perspective:

How about using blake2b? It's also faster than sha512 and we already depend on it anyway (and we already have code taking it from an external lib, if available).

As we use it for the index files only currently, it (due to lower volume, some GBs) is not as speed critical as the checksums for the backup data written into the repository (TBs).

I'ld also like something better than crc32 for the repo, but that is not going to happen soon - so we could delay adding xxhash until we decide to change the repo format (borg 2.0) if we feel the added speed over blake2b is worth it.

@ThomasWaldmann
Copy link
Member

As a general note: I vaguely remember reading some stuff about that the size of the checksum (relative to the size of the data being checksummed) is important.

So, e.g., it is bad to use a big checksum/hash (e.g. 256 or 512bit) if there is only little data (few bytes, like e.g. a chunk header).

@enkore
Copy link
Contributor Author

enkore commented Jun 1, 2017

chunks.archive.d is large (x-xxx GB) and generation / reading is already CPU constrained. Blake2b is a bit faster than SHA-512, but not that much - in the implementation we're using -, and more importantly, not universally. Edit: This may become a bit less relevant with 1.2 if such operations were parallelized.

I'm open to alternate suggestions, but none of the algorithms we are using so far are a convincingly good fit. Though I could live with Blake2. I don't use small iron; I just try to be considerate towards those who do.

As a general note: I vaguely remember reading some stuff about that the size of the checksum (relative to the size of the data being checksummed) is important.

That's correct. Considering a channel with a defined BER, an optimal length exists for each individual checksum where the system BER (sum of the true positives and false positives due to corrupted checksum) is minimal.

@codecov-io
Copy link

codecov-io commented Jun 1, 2017

Codecov Report

Merging #2580 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2580      +/-   ##
==========================================
+ Coverage   83.62%   83.66%   +0.03%     
==========================================
  Files          22       22              
  Lines        8166     8183      +17     
  Branches     1390     1391       +1     
==========================================
+ Hits         6829     6846      +17     
  Misses        956      956              
  Partials      381      381
Impacted Files Coverage Δ
src/borg/repository.py 87.11% <100%> (ø) ⬆️
src/borg/crypto/file_integrity.py 97.24% <100%> (+0.36%) ⬆️
src/borg/archiver.py 83.01% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23d591c...6c91a75. Read the comment docs.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Jun 1, 2017

@enkore do you have throughput numbers of the sha512 / blake2b / xxhash code as used by us?

oh, and i forgot about chunks.archive.d... - yeah, that is more volume, unfortunately.

@ThomasWaldmann
Copy link
Member

hmm, is there no libxxh or something for this we could use alternatively?
debian, you know, likes to rip out bundled 3rd party code...

@enkore
Copy link
Contributor Author

enkore commented Jun 1, 2017

With SHA512 merging indices becomes about 75 % (80 s) slower compared
to XXH (45.5 s). I didn't test with BLAKE2b (because I'd have to write
a new streaming wrapper for it to do so); rule of three from the
numbers we know works out to about 55 % slower. Baseline, without
checking, is 44s.

I made another series of tests to look at the difference between base
(no checking) and XXH, this is quicker to do since I can just patch
the DICF usage out of the cache (no need to re-generate the cache),
the difference (with fairly good, i.e. low variance) comes out to 1.7
s extra CPU time or close to -4 %.

That works out to about 12 GB/s inside the application. Simpler
testing suggests 15 GB/s, the same as crc32_clmul; the difference is
likely because during the sync there is a whole lot of other code and
data hit, competing for cache and higher overhead for streaming (which
requires additional logic and buffer management on the function's
part).

The data set is basically the same I used in #2572 and I also merged
#2572 into a temporary branch to create the chunks.archive.d,
otherwise this testing would take a lot longer...

The relative impact on writing chunks.archive.d is of course lower
since that process simply takes much longer overall. On the other
hand, a file in chunks.archive.d is written only once, but read more
than once; for every cache sync until the corresponding archive is
deleted.

@enkore
Copy link
Contributor Author

enkore commented Jun 1, 2017

hmm, is there no libxxh or something for this we could use alternatively?

Not that I'm aware of. There is a PyPI package which also vendors the code, but that'd be a pip dependency (still no OS packages) and I'd have to review the code to see whether it's bad or not, I don't have to do that here, since I know it's good.

As an side, for many users of XXH it would not make much sense to use a shared library, or, in other words, it would be bad to do so. It's typical for checksumming to be stitched into other processing, e.g. if you're a compressor, you'd stitch the checksum and copying/writing to the output buffer; you want to avoid extra function calls there. (This is essentially the same technique that's used in authenticated crypto for fused EtM, and also by AEADs like AES-GCM or Chapoly).

Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a test on ppc (BE) and some 32bit system before merging this?

XXH64_hash_t XXH64_digest (const XXH64_state_t* statePtr);

void XXH64_canonicalFromHash(XXH64_canonical_t* dst, XXH64_hash_t hash);
XXH64_hash_t XXH64_hashFromCanonical(const XXH64_canonical_t* src);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have consistent style? compare 30..32 and 34..35 (and other places).

Copy link
Contributor Author

@enkore enkore Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the parentheses placement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, or more in general: futile attempts to align things.

#define XXH_STATIC_ASSERT(c) { enum { XXH_static_assert = 1/(int)(!!(c)) }; } /* use only *after* variable declarations */
XXH_PUBLIC_API unsigned XXH_versionNumber (void) { return XXH_VERSION_NUMBER; }

#ifndef XXH_NO_LONG_LONG
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if that is defined? do we have to care?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we'd define this configure-like option then XXH64 would not be compiled in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but it is not like we lose XXH64 on some platform / compiler we otherwise support?

Copy link
Contributor Author

@enkore enkore Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.

(Note: Borg already uses xxhash by way of LZ4 which uses xxh32 for its checksum, and contains a verbatim copy of the xxhash sources in its source tree)

Copy link
Contributor Author

@enkore enkore Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll run a test on PPC, I don't expect any surprises though. I don't think I operate x86 any more, so can't test that one easily now. It does work stand-alone with -m32.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Contributor Author

@enkore enkore Jun 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FUSE tests fail for me in wheezy32 (looks like a missing modprobe), everything else is ok, incl. checksums.

Copy link
Member

@ThomasWaldmann ThomasWaldmann Jun 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem here for the vagrant wheezy32 machine is that a reboot is required as the fuse module does not fit the running kernel after security updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ppc BE test was also ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (see above)

@enkore enkore force-pushed the issue/1101.integration.xxh branch from 68fa1c7 to 6c91a75 Compare June 1, 2017 19:26
Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@enkore enkore merged commit 4789407 into borgbackup:master Jun 2, 2017
@enkore enkore deleted the issue/1101.integration.xxh branch June 2, 2017 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants