Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunk refcount limited to 2**31 (4e10) #169

Closed
sourcejedi opened this issue Aug 29, 2015 · 6 comments
Closed

Chunk refcount limited to 2**31 (4e10) #169

sourcejedi opened this issue Aug 29, 2015 · 6 comments

Comments

@sourcejedi
Copy link
Contributor

We need to 1) check for overflow and 2) raise the limit.

I think this limit is a bit tight, and we need to add a quick check for this overflow :). (Note index implementation details may interfere a little. I think you're not allowed to use the bit pattern 0xffffffff or oxfffffffe anyway & there is no check there either).

To fix it I think we only need to change the index format. In theory the index can be rebuilt if your attic doesn't recognize it, so in some sense we'd already have backwards and forwards compatibility. (I know attic has some repair code, though I haven't tried just deleting the index ad seeing what happens yet :).

@ThomasWaldmann pointed out that my test results indicate very large chunks (8M, the maximum size?) for the zero data. (Which also sounds suspicious to me, but hey :). That should mitigate the problem somewhat. OTOH I believe the chunker is keyed when encryption is used, so the size should be variable.

To overflow refcount in the chunk index, you need 2GiB * ~64KiB (of zeros or other consistent pattern, again assuming default chunk size).

There's not a big margin there, and we need to make sure we have an overflow check. You can include a 500G sparse VM image (mostly zero) in the backup. If it never changes then "incremental" backups are plenty fast, but you overflow after 256 daily backups.

See also: jborg/attic#315

@ThomasWaldmann ThomasWaldmann changed the title Chunk refcount limited to 2**31 (4e10). Need to 1) check for overflow 2) raise the limit Chunk refcount limited to 2**31 (4e10) Aug 29, 2015
@ThomasWaldmann
Copy link
Member

True about the chunker table being randomly keyed with encryption.

I don't find it suspicious that lots of zeros give 8MB chunks. The window contents of the rolling hash will always be all-zero in such a case and the probability that some specific data has a N-bits-of-zero-ending hash value is about 1 : 2^N.

@ThomasWaldmann
Copy link
Member

Some other ideas:

  • if it is int32_t now, we can check about using uint32_t (not that it helps much, but still).
  • overflow/range check: yes, we should have that ASAP.
  • users could still continue in a fresh, empty repo of we detect/avoid an overflow (and just abort / rollback then).

@ThomasWaldmann
Copy link
Member

2^31 * 2^23 = 2^54 bytes = 16 EB of zeros could be a problem.
but i guess if you have that much zeros in your files and you even backup them, then you have other bigger problems. :)

@verygreen
Copy link
Contributor

Well, sparse files! (since sparse files are not yet handled in a real way by borg, and even once they are, apparently the same problem might still exist depending on the implementation). Also raw block device backups where sparse handling would not help.

Also I don't claim I need to know how accurate the refcount value needs to be. Typically it's 0 or not 0 to decide if it should be deleted or not.

So just have a "magic" value of "-1" or whatever else less sensical and every time you cross 0 (overflow), set it forcefully to the magic value. Never add or substract anything to the magic value, so it just means "this is a really in-high-demand chunk, never delete it".

Similar things are done in the filesystems for directory refcounts. Typically as you create a directory, parent directory nlink is incremented, but to avoid pretty low 32k subdirectory limit, often once it's crossed, a filesystem code would set parent nlink to "1" which in itself is nonsensical for a directory and means "the nlink for this directory is not accurate, but who cares, just don't let it go to 0".

@verygreen
Copy link
Contributor

Speaking of overflows, I just realized there are more (worth having separate ticket for that?) when chunks are converted to bytes and you have many chunks, something inside overflows and shows negative numbers which is nonsensical and a better handling would go a long way.
This is evidenced in #886 for example.

@enkore
Copy link
Contributor

enkore commented Apr 16, 2016

Fixed by #891, since the refcount cannot overflow anymore. The limit is also doubled now (although you'll still find it tight, if 2**31-1 was tight before ;)

Merged into 1.0-maint, released in 1.0.2, and already merged back into master. #925 updates a last remainder of this in master.

@enkore enkore closed this as completed Apr 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants