-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data integrity error after running out of disk space #356
Comments
The error message doesn't contain a lot of information, but likely either some segment file or maybe the repo index got corrupted / incomplete due to the disk being full. You could try the following (if the data in that repo is important, make a backup of the repo first):
If that did not help, you could retry like this:
In general, avoid running out of free space (or even getting close to it). |
Should I go ahead and remove index.* and hints.* and run |
yes, try. |
|
Hmm, seems it crashes in msgpack (that is a 3rd party lib used to pack/unpack binary data). About the archives: you could try deleting all archives named "*.checkpoint", that is all intermediate stuff made while it is backing up and superseded once it reaches the end of the backup. Hmm, I see debian jessie has msgpack 0.4.2. That might be an issue also (see other issues here). A general question: is there important data to recover from this corrupted repo or could you just start from scratch using more fresh code? |
There do not seem to be any This repository contains redundant data created with my conversion script rdiff-backup2attic, so I could regenerate it, but I would rather not, because the conversion took over a week or so. But more importantly, if it's possible to run into unrecoverable errors with attic, then I don't think I'll dare use it for real :( It seems the newest version of attic in Debian is 0.13. I think I will file a bug downstream of this as well, what do you think they could do about fixing this issue in Debian Jessie? AFAIK, typically version upgrades are not done in Debian Stable, but fixes can be cherrypicked/backported. |
*.checkpoint is an archive name, see your log output from above. Trying a newer version of attic AND msgpack may help (but that is not sure, to be sure we would need to point at the changeset that fixed your issue). If a binary release of attic works on your system you could try that rather easily. But I am not even sure how exactly it produced that "unhashable type: list" it is falling over now. Or how to best deal with it - I guess that would need a debugging session on a system with the source code and the corrupt data set. You could also try borgbackup, it sometimes gives better error msgs and also has some more fixes applied than attic, but I am not even sure if that your issue is fixed there. Trying to convert a corrupt repository would be something new and rather adventurous also, with unknown outcome. |
I guess trying a newer attic version is up next. |
Attic 0.16:
So, attic 0.13 (which is in Debian Jessie) can corrupt the repository so that not even latest attic version can recover it. This seems like a grave issue to me. |
Well, we can't do much about Debian packaging old releases and then sticking to these, that is just their usual policy for "stable". Sometimes they make exceptions, this might well be such a case (but as attic developer does not make or maintain these packages, one would need to take this to the debian packagers). But before doing that, it would be really useful to identify the root cause and create a fix (if possible) for this. See above about my offer for a debugging session. |
I'm afraid I can't provide you with access to the data / debugging session. But I imagine it shouldn't be difficult to reproduce - and running out of disk space is something that a backup tool should certainly be tested for (and recover from). If it helps, while the repository is 15 GiB, those archives contain very little changes, and those are mostly added data. 229 increments. I don't know how attic works internally, but I can imagine it's possible that it ran out of disk space when writing metadata - not actual new data. So to reproduce I would try to generate a lots of increments with very little data changes, on partition with almost full disk. |
One way to test near-full disk condition is to use loopback files:
Quick testing didn't yield the same error, but this is interesting too (attic 0.13):
Yet the archive is created. I imagine if errno 28 is raised and attic bails out, it should not create the archive? |
FWIW, here's a link to downstream bug report in Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=802619 |
That's always a possibility, but I find it easier to believe that the handling of running out of free disk space is just lacking (IIRC I saw a lot of errors instead of bailing out on first one). And if the Debian version of attic is known to depend on broken python3-msgpack, I don't know if there's any need to look further. I think it would be good if attic (or any backup tool for that matter) had a test suite that emulated various amounts of available disk space to find out if the error handling leaves the repository in a consistent state always. Run it on btrfs raid1 with ECC memory or something, if worried about faulty HW... |
@hoxu emulating out of disk space in testsuite -> that's what I did. Guess it could be used for attic in a similar way. I could NOT reproce your issue with it though, that's why I linked to the other ticket that debugged exactly same crash to being caused by a defect RAM. |
@ThomasWaldmann Curious. Did you test attic 0.13 and python3-msgpack 0.4.2? |
No, I just tested with current borgbackup code. |
Well, to verify if the problem exists and whether it's been fixed already, it would have to be reproduced with those versions. How easy would it be to run that testsuite for attic 0.13? |
Not trivial, but possible I guess. We are using py.test as test runner and the package name is "borg" instead of "attic", but a lot of the general project structure is still the same. |
I ran into
attic: Error: Data integrity error
after bunch of[Errno 28] No space left on device
errors.The attic repository contains only 15 GiB of data.
Debian 8.2, so attic version is 0.13-1.
Is there any way to recover from this corruption?
The text was updated successfully, but these errors were encountered: