-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in check --repair #87
Comments
This is very weird. It looks like the archive metadata for archive servers-2014-05-17-092237 is corrupt even though the repository level checksum is valid. Did you see any "Archive metadata damage detected" message on stderr during the repair? Here's a completely untested patch which might give some additional debug information about the crash. It also skips all archives except the problematic one to make it faster to reproduce the crash. https://gist.github.com/jborg/93edaa92afa6d7ca815a So please if you can, please apply the following patch and run:
|
Thanks for the help! No archive metadata damage detected messages. Here's the complete output with the patch, unfortunately the debug statement didn't get triggered, it looks like the check here is crashing in the outer loop?
|
oops my bad, it should be "enumerate(zip(...))" I've corrected the gist: https://gist.github.com/jborg/93edaa92afa6d7ca815a/81104a39cfbcf30486623ec98575bc4b1114efa8 |
Okay, it says:
Can I find and send this chunk to you or perhaps try looking into it myself somehow? |
Yeah, that would be really helpful. I created a tiny script to dump an archives metadata to a bunch of files: https://gist.github.com/jborg/78bbb98f3b95275131d3 Run it like this: $ python attic_dump_archive_metadata.py /path/to/repository::servers-2014-05-17-092237 And email me files 422_XXXXX, 423_YYYY and 424_ZZZ (or all of them). But please note that these files contain file metadata such as file names but no actual file contents. One more thing. Can you verify that this command also crashes? (I suspect it will) $ attic list /path/to/repository::servers-2014-05-17-092237 |
I think I can confirm your theory about bad memory. After analysing the files you sent to me I found that a single bit in the 423-file has been flipped. The byte at offset 0x3ef8 is 0x8e but should be 0xce Since this corruption was not detected by the data checksums it happened at archive creation time, before the checksum was calculated. Ideally --repair should be able to handle all types of corruption, but this kind of corruption that happens before the initial checksum is calculated is especially hard to deal with, and sometimes impossible. So unfortunately I'm closing this ticket now since a system with broken ram can produce all kind of corrupted data that is in most cases impossible to properly detect and fix in a good way. |
Oh, well, if --repair could just delete the single backup run where the problem is, that would be nice, but I suppose I'll start a new repository then. I did pull out a couple of the RAM sticks the other day after running memtest86+ for 24 hours and getting 1 error. Thanks for looking into it. |
Since --repair is already supposed to handle cases where data is either missing or detectable corrupt the easiest way to get your repository back into a working state is simply to delete the corrupt object. You can delete object 3d90991d450e7d1fefa5b0f2c885389f90a31fbe1dc4cfa13ced87304410ded6 like this: $ python attic_delete_objects.py /path/to/repo.attic 3d90991d450e7d1fefa5b0f2c885389f90a31fbe1dc4cfa13ced87304410ded6 https://gist.github.com/jborg/d5bd7ceb419becdbc8f0 After that Let me know if it works. / Jonas |
Sorry for the slow reply, a couple of things happened (including a new motherboard). I did archive the old repository and started a new one, but tried the delete script anyway now. Unfortunately, it dies with
So unless you're still interested in trying this out, I think I'll delete the old archive in a couple of weeks when the backups in there are too old to be interesting. |
Was that the first time you ran that script? The "attic.repository.DoesNotExist" exception simply indicates that the object the script tries to delete did not exist. So the object has already been deleted by a precious invocation or the object id is incorrect. |
Aha. Okay, maybe I did run it twice - I had trouble getting the include paths to work correctly (long story involving Python 3.3 and 3.4) so it could have happened. I tried check --repair and it seems to have worked:
And I can now list the contents of the previously totally broken snapshot. Thanks! |
Hi again, I'm beginning to suspect there's a problem with either a disk or the memory on this machine, but in any case I hit this problem on pruning
then run check --repair which crashes with
The text was updated successfully, but these errors were encountered: