-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot restart after shutdown / Ctrl+C (etcdmain: database file does not match with snapshot) #5857
Comments
Related: After removing the data+wal dir, restarting (without removing the member first) prints a stack trace:
|
Apparently simply deleting the database file helped too: Edit: This workaround does not work anymore on 3.0.1 due to #5841 😢 |
@binwiederhier Removing the data-dir should not work. Removing data-dir is equal to member lost. Shutting down machine should be fine. Can you provide more details about how to reproduce the restart failure? |
Thanks for responding so quickly. Over night (standby-[8hrs]->wakeup) I have observed that my failing member ("rly4") healed itself, so it is now possible to kill/restart etcd without this nasty error. I presume there is some regular background process that writes the db file or something. However, after I removed the member again (kill/Ctrl+C/"systemctl stop etcd" -> "etcdctl member remove .." -> "rm -rf data+wal") and re-added it, it was not possible to kill+restart it without the error above (database file does not match ...). I attached all relevant files in this archive (issue5857.zip), I hope this helps. rly1+rly5 can be restarted fine, rly4 is the problem child. |
@binwiederhier Thanks. Should fix shortly if I can reproduce. |
@binwiederhier Do you still keep the log of rly4 from the very beginning (the first startup after re-joined it) |
@xiang90 Since I can reproduce this realiably, here's a log that shows rly4 beging started as a fresh member, then Ctrl+C, then restarted:
|
try to apply patch in #5862. update the binary of all members including rly1 and rly5. Then try to see if you still can reproduce it? |
Sure thing. My first encounter with "go". Have to install all the deps to get it to build :-) |
I'm trying to build a new version with your patch to test if it works, but I have no idea how to. |
@binwiederhier It seems the GOPATH is not set correctly. You need to put etcd under GOPATH correctly to make the vendor work. etcd needs to be at $GOPATH/src/github.com/coreos/etcd Can you do |
This is really hard. Do you have instructions how to set this stuff up from scratch? I feel I'm randomly poking around and I'm not getting anywhere. I managed to put things in the right place, but now when I run
Edit: |
@binwiederhier You can clean up your GOPATH and use godep can be found at https://github.com/tools/godep. instead of |
@gyuho I finally got it to build. Turns out the PPA version of go is needed. The one in Ubuntu 14.04 upstream does not work. Here are my instructions (if you want to add them somewhere):
@xiang90 I can confirm that your fix worked. |
@gyuho Can you help to update the build section in our doc? Seems like it is currently difficult for people who are not familiar with go env. (godep is not needed in my opinion, ./build will setup the dependency correctly. they need to set gopath and get correct version of go installed.) |
@xiang90 Ok will do. |
Just wanted to say thanks. You guys have been great! Blazing fast responses, awesome tool!! Much appreciated 👍 |
I have a cluster of 3 and I'm testing failover scenarios right now. I simulate power outages (shutting of a VM) and service failures (Ctrl+C on etcd). As of now, shutting off an instance has terrible consequences, as it cannot recover by simply rebooting a machine / restarting the service:
The only way to recover from this right now is to delete the data+wal dir, remove the member, re-add it and then restart the daemon. However, I fear that if all fail at the same time (power failure), I will lose all data.
The text was updated successfully, but these errors were encountered: