Discovery failing on EC2 #576

drnic · 2014-02-16T21:38:07Z

Does etcd & its -discovery mode work on EC2; or is it just failing for me?

$ etcd -discovery https://discovery.etcd.io/madeup
[etcd] Feb 16 21:19:57.292 WARNING   | Using the directory 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Feb 16 21:19:57.292 INFO      | Discovery via https://discovery.etcd.io using prefix /madeup.
[etcd] Feb 16 21:19:57.703 CRITICAL  | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Security group for the EC2 vm its running in:

The text was updated successfully, but these errors were encountered:

polvi · 2014-02-16T21:41:16Z

You have to use a valid token by hitting discovery.etcd.io/new
On Feb 16, 2014 1:38 PM, "Dr Nic Williams" [email protected] wrote:

Does etcd & its -discovery mode work on EC2; or is it just failing for me?

$ etcd -discovery https://discovery.etcd.io/madeup
[etcd] Feb 16 21:19:57.292 WARNING | Using the directory 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory because a directory was not specified.
[etcd] Feb 16 21:19:57.292 INFO | Discovery via https://discovery.etcd.io using prefix /madeup.
[etcd] Feb 16 21:19:57.703 CRITICAL | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Security group for the EC2 vm its running in:

[image: ec2_management_console]https://f.cloud.github.com/assets/108/2181480/9849a4d2-9752-11e3-9d75-e328359ad5df.png

Reply to this email directly or view it on GitHubhttps://github.com//issues/576
.

drnic · 2014-02-16T21:50:20Z

That also didn't work. Sorry, I can show that output too.

On Sun, Feb 16, 2014 at 1:41 PM, polvi [email protected] wrote:

You have to use a valid token by hitting discovery.etcd.io/new
On Feb 16, 2014 1:38 PM, "Dr Nic Williams" [email protected] wrote:

Does etcd & its -discovery mode work on EC2; or is it just failing for me?

$ etcd -discovery https://discovery.etcd.io/madeup
[etcd] Feb 16 21:19:57.292 WARNING | Using the directory 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory because a directory was not specified.
[etcd] Feb 16 21:19:57.292 INFO | Discovery via https://discovery.etcd.io using prefix /madeup.
[etcd] Feb 16 21:19:57.703 CRITICAL | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Security group for the EC2 vm its running in:

[image: ec2_management_console]https://f.cloud.github.com/assets/108/2181480/9849a4d2-9752-11e3-9d75-e328359ad5df.png

Reply to this email directly or view it on GitHubhttps://github.com//issues/576
.

Reply to this email directly or view it on GitHub:
#576 (comment)

drnic · 2014-02-16T21:51:49Z

$ curl https://discovery.etcd.io/new
https://discovery.etcd.io/02c009dbb1888fc8b710c650fbf55642
$ etcd -discovery https://discovery.etcd.io/02c009dbb1888fc8b710c650fbf55642
[etcd] Feb 16 21:50:50.595 WARNING   | Using the directory 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Feb 16 21:50:50.595 INFO      | Discovery via https://discovery.etcd.io using prefix /02c009dbb1888fc8b710c650fbf55642.
[etcd] Feb 16 21:50:51.006 CRITICAL  | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

drnic · 2014-02-16T21:52:16Z

Though I thought I read you can make up your own tokens.

drnic · 2014-02-16T21:53:09Z

etcd must work on AWS, at least VPC, as the Cloud Foundry group are using it in production for http://run.pivotal.io; so I'm not 100% sure what I'm not doing right. Though they aren't using -discovery and I think v0.2.0

polvi · 2014-02-16T21:53:50Z

Grab a new token, then use that URL for all the -discovery args. Add -vv to
etcd as well and gist the log output. Could you also confirm that you can
manual connect to the remote etcd server port with telnet or something?
On Feb 16, 2014 1:50 PM, "Dr Nic Williams" [email protected] wrote:

That also didn't work. Sorry, I can show that output too.

On Sun, Feb 16, 2014 at 1:41 PM, polvi [email protected] wrote:

You have to use a valid token by hitting discovery.etcd.io/new
On Feb 16, 2014 1:38 PM, "Dr Nic Williams" [email protected]
wrote:

Does etcd & its -discovery mode work on EC2; or is it just failing for
me?

$ etcd -discovery https://discovery.etcd.io/madeup
[etcd] Feb 16 21:19:57.292 WARNING | Using the directory
51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory
because a directory was not specified.
[etcd] Feb 16 21:19:57.292 INFO | Discovery via
https://discovery.etcd.io using prefix /madeup.
[etcd] Feb 16 21:19:57.703 CRITICAL | Discovery failed and a backup
peer list wasn't provided: 501: All the given peers are not reachable
(Tried to connect to each peer twice and failed) [0]

Security group for the EC2 vm its running in:

[image: ec2_management_console]<
https://f.cloud.github.com/assets/108/2181480/9849a4d2-9752-11e3-9d75-e328359ad5df.png>

Reply to this email directly or view it on GitHub<
https://github.com/coreos/etcd/issues/576>
.

Reply to this email directly or view it on GitHub:
#576 (comment)

Reply to this email directly or view it on GitHubhttps://github.com//issues/576#issuecomment-35215196
.

drnic · 2014-02-16T21:55:58Z

-vv isn't generating any additional output; odd.

$ etcd -discovery https://discovery.etcd.io/896ac583ef83bebad23a4ba51277de11 -vv
[etcd] Feb 16 21:55:09.813 WARNING   | Using the directory 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Feb 16 21:55:09.814 INFO      | Discovery via https://discovery.etcd.io using prefix /896ac583ef83bebad23a4ba51277de11.
[etcd] Feb 16 21:55:10.225 CRITICAL  | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

drnic · 2014-02-16T21:57:21Z

@polvi thx for trying to help, btw

polvi · 2014-02-16T22:38:44Z

Could you show the output from all the machines? Please include full cmd
line args from each host.

On Sun, Feb 16, 2014 at 1:57 PM, Dr Nic Williams
[email protected]:

@polvi https://github.com/polvi thx for trying to help, btw

Reply to this email directly or view it on GitHubhttps://github.com//issues/576#issuecomment-35216957
.

drnic · 2014-02-16T22:41:04Z

This was a standalone example; no other nodes. It doesn't pause and wait; it just fails.

When I ran three nodes etcd startup failed so quickly that there was no concept that they were really trying to communicate with each other. So I thought to reproduce the error with a single node.

On Sun, Feb 16, 2014 at 2:38 PM, polvi [email protected] wrote:

Could you show the output from all the machines? Please include full cmd
line args from each host.
On Sun, Feb 16, 2014 at 1:57 PM, Dr Nic Williams
[email protected]:

@polvi https://github.com/polvi thx for trying to help, btw

Reply to this email directly or view it on GitHubhttps://github.com//issues/576#issuecomment-35216957
.

Reply to this email directly or view it on GitHub:
#576 (comment)

drnic · 2014-02-16T22:42:49Z

I can't get etcd on my ubuntu 10.04 EV2 VMs to do anything other than fail immediately when I use -discovery option.

On Sun, Feb 16, 2014 at 2:38 PM, polvi [email protected] wrote:

Could you show the output from all the machines? Please include full cmd
line args from each host.
On Sun, Feb 16, 2014 at 1:57 PM, Dr Nic Williams
[email protected]:

@polvi https://github.com/polvi thx for trying to help, btw

Reply to this email directly or view it on GitHubhttps://github.com//issues/576#issuecomment-35216957
.

Reply to this email directly or view it on GitHub:
#576 (comment)

philips · 2014-02-17T03:23:10Z

@drnic Even with discovery you need to provide the -addr and -peer-addr arguments so that discovery uploads the right ip addresses.

drnic · 2014-02-17T05:47:45Z

$ etcd -peer-addr 127.0.0.1:7001 -addr 127.0.0.1:4001 -discovery https://discovery.etcd.io/2d0cf5e9cf33d601caf861bb304117a1
[etcd] Feb 17 05:47:23.743 WARNING   | Using the directory 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Feb 17 05:47:23.744 INFO      | Discovery via https://discovery.etcd.io using prefix /2d0cf5e9cf33d601caf861bb304117a1.
[etcd] Feb 17 05:47:24.156 CRITICAL  | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

drnic · 2014-02-17T05:49:15Z

@philips sorry, I thought these had default values if not specified

drnic · 2014-02-17T05:49:27Z

Still get same issue on AWS EC2.

drnic · 2014-02-17T06:07:45Z

Happy to try other debugging ideas. Sorry its not working for me :/

drnic · 2014-02-17T06:08:06Z

Unlikely to be relevant but...

$ uname -a
Linux 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc 3.0.0-32-virtual #51~lucid1-Ubuntu SMP Fri Mar 22 18:13:07 UTC 2013 x86_64 GNU/Linux

polvi · 2014-02-17T06:29:35Z

Try with -addr, -peer-addr, and a fresh -discovery url

On Sun, Feb 16, 2014 at 10:08 PM, Dr Nic Williams
[email protected]:

Unlikely to be relevant but...

$ uname -a
Linux 51e7ef11-e39b-41a9-a8eb-f8ed387ac5cc 3.0.0-32-virtual #51~lucid1-Ubuntu SMP Fri Mar 22 18:13:07 UTC 2013 x86_64 GNU/Linux

Reply to this email directly or view it on GitHubhttps://github.com//issues/576#issuecomment-35231037
.

drnic · 2014-02-17T06:41:08Z

With fresh token & explicit name

$ etcd -peer-addr 127.0.0.1:7001 -addr 127.0.0.1:4001 -discovery https://discovery.etcd.io/2d3468dabe9d36a54b50c5e05ceec623 -name aaa
[etcd] Feb 17 06:40:45.138 WARNING   | Using the directory aaa.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Feb 17 06:40:45.138 INFO      | Discovery via https://discovery.etcd.io using prefix /2d3468dabe9d36a54b50c5e05ceec623.
[etcd] Feb 17 06:40:45.550 CRITICAL  | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

yichengq · 2014-02-17T18:38:52Z

I have tried to build it exactly as you describe and it works well.

I guess it may be the problem of version.
Which version of etcd did you use?
I checked it from https://github.com/coreos/etcd.git , and built it manually.

drnic · 2014-02-17T18:40:46Z

I was using v0.3.0 release. I can try to build from HEAD if you think there is a fix since 0.3.0?

For my curiosity - could you download v0.3.0 and see if it works?

On Mon, Feb 17, 2014 at 10:39 AM, Yicheng Qin [email protected]
wrote:

I have tried to build it exactly as you describe and it works well.
I guess it may be the problem of version.
Which version of etcd did you use?

I checked it from https://github.com/coreos/etcd.git , and built it manually.

Reply to this email directly or view it on GitHub:
#576 (comment)

philips · 2014-02-17T18:47:03Z

@drnic I tried a two node cluster on an ec2 machine using the release tarball just now and it works fine:

$ cd etcd-v0.3.0-linux-amd64
$ ./etcd -discovery https://discovery.etcd.io/07a71c7632415ffde6a3cf14533e88f3
[etcd] Feb 17 18:43:40.755 WARNING   | Using the directory ip-10-244-134-157.etcd as the etcd curation directory because a directory was not specified.
[etcd] Feb 17 18:43:40.756 INFO      | Discovery via https://discovery.etcd.io using prefix /07a71c7632415ffde6a3cf14533e88f3.
[etcd] Feb 17 18:43:41.212 INFO      | Discovery _state was empty, so this machine is the initial leader.
[etcd] Feb 17 18:43:41.213 INFO      | ip-10-244-134-157: state changed from 'stopped' to 'follower'.
[etcd] Feb 17 18:43:41.213 INFO      | ip-10-244-134-157: state changed from 'follower' to 'leader'.
[etcd] Feb 17 18:43:41.213 INFO      | ip-10-244-134-157: leader changed from '' to 'ip-10-244-134-157'.
[etcd] Feb 17 18:43:41.260 INFO      | etcd server [name ip-10-244-134-157, listen on [::]:4001, advertised url http://127.0.0.1:4001]
[etcd] Feb 17 18:43:41.261 INFO      | peer server [name ip-10-244-134-157, listen on [::]:7001, advertised url http://127.0.0.1:7001]

$ ./etcd -discovery 'https://discovery.etcd.io/07a71c7632415ffde6a3cf14533e88f3' -peer-addr 127.0.0.1:7002 -addr 127.0.0.1:4002 -name 2 -data-dir 2.etcd
[etcd] Feb 17 18:46:23.235 INFO      | Discovery via https://discovery.etcd.io using prefix /07a71c7632415ffde6a3cf14533e88f3.
[etcd] Feb 17 18:46:23.782 INFO      | Discovery found peers [http://127.0.0.1:7001]
[etcd] Feb 17 18:46:23.783 INFO      | 2: state changed from 'stopped' to 'follower'.
[etcd] Feb 17 18:46:23.824 INFO      | etcd server [name 2, listen on [::]:4002, advertised url http://127.0.0.1:4002]
[etcd] Feb 17 18:46:23.825 INFO      | peer server [name 2, listen on [::]:7002, advertised url http://127.0.0.1:7002]
[etcd] Feb 17 18:46:23.836 INFO      | 2: leader changed from '' to 'ip-10-244-134-157'.
[etcd] Feb 17 18:46:23.888 INFO      | 2: peer added: 'ip-10-244-134-157'

yichengq · 2014-02-17T22:38:02Z

@drnic May you build the HEAD and try it again?
I have let etcd print out more debug info for discovery process.

lnguyen · 2014-02-18T17:10:53Z

Ok this is very odd... we have monit monitor etcd and keep trying to bring it up. It seem that eventually... no idea why but discovery does work. Refer to gist for logs https://gist.github.com/longnguyen11288/077267961ac2287cd210

yichengq · 2014-02-18T18:32:10Z

@drnic @longnguyen11288
https://github.com/unihorn/etcd/tree/8
May you build this one and try it again?
It will print out more debug info for discovery process in the file testlogfile

zeisss · 2014-02-20T09:42:25Z

I have the same problem with my coreos VM on VirtualBox with etcd v0.3.0.

$ sudo /usr/bin/etcd -vv -bind-addr 192.168.53.4:4001 -peer-addr 192.168.53.4:7001 -discovery=https://discovery.etcd.io/7cdbad7b454f233575c4315788490f06 -data-dir /var/lib/etcd -name dockzero-03 -f
[etcd] Feb 20 09:41:46.218 INFO      | Discovery via https://discovery.etcd.io using prefix /7cdbad7b454f233575c4315788490f06.
[etcd] Feb 20 09:41:48.625 CRITICAL  | Discovery failed and a backup peer list wasn't provided: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Removing the -discovery argument brings up etcd correctly. I also removed the data-dir, in case their was some invalid state left over, but that didn't help either.

@unihorn I also tried your branch - no testlogfile was written.

EDIT: Using coreos/master (46d817f), I got a bit more output:

$ sudo ./bin/etcd -vv -bind-addr=192.168.53.4:4001 -peer-addr=192.168.53.4:7001 -discovery=https://discovery.etcd.io/7cdbad7b454f233575c4315788490f06
[etcd] Feb 20 10:16:55.192 WARNING   | Using the directory dockzero-03.etcd as the etcd curation directory because a directory was not specified. 
[etcd] Feb 20 10:16:55.192 DEBUG     | open dockzero-03.etcd/snapshot: no such file or directory
[raft]10:16:55.193038 log.open.open  dockzero-03.etcd/log
[raft]10:16:55.193659 log.open.create  dockzero-03.etcd/log
[etcd] Feb 20 10:16:55.194 INFO      | dockzero-03: state changed from 'stopped' to 'follower'.
[raft]10:16:55.195338 Name: dockzero-03, State: follower, Term: 0, CommitedIndex: 0 
[etcd] Feb 20 10:16:55.195 INFO      | Discovery via https://discovery.etcd.io using prefix /7cdbad7b454f233575c4315788490f06.
[etcd] Feb 20 10:16:57.603 WARNING   | Discovery encountered an error: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]
[etcd] Feb 20 10:16:57.603 INFO      | URLs:  / dockzero-03 ()
[etcd] Feb 20 10:16:57.603 CRITICAL  | Discovery failed, no available peers in backup list, and no log data

bfosberry · 2014-03-05T13:55:01Z

I have the same issue, same version, inside coreos. Using a discovery url or token fails for any node being spun up. Having a standalone node, and using an explicit peer nodes list for other nodes works, however that kind of defeats the purpose. :P

core@core-02 ~ $ etcd --version
v0.3.0
core@core-02 ~ $ uname -a
Linux core-02 3.13.2+ #2 SMP Mon Feb 17 22:49:34 UTC 2014 x86_64 Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz GenuineIntel GNU/Linux

yichengq · 2014-03-06T03:37:04Z

@zeisss @bfosberry
May you try this branch again?
https://github.com/unihorn/etcd/tree/28

It prints out the debug information from go-etcd, and it will help me to find out the reasons for the error a lot.
Sorry for the late response.

newhoggy · 2014-03-10T09:39:48Z

I also have the same problem inside coreos:

core@ip-10-0-0-184 ~ $ etcd --version
v0.3.0
core@ip-10-0-0-184 ~ $ uname -a
Linux ip-10-0-0-184 3.13.5+ #2 SMP Wed Mar 5 08:34:30 UTC 2014 x86_64 Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz GenuineIntel GNU/Linux

Logs:

Mar 10 10:59:13 ip-10-0-0-183 systemd[1]: Stopping etcd...
Mar 10 10:59:14 ip-10-0-0-183 systemd[1]: Starting etcd...
Mar 10 10:59:14 ip-10-0-0-183 systemd[1]: Started etcd.
Mar 10 10:59:14 ip-10-0-0-183 etcd-bootstrap[20016]: [etcd] Mar 10 10:59:14.537 INFO      | Discovery via https://discovery.etcd.io using prefix /e99066c9e19b4472b002826217bd3f28.
Mar 10 10:59:15 ip-10-0-0-183 etcd-bootstrap[20016]: [etcd] Mar 10 10:59:15.748 INFO      | Discovery found peers [http://10.0.0.184:7001 http://10.0.0.182:7001]
Mar 10 10:59:15 ip-10-0-0-183 etcd-bootstrap[20016]: [etcd] Mar 10 10:59:15.749 INFO      | 10.0.0.183: state changed from 'stopped' to 'follower'.
Mar 10 10:59:17 ip-10-0-0-183 etcd-bootstrap[20016]: [etcd] Mar 10 10:59:17.101 WARNING   | Attempt to join via 10.0.0.184:7001 failed: Error during join version check: Get http://10.0.0.184:7001/version: net/http: timeout awaiting response heade
Mar 10 10:59:18 ip-10-0-0-183 etcd-bootstrap[20016]: [etcd] Mar 10 10:59:18.452 WARNING   | Attempt to join via 10.0.0.182:7001 failed: Error during join version check: Get http://10.0.0.182:7001/version: net/http: timeout awaiting response heade
Mar 10 10:59:18 ip-10-0-0-183 etcd-bootstrap[20016]: [etcd] Mar 10 10:59:18.453 WARNING   | Unable to join the cluster using any of the peers [10.0.0.184:7001 10.0.0.182:7001]. Retrying in 10.0 seconds

zeisss · 2014-03-10T10:39:26Z

@unihorn: Strange things happened. My first try failed again, but later in the progress (see https://gist.github.com/ZeissS/9462727).
Afterwards my colleague told me he upgraded vagrant and his problem disappeared, I did the same. (Vagrant 1.3.5 to 1.4.3).
CoreOS itself (CoreOS 247 w/ Etcd v0.3.0) does not seem to work, but using your version (unihorn/28) worked just fine. So I guess a vagrant update and using the master fixes this for me.

Is there a schedule for the next etcd release, which goes into coreos?

philips · 2014-03-10T18:15:49Z

@zeisss There isn't a schedule for the next release. I would like to make a release after this bug is closed. But, we need to track it down first.

newhoggy · 2014-03-10T22:12:27Z

Have you managed to reproduced in on EC2?

I'm currently using a single host cluster to work around the problem.

yichen · 2014-03-12T18:19:58Z

Hey, looks like I have the exact same problem. I have three EC2 instances. Two of them can discover each other, the third one will fail with the "Unable to join the cluster using any of the peers" error shown above.

After the third instance failed to join, the previous two instances will show etcd warning: heartbeat time out: 'instance3'.

philips · 2014-03-12T18:29:50Z

@yichen That sounds like discovery is actually working but there is a network partition for some reason or the etcd service isn't running on instance3. Can you try restarting instance3 and see if it comes back?

viliamjr · 2014-03-12T18:47:10Z

Additional info:

https://github.com/coreos/coreos-vagrant/blob/master/cluster/Vagrantfile => same issue.
https://github.com/coreos/coreos-vagrant/blob/master/Vagrantfile => works fine!

yichen · 2014-03-13T16:23:52Z

Hey guys, thanks for the quick response. This problem was resolved. Sorry I am not entirely sure what was the root reason, here is the list of things I did:

Noticed that on one of the instance I am using the local address 127.0.0.1, which result in this address showing up in the discovery URL. It might eventually expire but I created a new discovery "prefix" to start with a clean slate.
deleted all working directory and restarted all the etcd instances, made sure the first one started without the --discovery parameter, and the other two with the --discovery parameter.

philips · 2014-03-13T17:39:25Z

Last night I had a strange network partition on ec2 which reproduced in this manner. I had three machines L was leader, and there were two follower A and B.

A was unable to join the cluster nor curl any of L's endpoints. B was participating just fine.

Then I setup ncat on L listening on another port and started echo foobar | ncat L 7002. It hung for around 3 seconds and then foobar showed up on L.

yichengq · 2014-03-14T01:28:44Z

@philips It could be the problem about connection timeout settings, and boot order of raft and peer server. I think @xiangli-cmu is fixing it now: #626

drnic · 2014-03-14T01:33:25Z

I wasn't seeing partial failure as you've been seeing above. Sorry I haven't made time to try out the debug version. :(

On Thu, Mar 13, 2014 at 6:28 PM, Yicheng Qin [email protected]
wrote:

@philips It could be the problem about connection timeout settings, and boot order of raft and peer server. I think @xiangli-cmu is fixing it now: #626

Reply to this email directly or view it on GitHub:
#576 (comment)

bfosberry · 2014-03-15T01:31:25Z

@unihorn that branch (28) worked for me. Based on what @viliamjr said I tried adding in

 config.vm.provider :virtualbox do |vb, override|
    vb.customize ["modifyvm", :id, "--natdnshostresolver1", "on"]
    vb.customize ["modifyvm", :id, "--natdnsproxy1", "on"]
  end

to my vagrant config for vbox in case it was a routing error, but no good. Also I can manually hit the url outside of etcd.

Your branch worked great though, any ideas a) when that will get released and b) when it will get incorporated into coreos?

yichengq · 2014-03-25T07:47:44Z

@bfosberry It is really weird that it could have problem sometimes.
We have refactored listen code in #626, and hope that could help.
And there are other issues opened for it.
I would expect it could be eliminated, or at least reasoned after these PRs.
Please use -vvv flag for report after #653 is merged.

bfosberry · 2014-03-27T19:17:43Z

Latest vagrant coreos image works!

from https://github.com/coreos/coreos-vagrant/blob/master/Vagrantfile

yichengq · 2014-03-27T21:57:52Z

@bfosberry Great! :) :)

bataras · 2014-09-05T05:55:30Z

I'm getting discovery failure and also SLOW discovery. cluster w/5 machines. 2 in us-west and 3 in us-east. 2 VPCs with VPN between them. The networking is -solid-. I can ssh between all nodes and ping consistently at 1ms within a region and 60ms between regions. 4 of the 5 nodes have clustered. But the 5th node in us-east keeps giving this error...

$etcdctl ls
Error: 501: All the given peers are not reachable (Tried to connect to each peer twice and failed) [0]

Of note.. I created 2 nodes in us-west first. They clustered immediately. Then added a node in us-east. It took a long time to join the cluster. like 5-10 minutes, but it finally did. And again, the networking between all nodes and to the internet is fine.

And I started with a new discovery token

yichengq · 2014-09-08T21:13:31Z

@bataras We don't support multi datacenter etcd perfectly now. refer #964
The workaround now is to set a higher heartbeat interval and election timeout now.
You could try to set heartbeat interval to be 150ms, and election timeout to be 1s.

yichengq mentioned this issue Mar 5, 2014

chore(requests): more error info for SendRequest coreos/go-etcd#112

Closed

This was referenced Mar 25, 2014

chore(server/transporter): set RequestTimout reasonable #624

Merged

chore(etcd): print out go-etcd log in VeryVeryVerbose Mode #653

Merged

xiang90 closed this as completed Aug 23, 2014

bataras mentioned this issue Sep 8, 2014

support Multiple datacenter/region #964

Closed

Discovery failing on EC2 #576

Discovery failing on EC2 #576

Comments

drnic commented Feb 16, 2014

polvi commented Feb 16, 2014

drnic commented Feb 16, 2014

drnic commented Feb 16, 2014

drnic commented Feb 16, 2014

drnic commented Feb 16, 2014

polvi commented Feb 16, 2014

drnic commented Feb 16, 2014

drnic commented Feb 16, 2014

polvi commented Feb 16, 2014

drnic commented Feb 16, 2014

drnic commented Feb 16, 2014

philips commented Feb 17, 2014

drnic commented Feb 17, 2014

drnic commented Feb 17, 2014

drnic commented Feb 17, 2014

drnic commented Feb 17, 2014

drnic commented Feb 17, 2014

polvi commented Feb 17, 2014

drnic commented Feb 17, 2014

yichengq commented Feb 17, 2014

drnic commented Feb 17, 2014

I checked it from https://github.com/coreos/etcd.git , and built it manually.

philips commented Feb 17, 2014

yichengq commented Feb 17, 2014

lnguyen commented Feb 18, 2014

yichengq commented Feb 18, 2014

zeisss commented Feb 20, 2014

bfosberry commented Mar 5, 2014

yichengq commented Mar 6, 2014

newhoggy commented Mar 10, 2014

zeisss commented Mar 10, 2014

philips commented Mar 10, 2014

newhoggy commented Mar 10, 2014

yichen commented Mar 12, 2014

philips commented Mar 12, 2014

viliamjr commented Mar 12, 2014

yichen commented Mar 13, 2014

philips commented Mar 13, 2014

yichengq commented Mar 14, 2014

drnic commented Mar 14, 2014

@philips It could be the problem about connection timeout settings, and boot order of raft and peer server. I think @xiangli-cmu is fixing it now: #626

bfosberry commented Mar 15, 2014

yichengq commented Mar 25, 2014

bfosberry commented Mar 27, 2014

yichengq commented Mar 27, 2014

bataras commented Sep 5, 2014

yichengq commented Sep 8, 2014