Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

awh · 2015-07-17T13:28:47Z

We've received several reports of processes suffering intermittent 'No route to host' errors in weave networks with large numbers of containers. ARP resolution failure is the most likely cause, resulting in an ICMP unreachable message being delivered to the client process; although this could be due to UDP loss in the underlying network, it may will be the case that our pcap process is dropping packets under load. Currently investigating increasing the value of sysctl net.ipv4.neigh.ethwe.mcast_solicit as a mitigation strategy.

The text was updated successfully, but these errors were encountered:

awh · 2015-07-29T09:27:20Z

Proof that three missed ARP responses returns EHOSTUNREACH ('No route to host') to userspace:

Telnet to a non-existent address on an attached subnet:

$ date +%s.%N; telnet 10.0.2.100; date +%s.%N
1438161687.988999879
Trying 10.0.2.100...
telnet: Unable to connect to remote host: No route to host
1438161690.988471276

Contemporaneous tcpdump:

$ sudo tcpdump -n -i eth0 -tt arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
1438161687.990593 ARP, Request who-has 10.0.2.100 tell 10.0.2.15, length 28
1438161688.987015 ARP, Request who-has 10.0.2.100 tell 10.0.2.15, length 28
1438161689.987014 ARP, Request who-has 10.0.2.100 tell 10.0.2.15, length 28

$ sysctl net.ipv4.neigh.eth0.mcast_solicit 
net.ipv4.neigh.eth0.mcast_solicit = 3

awh · 2015-07-29T09:29:12Z

From @rade

another thing to experiment with are CPU shares & quotas
https://docs.docker.com/reference/run/#runtime-constraints-on-resources

This might allow us to prioritise the weave router to ensure we service the packet capture buffer in a timely fashion.

rade · 2015-09-09T19:59:04Z

@awh I suggest contacting the reporters of this problem, suggesting the upgrade to weave 1.1.0, which contains #1283, which we hope will help diagnosing the issue.

awh added the bug label Jul 17, 2015

awh mentioned this issue Jul 17, 2015

report pcap stats #1185

Closed

rade modified the milestones: 1.1.0, 1.0.2 Jul 17, 2015

awh mentioned this issue Jul 24, 2015

getent failures in 210_dns_cross_hosts_test.sh #1171

Closed

rade modified the milestones: current, 1.0.2 Jul 29, 2015

rade modified the milestone: current Aug 5, 2015

alph486 mentioned this issue Dec 31, 2015

Using containers that mount /var/run/docker.sock causes No Route To Host in others #1846

Closed

rade added this to the n/a milestone Feb 18, 2016

rade added the resolution/irreproducible label Feb 18, 2016

rade closed this as completed Feb 18, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

awh commented Jul 17, 2015

awh commented Jul 29, 2015

awh commented Jul 29, 2015

rade commented Sep 9, 2015

Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

Comments

awh commented Jul 17, 2015

awh commented Jul 29, 2015

awh commented Jul 29, 2015

rade commented Sep 9, 2015