Skip to content
This repository was archived by the owner on Jun 20, 2024. It is now read-only.

Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

Closed
awh opened this issue Jul 17, 2015 · 3 comments
Closed

Suspected loss of ARP packets causing EHOSTUNREACH ('No route to host') #1184

awh opened this issue Jul 17, 2015 · 3 comments

Comments

@awh
Copy link
Contributor

awh commented Jul 17, 2015

We've received several reports of processes suffering intermittent 'No route to host' errors in weave networks with large numbers of containers. ARP resolution failure is the most likely cause, resulting in an ICMP unreachable message being delivered to the client process; although this could be due to UDP loss in the underlying network, it may will be the case that our pcap process is dropping packets under load. Currently investigating increasing the value of sysctl net.ipv4.neigh.ethwe.mcast_solicit as a mitigation strategy.

@awh awh added the bug label Jul 17, 2015
@awh awh mentioned this issue Jul 17, 2015
@rade rade modified the milestones: 1.1.0, 1.0.2 Jul 17, 2015
@rade rade modified the milestones: current, 1.0.2 Jul 29, 2015
@awh
Copy link
Contributor Author

awh commented Jul 29, 2015

Proof that three missed ARP responses returns EHOSTUNREACH ('No route to host') to userspace:

Telnet to a non-existent address on an attached subnet:

$ date +%s.%N; telnet 10.0.2.100; date +%s.%N
1438161687.988999879
Trying 10.0.2.100...
telnet: Unable to connect to remote host: No route to host
1438161690.988471276

Contemporaneous tcpdump:

$ sudo tcpdump -n -i eth0 -tt arp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
1438161687.990593 ARP, Request who-has 10.0.2.100 tell 10.0.2.15, length 28
1438161688.987015 ARP, Request who-has 10.0.2.100 tell 10.0.2.15, length 28
1438161689.987014 ARP, Request who-has 10.0.2.100 tell 10.0.2.15, length 28
$ sysctl net.ipv4.neigh.eth0.mcast_solicit 
net.ipv4.neigh.eth0.mcast_solicit = 3

@awh
Copy link
Contributor Author

awh commented Jul 29, 2015

From @rade

another thing to experiment with are CPU shares & quotas
https://docs.docker.com/reference/run/#runtime-constraints-on-resources

This might allow us to prioritise the weave router to ensure we service the packet capture buffer in a timely fashion.

@rade rade modified the milestone: current Aug 5, 2015
@rade
Copy link
Member

rade commented Sep 9, 2015

@awh I suggest contacting the reporters of this problem, suggesting the upgrade to weave 1.1.0, which contains #1283, which we hope will help diagnosing the issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants