-
Notifications
You must be signed in to change notification settings - Fork 233
[solo] Fix skydns - two wrongs make a right #899
Conversation
Skydns now needs to be compiled with go 1.5+.
TL;DR When two DNS servers don't work, add one more! When running some integration tests with HeliosSoloDeployment on Docker hosts that use a local unbound instance as its DNS resolver (i.e. specified in `/etc/resolv.conf` on the Docker host), we saw tests failures due to failed SRV queries to skydns. Skydns is running in the solo container and forwards DNS queries it doesn't know about to the unbound instance via logic in `start.sh`. The skydns error output from the helios solo container spawned by HeliosSoloDeployment looked like: ``` skydns: failure to forward request "dns: failed to unpack truncated message" ``` Our guess is that large UDP responses from the upstream unbound have the "Message Truncated" DNS flag set. When this type of response reaches skydns, skydns blows up and doesn't tell the client about the error. The client times out without retrying in TCP mode. The client would've retried if it had received an error message from skydns. Running `dig` against skydns works. We think this is because `dig` adds an OPT record to its query that sets "udp payload size: 4096". Here's an outstanding issue in skydns that seem related: * skynetservices/skydns#242 * skynetservices/skydns#45 Solution: We start an unbound instance in the solo container and have it speak only TCP to the upstream skydns (in the same container) with `tcp-upstream: yes`. This forces skydns to speak only TCP with its upstream. No UDP truncation shit. Things are fixed. :) We admit this is super funky, but there's no way to force skydns to speak only TCP right now.
👍 |
We should somehow pin our skydns build artifact. Right now skydns is rebuilt from master here https://github.com/skynetservices/skydns whenever someone manually builds helios solo base image. We have no way of knowing what kind of skydns we've built later. |
Current coverage is
|
Nice explanation in the commit message 👍🏻 Is it meant to commit the -verbose flag to skydns in here? I wonder if the output would become large if solo was running for a while and sending lots of DNS requests (I'm not sure how docker logs stores things). For the unpredictable nature of building SkyDNS, is it available as a package to install with apt-get? |
And it shouldn't. What it should do is properly forward the response with the truncation flag set to the requester (who would then retry using TCP). Instead it just blows up and never responds to the requester at all :( |
We add yet another unbound between skydns and the upstream DNS resolver. We saw that the previous commit broke docker-machine on my OS X. For some reason, the upstream DNS resolver doesn't speak TCP. So this second unbound translates TCP from skydns back to UDP for this upstream. I can't even...
aade797
to
e006705
Compare
This doesn't work on my mac now. See #900 for an alternative approach |
No description provided.