Skip to content
This repository has been archived by the owner on May 22, 2020. It is now read-only.

Use CNI plugin #115

Merged
merged 1 commit into from
Jul 28, 2016
Merged

Use CNI plugin #115

merged 1 commit into from
Jul 28, 2016

Conversation

zreigz
Copy link

@zreigz zreigz commented Jun 17, 2016

This solution get rid of docker bootstrap service and uses cni plugin instead.
The newest hypercube must be use for this solution.

Tests result:

Summarizing 7 Failures:

[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl describe [It] should check if kubectl describe prints relevant information for rc and pods [Conformance] [Flaky] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1291

[Fail] [k8s.io] Secrets [It] should be consumable from pods in volume [Conformance] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/framework/util.go:1670

[Fail] [k8s.io] Pods [It] should have monotonically increasing restart count [Conformance] [Slow] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/pods.go:58

[Fail] [k8s.io] Kubectl client [k8s.io] Kubectl run job [It] should create a job from an image when restart is Never [Conformance] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1190

[Fail] [k8s.io] DNS [It] should provide DNS for the cluster [Conformance] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/dns.go:194

[Fail] [k8s.io] ClusterDns [Feature:Example] [It] should create pod that uses dns [Conformance] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/example_cluster_dns.go:130

[Fail] [k8s.io] Kubectl client [k8s.io] Guestbook application [It] should create and stop a working application [Conformance] 
/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/test/e2e/kubectl.go:1369

Ran 94 of 293 Specs in 571.681 seconds
FAIL! -- 87 Passed | 7 Failed | 0 Pending | 199 Skipped 

@zreigz
Copy link
Author

zreigz commented Jun 17, 2016

cc @cheld @luxas @mikedanese

@zreigz
Copy link
Author

zreigz commented Jun 17, 2016

If you want to test it with vagrant you can use project: https://github.com/FujitsuEnablingSoftwareTechnologyGmbH/kube-deploy-vagrant

It creates cluster with master and node

@cheld
Copy link
Contributor

cheld commented Jun 17, 2016

I tested it. Also checked the network settings. Everything seems to work and setup properly.

Finally, we have a starting point. Yippee. First step is to upgrade cni to latest version.

@luxas
Copy link
Contributor

luxas commented Jun 17, 2016

Nice. I'm having a PR up for CNI that makes it compile on arm64/ppc64le too.

@zreigz Checked out your patches, and noticed that instead of downloading CNI at runtime on the host, we should package the latest binaries in hyperkube. @mikedanese already sent a patch for this, but it seems like the version now is too old for being useful.

Generally I think everything that hyperkube needs should be packed inside the container.

Good that we've taken steps forward with this :)

On 17 Jun 2016, at 16:19, Christoph Held [email protected] wrote:

I tested it. Also checked the network settings. Everything seems to work and setup properly.

Finally, we have a starting point. Yippee. First step is to upgrade cni to latest version.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@cheld
Copy link
Contributor

cheld commented Jun 17, 2016

We should also package a default configuration for flannel to hyperkube. But proably not at default location.

@@ -18,7 +18,7 @@

kube::multinode::main(){
LATEST_STABLE_K8S_VERSION=$(kube::helpers::curl "https://storage.googleapis.com/kubernetes-release/release/stable.txt")
K8S_VERSION=${K8S_VERSION:-${LATEST_STABLE_K8S_VERSION}}
K8S_VERSION=v1.3.0-beta.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave as-is.
Instead, we should provide two "routes" of configuration as long as k8s supports v1.2, i.e. we may remove the current docker-bootstrap solution when v1.4 is released.

@luxas
Copy link
Contributor

luxas commented Jun 28, 2016

@zreigz Are you going to update this anytime soon? We'll look at this one after #164 is merged.

@zreigz
Copy link
Author

zreigz commented Jun 29, 2016

Yes I will continue with this. Something new should be done to the end of the week.

@cheld
Copy link
Contributor

cheld commented Jun 29, 2016

We are blocked until CNI is updated in Hyperkube, right?

@zreigz
Copy link
Author

zreigz commented Jun 29, 2016

Yes but I can make some changes to support booth solutions (split code to cni and bootstrap scripts, extract common code, some code refator, etc.)

@zreigz
Copy link
Author

zreigz commented Jun 29, 2016

By default docker-bootstrap will be enabled for now

@zreigz
Copy link
Author

zreigz commented Jul 4, 2016

PTAL
The simplest way to test with CNI plugin is replace gcr.io/google_containers/hyperkube-${ARCH}:${K8S_VERSION} to fest/hyperkube-amd64:latest and set flag USE_NETWORK_CNI_PLUGIN=true.

@cheld
Copy link
Contributor

cheld commented Jul 4, 2016

I suggest to switch between bootsrap and cni using Kubernetes version. In other words: in case user is running kubernetes 1.3 then he must use cni. This way we get rid of docker-bootstrap with next version.
A code snippet to parse Kubernetes version number is already in the code.

I tend to name the files
common.sh
common-1.2.sh
common-1.3.sh

This indicates the how the files belong together. (cni is just an implementation detail)

@zreigz
Copy link
Author

zreigz commented Jul 5, 2016

@luxas PTAL

kube::multinode::restart_docker_systemd
else
DOCKER_CONF="/etc/default/docker"
sed -i.bak 's/^\(MountFlags=\).*/\1shared/' $DOCKER_CONF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MountFlag is a Systemd option, not a docker option.

We can implement like this:

kubernetes/dashboard#969

Or maybe, we just remove the flag...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested manual mount tests and hyperkube. Everything seems to work without mount flag. Also, I found a somewhat related bug: docker/machine#3029
I think it is better to remove/comment the line in systemd unit file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting shared for this flag shows clear intention and purpose and doesn't require any comments. When we get rid of this flag then it can be unclear for others and we have to add additional comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, as long as it is set by docker installer, we should leave it. But you will remove the following code, right?

 DOCKER_CONF="/etc/default/docker"
 sed -i.bak 's/^\(MountFlags=\).*/\1shared/' $DOCKER_CONF

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for this it doesn't have sense because it is not a service file.

@luxas
Copy link
Contributor

luxas commented Jul 11, 2016

@zreigz First off, thanks for writing this! I'm excited to begin using CNI now, but there are a few blockers. I guess kubernetes/kubernetes#28673 is one of them. I suppose you've run this with your own, patched hyperkube images when testing this?

Sorry, but here's too much duplicated code. I made a first pass at the script and ended up with something like 341 insertions(+), 241 deletions(-) with some rebase conflicts yet to fix. For instance, my cni-plugin.sh file looks like this:

kube::cni::restart_docker(){

  if kube::helpers::command_exists systemctl; then

    DOCKER_CONF=$(systemctl cat docker | head -1 | awk '{print $2}')

    # If we can find MountFlags but not MountFlags=shared, set MountFlags to shared
    if [[ ! -z $(grep "MountFlags" ${DOCKER_CONF}) && -z $(grep "MountFlags=shared" ${DOCKER_CONF}) ]]; then

      sed -i.bak 's/^\(MountFlags=\).*/\1shared/' ${DOCKER_CONF}
      systemctl daemon-reload
      systemctl restart docker

      kube::log::status "Restarted docker with the new flannel settings"
    fi
  fi
}

No more. Again, we should keep the code here to absolutely a minimum. I'll upload my branch as soon as I have CNI working, I haven't got it fully working yet. Logs:

I0711 20:23:46.866358   29710 plugins.go:141] Loaded network plugin "cni"
E0711 20:23:53.804717   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'kube-addon-manager-172.20.10.6' - Unexpected address output  
E0711 20:23:54.178849   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'k8s-proxy-172.20.10.6' - Unexpected address output  
E0711 20:23:55.278803   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'k8s-proxy-172.20.10.6' - Unexpected address output  
E0711 20:23:55.329294   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'k8s-master-172.20.10.6' - Unexpected address output  
E0711 20:23:55.406653   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'kube-addon-manager-172.20.10.6' - Unexpected address output  
E0711 20:23:56.641932   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'k8s-master-172.20.10.6' - Unexpected address output  
E0711 20:23:56.687804   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'kube-addon-manager-172.20.10.6' - Unexpected address output  
E0711 20:23:57.779685   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'k8s-master-172.20.10.6' - Unexpected address output  
E0711 20:23:58.890347   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'k8s-master-172.20.10.6' - Unexpected address output  
E0711 20:24:04.550820   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'kube-dns-v17-os7ce' - Unexpected command output Device "eth0" does not exist.
E0711 20:24:04.616757   29710 docker_manager.go:345] NetworkPlugin cni failed on the status hook for pod 'kubernetes-dashboard-v1.1.0-omg7w' - Unexpected command output Device "eth0" does not exist.
E0711 20:24:06.581472   29710 cni.go:166] Error adding network: could not add IP address to "cni0": file exists
E0711 20:24:06.581551   29710 cni.go:123] Error while adding to cni network: could not add IP address to "cni0": file exists
E0711 20:24:06.581590   29710 docker_manager.go:1968] Failed to setup network for pod "kube-dns-v17-os7ce_kube-system(682a3089-47a5-11e6-90ee-e81132c07bd7)" using network plugins "cni": could not add IP address to "cni0": file exists; Skipping pod
E0711 20:24:07.102705   29710 pod_workers.go:183] Error syncing pod 682a3089-47a5-11e6-90ee-e81132c07bd7, skipping: failed to "SetupNetwork" for "kube-dns-v17-os7ce_kube-system" with SetupNetworkError: "Failed to setup network for pod \"kube-dns-v17-os7ce_kube-system(682a3089-47a5-11e6-90ee-e81132c07bd7)\" using network plugins \"cni\": could not add IP address to \"cni0\": file exists; Skipping pod"

I think this might be related to an upstream issue... (kubernetes/kubernetes#28178)

However, how is your docker daemon configured? I might have to turn off iptables and ipmasq in the docker daemon startup, they default to true. And how about flannel's ipmasq flag? Have you had it set to true or false?

@zreigz
Copy link
Author

zreigz commented Jul 12, 2016

Thanks for review I will check my code again and I will try get rid off duplicates. I hope during the review process we make it better.

I use default setting for docker daemon startup.

Officially we still don't have hypercube with CNI support but unofficially we use our prepared for this purpose image: fest/hyperkube-amd64:latest This image is based on 1.3 release. Until kubernetes/kubernetes#28673 will be merged I use fest/hyperkube-amd64:latest You can also try it.

So for the next iteration I will make code simple as possible and rebase the code

@luxas
Copy link
Contributor

luxas commented Jul 12, 2016

@zreigz Have you come across the errors I got above? What options are you passing to docker, and what's FLANNEL_IPMASQ when you're running it?

@cheld
Copy link
Contributor

cheld commented Jul 12, 2016

@luxas do you mean duplicated code between docker-bootstrap.sh and cni-plugin.sh?

My idea was that we do not touch docker-bootstrap.sh in the future (only important bugfixes). The hole file will be dropped with 1.4 release. So, the duplicate code will not hurt that much. (Under this assumption it might simplify the code)

But, I am fine with refactoring, too


# Utility functions for Kubernetes in docker setup and for cni network plugin.

kube::cni::restart_docker(){
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could rename this to ensure_shared_mount

@luxas
Copy link
Contributor

luxas commented Jul 20, 2016

Got it working, but a few gotchas:

  • Seems to require CONFIG_CGROUP_PIDS (which wasn't present on my Pine64 :/)
  • /etc/resolv.conf doesn't include DNS servers for the internet, but it doesn't seem like a CNI thing. Anyone encountered the same with v1.3?
  • If the --bip and --mtu options are there from the "old" docker bootstrap config, cni seems to fail
  • Is it possible for you to turndown and start again when using CNI? For me it's a bit flaky, the cni0 bridge isn't removed, and therefore kubelet complains it does exist already.

@cheld
Copy link
Contributor

cheld commented Jul 21, 2016

DNS:
hmm... I stumbled over the /etc/resolv.confproblem. I even created an issue:

kubernetes/kubernetes#27177

we launch kublet with --cluster-dns flag. This value is added to /etc/resolv.conf

I didn't test, but skydns has a nameservers flag which is intended to forward DNS requests. I cannot remember exactly, but I believe the behavior did not change with cni. As we have both versions we can check easily.

Restart
I think we tested a reboot of the machine - this should work. It is possible that we forgot to test multiple startup/turndown

BTW: @zreigz is on holiday this week

@zreigz zreigz force-pushed the cni-plugin branch 2 times, most recently from 469d1b3 to 06f1ec2 Compare July 25, 2016 11:29
@zreigz
Copy link
Author

zreigz commented Jul 25, 2016

I am back. PTAL

@cheld
Copy link
Contributor

cheld commented Jul 25, 2016

@zreigz could you verify that pods cannot resolve public internet names without cni?


kube::helpers::parse_version ${K8S_VERSION}

if [[ ${USE_CNI} == "true" && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just put ${USE_CNI} == true here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean without checking version and architecture ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add that later.
CNI should be automatically chosen with version v1.4.0-alpha.2 and above when we consider it stable.

@zreigz
Copy link
Author

zreigz commented Jul 27, 2016

Today I will proceed some extra tests to verify/fix mentioned issues.

systemctl daemon-reload
systemctl restart docker

kube::log::status "Restarted docker with the new flannel settings"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Restarted docker with MountFlags=shared

@luxas
Copy link
Contributor

luxas commented Jul 27, 2016

Great! I'll try to get this through today!
Test it throughfully (maybe with conformance too), and report the results here.
I'll also test it and then we'll see what we come up with.

@zreigz
Copy link
Author

zreigz commented Jul 27, 2016

@cheld @luxas Regarding to DNS. I've started cluster in docker-bootstrap mode. I've created some pods and entered to one of this:
docker exec -it 684fed694fa2 /bin/bash
or
kubectl exec -it frontend-708336848-94efi /bin/bash

In console I've executed ping google.com and I had correct responses. I've checked it with versions: v1.3.3 and v1.4.0-alpha.1

@zreigz zreigz force-pushed the cni-plugin branch 2 times, most recently from 88c01a8 to 11e35a8 Compare July 27, 2016 12:04
@zreigz
Copy link
Author

zreigz commented Jul 27, 2016

Regarding to:

  • If the --bip and --mtu options are there from the "old" docker bootstrap config, cni seems to fail

I've added check for it in cni-plugin.sh

@zreigz
Copy link
Author

zreigz commented Jul 28, 2016

Regarding this:

  • Is it possible for you to turndown and start again when using CNI? For me it's a bit flaky, the cni0 bridge isn't removed, and therefore kubelet complains it does exist already.

I can confirm the problem. I can take this for today.

@zreigz
Copy link
Author

zreigz commented Jul 28, 2016

I've pushed a new code with work around for cni0 bridge:
https://github.com/kubernetes/kube-deploy/pull/115/files#diff-144f411bd451f064e4e09136ff82f1cdR328

but I think the problem should be resolve in cni plugin code. I will try take a look for this code maybe I will be able propose something. Right now script uses brctl command to remove cni0 bridge in turndown function.

}

# Install network utils: ifconfig, brctl
kube::multinode::install_network_utils() {
case "${lsb_dist}" in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please just check for yum or apt-get and install the packages if one of those package managers exist.
I'd like to get rid of the OS-specific code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I will do it.

@luxas
Copy link
Contributor

luxas commented Jul 28, 2016

Yes, It clearly should solved in the CNI code itself.
Notify me if you come up with a PR for Kubernetes, then I can review it.

@zreigz
Copy link
Author

zreigz commented Jul 28, 2016

Yes sure

@luxas
Copy link
Contributor

luxas commented Jul 28, 2016

Thanks! LGTM

We can iretate more on this in new PRs

@luxas luxas merged commit d4792f2 into kubernetes-retired:master Jul 28, 2016
@zreigz zreigz deleted the cni-plugin branch July 29, 2016 11:25
@zreigz
Copy link
Author

zreigz commented Jul 29, 2016

@luxas You mentioned about getting rid of OS distribution dependencies. If you don't mind I can take this task. What do you think ?

@luxas
Copy link
Contributor

luxas commented Jul 29, 2016

Sounds good. We like to keep them as few as possible.

On 29 Jul 2016, at 14:40, Lukasz Zajaczkowski [email protected] wrote:

@luxas You mentioned about getting rid of OS distribution dependencies. If you don't mind I can take this task. What do you think ?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@zreigz
Copy link
Author

zreigz commented Jul 29, 2016

Ok thanks I will create PR for it after weekend

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants