Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kube-router is stuck in CrashLoopBackoff #5328

Closed
4 tasks done
maeteem opened this issue Dec 6, 2024 · 4 comments
Closed
4 tasks done

kube-router is stuck in CrashLoopBackoff #5328

maeteem opened this issue Dec 6, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@maeteem
Copy link

maeteem commented Dec 6, 2024

Before creating an issue, make sure you've checked the following:

  • You are running the latest released version of k0s
  • Make sure you've searched for existing issues, both open and closed
  • Make sure you've searched for PRs too, a fix might've been merged already
  • You're looking at docs for the released version, "main" branch docs are usually ahead of released versions.

Platform

Linux 5.14.0-503.14.1.el9_5.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Nov 15 12:04:32 UTC 2024 x86_64 GNU/Linux
NAME="Rocky Linux"
VERSION="9.5 (Blue Onyx)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.5"
PLATFORM_ID="platform:el9"
PRETTY_NAME="Rocky Linux 9.5 (Blue Onyx)"
ANSI_COLOR="0;32"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:rocky:rocky:9::baseos"
HOME_URL="https://rockylinux.org/"
VENDOR_NAME="RESF"
VENDOR_URL="https://resf.org/"
BUG_REPORT_URL="https://bugs.rockylinux.org/"
SUPPORT_END="2032-05-31"
ROCKY_SUPPORT_PRODUCT="Rocky-Linux-9"
ROCKY_SUPPORT_PRODUCT_VERSION="9.5"
REDHAT_SUPPORT_PRODUCT="Rocky Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.5"

Version

v1.31.2+k0s.0

Sysinfo

`k0s sysinfo`
Total memory: 1.4 GiB (pass)
File system of /var/lib/k0s: xfs (pass)
Disk space available for /var/lib/k0s: 31.4 GiB (pass)
Relative disk space available for /var/lib/k0s: 91% (pass)
Name resolution: localhost: [::1 127.0.0.1] (pass)
Operating system: Linux (pass)
  Linux kernel release: 5.14.0-503.14.1.el9_5.x86_64 (pass)
  Max. file descriptors per process: current: 524288 / max: 524288 (pass)
  AppArmor: unavailable (pass)
  Executable in PATH: modprobe: /sbin/modprobe (pass)
  Executable in PATH: mount: /bin/mount (pass)
  Executable in PATH: umount: /bin/umount (pass)
  /proc file system: mounted (0x9fa0) (pass)
  Control Groups: version 2 (pass)
    cgroup controller "cpu": available (is a listed root controller) (pass)
    cgroup controller "cpuacct": available (via cpu in version 2) (pass)
    cgroup controller "cpuset": available (is a listed root controller) (pass)
    cgroup controller "memory": available (is a listed root controller) (pass)
    cgroup controller "devices": available (device filters attachable) (pass)
    cgroup controller "freezer": available (cgroup.freeze exists) (pass)
    cgroup controller "pids": available (is a listed root controller) (pass)
    cgroup controller "hugetlb": available (is a listed root controller) (pass)
    cgroup controller "blkio": available (via io in version 2) (pass)
  CONFIG_CGROUPS: Control Group support: built-in (pass)
    CONFIG_CGROUP_FREEZER: Freezer cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_PIDS: PIDs cgroup subsystem: built-in (pass)
    CONFIG_CGROUP_DEVICE: Device controller for cgroups: built-in (pass)
    CONFIG_CPUSETS: Cpuset support: built-in (pass)
    CONFIG_CGROUP_CPUACCT: Simple CPU accounting cgroup subsystem: built-in (pass)
    CONFIG_MEMCG: Memory Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_HUGETLB: HugeTLB Resource Controller for Control Groups: built-in (pass)
    CONFIG_CGROUP_SCHED: Group CPU scheduler: built-in (pass)
      CONFIG_FAIR_GROUP_SCHED: Group scheduling for SCHED_OTHER: built-in (pass)
        CONFIG_CFS_BANDWIDTH: CPU bandwidth provisioning for FAIR_GROUP_SCHED: built-in (pass)
    CONFIG_BLK_CGROUP: Block IO controller: built-in (pass)
  CONFIG_NAMESPACES: Namespaces support: built-in (pass)
    CONFIG_UTS_NS: UTS namespace: built-in (pass)
    CONFIG_IPC_NS: IPC namespace: built-in (pass)
    CONFIG_PID_NS: PID namespace: built-in (pass)
    CONFIG_NET_NS: Network namespace: built-in (pass)
  CONFIG_NET: Networking support: built-in (pass)
    CONFIG_INET: TCP/IP networking: built-in (pass)
      CONFIG_IPV6: The IPv6 protocol: built-in (pass)
    CONFIG_NETFILTER: Network packet filtering framework (Netfilter): built-in (pass)
      CONFIG_NETFILTER_ADVANCED: Advanced netfilter configuration: built-in (pass)
      CONFIG_NF_CONNTRACK: Netfilter connection tracking support: module (pass)
      CONFIG_NETFILTER_XTABLES: Netfilter Xtables support: built-in (pass)
        CONFIG_NETFILTER_XT_TARGET_REDIRECT: REDIRECT target support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_COMMENT: "comment" match support: module (pass)
        CONFIG_NETFILTER_XT_MARK: nfmark target and match support: module (pass)
        CONFIG_NETFILTER_XT_SET: set target and match support: module (pass)
        CONFIG_NETFILTER_XT_TARGET_MASQUERADE: MASQUERADE target support: module (pass)
        CONFIG_NETFILTER_XT_NAT: "SNAT and DNAT" targets support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: "addrtype" address type match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_CONNTRACK: "conntrack" connection tracking match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_MULTIPORT: "multiport" Multiple port match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_RECENT: "recent" match support: module (pass)
        CONFIG_NETFILTER_XT_MATCH_STATISTIC: "statistic" match support: module (pass)
      CONFIG_NETFILTER_NETLINK: module (pass)
      CONFIG_NF_NAT: module (pass)
      CONFIG_IP_SET: IP set support: module (pass)
        CONFIG_IP_SET_HASH_IP: hash:ip set support: module (pass)
        CONFIG_IP_SET_HASH_NET: hash:net set support: module (pass)
      CONFIG_IP_VS: IP virtual server support: module (pass)
        CONFIG_IP_VS_NFCT: Netfilter connection tracking: built-in (pass)
        CONFIG_IP_VS_SH: Source hashing scheduling: module (pass)
        CONFIG_IP_VS_RR: Round-robin scheduling: module (pass)
        CONFIG_IP_VS_WRR: Weighted round-robin scheduling: module (pass)
      CONFIG_NF_CONNTRACK_IPV4: IPv4 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_REJECT_IPV4: IPv4 packet rejection: module (pass)
      CONFIG_NF_NAT_IPV4: IPv4 NAT: unknown (warning)
      CONFIG_IP_NF_IPTABLES: IP tables support: module (pass)
        CONFIG_IP_NF_FILTER: Packet filtering: module (pass)
          CONFIG_IP_NF_TARGET_REJECT: REJECT target support: module (pass)
        CONFIG_IP_NF_NAT: iptables NAT support: module (pass)
        CONFIG_IP_NF_MANGLE: Packet mangling: module (pass)
      CONFIG_NF_DEFRAG_IPV4: module (pass)
      CONFIG_NF_CONNTRACK_IPV6: IPv6 connetion tracking support (required for NAT): unknown (warning)
      CONFIG_NF_NAT_IPV6: IPv6 NAT: unknown (warning)
      CONFIG_IP6_NF_IPTABLES: IP6 tables support: module (pass)
        CONFIG_IP6_NF_FILTER: Packet filtering: module (pass)
        CONFIG_IP6_NF_MANGLE: Packet mangling: module (pass)
        CONFIG_IP6_NF_NAT: ip6tables NAT support: module (pass)
      CONFIG_NF_DEFRAG_IPV6: module (pass)
    CONFIG_BRIDGE: 802.1d Ethernet Bridging: module (pass)
      CONFIG_LLC: module (pass)
      CONFIG_STP: module (pass)
  CONFIG_EXT4_FS: The Extended 4 (ext4) filesystem: module (pass)
  CONFIG_PROC_FS: /proc file system support: built-in (pass)

What happened?

kube-route is struck in CrashLoopBackOff on Rockey linux 9.5

NAMESPACE     NAME                              READY   STATUS             RESTARTS        AGE
kube-system   coredns-c698ff8d6-9nh99           1/1     Running            0               28m
kube-system   kube-proxy-cqjvb                  1/1     Running            0               28m
kube-system   kube-router-9gkp2                 0/1     CrashLoopBackOff   10 (2m1s ago)   28m
kube-system   metrics-server-78c4ccbc7f-p4t62   0/1     Running            0               28m

Steps to reproduce

  1. install fresh rocky linux 9.5
  2. Install single node k0s
curl --proto '=https' --tlsv1.2 -sSf https://get.k0s.sh | sudo sh
sudo /usr/local/bin/k0s install controller --single
sudo systemctl daemon-reload
sudo /usr/local/bin/k0s start
  1. wait for k0s start and get pod status
sudo /usr/local/bin/k0s kc get po -A
NAMESPACE     NAME                              READY   STATUS             RESTARTS        AGE
kube-system   coredns-c698ff8d6-9nh99           1/1     Running            0               28m
kube-system   kube-proxy-cqjvb                  1/1     Running            0               28m
kube-system   kube-router-9gkp2                 0/1     CrashLoopBackOff   10 (2m1s ago)   28m
kube-system   metrics-server-78c4ccbc7f-p4t62   0/1     Running            0               28m

Expected behavior

The status of the kube-router pod should be Running.

Actual behavior

kube-router status is CrashLoopBackOff

$ sudo /usr/local/bin/k0s kc get po -A
NAMESPACE     NAME                              READY   STATUS             RESTARTS        AGE
kube-system   coredns-c698ff8d6-9nh99           1/1     Running            0               28m
kube-system   kube-proxy-cqjvb                  1/1     Running            0               28m
kube-system   kube-router-9gkp2                 0/1     CrashLoopBackOff   10 (2m1s ago)   28m
kube-system   metrics-server-78c4ccbc7f-p4t62   0/1     Running            0               28m

Screenshots and logs

kube-router container log shows iptables v1.8.9 (legacy): Failed to initialize iptables table but kube-proxy rule registered successfully.

$ sudo cat /var/log/containers/kube-router-9gkp2_kube-system_kube-router-04f3f5773cfab3a3c7d2194660104eb0da4b4ce525376f7626fa020eddfe6169.log
2024-12-06T12:05:58.383137532+07:00 stderr F I1206 05:05:58.382909   21274 version.go:66] Running /usr/local/bin/kube-router version v2.2.1, built on 2024-08-09T16:15:52+0200, go1.22.7
2024-12-06T12:05:58.484372457+07:00 stderr F I1206 05:05:58.484241   21274 metrics_controller.go:232] Starting metrics controller
2024-12-06T12:05:58.491492228+07:00 stderr F I1206 05:05:58.491369   21274 network_routes_controller.go:1651] Could not find annotation `kube-router.io/bgp-local-addresses` on node object so BGP will listen on node IP: [172.19.85.191] addresses.
2024-12-06T12:05:58.494517018+07:00 stderr F E1206 05:05:58.494395   21274 network_routes_controller.go:184] Failed to enable IP forwarding of traffic from pods: failed to run iptables command: running [/sbin/iptables -t filter -C FORWARD -m comment --comment allow outbound traffic from pods -i kube-bridge -j ACCEPT --wait]: exit status 3: iptables v1.8.9 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
2024-12-06T12:05:58.494532516+07:00 stderr F Perhaps iptables or your kernel needs to be upgraded.
2024-12-06T12:05:58.498947164+07:00 stderr F I1206 05:05:58.498839   21274 network_policy_controller.go:163] Starting network policy controller
2024-12-06T12:05:58.500081047+07:00 stderr F E1206 05:05:58.499957   21274 network_routes_controller.go:218] Error cleaning up old/bad Pod egress rules: failed to lookup iptables rule: running [/sbin/iptables -t nat -C POSTROUTING -m set --match-set kube-router-pod-subnets src -m set ! --match-set kube-router-pod-subnets dst -j MASQUERADE --wait]: exit status 3: Warning: Extension set is not supported, missing kernel module?
2024-12-06T12:05:58.500096646+07:00 stderr F iptables v1.8.9 (legacy): can't initialize iptables table `nat': Table does not exist (do you need to insmod?)
2024-12-06T12:05:58.500098846+07:00 stderr F Perhaps iptables or your kernel needs to be upgraded.
2024-12-06T12:05:58.50034582+07:00 stderr F F1206 05:05:58.500047   21274 network_policy_controller.go:448] failed to run iptables command to create KUBE-ROUTER-INPUT chain due to running [/sbin/iptables -t filter -S KUBE-ROUTER-INPUT 1 --wait]: exit status 3: iptables v1.8.9 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
2024-12-06T12:05:58.500354319+07:00 stderr F Perhaps iptables or your kernel needs to be upgraded.

$ sudo iptables -nvL INPUT
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
  477 31830 KUBE-PROXY-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes load balancer firewall */
93820  104M KUBE-NODEPORTS  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes health check service ports */
  477 31830 KUBE-EXTERNAL-SERVICES  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
 103K  157M KUBE-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0

Additional context

From Networking document I found that not all components use the same iptables mode.
k0s select nftables mode, kube-proxy also use nftables mode but kube-route use legacy mode.
I guest iptables legacy may not work on Rocky linux 9.5

$ sudo journalctl -u k0scontroller --no-pager | grep iptables
Dec 06 11:28:52 localhost.localdomain k0s[18152]: time="2024-12-06 11:28:52" level=info msg="Trying to detect iptables mode"
Dec 06 11:28:52 localhost.localdomain k0s[18152]: time="2024-12-06 11:28:52" level=info msg="Selecting iptables-nft: /usr/sbin/iptables --version: iptables v1.8.10 (nf_tables)"
Dec 06 11:28:52 localhost.localdomain k0s[18152]: time="2024-12-06 11:28:52" level=info msg="using iptables-nft"
...

$ sudo nsenter -t $(pidof kube-proxy) -m iptables -V
iptables v1.8.10 (nf_tables)

I replaced the kube-router image with the latest version from docker hub and found that it can run on rocky 9.5 (not sure about the functionality, at least it started and picked up nftables mode).

==== use cloudnativelabs/kube-router latest version from docker hub ====

$ sudo /usr/local/bin/k0s kc -n kube-system edit daemonset kube-router
.....
    spec:
      containers:
      - args:
        - --run-router=true
        - --run-firewall=true
        - --run-service-proxy=false
        - --bgp-graceful-restart=true
        - --enable-ipv4=true
        - --enable-ipv6=false
        - --auto-mtu=true
        - --metrics-port=8080
        - --hairpin-mode=true
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        - name: KUBE_ROUTER_CNI_CONF_FILE
          value: /etc/cni/net.d/10-kuberouter.conflist
        image: cloudnativelabs/kube-router
        imagePullPolicy: IfNotPresent
....
$ sudo /usr/local/bin/k0s kc get po -A
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE
kube-system   coredns-c698ff8d6-9nh99           1/1     Running   0          87m
kube-system   kube-proxy-cqjvb                  1/1     Running   0          87m
kube-system   kube-router-jdnk7                 1/1     Running   0          44s
kube-system   metrics-server-78c4ccbc7f-p4t62   0/1     Running   0          87m

$ sudo nsenter -t $(pidof kube-router) -m /sbin/iptables -V
iptables v1.8.9 (nf_tables)
@maeteem maeteem added the bug Something isn't working label Dec 6, 2024
@maeteem
Copy link
Author

maeteem commented Dec 9, 2024

Update:
The difference between kube-router from k0sproject and cloudnativelabs is /sbin/iptables in kube-router image from k0sproject is symlink to xtables-legacy-multi but /sbin/iptables from cloudnativelabs is symlink to iptables-wrapper.

$ docker run --rm --entrypoint /bin/bash quay.io/k0sproject/kube-router:v2.2.1-iptables1.8.9-0 -c "ls -l /sbin/iptables"
lrwxrwxrwx    1 root     root            20 Sep 25 14:44 /sbin/iptables -> xtables-legacy-multi
$ $ docker run --rm --entrypoint /bin/bash cloudnativelabs/kube-router:v2.2.1 -c "ls -l /sbin/iptables"
lrwxrwxrwx    1 root     root            22 Aug  9 14:24 /sbin/iptables -> /sbin/iptables-wrapper

The kube-router image already iniclude iptables-wrapper but dose not run installer script during build the image.

$ docker run --rm --entrypoint /bin/bash quay.io/k0sproject/kube-router:v2.2.1-iptables1.8.9-0 -c "ls -l /"
total 1892
drwxr-xr-x    1 root     root            18 Sep 25 14:44 bin
drwxr-xr-x    5 root     root           340 Dec  9 03:52 dev
drwxr-xr-x    1 root     root            25 Dec  9 03:52 etc
drwxr-xr-x    2 root     root             6 Sep  6 11:36 home
-rwxr-xr-x    1 root     root       1921176 Sep 25 14:45 iptables-wrapper
-rwxr-xr-x    1 root     root          4238 Sep 25 14:44 iptables-wrapper-installer.sh
drwxr-xr-x    1 root     root            61 Sep 25 14:44 lib
drwxr-xr-x    5 root     root            44 Sep  6 11:36 media
drwxr-xr-x    2 root     root             6 Sep  6 11:36 mnt
drwxr-xr-x    2 root     root             6 Sep  6 11:36 opt
dr-xr-xr-x  211 nobody   nobody           0 Dec  9 03:52 proc
drwx------    2 root     root             6 Sep  6 11:36 root
drwxr-xr-x    1 root     root            42 Dec  9 03:52 run
drwxr-xr-x    1 root     root          4096 Sep 25 14:44 sbin
drwxr-xr-x    2 root     root             6 Sep  6 11:36 srv
dr-xr-xr-x   13 nobody   nobody           0 Dec  9 03:16 sys
drwxrwxrwt    2 root     root             6 Sep  6 11:36 tmp
drwxr-xr-x    1 root     root            19 Sep  6 11:36 usr
drwxr-xr-x    1 root     root            17 Sep  6 11:36 var

I was able to successfully launch kube-router by adding a command to the kub-router daemonset that calls the iptables-wrapper installer before launching kube-router.

$ sudo /usr/local/bin/k0s kc -n kube-system edit daemonset kube-router
....
    spec:
      containers:
      - args:
        - --run-router=true
        - --run-firewall=true
        - --run-service-proxy=false
        - --bgp-graceful-restart=true
        - --enable-ipv6=false
        - --enable-ipv4=true
        - --auto-mtu=true
        - --metrics-port=8080
        - --hairpin-mode=true
        command: [ "/bin/sh", "-c", "cd /; ./iptables-wrapper-installer.sh; cd /root; /usr/local/bin/kube-router $@" ]
....
$ sudo /usr/local/bin/k0s kc get po -A
NAMESPACE     NAME                              READY   STATUS    RESTARTS      AGE
kube-system   coredns-c698ff8d6-9nh99           1/1     Running   2 (39m ago)   2d23h
kube-system   kube-proxy-qgxvd                  1/1     Running   1 (39m ago)   48m
kube-system   kube-router-dzgnn                 1/1     Running   0             23s
kube-system   metrics-server-78c4ccbc7f-p4t62   0/1     Running   4 (38m ago)   2d23h

$ sudo nsenter -t $(pidof kube-router) -m /sbin/iptables --version
iptables v1.8.9 (nf_tables)

$ sudo iptables -nvL INPUT
Chain INPUT (policy ACCEPT 713 packets, 377K bytes)
 pkts bytes target     prot opt in     out     source               destination
  828  396K KUBE-ROUTER-INPUT  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kube-router netpol - 4IA2OSFRMVNDXBVV */
   13   780 KUBE-PROXY-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes load balancer firewall */
  713  377K KUBE-NODEPORTS  0    --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes health check service ports */
   13   780 KUBE-EXTERNAL-SERVICES  0    --  *      *       0.0.0.0/0            0.0.0.0/0            ctstate NEW /* kubernetes externally-visible service portals */
  713  377K KUBE-FIREWALL  0    --  *      *       0.0.0.0/0            0.0.0.0/0

So, Running the iptables-wrapper installer script during kube-router image build could fix iptables compatibility issues.

@till
Copy link
Contributor

till commented Dec 10, 2024

@maeteem when you run iptables-save with one of the binaries from /var/lib/k0s/bin vs. the "correct" one. Do you also see the warning about legacy rules? Or is that unrelated?

@twz123
Copy link
Member

twz123 commented Dec 11, 2024

Can you maybe try the fixed kube-router image?

apiVersion: k0s.k0sproject.io/v1beta1
kind: ClusterConfig
metadata:
  name: k0s
  namespace: kube-system
spec:
  images:
    kuberouter:
      cni:
        image: quay.io/k0sproject/kube-router
        version: v2.2.1-iptables1.8.9-1

@twz123
Copy link
Member

twz123 commented Dec 14, 2024

The new v1.31.3+k0s.0 ships the fixed kube-router image by default now. Please ping here if the problem persists.

@twz123 twz123 closed this as completed Dec 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants