Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default deployment of rke2-ingress-nginx has load balancer service enabled in RKE2 1.21.2 #1446

Closed
Martin-Weiss opened this issue Jul 22, 2021 · 16 comments
Assignees

Comments

@Martin-Weiss
Copy link

Martin-Weiss commented Jul 22, 2021

Environmental Info:
RKE2 Version: 1.21.2-rke2r1

Cluster Configuration:
cis-1.6

write-kubeconfig-mode: "0640"
cluster-cidr: "192.168.0.0/17"
service-cidr: "192.168.128.0/20"
cluster-dns: "192.168.128.10"
private-registry: /etc/rancher/rke2/registries.yaml
agent-token: xxx
token: yyy
profile: cis-1.6
tls-san:
  - "<clusterfqdn>"
node-label:
  - "cluster=<clustername>"

Describe the bug:
kubectl get services -n kube-system shows load balancer service for the ingress controller

Steps To Reproduce:
Install RKE2 1.21.2 and check for services in kube-system

Expected behavior:
The load balancer service for nginx-ingress should not be there (has not been the case in pre-1.21 versions)

Actual behavior:
Load Balancer service is there and pending

@nilathedragon
Copy link

Can confirm, the ingress is now successfully wasting one of my IP addresses. Any workarounds?

@brandond
Copy link
Member

brandond commented Jul 26, 2021

This comes from the upstream ingress-nginx chart, which ships with service.enabled=true:
https://github.com/kubernetes/ingress-nginx/blob/main/charts/ingress-nginx/values.yaml#L396-L397

We previously shipped an old (no longer supported) version of the chart which had this defaulted to false. If you want this old behavior back, you can provide a rke2-ingress-nginx HelmChartConfig manifest that sets the value to false.

@Martin-Weiss
Copy link
Author

Martin-Weiss commented Jul 26, 2021 via email

@mstrent
Copy link

mstrent commented Jul 26, 2021

For those who don't know, you need to add this to "/var/lib/rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml"

---
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      publishService:
        enabled: false
      service:
        enabled: false

Before I did this, not only did I have the extra LoadBalancer ingress showing up, but my other ingresses showed as pending or incomplete.

@brandond
Copy link
Member

Just out of curiosity, why do you have 'extra' ingresses? Are you deploying your own ingress controller alongside the built in one?

@JDB1976
Copy link

JDB1976 commented Jul 27, 2021

I don't know if this is relevant but I have just deployed v1.21.3+rke2r1 (fresh install, CentOS 8 - previously deployed with rancherd but cleaned that off and made a seperate VM for Rancher to run in) and I have an ingress controller, default NGINX (I'm using the built in) sitting in pending state and none of my L7 ingress definitions will initialize. Not sure what is going on here.

Capture2
image

@mstrent
Copy link

mstrent commented Jul 27, 2021

@brandond sorry, I'm new to the space so my language may not be precise here. :) What I was trying to describe is the same thing as the OP and as pictured by @JDB1976. Not using any other ingress controllers but the OOTB NGINX one.

Modifying the helm chart settings with service.enabled: false gets rid of the new "rke2-ingress-nginx-controller" LB and service (which stays at Pending), but the ingress pod crashes. Adding publishService.enabled: false stops the crashing. This seems to get us back to the previous behavior.

@JDB1976
Copy link

JDB1976 commented Jul 27, 2021

Might it be because the cluster deploys the NGINX service as a L4 external load balancer service by default? I've just replicated this behavior with my K8S CentOS 8 fresh install where I deployed a fresh NGINX. Once I reinstalled the deployment to deploy as a daemonset with helm and moved the service type to "Cluster IP (internal only)", all was right in the world again. :)

Along the lines of this:

helm install ingress-nginx ingress-nginx/ingress-nginx --set controller.hostNetwork=true,controller.service.type="",controller.kind=DaemonSet -n kube-system

I have not yet looked at this for the RKE2 install however. CONFIRMED: the default deployment in RKE2 of the NGINX ingress controller is a daemonset but the service is defined as a L4 external load balancer. When I changed to "Cluster IP (Internal only)" the pending state disappeared and all my defined ingresses deployed and function now!

So maybe there should be a note somewhere to remind us newbies that NGINX deploys it's service definition by default as an external L4 load balancer and also as a replica set and you need to change it to something conducive to your local environment if you don't use such a service and/or want to run as a demonset. I know, RTFM - but there are SO many FM's in this area it's hard to read them all. :)

@brandond
Copy link
Member

brandond commented Jul 27, 2021

@erikwilson can you take a look at this since you worked on the nginx helm chart most recently? It sounds like the current version of the chart doesn't work without an external LB controller.

@brandond brandond added this to the v1.21.4+rke2r1 milestone Jul 27, 2021
@mstrent
Copy link

mstrent commented Jul 27, 2021

Would this be a good time to investigate deploying as a daemonset by default as well? Then extra steps like this won't be necessary: https://rancher.com/docs/rancher/v2.5/en/installation/resources/k8s-tutorials/ha-rke2/#5-configure-nginx-to-be-a-daemonset

@brandond
Copy link
Member

It is already a daemonset by default.

@cjellick
Copy link
Contributor

Need to understand why this was not caught in our testing/automation?

Need QA to verify what happens on upgrade. Will this breaking functional setups upgrading from 1.20.x?

We may need to modify our 1.21.2 and 1.21.3 release notes to call out this regression.

@brandond
Copy link
Member

@cjellick QA was only testing with the AWS cloud provider configured; we've asked them to also test without a cloud provider to duplicate standalone environments. We also cleared up some confusion about how to test Ingress resources.

rancher/rke2-charts#123 modifies the defaults to match the previous shipped configuration.

@nickgerace
Copy link

nickgerace commented Jul 28, 2021

Ironically, we just hit this in Rancher provisioning v2: rancher/rancher#33775
cc: @cbron @kravciak @kinarashah

EDIT: first time we hit it for RKE1 not too long ago: rancher/rancher#30356

@brandond
Copy link
Member

/forwardport v1.22.0+rke2r1

@fapatel1 fapatel1 added this to the v1.21.3+rke2r2 milestone Aug 9, 2021
@rancher-max
Copy link
Member

Validated on v1.21.3-rc4+rke2r2

  • There is no longer the load balancer service enabled by default:
$ k get svc -n kube-system
NAME                                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
rke2-coredns-rke2-coredns                 ClusterIP   10.43.0.10     <none>        53/UDP,53/TCP   47m
rke2-ingress-nginx-controller-admission   ClusterIP   10.43.148.53   <none>        443/TCP         47m
rke2-metrics-server                       ClusterIP   10.43.30.100   <none>        443/TCP         47m
  • Ingress resource correctly are assigned an externalip
$ k get ingress
NAME                CLASS    HOSTS       ADDRESS                                                  PORTS   AGE
othertest-ingress   <none>   test1.com   18.xxx.yyy.zzz,3.xxx.yyy.zzz,3.xxx.yyy.zzz,3.xxx.yyy.zzz   80      8m42s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants