Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: include priorityClassName: system-node-critical to controller deployments #3800

Closed
1 task done
gladiatr72 opened this issue Apr 14, 2023 · 13 comments · Fixed by #3802
Closed
1 task done

FR: include priorityClassName: system-node-critical to controller deployments #3800

gladiatr72 opened this issue Apr 14, 2023 · 13 comments · Fixed by #3802

Comments

@gladiatr72
Copy link

gladiatr72 commented Apr 14, 2023

Describe the bug

flux/cli bootstrap deploys controller deployments without specifying a priorityClass. This puts them front-of-the-line for eviction if the resources of the underlying node come under even the slightest pressure and can result in flux objects (lookin' at you, helmRelease 😁) being left in an inconsistent state.

Steps to reproduce

flux deployments do not include deployment.spec.template.spec.priorityClassName. By the pod priority/eviction docs, this gives it a default priority of 0, or, in symbolic parlance, the priority of a sacrificial goat. In non-flux rolling update scenarios (managed by flux), it is often the case that resource pressure leads to flux controller eviction.

I would like to suggest bootstrap set controllers' priorityClass to system-node-critical since it is a long-standing kubernetes default priorityClass.

Expected behavior

When a source makes new data available that responding fluxcd controllers are around long enough to see the defined changes applied.

Screenshots and recordings

No response

OS / Distro

n/a

Flux version

v0.40.1
edit: specific concern already addressed with v41.1 release

Flux check

n/a

Git provider

n/a

Container Registry provider

n/a

Additional context

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@makkes
Copy link
Member

makkes commented Apr 15, 2023

Please take a look at the documentation that explains how to customize the manifests to your needs.

@gladiatr72
Copy link
Author

I don't need to be told how to do it.

I've been deploying with flux2 since 0.10. I am quite familiar with the docs and make use of Flux to configure Flux the way it should be configured.

I'm talking about defaults. I might mention that when gitops is written about, fluxcd is included in the list of gitops systems maybe a third of the time (exceptions include pieces written by outfits that are partnered in some way with fluxcd)

If you go to the issues for this and related Flux repos, you find many of them involve problems with controller stability. There are real code issues and design decisions that are still being worked out, but often the problem boils down to the controller dying in the middle of reconciliation.

Unfortunately, as the gutting of kube's built-in drivers continues, more of said drivers are being pushed to the node as daemonsets, the liklihood of Flux controllers dying from goat-level prioritization rises. The defaults pretty much guarantee instability during deployment (actually using fluxcd for it's defined purpose)

If the cluster operator must update a parameter on every default deployment, it is, by definition, a default which is what this issue concerns.

@stefanprodan
Copy link
Member

if the resources of the underlying node come under even the slightest pressure and can result in flux objects (lookin' at you, helmRelease 😁) being left in an inconsistent state.

This is no longer the case, starting with Flux v0.41.0, helm-controller handles SIGTERM sent by kubelet and aborts all inflight Helm operations by setting the releases status to failed. When it restarts, those failed releases are retried as they are no longer left in an inconsistent state.

I would like to suggest bootstrap set controllers' priorityClass to system-node-critical

The Flux controllers are not static pods nor daemonset, IMO a more suitable class would be system-cluster-critical which is 2nd after system-node-critical, if things go wrong and Kubernetes must evict pods, I prefer to have Flux evicted before CoreDNS or CNIs. Also setting this priority class on all Flux controllers doesn't feel right to me, we could use it only for source/kustomize/helm controllers which are "critical" to cluster sync operations.

@gladiatr72
Copy link
Author

Wrt the helm controller, as of 0.41.1, it does not. Pretty sure that's why there is a current initiative to properly rollback to a reconcilable state when it fails.

I agree with you regarding using the system-cluster-critical class.

@stefanprodan
Copy link
Member

Wrt the helm controller, as of 0.41.1, it does not.

What are you talking about? The SIGTERM handling was not reverted in 0.41.1, we shipped a cgroup fix in that release for the OOMWatcher, which is an opt-in feature.

@gladiatr72
Copy link
Author

gladiatr72 commented Apr 15, 2023

I'll post links to the issue mentioned when not on mobile.

I do not think we're speaking of the same thing.

@stefanprodan
Copy link
Member

In the case of pod eviction, due to node pressure or downscaling, before Flux v0.41.0, the inflight releases were left in a locked state. Once a release is locked, helm-controller can't perform any actions on it. Starting with v0.41, the controller reacts to SIGTERM (sent before eviction) and sets the inflight releases as failed, thus removing the lock. When helm-controller starts on a different node, it will retry the failed releases, if configured to do so.

@gladiatr72
Copy link
Author

gladiatr72 commented Apr 19, 2023

@stefanprodan thanks. I meant to get back to this but the week got away from me.

I must admit to you: I was running the 41 cli against a 40 installation. I'll edit the issue for historical clarity. Thanks for the PR.

@makkes
Copy link
Member

makkes commented Apr 19, 2023

Re-opening as the PR implementing the actual feature hasn't been merged, yet.

@adarhef
Copy link

adarhef commented May 11, 2023

Hello!
I tried Flux for the first time today. The deployments for the kustomize helm and source controllers are blocked on GKE (notification controller deploys fine it seems) with the error: insufficient quota to match these scopes: [{PriorityClass In [system-node-critical system-cluster-critical]}]. This appears to prevent bootstrapping from succeeding.
From my understanding this is due to the fact that GKE won't allow these priority classes to be used by deployments without explicit ResourceQuota resource definitions or something similar.
It seems the related PR was merged into the release from two days ago, so I assume this causes my issue.
Does anyone have any thoughts on this? Didn't want to open a new issue to bring this up if it's just due to a misunderstanding on my part..

@stefanprodan
Copy link
Member

@adarhef this has been fixed, thanks for reporting it. To install Flux on GKE, download v2.0.0-rc.3 run flux uninstall and then flux bootstrap. I've tested it with RC.3 on GKE v1.26.2-gke.1000 and works Ok.

@adarhef
Copy link

adarhef commented May 13, 2023

@adarhef this has been fixed, thanks for reporting it. To install Flux on GKE, download v2.0.0-rc.3 run flux uninstall and then flux bootstrap. I've tested it with RC.3 on GKE v1.26.2-gke.1000 and works Ok.

Thanks! I didn't get the chance to test your earlier suggestion to apply the ResourceQuota manually, but I did upgrade to rc3 and that issue went away. Things appear to be working fine.

@ksemele
Copy link

ksemele commented May 19, 2023

@adarhef this has been fixed, thanks for reporting it. To install Flux on GKE, download v2.0.0-rc.3 run flux uninstall and then flux bootstrap. I've tested it with RC.3 on GKE v1.26.2-gke.1000 and works Ok.

when flux uninstall not available (e.g. flux v0.41 on production env) - bootstrap via flux install will be correctly overwrite all resources of flux?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants