FR: include `priorityClassName: system-node-critical` to controller deployments #3800

gladiatr72 · 2023-04-14T19:26:38Z

Describe the bug

flux/cli bootstrap deploys controller deployments without specifying a priorityClass. This puts them front-of-the-line for eviction if the resources of the underlying node come under even the slightest pressure and can result in flux objects (lookin' at you, helmRelease 😁) being left in an inconsistent state.

Steps to reproduce

flux deployments do not include deployment.spec.template.spec.priorityClassName. By the pod priority/eviction docs, this gives it a default priority of 0, or, in symbolic parlance, the priority of a sacrificial goat. In non-flux rolling update scenarios (managed by flux), it is often the case that resource pressure leads to flux controller eviction.

I would like to suggest bootstrap set controllers' priorityClass to system-node-critical since it is a long-standing kubernetes default priorityClass.

Expected behavior

When a source makes new data available that responding fluxcd controllers are around long enough to see the defined changes applied.

Screenshots and recordings

No response

OS / Distro

n/a

Flux version

v0.40.1
edit: specific concern already addressed with v41.1 release

Flux check

n/a

Git provider

n/a

Container Registry provider

n/a

Additional context

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

makkes · 2023-04-15T07:59:19Z

Please take a look at the documentation that explains how to customize the manifests to your needs.

gladiatr72 · 2023-04-15T13:47:29Z

I don't need to be told how to do it.

I've been deploying with flux2 since 0.10. I am quite familiar with the docs and make use of Flux to configure Flux the way it should be configured.

I'm talking about defaults. I might mention that when gitops is written about, fluxcd is included in the list of gitops systems maybe a third of the time (exceptions include pieces written by outfits that are partnered in some way with fluxcd)

If you go to the issues for this and related Flux repos, you find many of them involve problems with controller stability. There are real code issues and design decisions that are still being worked out, but often the problem boils down to the controller dying in the middle of reconciliation.

Unfortunately, as the gutting of kube's built-in drivers continues, more of said drivers are being pushed to the node as daemonsets, the liklihood of Flux controllers dying from goat-level prioritization rises. The defaults pretty much guarantee instability during deployment (actually using fluxcd for it's defined purpose)

If the cluster operator must update a parameter on every default deployment, it is, by definition, a default which is what this issue concerns.

stefanprodan · 2023-04-15T14:16:48Z

if the resources of the underlying node come under even the slightest pressure and can result in flux objects (lookin' at you, helmRelease 😁) being left in an inconsistent state.

This is no longer the case, starting with Flux v0.41.0, helm-controller handles SIGTERM sent by kubelet and aborts all inflight Helm operations by setting the releases status to failed. When it restarts, those failed releases are retried as they are no longer left in an inconsistent state.

I would like to suggest bootstrap set controllers' priorityClass to system-node-critical

The Flux controllers are not static pods nor daemonset, IMO a more suitable class would be system-cluster-critical which is 2nd after system-node-critical, if things go wrong and Kubernetes must evict pods, I prefer to have Flux evicted before CoreDNS or CNIs. Also setting this priority class on all Flux controllers doesn't feel right to me, we could use it only for source/kustomize/helm controllers which are "critical" to cluster sync operations.

gladiatr72 · 2023-04-15T14:28:37Z

Wrt the helm controller, as of 0.41.1, it does not. Pretty sure that's why there is a current initiative to properly rollback to a reconcilable state when it fails.

I agree with you regarding using the system-cluster-critical class.

stefanprodan · 2023-04-15T14:36:46Z

Wrt the helm controller, as of 0.41.1, it does not.

What are you talking about? The SIGTERM handling was not reverted in 0.41.1, we shipped a cgroup fix in that release for the OOMWatcher, which is an opt-in feature.

gladiatr72 · 2023-04-15T16:19:20Z

I'll post links to the issue mentioned when not on mobile.

I do not think we're speaking of the same thing.

stefanprodan · 2023-04-15T17:35:52Z

In the case of pod eviction, due to node pressure or downscaling, before Flux v0.41.0, the inflight releases were left in a locked state. Once a release is locked, helm-controller can't perform any actions on it. Starting with v0.41, the controller reacts to SIGTERM (sent before eviction) and sets the inflight releases as failed, thus removing the lock. When helm-controller starts on a different node, it will retry the failed releases, if configured to do so.

gladiatr72 · 2023-04-19T16:14:40Z

@stefanprodan thanks. I meant to get back to this but the week got away from me.

I must admit to you: I was running the 41 cli against a 40 installation. I'll edit the issue for historical clarity. Thanks for the PR.

makkes · 2023-04-19T16:28:31Z

Re-opening as the PR implementing the actual feature hasn't been merged, yet.

adarhef · 2023-05-11T16:06:06Z

Hello!
I tried Flux for the first time today. The deployments for the kustomize helm and source controllers are blocked on GKE (notification controller deploys fine it seems) with the error: insufficient quota to match these scopes: [{PriorityClass In [system-node-critical system-cluster-critical]}]. This appears to prevent bootstrapping from succeeding.
From my understanding this is due to the fact that GKE won't allow these priority classes to be used by deployments without explicit ResourceQuota resource definitions or something similar.
It seems the related PR was merged into the release from two days ago, so I assume this causes my issue.
Does anyone have any thoughts on this? Didn't want to open a new issue to bring this up if it's just due to a misunderstanding on my part..

stefanprodan · 2023-05-12T13:17:37Z

@adarhef this has been fixed, thanks for reporting it. To install Flux on GKE, download v2.0.0-rc.3 run flux uninstall and then flux bootstrap. I've tested it with RC.3 on GKE v1.26.2-gke.1000 and works Ok.

adarhef · 2023-05-13T15:56:30Z

@adarhef this has been fixed, thanks for reporting it. To install Flux on GKE, download v2.0.0-rc.3 run flux uninstall and then flux bootstrap. I've tested it with RC.3 on GKE v1.26.2-gke.1000 and works Ok.

Thanks! I didn't get the chance to test your earlier suggestion to apply the ResourceQuota manually, but I did upgrade to rc3 and that issue went away. Things appear to be working fine.

ksemele · 2023-05-19T19:47:45Z

@adarhef this has been fixed, thanks for reporting it. To install Flux on GKE, download v2.0.0-rc.3 run flux uninstall and then flux bootstrap. I've tested it with RC.3 on GKE v1.26.2-gke.1000 and works Ok.

when flux uninstall not available (e.g. flux v0.41 on production env) - bootstrap via flux install will be correctly overwrite all resources of flux?

stefanprodan mentioned this issue Apr 16, 2023

Set priority class for the critical Flux components #3802

Merged

gladiatr72 closed this as completed Apr 19, 2023

makkes reopened this Apr 19, 2023

stefanprodan closed this as completed in #3802 May 2, 2023

stefanprodan mentioned this issue May 12, 2023

Add ResourceQuota for critical pods #3879

Merged

balonik mentioned this issue Nov 23, 2023

Lower PriorityClass for Flux controller Pods #4427

Closed

siegenthalerroger mentioned this issue Feb 16, 2024

Velero CLI install should allow defining a priorityClass vmware-tanzu/velero#7435

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FR: include `priorityClassName: system-node-critical` to controller deployments #3800

FR: include `priorityClassName: system-node-critical` to controller deployments #3800

gladiatr72 commented Apr 14, 2023 •

edited

Loading

makkes commented Apr 15, 2023

gladiatr72 commented Apr 15, 2023

stefanprodan commented Apr 15, 2023

gladiatr72 commented Apr 15, 2023

stefanprodan commented Apr 15, 2023

gladiatr72 commented Apr 15, 2023 •

edited

Loading

stefanprodan commented Apr 15, 2023

gladiatr72 commented Apr 19, 2023 •

edited

Loading

makkes commented Apr 19, 2023

adarhef commented May 11, 2023 •

edited

Loading

stefanprodan commented May 12, 2023

adarhef commented May 13, 2023

ksemele commented May 19, 2023

FR: include priorityClassName: system-node-critical to controller deployments #3800

FR: include priorityClassName: system-node-critical to controller deployments #3800

Comments

gladiatr72 commented Apr 14, 2023 • edited Loading

Describe the bug

Steps to reproduce

Expected behavior

Screenshots and recordings

OS / Distro

Flux version

Flux check

Git provider

Container Registry provider

Additional context

Code of Conduct

makkes commented Apr 15, 2023

gladiatr72 commented Apr 15, 2023

stefanprodan commented Apr 15, 2023

gladiatr72 commented Apr 15, 2023

stefanprodan commented Apr 15, 2023

gladiatr72 commented Apr 15, 2023 • edited Loading

stefanprodan commented Apr 15, 2023

gladiatr72 commented Apr 19, 2023 • edited Loading

makkes commented Apr 19, 2023

adarhef commented May 11, 2023 • edited Loading

stefanprodan commented May 12, 2023

adarhef commented May 13, 2023

ksemele commented May 19, 2023

FR: include `priorityClassName: system-node-critical` to controller deployments #3800

FR: include `priorityClassName: system-node-critical` to controller deployments #3800

gladiatr72 commented Apr 14, 2023 •

edited

Loading

gladiatr72 commented Apr 15, 2023 •

edited

Loading

gladiatr72 commented Apr 19, 2023 •

edited

Loading

adarhef commented May 11, 2023 •

edited

Loading