-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
helm: autoscaling migration procedure #7367
Comments
@beatkind tagging you because I don't think I can assign you without you participating in the issue. Can you assign yourself by any chance? From the github docs
|
@dimitarvdimitrov And I need to actively write here :) to be participating - nope I am not able to assign myself, because I do not have any permissions inside the repo |
@dimitarvdimitrov thanks for reopening, this was simply a mistake :) - I will add some documentation with my next PR |
hey @beatkind @dimitarvdimitrov , i just tried to follow
|
how long after deploying the HPA did this happen? Was the HPA already working?
this looks like a problem with loadbalancing. The ring doesn't determine distributor load. Perhaps the reverse proxy in front of the distributor proxy didn't get the update quickly enough (or DNS was cached, etc.). What reverse proxy are you using - is it the nginx from the chart? Did this fix itself resolve eventually? |
|
I'd expect that doing it after the |
Good point, will validate it again with a test env |
Just wanted to leave a note that we also encountered our replicas scaling down to 1 when we set
|
#7282 added autoscaling to Helm as an experimental feature. This issue is about adding support in the helm chart for a smooth migration and adding documentation for the migration.
Why do we need a migration?
Migrating to a Mimir cluster with autoscaling requires a few intermediate steps to ensure that there are no disruptions to traffic. The major risk is that enabling autoscaling also removed the
replicas
field from Deployments. If KEDA/HPA hasn't started autoscaling the Deployment, then k8s interprets no replicas as meaning1
replica, which can cause an outage.Migration in a nutshel
distributor.kedaAutoscaling.preserveReplicas: true
field in the helm chart which doesn't delete the replicas field from the rendered manifests (feat(helm): AddingpreserveReplicas
option for kedaAutoscaling to preservereplicas
even if keda is enabled #7431)preserveReplicas: true
, deploy the chart.preserveReplicas
fromvalues.yaml
and deploy the chartInternal docs
I'm also pasing Grafana Labs-internal documentation that's specific to our deployment tooling with FluxCD. Perhaps it can be used by folks running FluxCD or as a starting point for proper docs:
remove_managed_replicas.sh
The text was updated successfully, but these errors were encountered: