helm: promote autoscaling to stable #7368

dimitarvdimitrov · 2024-02-12T19:50:51Z

#7282 added autoscaling to Helm as an experimental feature. This issue is about remaining work in order to promote autoscaling to stable.

Bugs

[mimir-distributed] kedaAutoscaling can set threshold to 0 #7473

Docs/Migration procedure

helm: autoscaling migration procedure #7367

Remote URL UX

Support a remote different from the metamonitoring setup feat: Adding global kedaAutoscaling section #7392
Take the additional headers and auth from metamonitoring setup. Currently basic auth and extra headers are ignored.
Default to X-Scope-OrgID: metamonitoring if the config is already sending metrics to the same Mimir installation (the same way that metamonitoring computes it)
Add validation that is same mimir cluster is used, then metamonitoring is also enabled

Helm-jsonnet diffing

Add helm-jsonnet diffing so that the autoscaling configs don't get out of sync when we change one and forget to change the other. This is a matter of enabling autoscaling on select components in these two files and then making sure there are no differences between the rendered manifests. Minor differences can still be ignored via kustomiztions like this one

Make dashbaords compatible with helm-deployed KEDA objects

Similar to the /operations/mixin changes in feat(helm): add keda autoscaling and fix dashboards #4687 - the matchers for keda objects should allow for helm-deployed objects
Autoscaling panels from mixin dashboards not working with helm chart deployment #9431

The text was updated successfully, but these errors were encountered:

beatkind · 2024-02-13T09:12:10Z

@dimitarvdimitrov if you want, you can assign me to both of these issues. #7368 & #7367

dimitarvdimitrov · 2024-02-13T09:47:35Z

thanks @beatkind 🙏 I'm not sure of the level of detail in these issues, so ask away if anything isn't 100% clear

beatkind · 2024-02-13T10:40:53Z

Some thoughts on

Support a remote different from the metamonitoring setup

#7282 (comment)

Basically drills down to have a global section for kedaAutoscaling:

kedaAutoscaling:
     prometheusAddress: http://... 
     customHeaders:
       {}
     pollingInterval: 10

QuentinBisson · 2024-04-04T13:49:23Z

@dimitarvdimitrov is there an issue to add hpa to the components that do not have it yet like the ingester?

dimitarvdimitrov · 2024-04-08T09:45:44Z

for ingesters I could only find an internal one unfortunately - grafana/mimir-squad#1410. I see that @jhalterman was last working on that. Jonathan is there a public issue for this work?

jhalterman · 2024-04-08T14:22:36Z

The issue you cited is the only one. There's nothing public yet.

ankense-cariad · 2024-12-12T19:50:48Z

@dimitarvdimitrov @jhalterman are there plans to add support for ingesters or will that remain out of scope for autoscaling in the helm chart with keda? The last comment in April indicated that there might be internal information on how to configure hpa for ingesters, but nothing has been made public, can a public example be published?

dimitarvdimitrov · 2024-12-24T11:08:55Z

I think @pr00se has been working on ingester autoscaling. Patryk, do you have any plans for bringing this to upstream jsonnet and helm chart?

ankense-cariad · 2025-01-14T15:35:14Z

@dimitarvdimitrov @pr00se The ideal scenario would be for the helm chart to support container lifecycle hooks so that it is possible to terminate ingester pods during scale events and they could exit the ring properly without getting stuck in an UNHEALTHY state. Something like:

lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "curl -X POST http://localhost:8080/shutdown"]

dimitarvdimitrov · 2025-01-20T09:25:23Z

the shutdown endpoint is not meant to be called on every pod lifecycle stop. It's mean to be called before the ingester is shut down for the last time. After calling POST /shutdown the ingester isn't expected to come back up. That's the reason the rollout-operator was created in the first place.

seanankenbruck · 2025-01-24T16:59:41Z

Thanks @dimitarvdimitrov, we had actually implemented an HPA and modified the ingester statefulset with a custom script to call all of the endpoints in the sequence described inside of the Scaling down ingesters section of the documentation. The lifecycle stop commands do not get executed and our ingesters end up in an unhealthy state.

I've read the rollout-operator documentation and in the section titled Scaling based on reference resource. This section suggests that autoscaling can be achieved using a combination of an HPA and the rollout-operator. However, I cannot find any useful documentation or examples in either the mimir or rollout-operator repos that describe how to use these two components in tandem to achieve the desired behavior.

Is it possible (and recommended by the community) to manage ingester scaling using a combination of an HPA and the rollout-operator?

dimitarvdimitrov · 2025-02-06T09:39:13Z

@seanankenbruck I can point you to the jsonent that we use to set up the rollout-operator and HPA. This is the setup necessary for the rollout-operator (most of it should be present in the rollout-operator helm chart). And this is the HPA autoscaling setup for the new kafka-based ingest storage (where ingesters are deployed slightly differently to before).

dimitarvdimitrov added help wanted Extra attention is needed helm labels Feb 12, 2024

dimitarvdimitrov mentioned this issue Feb 12, 2024

feat(helm): Adding KEDA autoscaling support #7282

Merged

4 tasks

dimitarvdimitrov assigned beatkind Feb 13, 2024

dimitarvdimitrov mentioned this issue Feb 13, 2024

Add support for HPA in mimir-distributed #3430

Closed

beatkind mentioned this issue Feb 15, 2024

feat: Adding global kedaAutoscaling section #7392

Merged

4 tasks

narqo mentioned this issue Apr 9, 2024

Native Kubernetes HPA in mimir-distributed #7846

Closed

narqo mentioned this issue Oct 15, 2024

Passing mimir endpont and auth details in keda autoscaling #9616

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

helm: promote autoscaling to stable #7368

helm: promote autoscaling to stable #7368

dimitarvdimitrov commented Feb 12, 2024 •

edited

Loading

beatkind commented Feb 13, 2024

dimitarvdimitrov commented Feb 13, 2024

beatkind commented Feb 13, 2024 •

edited

Loading

QuentinBisson commented Apr 4, 2024

dimitarvdimitrov commented Apr 8, 2024 •

edited

Loading

jhalterman commented Apr 8, 2024

ankense-cariad commented Dec 12, 2024

dimitarvdimitrov commented Dec 24, 2024

ankense-cariad commented Jan 14, 2025 •

edited

Loading

dimitarvdimitrov commented Jan 20, 2025

seanankenbruck commented Jan 24, 2025 •

edited

Loading

dimitarvdimitrov commented Feb 6, 2025

helm: promote autoscaling to stable #7368

helm: promote autoscaling to stable #7368

Comments

dimitarvdimitrov commented Feb 12, 2024 • edited Loading

Bugs

Docs/Migration procedure

Remote URL UX

Helm-jsonnet diffing

Make dashbaords compatible with helm-deployed KEDA objects

beatkind commented Feb 13, 2024

dimitarvdimitrov commented Feb 13, 2024

beatkind commented Feb 13, 2024 • edited Loading

QuentinBisson commented Apr 4, 2024

dimitarvdimitrov commented Apr 8, 2024 • edited Loading

jhalterman commented Apr 8, 2024

ankense-cariad commented Dec 12, 2024

dimitarvdimitrov commented Dec 24, 2024

ankense-cariad commented Jan 14, 2025 • edited Loading

dimitarvdimitrov commented Jan 20, 2025

seanankenbruck commented Jan 24, 2025 • edited Loading

dimitarvdimitrov commented Feb 6, 2025

dimitarvdimitrov commented Feb 12, 2024 •

edited

Loading

beatkind commented Feb 13, 2024 •

edited

Loading

dimitarvdimitrov commented Apr 8, 2024 •

edited

Loading

ankense-cariad commented Jan 14, 2025 •

edited

Loading

seanankenbruck commented Jan 24, 2025 •

edited

Loading