Skip to content
This repository has been archived by the owner on Nov 21, 2023. It is now read-only.

Fix/autoscaling #37

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Fix/autoscaling #37

wants to merge 2 commits into from

Conversation

sharkinsspatial
Copy link
Contributor

What I am changing

Removed cluster downscaling configuration which was preventing AKS autoscaling. Downscaling should be managed directly via dask-kubernetes adaptive implementation.

How I did it

Removed this configuration from the cluster terraform configuration.

How you can test it

@tracetechnical
Copy link
Contributor

I don't believe that the dask-kubernetes and AKS autoscalers are linked. This configuration specifically scales down when there are no workloads and is unrelated to Dask.

@tracetechnical
Copy link
Contributor

We eventually should still have this functionality at K8S level to save costs, otherwise the cluster will stay with nodes spun up until the default low-workload threshold time is hit.

@sharkinsspatial
Copy link
Contributor Author

@tracetechnical There are several points to unpack here, the first is that us configuring down scaling at the cluster level appears to cause a race/contention issue with worker pod creation initiated by https://github.com/dask/distributed/blob/main/distributed/deploy/adaptive_core.py. The KubeCluster is scaling worker pods using distributed's adaptive heuristics but downscaling at the cluster level can result in new nodes being removed removed before worker pods are placed. Theoretically we could increase the downscaling time range but given the variability in AKS's autoscaling node launch that might still be problematic.

The non-Dask workloads running in our cluster (prefect-agent, flow-runner, loki, grafana) are less likely to require resource scaling. As we are running multiple workloads in this cluster does it make sense for us to use an annotation to prevent Dask resources from being autoscaled down? One issue with this may be the case where the scheduler pod is killed prematurely resulting in orphaned worker pods that are no longer subject to be autoscaled down. Maybe the best approach is to maintain scale_down_unneeded with a very large interval?

@tracetechnical
Copy link
Contributor

tracetechnical commented Dec 9, 2021

@sharkinsspatial Scrub my previous comment RE: the workings of the autoscaler (now deleted). I think your idea RE: scaledown un-needed may be covered by the default times in the autoscaler, but we would need to verify this against the standard autoscaler profile And I imagine that the cost savings delivered by non-standard autoscaler profiles would be far less than the annoyance factor of mystery dissapearances based on your examples above.

The above, coupled with the fact that these params seem to break the autoscaler, point toward this customisation which is removed in this PR being worthy of removal.

@rabernat
Copy link

rabernat commented Dec 9, 2021

I thought I would chime in on this based on our experience running Dask clusters in Pangeo.

In the early days, we ran Dask Kubernetes on our Pangeo Cloud GKE cluster with autoscaling node pools. Dask's autoscaling requests for more pods triggered GKE to scale up and down accordingly. It seemed to work well. GKE's timescale was a lot slower than Dask's--if the timescales were comparable, I imagine you could get weird behavior (oscillations for example).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants