Autoscaling controller should not calculate desired replicas including out-of-scaleTargetRef RunnerDeployment #330

tapih · 2021-02-18T06:32:39Z

What happened:

I tried using HorizontalRunnerAutoscaler with multiple RunnerDeployment existing in the same namespace,
and found that the autoscaling controller does not work properly.

Install actions-runner-controller following the getting started
Create 2 RunnerDeployment, named example-runner-deployment and example-runner-deployment, with 2 replicas for each
Create HorizontalRunnerAutoscaler with maxReplicas: 4, scaleUpThreshold: '0.6' and scaleDownFactor: '1' for example-runner-deployment

Then, a reconciliation was triggered right after HorizontalRunnerAutoscaler was created and the autoscaling controller created 2 new Runners under example-runner-deployment, even though there was no Runner with the Busy status.

So, there were 4 Runners for example-runner-deployment and 2 Runners for example-runner-deployment-2 in the end.

What I expected to happen:

No Runners are created for example-runner-deployment.

I read this part of the code, and I'm guessing that
the autoscaling controller calculates the desired replicas for scaleTargetRef including all the runner existing in the same namespace.
In my case, the desired replicas is calculated like example-runner-deployment + example-runner-deployment-2 = 4 as
the log of the controller says that computed_replicas_desired is 4.

2021-02-18T05:50:25.010Z        DEBUG   controllers.HorizontalRunnerAutoscaler  Calculated desired replicas     {"computed_replicas_desired": 4, "spec_replicas_min": 1, "spec_replicas_max": 4, "current_replicas": 4, "num_runners": 6, "num_runners_busy": 0}

The controller should scale out Runners with filtering the targets using labels, but now it does not seem to manage Runners using label selectors like k8s ReplicaSet.
IMHO, we should fix the replicaset controller to manage Runners with label selectors first, and then fix the autoscaling controller to filter Runners with labels.

What do you think about it? If you think it's reasonable, I'm willing to fix it as I wrote.

Thank you!

Environment:

OS: ubuntu 20.04
k8s cluster: v1.19.7
k8s bootstrap: kind v0.10.0
actions-runner-controller: v0.16.1
actions-runner: v2.276.1

Manifests:

apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runner-deployment
  namespace: actions-runner-system
spec:
  replicas: 2
  template:
    spec:
      organization: <ORGNAME>
      labels:
        - custom-runner1
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: example-runner-deployment-2
  namespace: actions-runner-system
spec:
  replicas: 2
  template:
    spec:
      organization: <ORGNAME>
      labels:
        - custom-runner2
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: example-runner-deployment-autoscaler
  namespace: actions-runner-system
spec:
  scaleTargetRef:
    name: example-runner-deployment
  minReplicas: 1
  maxReplicas: 4
  metrics:
  - type: PercentageRunnersBusy
    scaleUpThreshold: '0.6'
    scaleDownThreshold: '0.1'
    scaleUpFactor: '1.5'
    scaleDownFactor: '1'
    repositoryNames:
    - <ORGNAME>/<REPONAME>

The text was updated successfully, but these errors were encountered:

mumoshu · 2021-02-18T06:53:04Z

@tapih Hey! Thanks for the report and the proposal. Yeah, that does make sense. Recently I had a chance to read https://github.com/summerwind/actions-runner-controller/blob/be133228166274c1b03716e511c1051199c760a8/controllers/autoscaling.go#L260-L262 as well and thought it did look problematic but I had no confidence as no one has mentioned this issue so far!

If you think it's reasonable, I'm willing to fix it as I wrote.

That would be very much appreciated ☺️

tapih · 2021-02-18T08:46:28Z

@mumoshu
Sure. I'm gonna start working on it soon. Thank you.

tapih mentioned this issue Feb 18, 2021

GitHub self-hosted runners on Neco cybozu-go/neco#1081

Closed

5 tasks

binoue mentioned this issue Feb 24, 2021

Fix the runner controller to manage Runner by labels cybozu-go/neco#1428

Closed

4 tasks

tapih mentioned this issue Feb 25, 2021

Manage runner with label #355

Merged

mumoshu closed this as completed in #355 Mar 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoscaling controller should not calculate desired replicas including out-of-scaleTargetRef RunnerDeployment #330

Autoscaling controller should not calculate desired replicas including out-of-scaleTargetRef RunnerDeployment #330

tapih commented Feb 18, 2021 •

edited

Loading

mumoshu commented Feb 18, 2021

tapih commented Feb 18, 2021

Autoscaling controller should not calculate desired replicas including out-of-scaleTargetRef RunnerDeployment #330

Autoscaling controller should not calculate desired replicas including out-of-scaleTargetRef RunnerDeployment #330

Comments

tapih commented Feb 18, 2021 • edited Loading

What happened:

What I expected to happen:

Environment:

Manifests:

mumoshu commented Feb 18, 2021

tapih commented Feb 18, 2021

tapih commented Feb 18, 2021 •

edited

Loading