Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: explain the differences between activation and scaling values #780

Merged
merged 13 commits into from
Jun 27, 2022
31 changes: 31 additions & 0 deletions content/docs/2.8/concepts/scaling-deployments.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,37 @@ metadata:

The presensce of this annotation will pause autoscaling no matter what number of replicas is provided. The above annotation will scale your current workload to 0 replicas and pause autoscaling. You can set the value of replicas for an object to be paused at to any arbitary number. To enable autoscaling again, simply remove the annotation from the `ScaledObject`definition.

### `Activating` and `Scaling`
tomkerkhove marked this conversation as resolved.
Show resolved Hide resolved

There are use-cases where the activating value (0-1 and 1-0) is totally different than 0, such as workloads scaled with the Prometheus scaler where the values go from -X to X. To give a consisten solution to this problem, KEDA has 2 different phases during the autoscaling process.
tomkerkhove marked this conversation as resolved.
Show resolved Hide resolved

- **Activation phase**
- **Scaling phase**

#### **Activating pahse**

The activating (or deactivating) phase is the moment when KEDA (operator) has to decide if the workload should be scaled from/to zero. KEDA is the responsible for this action based on the result of the scaler `IsActive` function and only applies in from/to 0 scaling.

#### **Scaling pahse**

The scaling phase is the moment when KEDA has decided to scale out to 1 instance and now is the HPA controller who takes the scaling decisions based on the configuration defined in the generated HPA (from ScaledObject data) and the metrics exposed by KEDA (metrics server). This phase applies the scaling from 1-N and N-1.
JorTurFer marked this conversation as resolved.
Show resolved Hide resolved

#### **Thresholds**
JorTurFer marked this conversation as resolved.
Show resolved Hide resolved

KEDA allows you to specify different values for each scenario:

- **Activation:** Defines when the scaler is active or not and scales from/to 0 based on it.
- **Scaling:** Defines the target value to scale the workload from 1 to _n_ instances and vice versa. To achieve this, KEDA passes the target value to the Horizontal Pod Autoscaler (HPA) and the built-in HPA controller will handle all the autoscaling.
tomkerkhove marked this conversation as resolved.
Show resolved Hide resolved

> ⚠️ **NOTE:** If the minimum replicas is >= 1, the scaler is always active and the activation value will be ignored.

Each scaler defines parameters for their use-cases, but the activation can always be the same as the scaling value, appended by the prefix `activation` (ie: `threshold` for scaling and `activationThreshold` for activation).

There are some important topics to take into account:

- Opposite to scaling value, the activation value is always optional and the default value is 0.
- The activation value has more priority than the scaling value in case of different decisions for each. ie: `threshold: 10` and `activationThreshold: 50`, in case of 40 messages the scaler is not active and it'll be scaled to zero even the HPA requires 4 instances.

## Long-running executions

One important consideration to make is how this pattern can work with long running executions. Imagine a deployment triggers on a RabbitMQ queue message. Each message takes 3 hours to process. It's possible that if many queue messages arrive, KEDA will help drive scaling out to many replicas - let's say 4. Now the HPA makes a decision to scale down from 4 replicas to 2. There is no way to control which of the 2 replicas get terminated to scale down. That means the HPA may attempt to terminate a replica that is 2.9 hours into processing a 3 hour queue message.
Expand Down