Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3905

josephangbc · 2022-11-23T09:40:23Z

Summary

Add implementation for excluding consumer lag from partitions with persistent lag.

Use Case

In situations where consumer is unable to process / consume from partition due to errors etc., committed offset will not change, and consumer lag on that partition will be increasing and never be decreased. KEDA trigger scaling towards the maxReplicaCount.

If partition lag is deemed as persistent, excluding its consumer lag will allow KEDA to trigger scaling appropriately based on the consumer lag observed on other topics and partition, and not be affected by this consumer lag which will not be resolved by scaling.

Logic

Upon each polling cycle, check if current consumer offset is same as previous consumer offset.

Different: return endOffset - consumerOffset (No different from current implementation)
Same: return 0 (To exclude this partition's consumer lag from the total lag)

Checklist

Commits are signed with Developer Certificate of Origin (DCO - learn more)
Tests have been added
A PR is opened to update the documentation on (repo) (if applicable)
Changelog has been updated and is aligned with our changelog requirements

Relates to #3904
Relates to kedacore/keda-docs#984

Signed-off-by: Tobias Krause <[email protected]>

JorTurFer

I'm not really sure about this feature, in general we try to avoid maintaining information about the execution inside the scaler (previousOffsets). This is because scaler can be recreated when it's needed, and behaviours based on previous cycles information could work randomly.
WDYT @zroubalik ?

CHANGELOG.md

zroubalik · 2022-11-28T10:43:46Z

/run-e2e kafka*
Update: You can check the progress here

zroubalik · 2022-11-28T10:53:10Z

this test is failing:

=== RUN   TestGetBrokers
    kafka_scaler_test.go:168: Expected success but got error error parsing excludePersistentLag: strconv.ParseBool: parsing "notvalid": invalid syntax
    kafka_scaler_test.go:192: Expected success but got error error parsing excludePersistentLag: strconv.ParseBool: parsing "notvalid": invalid syntax

zroubalik · 2022-11-28T10:55:51Z

I'm not really sure about this feature, in general we try to avoid maintaining information about the execution inside the scaler (previousOffsets). This is because scaler can be recreated when it's needed, and behaviours based on previous cycles information could work randomly. WDYT @zroubalik ?

That's a very good point @JorTurFer, even though I think that it should be okay for this specific feature. @JosephABC as @JorTurFer mentioned, the scaler could be recreated during it's lifetime, would it cause any problems? I think that it will reconcile.

zroubalik · 2022-11-28T10:56:08Z

I'd love to see e2e test for this.

tomkerkhove · 2022-12-02T08:38:20Z

We are going to release KEDA v2.9 on Thursday. Do you think you can complete the open work by Wednesday @JosephABC?

josephangbc · 2022-12-02T11:51:26Z

@zroubalik If the scaler is recreated, I would expect the previousOffsets map to be recreated anew. So the recording of the previous offsets will start from a blank state. The downside is that the scaler will not be able to identify the partitions with persistent lag in the next reconciliation cycle and possibly trigger a scale out of the scaling target. In the following cycles, the scaler will observe the previousOffsets map and scale accordingly (excluding persistent lag if deemed necessary)

@tomkerkhove I will certainly try to complete by Wednesday. If not, I will target to complete before the next KEDA release.

josephangbc · 2022-12-03T04:37:44Z

@zroubalik e2e tests seems to be queued for quite a long time already. Is this normal?

zroubalik · 2022-12-03T12:27:43Z

@zroubalik e2e tests seems to be queued for quite a long time already. Is this normal?

What do you mean? Only maintainers can trigger them. Do you mean this comment #3905 (comment) ?

The e2e tests passed on this one (it is marked in emojis)

JorTurFer · 2022-12-03T14:55:38Z

The e2e tests passed on this one (it is marked in emojis)

And you can also see it in the latest commit checks when the execution was triggered

josephangbc · 2022-12-03T17:16:29Z

The checks for the latest commit all passed, except for the e2e test which shows as "queued". Does that need to be runned and completed as well?

JorTurFer · 2022-12-03T17:21:20Z

The checks for the latest commit all passed, except for the e2e test which shows as "queued". Does that need to be runned and completed as well?

we need to trigger them manually like here

Could you add an e2e test to cover this new feature? We try to cover all the scalers with e2e tests, in this case, you could just add your feature as a test case here

JorTurFer · 2022-12-04T21:24:49Z

/run-e2e kafka*
Update: You can check the progress here

pkg/scalers/kafka_scaler.go

zroubalik · 2022-12-05T17:52:23Z

@tobiaskrause would you mind looking at this as well, since you are doing a Kafka PR in parallel?

zroubalik · 2022-12-05T17:58:17Z

@JosephABC rebase please

tobiaskrause · 2022-12-06T07:42:47Z

@tobiaskrause would you mind looking at this as well, since you are doing a Kafka PR in parallel?

@zroubalik, don't see any interference with my PR

) * Disable response compression for k8s restAPI in client-go Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Updating metrics server with the same parameters Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Adding the change to changelog Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Set default value to true for disable compression Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Changing default value to true in adapter Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> Co-authored-by: Chaitanya Kuduvalli Ramachandra <[email protected]>

Signed-off-by: Zbynek Roubalik <[email protected]> Signed-off-by: Zbynek Roubalik <[email protected]>

* Metrics Server: use vendored OpenAPI definitions custom-metrics-apiserver serves OpenAPI spec by default since version [v1.25.0] (cf [PR 110]). [v1.25.0]: https://github.com/kubernetes-sigs/custom-metrics-apiserver/releases/tag/v1.25.0 [PR 110]: kubernetes-sigs/custom-metrics-apiserver#110 In Keda Metrics Server, remove generation of `adapter/generated/openapi/zz_generated.openapi.go` and use OpenAPI definitions from custom-metrics-apiserver instead. Signed-off-by: Olivier Lemasle <[email protected]> * Update CHANGELOG.md Co-authored-by: Zbynek Roubalik <[email protected]> Signed-off-by: Olivier Lemasle <[email protected]> Signed-off-by: Olivier Lemasle <[email protected]> Signed-off-by: Olivier Lemasle <[email protected]> Co-authored-by: Zbynek Roubalik <[email protected]>

…d of the Event Hub itself. (kedacore#3924) Signed-off-by: Vighnesh Shenoy <[email protected]>

…acore#3861) Fixes kedacore#3920 Fixes kedacore#3919

* chore: add `stale-bot-cant-touch-this` to stale bot ignores Signed-off-by: Jorge Turrado Ferrero <[email protected]> * update label Signed-off-by: Jorge Turrado Ferrero <[email protected]> Signed-off-by: Jorge Turrado Ferrero <[email protected]>

Signed-off-by: Zbynek Roubalik <[email protected]> Signed-off-by: Zbynek Roubalik <[email protected]>

Signed-off-by: Jorge Turrado <[email protected]>

Signed-off-by: Laszlo Kishalmi <[email protected]> Signed-off-by: Laszlo Kishalmi <[email protected]>

Signed-off-by: Zbynek Roubalik <[email protected]>

Signed-off-by: dkv <[email protected]>

…when clustered (kedacore#3564) Signed-off-by: Ray <[email protected]>

as the scenario covered in this line is for the case in which the partition reached maxInt and started from zero. the calc is done wrongly. Signed-off-by: Yoav Dobrin <[email protected]> Signed-off-by: Yoav Dobrin <[email protected]>

…re#3788) * Update stackdriver client to handle metrics of value type double Signed-off-by: Eric Takemoto <[email protected]> * move change log note to below general Signed-off-by: Eric Takemoto <[email protected]> * parse activation value as float64 Signed-off-by: Eric Takemoto <[email protected]> * change target value to float64 for GCP pub/sub and stackdriver Signed-off-by: Eric Takemoto <[email protected]> Signed-off-by: Eric Takemoto <[email protected]>

* Split CodeQL into a specific static analysers workflow Signed-off-by: Jorge Turrado <[email protected]> * update workflow Signed-off-by: Jorge Turrado <[email protected]> * remove schedule trigger Signed-off-by: Jorge Turrado <[email protected]> * fix typo Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>

* feat: add semgrep Signed-off-by: Jorge Turrado <[email protected]> * change trigger-type Signed-off-by: Jorge Turrado <[email protected]> * change trigger-type Signed-off-by: Jorge Turrado <[email protected]> * add new line Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>

Fixes kedacore#3897 Fixes kedacore#3898 Fixes kedacore#3899

…e#3959) * chore: remove vendor folder from semgrep scan Signed-off-by: Jorge Turrado <[email protected]> * fix the checkout Signed-off-by: Jorge Turrado <[email protected]> * add ne wline Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>

…ore#3960) * fix: add missing env var for gcp e2e Signed-off-by: Jorge Turrado <[email protected]> * add missing gh-cli for checking out Signed-off-by: Jorge Turrado <[email protected]> * update security page with semgrep Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>

…edacore#3732) Co-authored-by: Tom Kerkhove <[email protected]> Co-authored-by: Zbynek Roubalik <[email protected]>

Signed-off-by: JosephABC <[email protected]>

zroubalik · 2022-12-06T17:52:31Z

@JosephABC could you please rebase your PR to contain only relevant commits? Thanks!

tobiaskrause and others added 4 commits November 18, 2022 14:22

add partitionLimitation parameter to limit partitions to operate on

c0a56b8

Signed-off-by: Tobias Krause <[email protected]>

fixed typo

9935f3c

Signed-off-by: Tobias Krause <[email protected]>

add issue to CHANGELOG

9691387

Signed-off-by: Tobias Krause <[email protected]>

Merge branch 'main' into kafka-scaler-partition-limit

aff2ec9

josephangbc requested a review from a team as a code owner November 23, 2022 09:40

josephangbc mentioned this pull request Nov 23, 2022

Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3904

Closed

JorTurFer reviewed Nov 23, 2022

View reviewed changes

CHANGELOG.md Outdated Show resolved Hide resolved

tomkerkhove mentioned this pull request Nov 24, 2022

Add Documentation for excludePersistentLag kedacore/keda-docs#984

Merged

1 task

josephangbc force-pushed the main branch from 5a1218d to bfd976c Compare December 2, 2022 13:54

zroubalik reviewed Dec 5, 2022

View reviewed changes

pkg/scalers/kafka_scaler.go Outdated Show resolved Hide resolved

josephangbc force-pushed the main branch from 240cd06 to 07deed1 Compare December 6, 2022 01:05

josephangbc marked this pull request as draft December 6, 2022 01:09

josephangbc closed this Dec 6, 2022

josephangbc force-pushed the main branch from 07deed1 to 9dde386 Compare December 6, 2022 15:19

penghuazhou and others added 25 commits December 7, 2022 00:43

Do not keep alive http connection (kedacore#3878)

dad43c6

bump k8s deps to v0.25.4 & bump other deps (kedacore#3914)

5d4bedb

Signed-off-by: Zbynek Roubalik <[email protected]> Signed-off-by: Zbynek Roubalik <[email protected]>

chore: add GCP_RUN_IDENTITY_TESTS in e2e pr (kedacore#3915)

483d25f

feat: Support using connection strings for Event Hub namespace instea…

546e6db

…d of the Event Hub itself. (kedacore#3924) Signed-off-by: Vighnesh Shenoy <[email protected]>

Metrics Server: use gRPC connection to get metrics from Operator (ked…

18f3e7f

…acore#3861) Fixes kedacore#3920 Fixes kedacore#3919

grpc client: wait properly for establishing a connection (kedacore#3938)

2dd408f

Signed-off-by: Zbynek Roubalik <[email protected]> Signed-off-by: Zbynek Roubalik <[email protected]>

feat: add chaos e2e test (kedacore#3935)

16fbbe5

Signed-off-by: Jorge Turrado <[email protected]>

chore: bump deps (kedacore#3944)

5901daa

NewRelic scaler crashes on logging (kedacore#3946)

13c273b

Signed-off-by: Laszlo Kishalmi <[email protected]> Signed-off-by: Laszlo Kishalmi <[email protected]>

Added panel regarding ammount of HPA (kedacore#3934)

9c87cff

Metrics Server: use correct k8s config in Manager (kedacore#3955)

aa2666c

Signed-off-by: Zbynek Roubalik <[email protected]>

Kafka: Increase logging V-level (kedacore#3953)

25901b4

Signed-off-by: dkv <[email protected]>

docs(changelog): Provide off-the-shelf dashboard (kedacore#3951)

c925806

Add support for JetStream scaler to query the stream consumer leader …

249682d

…when clustered (kedacore#3564) Signed-off-by: Ray <[email protected]>

feat: add e2e tests for GCP with workload identity (kedacore#3916)

e922460

Fixes kedacore#3897 Fixes kedacore#3898 Fixes kedacore#3899

Eventhub Scaler: add new dapr checkpoint strategy to eventhub scaler (k…

7627da5

…edacore#3732) Co-authored-by: Tom Kerkhove <[email protected]> Co-authored-by: Zbynek Roubalik <[email protected]>

Merge branch 'kedacore:main' into main

f82217c

josephangbc reopened this Dec 6, 2022

josephangbc added 2 commits December 7, 2022 01:46

Add Implementation for ExcludePersistentLag

9d40a47

Signed-off-by: JosephABC <[email protected]>

Merge branch 'main' of github.com:JosephABC/keda

1256929

josephangbc closed this Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3905

Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3905

josephangbc commented Nov 23, 2022 •

edited

Loading

JorTurFer left a comment

zroubalik commented Nov 28, 2022 •

edited by github-actions bot

Loading

zroubalik commented Nov 28, 2022

zroubalik commented Nov 28, 2022

zroubalik commented Nov 28, 2022

tomkerkhove commented Dec 2, 2022

josephangbc commented Dec 2, 2022

josephangbc commented Dec 3, 2022

zroubalik commented Dec 3, 2022

JorTurFer commented Dec 3, 2022

josephangbc commented Dec 3, 2022

JorTurFer commented Dec 3, 2022

JorTurFer commented Dec 4, 2022 •

edited by github-actions bot

Loading

zroubalik commented Dec 5, 2022

zroubalik commented Dec 5, 2022

tobiaskrause commented Dec 6, 2022

zroubalik commented Dec 6, 2022

Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3905

Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3905

Conversation

josephangbc commented Nov 23, 2022 • edited Loading

Summary

Use Case

Logic

Checklist

JorTurFer left a comment

Choose a reason for hiding this comment

zroubalik commented Nov 28, 2022 • edited by github-actions bot Loading

zroubalik commented Nov 28, 2022

zroubalik commented Nov 28, 2022

zroubalik commented Nov 28, 2022

tomkerkhove commented Dec 2, 2022

josephangbc commented Dec 2, 2022

josephangbc commented Dec 3, 2022

zroubalik commented Dec 3, 2022

JorTurFer commented Dec 3, 2022

josephangbc commented Dec 3, 2022

JorTurFer commented Dec 3, 2022

JorTurFer commented Dec 4, 2022 • edited by github-actions bot Loading

zroubalik commented Dec 5, 2022

zroubalik commented Dec 5, 2022

tobiaskrause commented Dec 6, 2022

zroubalik commented Dec 6, 2022

josephangbc commented Nov 23, 2022 •

edited

Loading

zroubalik commented Nov 28, 2022 •

edited by github-actions bot

Loading

JorTurFer commented Dec 4, 2022 •

edited by github-actions bot

Loading