-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apache Kafka Scaler: Implementation for Excluding Persistent Lag #3905
Conversation
Signed-off-by: Tobias Krause <[email protected]>
Signed-off-by: Tobias Krause <[email protected]>
Signed-off-by: Tobias Krause <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure about this feature, in general we try to avoid maintaining information about the execution inside the scaler (previousOffsets
). This is because scaler can be recreated when it's needed, and behaviours based on previous cycles information could work randomly.
WDYT @zroubalik ?
/run-e2e kafka* |
this test is failing:
|
That's a very good point @JorTurFer, even though I think that it should be okay for this specific feature. @JosephABC as @JorTurFer mentioned, the scaler could be recreated during it's lifetime, would it cause any problems? I think that it will reconcile. |
I'd love to see e2e test for this. |
We are going to release KEDA v2.9 on Thursday. Do you think you can complete the open work by Wednesday @JosephABC? |
@zroubalik If the scaler is recreated, I would expect the previousOffsets map to be recreated anew. So the recording of the previous offsets will start from a blank state. The downside is that the scaler will not be able to identify the partitions with persistent lag in the next reconciliation cycle and possibly trigger a scale out of the scaling target. In the following cycles, the scaler will observe the previousOffsets map and scale accordingly (excluding persistent lag if deemed necessary) @tomkerkhove I will certainly try to complete by Wednesday. If not, I will target to complete before the next KEDA release. |
@zroubalik e2e tests seems to be queued for quite a long time already. Is this normal? |
What do you mean? Only maintainers can trigger them. Do you mean this comment #3905 (comment) ? The e2e tests passed on this one (it is marked in emojis) |
And you can also see it in the latest commit checks when the execution was triggered |
The checks for the latest commit all passed, except for the e2e test which shows as "queued". Does that need to be runned and completed as well? |
we need to trigger them manually like here Could you add an e2e test to cover this new feature? We try to cover all the scalers with e2e tests, in this case, you could just add your feature as a test case here |
/run-e2e kafka* |
@tobiaskrause would you mind looking at this as well, since you are doing a Kafka PR in parallel? |
@JosephABC rebase please |
@zroubalik, don't see any interference with my PR |
) * Disable response compression for k8s restAPI in client-go Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Updating metrics server with the same parameters Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Adding the change to changelog Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Set default value to true for disable compression Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> * Changing default value to true in adapter Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> Signed-off-by: Chaitanya Kuduvalli Ramachandra <[email protected]> Co-authored-by: Chaitanya Kuduvalli Ramachandra <[email protected]>
Signed-off-by: Zbynek Roubalik <[email protected]> Signed-off-by: Zbynek Roubalik <[email protected]>
* Metrics Server: use vendored OpenAPI definitions custom-metrics-apiserver serves OpenAPI spec by default since version [v1.25.0] (cf [PR 110]). [v1.25.0]: https://github.com/kubernetes-sigs/custom-metrics-apiserver/releases/tag/v1.25.0 [PR 110]: kubernetes-sigs/custom-metrics-apiserver#110 In Keda Metrics Server, remove generation of `adapter/generated/openapi/zz_generated.openapi.go` and use OpenAPI definitions from custom-metrics-apiserver instead. Signed-off-by: Olivier Lemasle <[email protected]> * Update CHANGELOG.md Co-authored-by: Zbynek Roubalik <[email protected]> Signed-off-by: Olivier Lemasle <[email protected]> Signed-off-by: Olivier Lemasle <[email protected]> Signed-off-by: Olivier Lemasle <[email protected]> Co-authored-by: Zbynek Roubalik <[email protected]>
…d of the Event Hub itself. (kedacore#3924) Signed-off-by: Vighnesh Shenoy <[email protected]>
* chore: add `stale-bot-cant-touch-this` to stale bot ignores Signed-off-by: Jorge Turrado Ferrero <[email protected]> * update label Signed-off-by: Jorge Turrado Ferrero <[email protected]> Signed-off-by: Jorge Turrado Ferrero <[email protected]>
Signed-off-by: Zbynek Roubalik <[email protected]> Signed-off-by: Zbynek Roubalik <[email protected]>
Signed-off-by: Jorge Turrado <[email protected]>
Signed-off-by: Laszlo Kishalmi <[email protected]> Signed-off-by: Laszlo Kishalmi <[email protected]>
Signed-off-by: Zbynek Roubalik <[email protected]>
Signed-off-by: dkv <[email protected]>
…when clustered (kedacore#3564) Signed-off-by: Ray <[email protected]>
as the scenario covered in this line is for the case in which the partition reached maxInt and started from zero. the calc is done wrongly. Signed-off-by: Yoav Dobrin <[email protected]> Signed-off-by: Yoav Dobrin <[email protected]>
…re#3788) * Update stackdriver client to handle metrics of value type double Signed-off-by: Eric Takemoto <[email protected]> * move change log note to below general Signed-off-by: Eric Takemoto <[email protected]> * parse activation value as float64 Signed-off-by: Eric Takemoto <[email protected]> * change target value to float64 for GCP pub/sub and stackdriver Signed-off-by: Eric Takemoto <[email protected]> Signed-off-by: Eric Takemoto <[email protected]>
* Split CodeQL into a specific static analysers workflow Signed-off-by: Jorge Turrado <[email protected]> * update workflow Signed-off-by: Jorge Turrado <[email protected]> * remove schedule trigger Signed-off-by: Jorge Turrado <[email protected]> * fix typo Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>
* feat: add semgrep Signed-off-by: Jorge Turrado <[email protected]> * change trigger-type Signed-off-by: Jorge Turrado <[email protected]> * change trigger-type Signed-off-by: Jorge Turrado <[email protected]> * add new line Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>
…e#3959) * chore: remove vendor folder from semgrep scan Signed-off-by: Jorge Turrado <[email protected]> * fix the checkout Signed-off-by: Jorge Turrado <[email protected]> * add ne wline Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>
…ore#3960) * fix: add missing env var for gcp e2e Signed-off-by: Jorge Turrado <[email protected]> * add missing gh-cli for checking out Signed-off-by: Jorge Turrado <[email protected]> * update security page with semgrep Signed-off-by: Jorge Turrado <[email protected]> Signed-off-by: Jorge Turrado <[email protected]>
…edacore#3732) Co-authored-by: Tom Kerkhove <[email protected]> Co-authored-by: Zbynek Roubalik <[email protected]>
@JosephABC could you please rebase your PR to contain only relevant commits? Thanks! |
Summary
Add implementation for excluding consumer lag from partitions with persistent lag.
Use Case
In situations where consumer is unable to process / consume from partition due to errors etc., committed offset will not change, and consumer lag on that partition will be increasing and never be decreased. KEDA trigger scaling towards the maxReplicaCount.
If partition lag is deemed as persistent, excluding its consumer lag will allow KEDA to trigger scaling appropriately based on the consumer lag observed on other topics and partition, and not be affected by this consumer lag which will not be resolved by scaling.
Logic
Upon each polling cycle, check if current consumer offset is same as previous consumer offset.
Different: return endOffset - consumerOffset (No different from current implementation)
Same: return 0 (To exclude this partition's consumer lag from the total lag)
Checklist
Relates to #3904
Relates to kedacore/keda-docs#984