Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TO-CC race condition issue happening during throttled rebalances #10241

Closed
wants to merge 2 commits into from

Conversation

fvaleri
Copy link
Contributor

@fvaleri fvaleri commented Jun 19, 2024

This change fixes the issue caused by the TO-CC race condition during throttled rebalances. The TO reverts dynamic throttled configs set by CC at the topic level because they don't match with the KT spec.

A private constant that contains the two throttle config properties is added to the BatchingTopiController, along with the following logic:

  • If Cruise Control is enabled WITHOUT .spec.config throttle props, we skip any throttle alter config operation without warnings
  • If Cruise Control is enabled WITH .spec.config throttle props, we do not skip the throttle alter config operations, but we raise a warning (can't be an error because there is no race when running rebalances without throttling, which is the default)
  • If Cruise Control is NOT enabled, we reconcile as usual without warnings

The interaction between ALTERABLE_TOPIC_CONFIG and THROTTLING_CONFIG is covered by the new unit tests.

Fixes #9972.

@fvaleri fvaleri requested a review from tombentley June 19, 2024 16:51
@fvaleri fvaleri added this to the 0.42.0 milestone Jun 19, 2024
@fvaleri fvaleri requested a review from ppatierno June 19, 2024 16:51
@scholzj
Copy link
Member

scholzj commented Jun 19, 2024

Isn't there another PR for this? #10182? Also, it would be great if you could bother and write some description about what the PR does and how!

@fvaleri
Copy link
Contributor Author

fvaleri commented Jun 19, 2024

Isn't there another PR for this?

The bug has been opened for 2 months, and it's 3 weeks I haven't heard from the contributor after multiple pings. Additionally, in the last community meeting we agreed to ping the contributor one more time and wait an additional week, which has passed.

it would be great if you could bother and write some description about what the PR does and how!

Description is here, but I can also add in the description if you prefer.

@scholzj
Copy link
Member

scholzj commented Jun 19, 2024

Isn't there another PR for this?

The bug has been opened for 2 months, and it's 3 weeks I haven't heard from the contributor after multiple pings. Additionally, in the last community meeting we agreed to ping the contributor one more time and wait an additinal week, which has passed.

Then maybe some maintainer should deal with the original PR first, before you open another one.

it would be great if you could bother and write some description about what the PR does and how!

Description is in the linked issue, but I can also add here if you prefer.

Yes, please update the description! Each PR should have a description explaining what it does and how. It should be something to do automatically, you have been asked again and again by different people to do it. And it would be great if you could start using the provided template. There is a reason why the template is there and it should be followed.

@ppatierno
Copy link
Member

Then maybe some maintainer should deal with the original PR first, before you open another one.

@scholzj that could have been my fault. As @fvaleri said we agreed to ping the contributor one more time during the last community call and it was something already done by Federico a couple of times. I just said him to open the PR but didn't think to ping (again!??) the contributor. I thought it was already enough because he didn't come back to us since 2-3 weeks.

This change fixes the issue caused by the TO-CC race condition happening during throttled rebalances. The TO reverts dynamic throttled configs set by CC at the topic level because they don't match with the KT spec.

A private constant that contains the two throttle config properties is added to the BatchingTopiController, along with the following logic:

- If Cruise Control is enabled WITHOUT .spec.config throttle props, we skip any throttle alter config operation without warnings
- If Cruise Control is enabled WITH .spec.config throttle props, we do not skip the throttle alter config operations, but we raise a warning (can't be an error because there is no race when running rebalances without throttling, which is the default)
- If Cruise Control is NOT enabled, we reconcile as usual without warnings

The interaction between ALTERABLE_TOPIC_CONFIG and THROTTLING_CONFIG is covered by the new unit tests.

Fixes strimzi#9972.

Signed-off-by: Federico Valeri <[email protected]>
@fvaleri
Copy link
Contributor Author

fvaleri commented Jun 25, 2024

@strimzi/maintainers can you start regression tests? Thanks

@scholzj
Copy link
Member

scholzj commented Jun 25, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fvaleri
Copy link
Contributor Author

fvaleri commented Jun 26, 2024

@tombentley should be good now.

@scholzj scholzj modified the milestones: 0.42.0, 0.43.0 Jul 1, 2024
@fvaleri fvaleri marked this pull request as draft July 8, 2024 17:21
@fvaleri
Copy link
Contributor Author

fvaleri commented Jul 12, 2024

I will open a new PR based on some refactoring we are doing. The fix is currently hosted in #10248.

@fvaleri fvaleri closed this Jul 12, 2024
@fvaleri fvaleri deleted the to-cc-race-fix branch July 12, 2024 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: TopicOperator removes throttled.replicas topic configuration of throttled reassignment/rebalance
4 participants