Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: add cli argument to modify controller workqueue ratelimiter #2186

Merged

Conversation

ImpSy
Copy link
Contributor

@ImpSy ImpSy commented Sep 23, 2024

Purpose of this PR

Add the capacity to customize the controller workqueue ratelimiter and remove a potential bug on high usage cluster

Proposed changes:

  • cli argument to modify the default controller-runtime workqueue ratelimiter
  • Keep the default value (from controller-runtime) in the code but set them in the chart

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

1- controler-runtime uses workqueue.DefaultControllerRateLimiter by default which has a rather low BucketRatelimiter

With only 10 QPS and 100 Bucket max size we easily reach those number, before migrating to controller-runtime the value was actually 50/500 (0dc641b#diff-915090b1cf1857dc448319dbbbc11699ba40e674744723a5772a4624a1c79282L59)

2- BucketRatelimiter does not have a max delay

We encounter a bug on a high usage cluster a few years ago where we discover that BucketRatelimiter rely on golang.org/x/time/rate Limiter.Reserve() function that is not upperBound:

  • If QPS > Bucket max size then the rateLimiter return rate.InfDuration

Being able to set a MaxDelay fix this

Checklist

Before submitting your PR, please review the following:

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

This changed has been implemented on our fork for almost 2 years -> spotinst#5

@ImpSy ImpSy force-pushed the customize-workqueue-ratelimiter branch from 08200aa to 251376f Compare September 23, 2024 14:48
@ImpSy ImpSy changed the title add cli argument to modify controller workqueue ratelimiter FEATURE: add cli argument to modify controller workqueue ratelimiter Sep 23, 2024
Copy link
Contributor

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution! The PR looks good to me, except pending resolution of the comment.

We’re planning to run some load tests in v2 with this change to evaluate if it improves performance and whether we actually need a queue-per-app solution.

Thanks, @ImpSy!

@ImpSy ImpSy force-pushed the customize-workqueue-ratelimiter branch from 251376f to 39d4a03 Compare September 28, 2024 19:50
@ImpSy
Copy link
Contributor Author

ImpSy commented Sep 28, 2024

@vara-bonthu @ChenYi015 Added the documentation to the charts like you wanted
And thanks for the review 🙏

Copy link
Contributor

@vara-bonthu vara-bonthu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

wait for another approval

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015, vara-bonthu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [ChenYi015,vara-bonthu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Contributor

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Sep 29, 2024
@google-oss-prow google-oss-prow bot merged commit d37a0e9 into kubeflow:master Sep 29, 2024
7 checks passed
ChenYi015 pushed a commit to ChenYi015/spark-operator that referenced this pull request Oct 9, 2024
…ubeflow#2186)

* add cli argument to modify controller workqueue ratelimiter

Signed-off-by: ImpSy <[email protected]>

* add cli argument to modify controller workqueue ratelimiter support to helm chart

Signed-off-by: ImpSy <[email protected]>

---------

Signed-off-by: ImpSy <[email protected]>
@ChenYi015 ChenYi015 mentioned this pull request Oct 9, 2024
ChenYi015 pushed a commit to ChenYi015/spark-operator that referenced this pull request Oct 10, 2024
…ubeflow#2186)

* add cli argument to modify controller workqueue ratelimiter

Signed-off-by: ImpSy <[email protected]>

* add cli argument to modify controller workqueue ratelimiter support to helm chart

Signed-off-by: ImpSy <[email protected]>

---------

Signed-off-by: ImpSy <[email protected]>
(cherry picked from commit d37a0e9)
Signed-off-by: Yi Chen <[email protected]>
google-oss-prow bot pushed a commit that referenced this pull request Oct 10, 2024
* FEATURE: add cli argument to modify controller workqueue ratelimiter (#2186)

* add cli argument to modify controller workqueue ratelimiter

Signed-off-by: ImpSy <[email protected]>

* add cli argument to modify controller workqueue ratelimiter support to helm chart

Signed-off-by: ImpSy <[email protected]>

---------

Signed-off-by: ImpSy <[email protected]>
(cherry picked from commit d37a0e9)
Signed-off-by: Yi Chen <[email protected]>

* Fix ingress capability discovery (#2201)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit 56b4974)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/aws/aws-sdk-go-v2 from 1.30.5 to 1.31.0 (#2207)

Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) from 1.30.5 to 1.31.0.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@v1.30.5...v1.31.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit faa0822)
Signed-off-by: Yi Chen <[email protected]>

* Bump golang.org/x/net from 0.28.0 to 0.29.0 (#2205)

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.28.0 to 0.29.0.
- [Commits](golang/net@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 6106178)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/docker/docker from 27.0.3+incompatible to 27.1.1+incompatible (#2125)

Bumps [github.com/docker/docker](https://github.com/docker/docker) from 27.0.3+incompatible to 27.1.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](moby/moby@v27.0.3...v27.1.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 316536f)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.58.3 to 1.63.3 (#2206)

Bumps [github.com/aws/aws-sdk-go-v2/service/s3](https://github.com/aws/aws-sdk-go-v2) from 1.58.3 to 1.63.3.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@service/s3/v1.58.3...service/s3/v1.63.3)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/s3
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 1972fb7)
Signed-off-by: Yi Chen <[email protected]>

* Update integration test workflow and add golangci lint check (#2197)

* Update integration test workflow

Signed-off-by: Yi Chen <[email protected]>

* Update golangci lint config

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 143b16e)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/aws/aws-sdk-go-v2 from 1.31.0 to 1.32.0 (#2229)

Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) from 1.31.0 to 1.32.0.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@v1.31.0...v1.32.0)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit a4dcfcb)
Signed-off-by: Yi Chen <[email protected]>

* Bump cloud.google.com/go/storage from 1.43.0 to 1.44.0 (#2228)

Bumps [cloud.google.com/go/storage](https://github.com/googleapis/google-cloud-go) from 1.43.0 to 1.44.0.
- [Release notes](https://github.com/googleapis/google-cloud-go/releases)
- [Changelog](https://github.com/googleapis/google-cloud-go/blob/main/CHANGES.md)
- [Commits](googleapis/google-cloud-go@pubsub/v1.43.0...spanner/v1.44.0)

---
updated-dependencies:
- dependency-name: cloud.google.com/go/storage
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 2542009)
Signed-off-by: Yi Chen <[email protected]>

* Bump manusa/actions-setup-minikube from 2.11.0 to 2.12.0 (#2226)

Bumps [manusa/actions-setup-minikube](https://github.com/manusa/actions-setup-minikube) from 2.11.0 to 2.12.0.
- [Release notes](https://github.com/manusa/actions-setup-minikube/releases)
- [Commits](manusa/actions-setup-minikube@v2.11.0...v2.12.0)

---
updated-dependencies:
- dependency-name: manusa/actions-setup-minikube
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 4358fd4)
Signed-off-by: Yi Chen <[email protected]>

* Bump golang.org/x/time from 0.6.0 to 0.7.0 (#2227)

Bumps [golang.org/x/time](https://github.com/golang/time) from 0.6.0 to 0.7.0.
- [Commits](golang/time@v0.6.0...v0.7.0)

---
updated-dependencies:
- dependency-name: golang.org/x/time
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 29ba4e7)
Signed-off-by: Yi Chen <[email protected]>

* fix: imagePullPolicy was ignored (#2222)

Signed-off-by: xuqingtan <[email protected]>
(cherry picked from commit 7fb14e6)
Signed-off-by: Yi Chen <[email protected]>

* fix: spark-submission failed due to lack of permission by user `spark` (#2223)

error: Exception in thread "main" java.io.FileNotFoundException: /home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-511288aa-ce7c-4a38-9c8e-4869b71c68fa-1.0.xml (No such file or directory)

Signed-off-by: xuqingtan <[email protected]>
(cherry picked from commit d07821b)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/aws/aws-sdk-go-v2/config from 1.27.33 to 1.27.42 (#2231)

Bumps [github.com/aws/aws-sdk-go-v2/config](https://github.com/aws/aws-sdk-go-v2) from 1.27.33 to 1.27.42.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](aws/aws-sdk-go-v2@config/v1.27.33...config/v1.27.42)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/config
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit 9be8dce)
Signed-off-by: Yi Chen <[email protected]>

* Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.4 (#2204)

Bumps [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang) from 1.19.1 to 1.20.4.
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/main/CHANGELOG.md)
- [Commits](prometheus/client_golang@v1.19.1...v1.20.4)

---
updated-dependencies:
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
(cherry picked from commit fe833fa)
Signed-off-by: Yi Chen <[email protected]>

* Remove `cap_net_bind_service` from image (#2216)

Signed-off-by: Jacob Salway <[email protected]>
(cherry picked from commit ac761ef)
Signed-off-by: Yi Chen <[email protected]>

* fix: webhook panics due to logging (#2232)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit 247e834)
Signed-off-by: Yi Chen <[email protected]>

* Add check for generating manifests and code (#2234)

Signed-off-by: Yi Chen <[email protected]>
(cherry picked from commit c75d99f)
Signed-off-by: Yi Chen <[email protected]>

* Spark Operator Official Release v2.0.2

Signed-off-by: Yi Chen <[email protected]>

---------

Signed-off-by: ImpSy <[email protected]>
Signed-off-by: Yi Chen <[email protected]>
Signed-off-by: Jacob Salway <[email protected]>
Signed-off-by: dependabot[bot] <[email protected]>
Signed-off-by: xuqingtan <[email protected]>
Co-authored-by: Sébastien Maintrot <[email protected]>
Co-authored-by: Jacob Salway <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nick Tan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants