You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When managing a cluster with over 50,000 SidecarSets, restarting the kruise-manager poses a significant challenge. With the default settings of 3 reconciliation workers and a rate limiter set at 10 QPS, it can take over 30 minutes to resync all SidecarSets during startup. During this time, all deployments are effectively stalled because the reconciler is blocked.
Are there any solutions to prevent this issue? For example, could we implement a CreateFunc in predicate.Funcs to filter out resources based on their CreationTimestamp before the kruise-manager startup and only process partition non-empty resources.
This change would significantly reduce the startup time of kruise-manager in large clusters.
The text was updated successfully, but these errors were encountered:
Hello, @MichaelRren. The configuration must be accurate to avoid confusion for our users.
I believe we should establish best practices to guide users in setting it properly.
For your scenario, I recommend increasing the number of workers and enhancing the rate limiter. This adjustment should expedite the initial processing.
In 1.7, there is a patch about 'Optimizing Pod SidecarSet webhook and controller performance when lots of namespace scoped sidecarSet exists (#1547, @ls-2018)'. If your sidecarset is namepaced, this will also help you.
Thanks for your reply, @ABNER-1 . Unfortunately, the patch you suggested doesn’t suit our scenario because all the SidecarSets are within the same namespace in our clusters.
We have to adjust the number of workers and the rate limiter to cover this situation currently.
Hi, @MichaelRren
Would you like to share the process and results before and after your parameter tuning?
I believe this is an excellent blog that discusses best practices.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Why is this needed:
When managing a cluster with over 50,000 SidecarSets, restarting the kruise-manager poses a significant challenge. With the default settings of 3 reconciliation workers and a rate limiter set at 10 QPS, it can take over 30 minutes to resync all SidecarSets during startup. During this time, all deployments are effectively stalled because the reconciler is blocked.
Are there any solutions to prevent this issue? For example, could we implement a CreateFunc in predicate.Funcs to filter out resources based on their CreationTimestamp before the kruise-manager startup and only process partition non-empty resources.
This change would significantly reduce the startup time of kruise-manager in large clusters.
The text was updated successfully, but these errors were encountered: