-
-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unhandled Exception on Error 429 - Too Many Requests #1011
Comments
We have a similar issue that this solution would also assist with. We need our operator to manage a lot of pre-existing resources in the cluster. Since there's no limit to the number of requests made, the first run of the operator causes 429s because it tries to annotate many objects in a very short time. |
@nolar Would it be possible to consider cutting a new release? Thanks in advance! |
Was somebody able to fix this error? Our operator (kopf 1.36.2) has to deal with several thousand resources as well on startup. From time to time the following lethal error occurs. After it happened the finalizers are stuck and the operator basically can't manage the resource type anymore.
|
@nolar What do you think of exposing #1011 (comment) as variable? Seems quite simple and it should be an easy fix to this issue |
To expose as a variable that is used where and how? Sorry, I didn't fully get the proposed solution. Can you please clarify? Having a limit variable (e.g. in settings) would require the centralized synchronization mechanism somewhere in memory, some kind of a semaphore with locks & timers. Kopf currently does not have anything of that kind, and I am not sure if this is a good idea to add such level of complexity (though doable — it is a relatively well-established pattern of a server-side rate limiter). The original problem with many objects in the cluster causing too many requests can be indirectly addressed by using the custom storages (there are 2 of them in settings: diffbase & progress storage — see docs). This can be anything from a key-val storage (Redis), relational database (Postgres), or even a shared filesystem — persistent enough to outlive the operator's pod. Out of the box, it is the objects' annotations — for simplicity — but annotations are not a requirement. UPD: I am talking about patching requests. The rate limiting on watchers cannot be solved with a storage, of course. But that is already fixed with backoff intervals in settings. |
Thanks for the rapid answer and quick support!
Thanks for the pointer! If I understand the code correctly, then Kopf does the following:
That means upon startup we have
Fully agree. Maybe my understanding has a flaw, let me try to explain:
That for me means:
Taking now all things together the solution for the above problem seems (again please correct my minimal understanding)
|
Kopf does 1 api request per resource kind, the "watching" request ( Kopf can also make a few initial requests — also via streaming — to scan the namespaces and CRDs available, in order to resolve the "globs" (masks) of namespaces & resources (if the masks are uncertain), such as All in all, the number of "reading" requests is very limited. The number of requests to storage is indeed high, but in case of AnnotationsStorage(s), this is in-memory request to get a value from a huge dict of the resource body, it does not go to the API. The number of "writing" API requests explodes usually when those objects are stored back to Kubernetes — via the PATCH requests, after their annotations were modified. This is why I propose an external storage: once Kopf has no need to patch the annotations, the "writing" requests will be gone too.
Oh, I see now. Thanks for clarifying. Yes, that can be made configurable. But keep in mind that it is the limit on the number of concurrent requests in any moment of time. If does not provide any timing limitations or priorities. As such, it can lead to 2 issues:
Maybe separating the APIContext into 2 contexts: one for watch-streams, another for patches, would be a good idea — this will solve the 1st problem. But the solution is nevertheless partial. What can be done, is exposing the APIContext subclassing to the developers somehow via settings, so that they can do whatever they want with the connection management. The only objection in the past was that the API client is a hidden internal of Kopf, and there were already 2 major changes of the API client in the past, and there could be more in the future (in theory). Exposing this class and binding to aiohttp will make such settings forward-incompatible. But since the active development of Kopf is not happening now (I have no time, and no ideas what to do), this might be the option. Here are my thoughts on this ;-) What do you think? Given a custom APIContext sub-class, will this help to implement API throttling? |
Ah, I see. So once we create the watcher we initially list all resources of the kind and then get all its data to fill the inmemory annotation storage.
I think that's fine. It will'll catch up eventually
Yep, for that we (as in the kopf user) should implement the backoff with jitter
That's even more than we need but definitely the most powerful solution 👍🏻 Only ask I'd have: Can we extract the |
Correct.
The rationale is this: Kopf is a Kubernetes-specialized tool, not an HTTP client tool. Adding too much of the HTTP-specific nuances is undesired. This use-case above is very specific and I'd say rare (opinionated). So, instead of adding this functionality to Kopf, some hook should be exposed, allowing the users of Kopf to do all the sophisticated trickery specific to their cases & environments. And this is somewhere around APIContext, plus-minus. So (thinking out loud), a few changes are needed:
Maybe, the whole It seems somewhat complex, but not complicated. I might take a look in the following weeks, but no promise (as I said, no time). |
Long story short
When a watch receives a 429 error (Too Many Requests) the watch fails rather than handling with backoff.
Kopf version
1.36.0
Kubernetes version
1.23.12.12
Python version
No response
Code
# Unable to reliably reproduce.
Logs
Additional information
I believe a 429 Too Many Requests response should be handled with error backoff similar to an APIServerError?
The text was updated successfully, but these errors were encountered: