You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and watching the logs I see that resume event is keeping firing at regular intervals (roughly once per minute) even as there was no 'initial' resources in the first place:
Upon logging of of all incoming kwargs to handle_resume one can see that the only different kwargs are started and runtime, everything else is the same.
Restarting the operator (so that 'initial' resource listing is not empty) does not change the behaviour.
One is directly related to the suggested PR with proper usage of resourceVersion as long as possible (until "410 Gone").
Another one solves this problem completely by introducing an in-memory per-uid memos, so it prevents the resumes even when the resourceVersion is "410 Gone", and a re-listing is needed anyway.
This is ready for couple weeks already. I'll try to speed up the merges now.
The mentioned branches are merged and pre-released as kopf==0.23rc1. Will be normally released as kopf==0.23 in few days since now (after some live testing).
Beware: it also includes massive refactoring and internal changes, which can break things (though should not). See the Kopf 0.23rc1 Release Notes.
If the issue reappears with kopf>=0.23rc1, feel free to re-open this issue with a comment.
Commented by pshchelo at 2019-11-18 12:32:41+00:00
when I create 3 kopfexample resources, shutdown controller, edit 2nd, create 4th and delete 3rd resource and start controller again, I see that for the second resource first the 'update' handler is invoked, and then 'resume' handler with event==update.
Is it expected?
Commented by pshchelo at 2019-11-18 15:43:07+00:00
nolar, however, if I roll all (except delete) handlers into one (decorate the same handler function as create, update and resume) that second (spurious?) 'resume handles update event' is not happening (handler de-duplication at works?).
Yes. The handlers are deduplicated intentionally for the case of multiple decorators for one function. This is for the case of resume+create handlers: either an operator is created, or an object is created:
If you need them separately, then these should be separate functions, which internally call a regular function to do something.
Regarding the order:
Yes, the handlers are executed in the order they are registered, and this is intentional.
Though, actually, this is not fully true. The proper answer is: They are attempted to be executed in the order registered, but if there are retries, the 2nd and following attempts of the 1st handler will be after the 1st attempts of the 2nd and other handlers.
This is controlled by kopf.reactor.lifecycles internally. You can define your own lifecycle function and set it as a default if needed. They are exposed as a public interface of kopf (via import kopf; kopf.set_default_lifecycle(kopf.lifecycles.shuffled), etc).
Regarding reason==UPDATE:
Yes, it is intentional, though discussable.
As it is implemented now, the resuming handlers are mixed into the selection of handlers to be executed, no matter the reason of this execution. As such, the reason can be UPDATE, so both on-update & on-resume handlers are executed.
The RESUME reason is an artificial pseudo-reason for the cases when the regular reasons are not applicable (nothing is created, nothing is updated, nothing is deleted), but we still need to continue with the resuming handlers (because it is there).
The current naming is a bit confusing though. I thought about renaming the RESUME reason to NOTICE, as in "I do not see anything changed, but here it is, exists out there, and I see it for the first time" — this would semantically separate the "resuming" handlers from the "noticing"/"updating"/etc reasons of these handlers being triggered (unlike the case of creation/update/deletion reasons & handlers, where the mapping is obvious and straightforward).
It was originally done so to keep the logic of resuming separated from the logic of creation/updating — in the assumption that resuming usually implies a background task/thread spawning.
I would be happy to understand other use-cases of resuming except as for task/thread spawning. Can you please explain how do you intend to use the reason kwarg (ex-event)? What is an expected logic?
Commented by pshchelo at 2019-11-18 17:34:04+00:00
I don't think the naming is confusing, and as you describe 'notice' - is is 'create' semantically (seeing for the first time part).
I would expect the 'resume' handler to process only resources that have not changed in between controller shutdown and start. Everything else should be covered by other handlers - if something changed while controller was gone - on.update, if created new - on.create, if deleted - on.delete.
Our main use case for 'resume' is actually handling controller updates/upgrades - usually controller creates some other 'child' resources in k8s, and the way it creates those may (and will?) change from one version to the next, and we obviously want the result of controller actions to be the same regardless of if one deployed a new one for the first time, or upgraded from the older version.
For that we aim to use the 'resume' handler to catch CRs that were not changed while controller is restarting, and re-render templates for / patch all existing child resources of that CR (as created by previous controller version) to bring those to the state how new controller version would've created those from scratch.
To reduce any possible duplicate heavy-lifting (like external API calls) it would be nice if double-appearing events were not happening.
However at the moment we use single function for create/update/resume (kind of like kubectl apply), so we shouldn't be affected thanks to handler de-duplication (and we also can filter on reason if we have to).
Also currently we use single instance of controller, so it would be interesting to test if 'resume' fires up on controller unfreeze as part of rolling update of a controller deployed like kind: Deployment.
My first question is: why an on-resume handler is not enough, without mixing it with on-update/-on-create decorators? That would cover all the CRs, both changed and unmodified.
But if the goal is to catch all the "unmodified only" CRs, this will be a problem indeed.
This brings me to an idea that another handler should be introduced (@kopf.on.notice()?) for this purpose — to keep aligned with the previous semantics of the on-resume handlers as they were intended originally (but were broken all this time before 0.23rc1). Or maybe the resuming semantics should be narrowed only to the unmodified objects only before I do the 0.23 release with the fixes (which means: quickly).
On the other hand, both the docs, the examples, and even our own use-cases always rely on @kopf.on.create + @kopf.on.resume pairs for the same function — which implies that they are separate, non-overlapping reasons, as you explained in your use-case.
This may be also in relation with the @kopf.daemon() and @kopf.on.timer() decorators (#19), which are currently in progress: to have tasks/threads easily spawned without explicit orchestration (drafted and were functional for some time before I broke it), which should replace @kopf.on.resume as a mean of task spawning for all the CRs, and thus removing the need to have anything to react to the object existence in any state ("noticing" the existence), but only in the unmodified state.
I need some time to process this information and to simulate the use-cases. Especially those that could at all exist in the previous broken state in kopf<=0.22.
However, if the object was changed (e.g. its status) before the crash or restart, the UPDATE cause is detected, and the @kopf.on.update handlers are called. The @kopf.on.resume handlers are NOT called in case the object was changed — this breaks the intuitive expectations and semantics of a background thread starter (the original purpose of this handler).
Starting the threads from the UPDATE/DELETE handlers is not desired, as it means we have to keep track of the thread pool state per object — extra hassle.
The RESUME cause has to be handled even if the object has changed (but maybe not if deleted) — before or after the UPDATE handler in the same handling cycle.
Which means, the idea of mixing them in was not only intentional, but conscious — after previously trying an alternative solution with the "unmodified only" reason & handlers.
So, I will keep @kopf.on.resume() handlers as they are — mixed in for any regular reasons (CREATE/UPDATE/DELETE) on the initial listing — as this indeed was the original intention since kopf==0.16.
But I admit that one use-case is missing now and has no workarounds —the one described above— the "unmodified only" handlers.
I am going to release 0.23 without this feature. This feature will be added a bit later (maybe in 0.24). Naming suggestions are welcome. @kopf.on.notice()? @kopf.on.recall()? @kopf.on.existence()? Anything else?
I've created #241 for this type of handlers. It seems easy to implement, as all the moving parts are already there, but have no name tag on them.
The text was updated successfully, but these errors were encountered:
Expected Behavior
resume event is handled only once on actual operator start
Actual Behavior
resume event keeps firing at regular intervals
Steps to Reproduce the Problem
Using kopf/master and the following bare-minimal operator handling standard kopfexample resource:
and creating a bare-minimal kopfexample:
and watching the logs I see that resume event is keeping firing at regular intervals (roughly once per minute) even as there was no 'initial' resources in the first place:
Upon logging of of all incoming kwargs to handle_resume one can see that the only different kwargs are
started
andruntime
, everything else is the same.Restarting the operator (so that 'initial' resource listing is not empty) does not change the behaviour.
Source code for the repro (with logging of all kwargs) can be found at
https://github.com/pshchelo/kopf-debug/tree/893ff6312251c51cb8a39b9e3a62c2e10389989c
and running
tox -eresume-master
from cloned repo.Specifications
aiohttp==3.6.2
aiojobs==0.2.2
async-timeout==3.0.1
attrs==19.3.0
certifi==2019.9.11
chardet==3.0.4
Click==7.0
idna==2.8
iso8601==0.1.12
kopf==0.23.dev102+gbab7295
multidict==4.5.2
pip==19.3.1
pykube-ng==19.10.0
PyYAML==5.1.2
requests==2.22.0
setuptools==41.4.0
typing-extensions==3.7.4.1
urllib3==1.25.7
wheel==0.33.6
yarl==1.3.0
I believe this has to do with 'initial' resource listing being done in 'streaming_watch' method.
Will push a PR soon.
Thanks for reporting.
This issue is solved in two look-ahead branches (with tests):
One is directly related to the suggested PR with proper usage of resourceVersion as long as possible (until "410 Gone").
Another one solves this problem completely by introducing an in-memory per-uid memos, so it prevents the resumes even when the resourceVersion is "410 Gone", and a re-listing is needed anyway.
This is ready for couple weeks already. I'll try to speed up the merges now.
The mentioned branches are merged and pre-released as
kopf==0.23rc1
. Will be normally released askopf==0.23
in few days since now (after some live testing).Beware: it also includes massive refactoring and internal changes, which can break things (though should not). See the Kopf 0.23rc1 Release Notes.
If the issue reappears with
kopf>=0.23rc1
, feel free to re-open this issue with a comment.nolar testing with master currently, with repro controller as in
https://github.com/pshchelo/kopf-debug/blob/96fbdc20e38b91db214e0d174c0343a2a87aff81/resume.py
when I create 3
kopfexample
resources, shutdown controller, edit 2nd, create 4th and delete 3rd resource and start controller again, I see that for the second resource first the 'update' handler is invoked, and then 'resume' handler with event==update.Is it expected?
nolar, however, if I roll all (except delete) handlers into one (decorate the same handler function as create, update and resume) that second (spurious?) 'resume handles update event' is not happening (handler de-duplication at works?).
Regarding the deduplication:
Yes. The handlers are deduplicated intentionally for the case of multiple decorators for one function. This is for the case of resume+create handlers: either an operator is created, or an object is created:
If you need them separately, then these should be separate functions, which internally call a regular function to do something.
Regarding the order:
Yes, the handlers are executed in the order they are registered, and this is intentional.
Though, actually, this is not fully true. The proper answer is: They are attempted to be executed in the order registered, but if there are retries, the 2nd and following attempts of the 1st handler will be after the 1st attempts of the 2nd and other handlers.
This is controlled by
kopf.reactor.lifecycles
internally. You can define your own lifecycle function and set it as a default if needed. They are exposed as a public interface ofkopf
(viaimport kopf; kopf.set_default_lifecycle(kopf.lifecycles.shuffled)
, etc).Regarding
reason==UPDATE
:Yes, it is intentional, though discussable.
As it is implemented now, the resuming handlers are mixed into the selection of handlers to be executed, no matter the reason of this execution. As such, the reason can be
UPDATE
, so both on-update & on-resume handlers are executed.The
RESUME
reason is an artificial pseudo-reason for the cases when the regular reasons are not applicable (nothing is created, nothing is updated, nothing is deleted), but we still need to continue with the resuming handlers (because it is there).The current naming is a bit confusing though. I thought about renaming the
RESUME
reason toNOTICE
, as in "I do not see anything changed, but here it is, exists out there, and I see it for the first time" — this would semantically separate the "resuming" handlers from the "noticing"/"updating"/etc reasons of these handlers being triggered (unlike the case of creation/update/deletion reasons & handlers, where the mapping is obvious and straightforward).It was originally done so to keep the logic of resuming separated from the logic of creation/updating — in the assumption that resuming usually implies a background task/thread spawning.
I would be happy to understand other use-cases of resuming except as for task/thread spawning. Can you please explain how do you intend to use the
reason
kwarg (ex-event
)? What is an expected logic?I don't think the naming is confusing, and as you describe 'notice' - is is 'create' semantically (seeing for the first time part).
I would expect the 'resume' handler to process only resources that have not changed in between controller shutdown and start. Everything else should be covered by other handlers - if something changed while controller was gone - on.update, if created new - on.create, if deleted - on.delete.
Our main use case for 'resume' is actually handling controller updates/upgrades - usually controller creates some other 'child' resources in k8s, and the way it creates those may (and will?) change from one version to the next, and we obviously want the result of controller actions to be the same regardless of if one deployed a new one for the first time, or upgraded from the older version.
For that we aim to use the 'resume' handler to catch CRs that were not changed while controller is restarting, and re-render templates for / patch all existing child resources of that CR (as created by previous controller version) to bring those to the state how new controller version would've created those from scratch.
To reduce any possible duplicate heavy-lifting (like external API calls) it would be nice if double-appearing events were not happening.
However at the moment we use single function for create/update/resume (kind of like
kubectl apply
), so we shouldn't be affected thanks to handler de-duplication (and we also can filter on reason if we have to).Also currently we use single instance of controller, so it would be interesting to test if 'resume' fires up on controller unfreeze as part of rolling update of a controller deployed like
kind: Deployment
.Thanks. Just few thoughts to write down:
My first question is: why an on-resume handler is not enough, without mixing it with on-update/-on-create decorators? That would cover all the CRs, both changed and unmodified.
But if the goal is to catch all the "unmodified only" CRs, this will be a problem indeed.
This brings me to an idea that another handler should be introduced (
@kopf.on.notice()
?) for this purpose — to keep aligned with the previous semantics of the on-resume handlers as they were intended originally (but were broken all this time before 0.23rc1). Or maybe the resuming semantics should be narrowed only to the unmodified objects only before I do the 0.23 release with the fixes (which means: quickly).On the other hand, both the docs, the examples, and even our own use-cases always rely on
@kopf.on.create
+@kopf.on.resume
pairs for the same function — which implies that they are separate, non-overlapping reasons, as you explained in your use-case.This may be also in relation with the
@kopf.daemon()
and@kopf.on.timer()
decorators (#19), which are currently in progress: to have tasks/threads easily spawned without explicit orchestration (drafted and were functional for some time before I broke it), which should replace@kopf.on.resume
as a mean of task spawning for all the CRs, and thus removing the need to have anything to react to the object existence in any state ("noticing" the existence), but only in the unmodified state.I need some time to process this information and to simulate the use-cases. Especially those that could at all exist in the previous broken state in
kopf<=0.22
.I've re-read some PRs from the past — it seems, this dilemma of resuming handlers' semantic was already there when resuming was introduced:
Specifically, this case:
Which means, the idea of mixing them in was not only intentional, but conscious — after previously trying an alternative solution with the "unmodified only" reason & handlers.
So, I will keep
@kopf.on.resume()
handlers as they are — mixed in for any regular reasons (CREATE/UPDATE/DELETE) on the initial listing — as this indeed was the original intention sincekopf==0.16
.But I admit that one use-case is missing now and has no workarounds —the one described above— the "unmodified only" handlers.
I am going to release 0.23 without this feature. This feature will be added a bit later (maybe in 0.24). Naming suggestions are welcome.
@kopf.on.notice()
?@kopf.on.recall()
?@kopf.on.existence()
? Anything else?I've created #241 for this type of handlers. It seems easy to implement, as all the moving parts are already there, but have no name tag on them.
The text was updated successfully, but these errors were encountered: