Stop daemons & timers while the operator is on peering freeze #675

nolar · 2021-02-08T23:54:30Z

When the operator goes to peering freeze, all the watch streams disconnect from the API and wait until the operator is resumed.

However, this mode was not passed to the daemons & timers subsystem, so they continued running — while not getting any updates on the resource's state from the cluster (because the watch-streams are frozen).

This fix synchronizes the daemons & timers to the global operator freeze:

Once an operator is frozen, all daemons & timers are stopped almost the same way as when the operator exits (e.g. due to SIGTERM/SIGINT) — with all the cancellation backoffs, timeouts, and other procedures of graceful & forced termination.

Once the operator is resumed, nothing is done explicitly: the resumed watch-streams will naturally spawn new daemons/timers for all matching resources to the moment — as if those started to match the filters, all at once. (This will implicitly cover the case of "up to date" relevance: those resources that did not exist before the freeze, will be spawned too; those that has gone, will not be resumed.)

TODOs left:

Edge case: prematurely stop the stopping of daemons if the operator is resumed before all of them are stopped.
- Uncertain: just unset the stopping reasons? but what if the daemon's code is already in the existing clause? finish the stopping? then, how is it re-spawned if no new events arrive for the resource after that?
Tests.

Fixes #673.

Signed-off-by: Sergey Vasilyev <[email protected]>

nolar added the bug Something isn't working label Feb 8, 2021

nolar force-pushed the pause-daemons-on-freeze branch 3 times, most recently from 8882d45 to 321679b Compare February 11, 2021 21:11

nolar marked this pull request as ready for review February 11, 2021 21:12

nolar force-pushed the pause-daemons-on-freeze branch from 321679b to 1d17c01 Compare February 11, 2021 21:20

Stop daemons/timers while the operator is on peering freeze

69a792e

Signed-off-by: Sergey Vasilyev <[email protected]>

nolar force-pushed the pause-daemons-on-freeze branch from 1d17c01 to 69a792e Compare February 11, 2021 21:26

nolar merged commit e808d89 into master Feb 11, 2021

nolar deleted the pause-daemons-on-freeze branch February 11, 2021 21:32

nolar mentioned this pull request Feb 13, 2021

Change terminology from "freezing" to "pausing" #680

Merged

dependabot bot mentioned this pull request Mar 11, 2021

Bump kopf from 0.28.3 to 1.30.1 kawaja/oaat-operator#42

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop daemons & timers while the operator is on peering freeze #675

Stop daemons & timers while the operator is on peering freeze #675

nolar commented Feb 8, 2021 •

edited

Loading

Stop daemons & timers while the operator is on peering freeze #675

Stop daemons & timers while the operator is on peering freeze #675

Conversation

nolar commented Feb 8, 2021 • edited Loading

nolar commented Feb 8, 2021 •

edited

Loading