Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop daemons & timers while the operator is on peering freeze #675

Merged
merged 1 commit into from
Feb 11, 2021

Conversation

nolar
Copy link
Owner

@nolar nolar commented Feb 8, 2021

When the operator goes to peering freeze, all the watch streams disconnect from the API and wait until the operator is resumed.

However, this mode was not passed to the daemons & timers subsystem, so they continued running — while not getting any updates on the resource's state from the cluster (because the watch-streams are frozen).

This fix synchronizes the daemons & timers to the global operator freeze:

Once an operator is frozen, all daemons & timers are stopped almost the same way as when the operator exits (e.g. due to SIGTERM/SIGINT) — with all the cancellation backoffs, timeouts, and other procedures of graceful & forced termination.

Once the operator is resumed, nothing is done explicitly: the resumed watch-streams will naturally spawn new daemons/timers for all matching resources to the moment — as if those started to match the filters, all at once. (This will implicitly cover the case of "up to date" relevance: those resources that did not exist before the freeze, will be spawned too; those that has gone, will not be resumed.)

TODOs left:

  • Edge case: prematurely stop the stopping of daemons if the operator is resumed before all of them are stopped.
    • Uncertain: just unset the stopping reasons? but what if the daemon's code is already in the existing clause? finish the stopping? then, how is it re-spawned if no new events arrive for the resource after that?
  • Tests.

Fixes #673.

@nolar nolar added the bug Something isn't working label Feb 8, 2021
@nolar nolar force-pushed the pause-daemons-on-freeze branch 3 times, most recently from 8882d45 to 321679b Compare February 11, 2021 21:11
@nolar nolar marked this pull request as ready for review February 11, 2021 21:12
@nolar nolar force-pushed the pause-daemons-on-freeze branch from 321679b to 1d17c01 Compare February 11, 2021 21:20
@nolar nolar force-pushed the pause-daemons-on-freeze branch from 1d17c01 to 69a792e Compare February 11, 2021 21:26
@nolar nolar merged commit e808d89 into master Feb 11, 2021
@nolar nolar deleted the pause-daemons-on-freeze branch February 11, 2021 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

timers continue when freezing operations
1 participant