You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The peerings and the freezes were made to prevent multiple operators handling the same resources in the same cluster, even if those operators run outside of the cluster. One of the use-cases was for the development/debugging of the operators on the developer's machine. But also the normal operations with 2+ operators, which would otherwise conflict with each other's changes.
Description
Before this PR: When an operator is frozen due to presence of another operator of the same or higher priority, it ignores the events, while continuing watching for the resource changes normally.
As a consequence of that approach, in some cases (see below), it ignores the listing/watching events on the startup, and does not react to the objects' changes/creations/deletions that happened before or during the initial freeze (of up to 60 seconds after the startup). Only the new changes happening after the unfreeze are handled.
For example, when the previous operator's pod is SIGKILL'ed and restarted, the old operator's process does not remove itself from the peering resource (no chance given), so the new operator's process can see a ghost of the old operator's process for up to 60 seconds. And since it has the same priority, it goes to the frozen mode until then.
Such per-event ignoring also creates a lot of log noise for every ignored event.
With this PR, the "freeze-mode" is redesigned to a better approach:
Instead of ignoring the events but continuing to watch the resource, the watch streams are closed at the same moment as the freeze mode is turned on (and the watching connections are closed).
When the freeze mode is turned off, the watch-streams are reconnected by re-listing the resources, as if the operator has just started.
This ensures that no objects remain unnoticed for long times just because their changes happened during the "freeze mode" for false technical reasons (ghost operators' records in the peering). In the worst case, the reactions will be delayed until the ghost operators' records will expire (up to 60 seconds), but the reaction will happen normally after that. In most cases, however, it will lead to no-operation.
This also removes the log noise.
Types of Changes
New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)
Review
List of tasks the reviewer must do to review the PR
Tests
The text was updated successfully, but these errors were encountered:
Redesign operator freezes to stop/resume watching instead of ignoring events.
Background
The peerings and the freezes were made to prevent multiple operators handling the same resources in the same cluster, even if those operators run outside of the cluster. One of the use-cases was for the development/debugging of the operators on the developer's machine. But also the normal operations with 2+ operators, which would otherwise conflict with each other's changes.
Description
Before this PR: When an operator is frozen due to presence of another operator of the same or higher priority, it ignores the events, while continuing watching for the resource changes normally.
As a consequence of that approach, in some cases (see below), it ignores the listing/watching events on the startup, and does not react to the objects' changes/creations/deletions that happened before or during the initial freeze (of up to 60 seconds after the startup). Only the new changes happening after the unfreeze are handled.
For example, when the previous operator's pod is SIGKILL'ed and restarted, the old operator's process does not remove itself from the peering resource (no chance given), so the new operator's process can see a ghost of the old operator's process for up to 60 seconds. And since it has the same priority, it goes to the frozen mode until then.
Such per-event ignoring also creates a lot of log noise for every ignored event.
With this PR, the "freeze-mode" is redesigned to a better approach:
Instead of ignoring the events but continuing to watch the resource, the watch streams are closed at the same moment as the freeze mode is turned on (and the watching connections are closed).
When the freeze mode is turned off, the watch-streams are reconnected by re-listing the resources, as if the operator has just started.
This ensures that no objects remain unnoticed for long times just because their changes happened during the "freeze mode" for false technical reasons (ghost operators' records in the peering). In the worst case, the reactions will be delayed until the ghost operators' records will expire (up to 60 seconds), but the reaction will happen normally after that. In most cases, however, it will lead to no-operation.
This also removes the log noise.
Types of Changes
Review
List of tasks the reviewer must do to review the PR
The text was updated successfully, but these errors were encountered: