-
Notifications
You must be signed in to change notification settings - Fork 59
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
control-service: update the K8S job monitoring logic (#563)
Previously, the logic that monitors the K8S jobs was ignoring the ADDED event, which is received when a K8S job is created but before the pod execution is started. The reason for this is unclear, but this was affecting the notification logic in the following manner: * Assume we have a previous execution of a data job, which completed with a user error * When the next execution is started, first a K8S job is created. This emits the ADDED event and a metrics for this job is immediatelly exposed by the kube-state-metrics * Since we are ignoring the ADDED event, our termination status still reflects the previous execution (i.e. user error) * We remain in this state until the actual execution starts, i.e. a pod is up and running, in which case we receive a MODIFIED event for the K8S job and update the termination status in response * However, during the period from the creation of the K8S job until it actually starts, the kube-state-metrics exposes information about the new execution, but our termination metrics reflect the old one. Because we are joining against the kube-state-metrics (in order to get the execution id), it happens that for a period of time we have an active alert for the new execution (even though it is still running) with the termination status of the old one. Luckily this alert rarely fires because we have a 1 minute `for` time before an active alert becomes firing. However, this can produce false positives if the pod is delayed too much and is also ugly and misleading when looking at the Prometheus graphs. This commit aims to fix this by responding to the ADDED event by creating a SUBMITTED execution with a start time equal to the current time. Testing done: unit and integration tests pass Signed-off-by: Tsvetomir Palashki <[email protected]>
- Loading branch information
Showing
5 changed files
with
88 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters