control-service: add counter to track data job watching task executions #692
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, we are lacking monitoring of our data job watching task -
this is the task that monitors the K8s namespace for data job changes and
updates the execution and termination statuses of the data jobs along with
the metrics exposed by the control service.
We have experienced cases when this task stops running. Considering the
importance of this task it is essential that we get an early alert when
this happens. This commit introduces a new metric (counter) that exposes
the number of executions of this task. This counter can then be used in
dashboards to alert when the task stops executing for a period of time.
Testing done: new unit tests; manually starting the service to observe
the new, gradually increasing metrics.
Signed-off-by: Tsvetomir Palashki [email protected]