Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

control-service: add MeterRegistry counters for DataJobsSynchronizer #2844

Merged
merged 5 commits into from
Oct 30, 2023

Conversation

mrMoZ1
Copy link
Contributor

@mrMoZ1 mrMoZ1 commented Oct 27, 2023

what: added telemetry counters for the DataJobsSynchronizer

why: The counters can be used to monitor if the synchronizeDataJobs method is executing as expected.

testing: added unit test

@antoniivanov
Copy link
Collaborator

Can you please explain how is this going to be used ?

Can you copy paste the new metrics generated from (/data-jobs/debug/prometheus)

Have you thought of adding data job name as tag ? So it's easier to recognize which job's sync is failing.

mrMoZ1 and others added 2 commits October 27, 2023 12:31
@mrMoZ1
Copy link
Contributor Author

mrMoZ1 commented Oct 27, 2023

Can you please explain how is this going to be used ?

Can you copy paste the new metrics generated from (/data-jobs/debug/prometheus)

Have you thought of adding data job name as tag ? So it's easier to recognize which job's sync is failing.

This will be used in a similar fashion to the DataJobExecutionCleanupMonitor. We already have metrics on a per job basis they can be found in DeploymentMonitor class and afaik work with the new synchronizer through the DeploymentProgress class. The purpose of this metric is to have visibility on successful dataJobsSynchrnize() invocations, rather than monitor individual data jobs' deployment statuses which is already present. A possible query might look like:
increase(datajobs_failed_synchronizer_invocations_counter[1m])
increase(datajobs_synchronizer_invocations_counter[1m])
and an alerting panel could be configured to trigger an alarm based on a certain threshold of invocations being present/not present within a time period.

Signed-off-by: mrMoZ1 <[email protected]>
@mrMoZ1 mrMoZ1 merged commit 21da2df into main Oct 30, 2023
@mrMoZ1 mrMoZ1 deleted the person/mzhivkov/synchronizer-telemetry branch October 30, 2023 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants