-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
control-service: add namespace label to Prometheus alerts #442
Conversation
Add the Kubernetes namespace where the data job execution happened as a label to the alerts generated by the Prometheus rules. This label was initially dropped from the alerts for simplicity, but it turned out that there are some use cases where it could be useful. Testing done: tested the modified rules on a live Prometheus server where the rules are already deployed and verified that the Kubernetes namespace appears as a label. Signed-off-by: Tsvetomir Palashki <[email protected]>
7a44fc4
to
42f6d11
Compare
It's not very clear to me why do we need namespace here? |
@@ -494,7 +494,7 @@ alerting: | |||
(avg by(data_job) (taurus_datajob_termination_status) | |||
* on(data_job) group_left(email_notified_on_success) | |||
avg by(data_job, email_notified_on_success) (taurus_datajob_info{email_notified_on_success!=""}) == bool 0) | |||
* on(data_job) group_left(job_name) | |||
* on(data_job) group_left(job_name, namespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping alerts by target email and source namespace, correct?
Is this PR covering a use-case of multiple deployments on the same k8s cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping alerts by target email and source namespace, correct?
This particular grouping is by job name and namespace.
Is this PR covering a use-case of multiple deployments on the same k8s cluster?
Single vs multiple clusters should not matter if the deployments are in different namespaces. If the deployments are in the same namespace and they contain a job with the same name, it might be hard to distinguish between the two jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we support multiple deployments in the same namespace - as the builder jobs names and the cron jobs names (job deployment), k8s secrets and other resources are namespace specific (and not deployment-specific).
But supporting multiple deployments on the same cluster is certainly important as it's normal for multiple tenants to share same cluster.
This particular requirement comes from one of the internal adopters of the VDK Control Service. Regardless, having the namespace as a label in each alert could be useful in the case of multiple deployments of the Control Service (in different namespaces). It is common that the entire Kubernetes cluster is governed by a single Alertmanager. In this case, it may be useful to be able to determine the Control Service instance from which an alert came. |
That makes sense to me. |
Add the Kubernetes namespace where the data job execution happened
as a label to the alerts generated by the Prometheus rules. This label
was initially dropped from the alerts for simplicity, but it turned out
that there are some use cases where it could be useful.
Testing done: tested the modified rules on a live Prometheus server
where the rules are already deployed and verified that the Kubernetes
namespace appears as a label. Example result from the rule evaluation:
Signed-off-by: Tsvetomir Palashki [email protected]