Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert severity not correctly set on PagerDuty Alert #1211

Closed
dbonatto opened this issue Jan 25, 2018 · 13 comments
Closed

Alert severity not correctly set on PagerDuty Alert #1211

dbonatto opened this issue Jan 25, 2018 · 13 comments

Comments

@dbonatto
Copy link
Contributor

I'm trying to migrate to the PagerDuty API v2. I already changed to use the routing_key in the configuration so that AM would use the API v2. What I can not find how to do is passing on to PagerDuty the severity that is declared as a label in the alert.

curl -XPOST -k https://somehost/api/v1/alerts -d '[{
        "status": "firing",
        "labels": {
                "alertname": "TestAlertWarning-13127",
                "service": "my-service",
                "team": "C Team",
                "severity":"warning",
                "instance": "TestAlertWarning-13127.example.net"
        },
        "annotations": {
                "summary": "High latency is high!"
        },
        "generatorURL": "http://prometheus.int.example.net/<generating_expression>"
}]'

What is happening now is that AM is sending a "error" severity to PagerDuty, which looking in the code I could see that is the default behavior when no severity is set.

Not sure if it is a code or documentation issue, would appreciate some help.

  • Alertmanager version:

Branch: HEAD
BuildDate: 20180112-10:32:46
BuildUser: root@d83981af1d3d
GoVersion: go1.9.2
Revision: fb713f6
Version: 0.13.0

  • Prometheus version:

Version 1.8.2
Revision 5211b96d4d1291c3dd1a569f711d3b301b635ecb
Branch HEAD
BuildUser root@1412e937e4ad
BuildDate 20171104-16:09:14
GoVersion go1.9.2

  • Alertmanager configuration file:
  - receiver: CTeam-pagerduty-alert
    match:
      team: C Team
...
- name: CTeam-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
@stuartnelson3
Copy link
Contributor

Severity is set in pagerduty_configs:

- name: CTeam-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    severity: warning

@dbonatto
Copy link
Contributor Author

dbonatto commented Jan 25, 2018

Sorry, this was one of the tests I've done.

If you remove it, it will have the behavior I described. Its not passing on the severity and I can't find in the documentation how to do it.

@stuartnelson3
Copy link
Contributor

Ah, I misunderstood, sorry about that. As it is now, the severity can only be defined statically in the receiver.

To pass on the severity in the label, the code needs to be slightly updated here:

Severity: n.conf.Severity,

n.conf.Severity needs to become tmpl(n.conf.Severity). Then you can use the normal templating to access the common severity label.

I would recommend against this, as if an alert group has mixed severity labels (one is severe, one is warning), then that common label isn't there for templating. I'm not sure how you have your grouping configured, though.

I would recommend in your routes defining a match on severity=warning and a separate severity=critical (or whatever the levels are), and having them point to two different receivers. It's more verbose, but perhaps clearer to others reading the config that don't know how the grouping and common labels work.

That having been said, if you want to add templating to the severity in the link above, that would be fine by me (although I would still recommend being explicit).

@stuartnelson3 stuartnelson3 reopened this Jan 26, 2018
@dbonatto
Copy link
Contributor Author

dbonatto commented Jan 26, 2018

Thanks for reopening.

So in my use of AM, we have around 10 teams and many more services. Each team will decide on the severity of an alert based on the type of problem and the environment in which the problem is happening. Also, PagerDuty makes use of the Severity to define what is the urgency of a alert and with that decide how to contact on call engineers.

So in my opinion it would be very useful to have a way to pass the severity configured in the alert directly to PagerDuty, even if that is not the default behavior. Looks like that is the alternative you suggested and I'm very much in favor of it.

Creating routing rules for all teams I have and for all four valid seventies in PagerDuty (Info, Warning, Error and Critical) will be a little bit messy. Still, looks like that is the alternative I have for now.

Also, I think offering a template for other fields provided by the PagerDuty API v2, such as Group and Component might be useful, so we can have a more generic configuration file and have information being dynamically set according to what is set in the alert.

dbonatto added a commit to dbonatto/alertmanager that referenced this issue Jan 26, 2018
Add template to severity field for PagerDuty API v2.
dbonatto added a commit to dbonatto/alertmanager that referenced this issue Jan 26, 2018
@dbonatto
Copy link
Contributor Author

@stuartnelson3 I went ahead and created the PR with the solution you mentioned in case you agree with integrating it.

Also, the routing solution works, still it creates a giant config file. Every team will need to have something like this:

  - match:
      team: C Team
    routes:
    - receiver: CTeam-pagerduty-critical
      match:
        severity: critical
    - receiver: CTeam-pagerduty-warning
      match:
        severity: warning
    - receiver: CTeam-pagerduty-info
      match:
        severity: info
    - receiver: CTeam-pagerduty-error
      match:
        severity: error

In the end, more templates would make the configuration simpler.

@dbonatto
Copy link
Contributor Author

dbonatto commented Jan 27, 2018

@stuartnelson3 Please let me know if this is what you are looking for: prometheus/docs#956

stuartnelson3 pushed a commit that referenced this issue Feb 9, 2018
* Allow templating of Component and Group in PagerDuty v2

Related to #1211

* Add missing PD-CEF field Component
@swestcott
Copy link

I'm trying to use the templated severity field, but the PD-CEF severity field isn't being passed to PD.

Here's my config, am I doing something wrong? I'm running AM 0.15.0-rc.1.

receivers:
- name: pagerduty-poc
  pagerduty_configs:
  - send_resolved: true
    service_key: <secret>
    severity: '{{ .Labels.severity }}'

I'm using the same curl request as above.

@swestcott
Copy link

Oops, I've corrected service_key to routing_key to use the PD v2 API. AM is now getting "unexpected status code 400" from PD. Hard coding the severity to severity: 'warning' works so I guessing my template config isn't quite right yet.

@mjuarez
Copy link

mjuarez commented May 16, 2018

I too am running into issues using labels with severity. Using AM 0.14.0 and Events v2 API. Tried both {{ .Labels.severity }} and {{$labels.severity}}

Output:

level=error ts=2018-05-16T18:40:52.740921204Z caller=notify.go:303 component=dispatcher msg="Error on notify" err="cancelling notify retry for \"pagerduty\" due to unrecoverable error: unexpected status code 400"
level=error ts=2018-05-16T18:40:53.005475343Z caller=dispatch.go:266 component=dispatcher msg="Notify for alerts failed" num_alerts=1 err="cancelling notify retry for \"pagerduty\" due to unrecoverable error: unexpected status code 400"

Relevant am config as follows:

- name: nexus
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    severity: '{{$labels.severity}}'

alert rule from prometheus:

alert: GPU
  Acceleration Disabled On Host
expr: gpu_acceleration
  < 1
for: 2m
labels:
  service: httppainters
  severity: warning
annotations:
  summary: GPU Acceleration Disabled On Host

@dbonatto
Copy link
Contributor Author

@swestcott @mjuarez

What worked for me was ".CommonLabels.severity".

We then added this:

{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower }}{{ else }}critical{{ end }}

Which makes the severity label low case, the PagerDuty API seems to like it more. We then force an alert to be critical in case severity is not set, which I think is a good practice.

Full config for reference, where we also added class, component and group, which are fields supported by the pagerduty API.

- name: team-pagerduty-alert
  pagerduty_configs:
  - send_resolved: true
    routing_key: <secret>
    url: https://events.pagerduty.com/v2/enqueue
    client: '{{ template "pagerduty.default.client" . }}'
    client_url: '{{ template "pagerduty.default.clientURL" . }}'
    description: '{{ template "pagerduty.default.description" .}}'
    details:
      firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
      num_firing: '{{ .Alerts.Firing | len }}'
      num_resolved: '{{ .Alerts.Resolved | len }}'
      resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
    severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
      }}{{ else }}critical{{ end }}'
    class: '{{ .CommonLabels.class }}'
    component: '{{ .CommonLabels.component }}'
    group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
      end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
      .CommonLabels.service }}.{{ .CommonLabels.service }}{{ end }}

For reference on PagerDuty fields: https://v2.developer.pagerduty.com/docs/send-an-event-events-api-v2

@swestcott
Copy link

@dbonatto Thanks! {{ .CommonLabels.severity }} worked for me.

@kchaitu4
Copy link

kchaitu4 commented Oct 14, 2018

I have a same issue - I am getting line 35: did not find expected key"

This is what I have in my config file

23      - name: 'PagerDuty'
24        pagerduty_configs:
25        - send_resolved: true
26          routing_key: <my_secrete_key>
27          url: https://events.pagerduty.com/v2/enqueue
28          client: '{{ template "pagerduty.default.client" . }}'
29          client_url: '{{ template "pagerduty.default.clientURL" . }}'
30          description: '{{ template "pagerduty.default.description" .}}'
31          details:
32            firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
33            num_firing: '{{ .Alerts.Firing | len }}'
34            num_resolved: '{{ .Alerts.Resolved | len }}'
35            resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
36         severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
37           }}{{ else }}critical{{ end }}'
38         class: '{{ .CommonLabels.class }}'
39         component: '{{ .CommonLabels.component }}'
40        group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
41          end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
42          .CommonLabels.severity }}.{{ .CommonLabels.severity }}{{ end }}

did i set it up all wrong???

@morganwu277
Copy link

I have a same issue - I am getting line 35: did not find expected key"

This is what I have in my config file

23      - name: 'PagerDuty'
24        pagerduty_configs:
25        - send_resolved: true
26          routing_key: <my_secrete_key>
27          url: https://events.pagerduty.com/v2/enqueue
28          client: '{{ template "pagerduty.default.client" . }}'
29          client_url: '{{ template "pagerduty.default.clientURL" . }}'
30          description: '{{ template "pagerduty.default.description" .}}'
31          details:
32            firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
33            num_firing: '{{ .Alerts.Firing | len }}'
34            num_resolved: '{{ .Alerts.Resolved | len }}'
35            resolved: '{{ template "pagerduty.default.instances" .Alerts.Resolved }}'
36         severity: '{{ if .CommonLabels.severity }}{{ .CommonLabels.severity | toLower
37           }}{{ else }}critical{{ end }}'
38         class: '{{ .CommonLabels.class }}'
39         component: '{{ .CommonLabels.component }}'
40        group: {{ if .CommonLabels.environment }}.{{ .CommonLabels.environment }}{{
41          end }}{{ if .CommonLabels.region }}.{{ .CommonLabels.region }}{{ end }}{{ if
42          .CommonLabels.severity }}.{{ .CommonLabels.severity }}{{ end }}

did i set it up all wrong???

@kchaitu4 because of yaml indent issue...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants