Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart #3

Merged
merged 3 commits into from
May 5, 2020
Merged

Helm chart #3

merged 3 commits into from
May 5, 2020

Conversation

patrickmslatteryvt
Copy link

Hi, this is an initial Helm chart for calert.
It implements a similar deployment to the one you have in the k8s folder but without Redis.

@mr-karan
Copy link
Owner

Thanks @patrickmslatteryvt I'll review this ASAP :)

@MPV
Copy link
Contributor

MPV commented Apr 20, 2020

This looks promising! What's next to help this moving forward? 😊

@mr-karan
Copy link
Owner

Can this be tested out locally? (without Tiller, ofcourse). If someone can help validating the chart and reviewing it, would be great! @MPV

@MPV
Copy link
Contributor

MPV commented Apr 21, 2020

I'm trying this out here now.

So far I've found 2 issues:

  1. Hardcoded name of configmap in deployment.
  2. Example prometheus-operator / alertmanager config creates errors.

1. Hardcoded configmap:

As I'm deploying this chart as a subchart as part of my "cluster-monitoring" chart, when the configmap is named after the release, I get cluster-monitoring-calert-config instead of just calert-config:

✅ The solution to this is just changing the referenced name of the configmap in the deployment:
https://github.com/patrickmslatteryvt/calert/blob/master/calert/templates/deployment.yaml#L58

-            name: calert-config
+            name: {{ template "calert.fullname" . }}-config

...as this is how the configmap is being named:
https://github.com/patrickmslatteryvt/calert/blob/master/calert/templates/configmap.yaml#L4

2. Broken alertmanager config

I get this error:
level=error ts=2020-04-21T09:15:12.828Z caller=coordinator.go:124 component=configuration msg="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="undefined receiver \"null\" used in route"

And here is the rendered alertmanager config, after using the supplied example:
$ kubectl -n cluster-monitoring get secret alertmanager-cluster-monitoring-prometh-alertmanager -o yaml | yq r - "data[alertmanager.yaml]" | base64 --decode

global:
  resolve_timeout: 5m
receivers:
- name: google-chat
  webhook_configs:
  - url: http://calert.cluster-monitoring.svc.cluster.local:6000/create?room_name=my-room-is-here
route:
  group_by:
  - job
  group_interval: 5m
  group_wait: 30s
  receiver: "null" # <--- could this be giving the issue?
  repeat_interval: 12h
  routes:
  - group_by:
    - alertname
    match:
      severity: critical
    receiver: google-chat

This is probably due to the fact that the prometheus-operator's default config for alertmanager includes a route with the "null" recevier (and also such a receiver):
https://github.com/helm/charts/blob/1e6a30c/stable/prometheus-operator/values.yaml#L131

❓ I'm not yet sure how to solve this. Could be that we should just add that "null" receiver to the receivers block, or specify google-chat as the default receiver for the route. I'll try it out.

@MPV
Copy link
Contributor

MPV commented Apr 21, 2020

2. Fixed example Alertmanager config

Here's how I got it working by adapting the example alertmanager config:
https://github.com/patrickmslatteryvt/calert/blob/master/calert/README.md#alertmanager-configuration

alertmanager:
  config:
    route:
+     receiver: "null"
      routes:
      - match:
          severity: critical
        receiver: google-chat
        group_by: [alertname]
    receivers:
+   - name: "null"
    - name: 'google-chat'
      webhook_configs:
      - url: "http://calert.clu-inf-all.svc.cluster.local:6000/create?room_name=<room>"

The second change adds back the receiver that was missing from the example, while the first change just makes it explicitly used (as it would still be set like this while merging with the prometheus-operator default values).

@MPV
Copy link
Contributor

MPV commented Apr 21, 2020

I can confirm that I got it working with my changes above. 🎉

Skärmavbild 2020-04-21 kl  12 13 56

@MPV
Copy link
Contributor

MPV commented Apr 21, 2020

I've opened PRs to @patrickmslatteryvt's fork with the fixes:
patrickmslatteryvt#1
patrickmslatteryvt#2

@MPV
Copy link
Contributor

MPV commented May 5, 2020

@mr-karan ...but my changes above aren't any blockers for using this, so I would consider this PR tested and okay to merge, unless you have any other thoughts? If @patrickmslatteryvt isn't available at the moment, I could re-contribute those changes into this repo after merging this PR.

@mr-karan mr-karan marked this pull request as ready for review May 5, 2020 13:12
@mr-karan mr-karan merged commit 19e30bd into mr-karan:master May 5, 2020
@mr-karan
Copy link
Owner

mr-karan commented May 5, 2020

Thanks for your contribution @MPV. Much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants