Skip to content

Commit f37114d

Browse files
committed
stubbing new files for later
1 parent 7bfaa9a commit f37114d

File tree

3 files changed

+125
-1
lines changed

3 files changed

+125
-1
lines changed

docs/projects/work/interview_notes.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ terraform
1818

1919
# Question bank
2020

21-
How would you setup terraform in a collaborative/team environment?
21+
How would you setup terraform in a collaborative/team environment? What are some considerations?
2222

2323
See if hes familiar with monitoring/observability tools
2424

docs/work/discovery_for_new_irm.md

+17
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
---
2+
tags:
3+
- WIP
4+
date: "2025-03-11"
5+
title: discovery_for_new_irm
6+
---
7+
8+
> [!faq]- Disclaimer:
9+
> This isn't a guide, this post just outlines my approach at achieving a solution.
10+
11+
# Background
12+
13+
# Media
14+
15+
# References
16+
17+
-

docs/work/managing_irm_as_code.md

+107
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
---
2+
tags:
3+
- WIP
4+
- grafana oncall
5+
- irm
6+
date: 2025-01-22
7+
title: Managing IRM as Code
8+
---
9+
10+
> [!faq]- Disclaimer:
11+
> This isn't a guide, this post just outlines my approach at achieving a solution.
12+
13+
# Background
14+
15+
We use Grafana Oncall to manage our IRE rotations. In the case of DR and for convenience - we want to have the ability to manage and configure what we can in code - this allows us to formally review access changes before they are rolled out.
16+
17+
# Setting up your team in Grafana Oncall
18+
19+
We are using terraform, setup with Grafana's providers to manage these resources. Currently in order to setup a new team with alerting/escalation chain we add the following files:
20+
21+
```
22+
access/grafana/oncall
23+
- team_teamA.tf
24+
- integrations_teamA.tf
25+
- escalation_teamA.tf
26+
```
27+
28+
First create an empty schedule in Grafana oncall. We are leaving the management of schedules to be manual to allow for more flexibility in editting, however the option stands if you want to move that schedule into code ([overrides are still supported](https://registry.terraform.io/providers/grafana/grafana/latest/docs/resources/oncall_schedule#enable_web_overrides-1)). The schedule needs to be created first as we need to reference it for some of the resources we are about to create.
29+
30+
Next, populate the above files accordingly:
31+
32+
```hcl
33+
# team_teamA.tf
34+
35+
# create team in grafana
36+
resource "grafana_team" "teamA" {
37+
provider = grafana.main
38+
name = "teamA"
39+
members = [
40+
41+
...
42+
]
43+
}
44+
45+
# oncall needs a separate data type to reference the team
46+
data "grafana_oncall_team" "teamA" {
47+
name = "teamA"
48+
depends_on = [grafana_team.teamA]
49+
}
50+
```
51+
52+
```hcl
53+
# integrations_teamA.tf
54+
55+
module "teamA" {
56+
source = "./modules/integration"
57+
name = "teamA"
58+
escalation_chain_id = grafana_oncall_escalation_chain.teamA.id
59+
oncall_team_id = data.grafana_oncall_team.teamA.id
60+
61+
# add routes based on severity. ensure that oncall user has been invited to these channels otherwise messages won't go through.
62+
routes = [
63+
{
64+
routing_regex = "{{ \"alerts-sev-1\" in payload.commonLabels.slack_channel }}"
65+
slack_channel_name = "alerts-sev-1"
66+
},
67+
...
68+
]
69+
}
70+
71+
```
72+
73+
```hcl
74+
# escalation_teamA.tf
75+
76+
resource "grafana_oncall_escalation_chain" "teamA" {
77+
name = "teamA"
78+
team_id = data.grafana_oncall_team.teamA.id
79+
}
80+
81+
# Schedule is not in github
82+
data "grafana_oncall_schedule" "schedule_teamA" {
83+
name = "teamA"
84+
}
85+
86+
// Notify users from on-call schedule
87+
resource "grafana_oncall_escalation" "teamA_step_0" {
88+
escalation_chain_id = grafana_oncall_escalation_chain.teamA.id
89+
type = "notify_on_call_from_schedule"
90+
notify_on_call_from_schedule = data.grafana_oncall_schedule.schedule_teamA.id
91+
position = 0
92+
}
93+
```
94+
95+
# Creating alerts
96+
97+
- Ideally we continue manage alerts-as-code
98+
99+
# Alert Routing
100+
101+
- Primarily done via Slack for general alerts
102+
- Phone call routing for critical alerts \[TODO\]
103+
104+
105+
# References
106+
107+
-

0 commit comments

Comments
 (0)