@@ -37,3 +37,47 @@ sbcli cluster get-secret <CLUSTER_ID>
37
37
** Credentials** <br />
38
38
Username: ** admin** <br />
39
39
Password: ** <CLUSTER_SECRET>**
40
+
41
+ ## Grafana Dashboards
42
+
43
+ All dashboards are stored in per-cluster folders. Each cluster contains the following dashboards entries:
44
+
45
+ - Cluster
46
+ - Storage node
47
+ - Device
48
+ - Logical Volume
49
+ - Storage Pool
50
+
51
+ Dashboard widgets are designed to be self-explanatory.
52
+
53
+ Per default, each of those dashboards contain data for all objects (e.g. all devices) in a cluster. It is, however,
54
+ possible to filter them by particular objects (e.g. devices, storage nodes or logical volumes) and to change the
55
+ timescale and window.
56
+
57
+ Dashboards include physical and logical capacity utilization dynamics, IOPS, I/O throughput, and latency dynamics (all
58
+ separate for read, write and unmap). While all data of the event log is currently stored in Prometheus, they aren't
59
+ used at the time of writing.
60
+
61
+ ## Alerting
62
+
63
+ By default, Grafana is configured to send alerts to Slack channels. However, Grafana also allows alerting via email
64
+ notifications, but this requires the use of an authorized SMTP server to send message.
65
+
66
+ An SMTP server is currently not part of the management stack and must be deployed separately. Alerts can be triggered
67
+ based on on-time or interval-based thresholds of statistical data collected (IO statistics, capacity information) or
68
+ based on events from the cluster event log.
69
+
70
+ ### Pre-Defined Alerts
71
+
72
+ The following pre-defined alerts are available:
73
+
74
+ | Alert | Trigger |
75
+ | --------------------| ---------------------------------------------------------------|
76
+ | device-unavailable | Device Status changed from online to unavailable |
77
+ | device-read-only | Device Status changed from online to read-only |
78
+ | sn-offline | Storage node status changed from online to offline |
79
+ | crit-cap-reached | Critical absolute capacity utilization in cluster was reached |
80
+ | crit-prov-reached | Critical absolute capacity utilization in cluster was reached |
81
+
82
+ It is possible to configure the Slack webhook for alerting during cluster creation or to modify it at a later point in
83
+ time.
0 commit comments