Skip to content
This repository has been archived by the owner on Mar 27, 2024. It is now read-only.

Commit

Permalink
feat(consul): collect prometheus key metrics (#1028)
Browse files Browse the repository at this point in the history
  • Loading branch information
ilyam8 authored Dec 19, 2022
1 parent 0b0ba23 commit 00fc083
Show file tree
Hide file tree
Showing 20 changed files with 4,989 additions and 668 deletions.
57 changes: 48 additions & 9 deletions modules/consul/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,61 @@ learn_rel_path: "References/Collectors references/Webapps"

# Consul monitoring with Netdata

[`Consul`](https://www.consul.io/) is a service networking solution to connect and secure services across any runtime
[Consul](https://www.consul.io/) is a service networking solution to connect and secure services across any runtime
platform and public or private cloud.

This module monitors `Consul` health checks.
This module collects the [Key Metrics](https://developer.hashicorp.com/consul/docs/agent/telemetry#key-metrics) of the
Consul Agent.

## Requirements

- Consul
with [enabled](https://developer.hashicorp.com/consul/docs/agent/config/config-files#telemetry-prometheus_retention_time)
Prometheus telemetry.

## Metrics

Depending on
the [mode](https://developer.hashicorp.com/consul/docs/install/glossary#agent), the collector collects a different
number of metrics.

All metrics have "consul." prefix.

Labels per scope:

- service check: node, service.
- unbound check: node.

| Metric | Scope | Dimensions | Units |
|-----------------------------|:-------------:|:---------------------------------------:|:------:|
| service_health_check_status | service check | passing, maintenance, warning, critical | status |
| unbound_health_check_status | unbound check | passing, maintenance, warning, critical | status |
- node check: node, check_name.
- service check: node, check_name, service_name.

| Metric | Scope | Dimensions | Units | Server | Client |
|-----------------------------------|:-------------:|:-----------------------------------------:|:-------------:|:------:|:------:|
| node_health_check_status | node check | passing, maintenance, warning, critical | status | yes | yes |
| service_health_check_status | service check | passing, maintenance, warning, critical | status | yes | yes |
| client_rpc_requests_rate | global | rpc | requests/s | yes | yes |
| client_rpc_requests_exceeded_rate | global | exceeded | requests/s | yes | yes |
| client_rpc_requests_failed_rate | global | failed | requests/s | yes | yes |
| memory_allocated | global | allocated | bytes | yes | yes |
| memory_sys | global | sys | bytes | yes | yes |
| gc_pause_time | global | gc_pause | seconds | yes | yes |
| kvs_apply_time | global | quantile_0.5, quantile_0.9, quantile_0.99 | ms | yes | no |
| kvs_apply_operations_rate | global | kvs_apply | ops/s | yes | no |
| txn_apply_time | global | quantile_0.5, quantile_0.9, quantile_0.99 | ms | yes | no |
| txn_apply_operations_rate | global | txn_apply | ops/s | yes | no |
| raft_commit_time | global | quantile_0.5, quantile_0.9, quantile_0.99 | ms | yes | no |
| raft_commits_rate | global | commits | commits/s | yes | no |
| autopilot_health_status | global | healthy, unhealthy | status | yes | no |
| autopilot_failure_tolerance | global | failure_tolerance | servers | yes | no |
| raft_leader_last_contact_time | global | quantile_0.5, quantile_0.9, quantile_0.99 | ms | yes | no |
| raft_leader_elections_rate | global | leader | elections/s | yes | no |
| raft_leadership_transitions_rate | global | leadership | transitions/s | yes | no |
| server_leadership_status | global | leader, not_leader | status | yes | no |
| raft_thread_main_saturation_perc | global | saturation | percentage | yes | no |
| raft_thread_fsm_saturation_perc | global | saturation | percentage | yes | no |
| raft_fsm_last_restore_duration | global | last_restore_duration | ms | yes | no |
| raft_leader_oldest_log_age | global | oldest_log_age | seconds | yes | no |
| raft_rpc_install_snapshot_time | global | quantile_0.5, quantile_0.9, quantile_0.99 | ms | yes | no |
| raft_boltdb_freelist_bytes | global | freelist | bytes | yes | no |
| raft_boltdb_logs_per_batch_rate | global | written | logs/s | yes | no |
| raft_boltdb_store_logs_time | global | quantile_0.5, quantile_0.9, quantile_0.99 | ms | yes | no |

## Configuration

Expand All @@ -45,9 +82,11 @@ Here is an example for 2 servers:
jobs:
- name: local
url: http://127.0.0.1:8500
acl_token: "ec15675e-2999-d789-832e-8c4794daa8d7"

- name: remote
url: http://203.0.113.10:8500
acl_token: "ada7f751-f654-8872-7f93-498e799158b6"
```
For all available options please see
Expand Down
Loading

0 comments on commit 00fc083

Please sign in to comment.