This is one of the features in GARM that I really love having. For one thing, it's community contributed and for another, it really adds value to the project. It allows us to create some pretty nice visualizations of what is happening with GARM.
Metric name |
Type |
Labels |
Description |
garm_health |
Gauge |
controller_id =<controller id>
callback_url =<callback url>
controller_webhook_url =<controller webhook url>
metadata_url =<metadata url>
webhook_url =<webhook url>
name =<hostname> |
This is a gauge that is set to 1 if GARM is healthy and 0 if it is not. This is useful for alerting. |
garm_webhooks_received |
Counter |
valid =<valid request>
reason =<reason for invalid requests> |
This is a counter that increments every time GARM receives a webhook from GitHub. |
Metric name |
Type |
Labels |
Description |
garm_enterprise_info |
Gauge |
id =<enterprise id>
name =<enterprise name> |
This is a gauge that is set to 1 and expose enterprise information |
garm_enterprise_pool_manager_status |
Gauge |
id =<enterprise id>
name =<enterprise name>
running =<true|false> |
This is a gauge that is set to 1 if the enterprise pool manager is running and set to 0 if not |
Metric name |
Type |
Labels |
Description |
garm_organization_info |
Gauge |
id =<organization id>
name =<organization name> |
This is a gauge that is set to 1 and expose organization information |
garm_organization_pool_manager_status |
Gauge |
id =<organization id>
name =<organization name>
running =<true|false> |
This is a gauge that is set to 1 if the organization pool manager is running and set to 0 if not |
Metric name |
Type |
Labels |
Description |
garm_repository_info |
Gauge |
id =<repository id>
name =<repository name> |
This is a gauge that is set to 1 and expose repository information |
garm_repository_pool_manager_status |
Gauge |
id =<repository id>
name =<repository name>
running =<true|false> |
This is a gauge that is set to 1 if the repository pool manager is running and set to 0 if not |
Metric name |
Type |
Labels |
Description |
garm_provider_info |
Gauge |
description =<provider description>
name =<provider name>
type =<internal|external> |
This is a gauge that is set to 1 and expose provider information |
Metric name |
Type |
Labels |
Description |
garm_pool_info |
Gauge |
flavor =<flavor>
id =<pool id>
image =<image name>
os_arch =<defined OS arch>
os_type =<defined OS name>
pool_owner =<owner name>
pool_type =<repository|organization|enterprise>
prefix =<prefix>
provider =<provider name>
tags =<concatenated list of pool tags>
|
This is a gauge that is set to 1 and expose pool information |
garm_pool_status |
Gauge |
enabled =<true|false>
id =<pool id> |
This is a gauge that is set to 1 if the pool is enabled and set to 0 if not |
garm_pool_bootstrap_timeout |
Gauge |
id =<pool id> |
This is a gauge that is set to the pool bootstrap timeout |
garm_pool_max_runners |
Gauge |
id =<pool id> |
This is a gauge that is set to the pool max runners |
garm_pool_min_idle_runners |
Gauge |
id =<pool id> |
This is a gauge that is set to the pool min idle runners |
Metric name |
Type |
Labels |
Description |
garm_runner_status |
Gauge |
name =<runner name>
pool_owner =<owner name>
pool_type =<repository|organization|enterprise>
provider =<provider name>
runner_status =<running|stopped|error|pending_delete|deleting|pending_create|creating|unknown>
status =<idle|pending|terminated|installing|failed|active>
|
This is a gauge value that gives us details about the runners garm spawns |
garm_runner_operations_total |
Counter |
provider =<provider name>
operation =<CreateInstance|DeleteInstance|GetInstance|ListInstances|RemoveAllInstances|Start\Stop> |
This is a counter that increments every time a runner operation is performed |
garm_runner_errors_total |
Counter |
provider =<provider name>
operation =<CreateInstance|DeleteInstance|GetInstance|ListInstances|RemoveAllInstances|Start\Stop> |
This is a counter that increments every time a runner operation errored |
Metric name |
Type |
Labels |
Description |
garm_github_operations_total |
Counter |
operation =<ListRunners|CreateRegistrationToken|...>
scope =<Organization|Repository|Enterprise> |
This is a counter that increments every time a github operation is performed |
garm_github_errors_total |
Counter |
operation =<ListRunners|CreateRegistrationToken|...>
scope =<Organization|Repository|Enterprise> |
This is a counter that increments every time a github operation errored |
Metrics are disabled by default. To enable them, add the following to your config file:
[metrics]
# Toggle to disable authentication (not recommended) on the metrics endpoint.
# If you do disable authentication, I encourage you to put a reverse proxy in front
# of garm and limit which systems can access that particular endpoint. Ideally, you
# would enable some kind of authentication using the reverse proxy, if the built-in auth
# is not sufficient for your needs.
#
# Default: false
disable_auth = true
# Toggle metrics. If set to false, the API endpoint for metrics collection will
# be disabled.
#
# Default: false
enable = true
# period is the time interval when the /metrics endpoint will update internal metrics about
# controller specific objects (e.g. runners, pools, etc.)
#
# Default: "60s"
period = "30s"
You can choose to disable authentication if you wish, however it's not terribly difficult to set up, so I generally advise against disabling it.
The following section assumes that your garm instance is running at garm.example.com
and has TLS enabled.
First, generate a new JWT token valid only for the metrics endpoint:
garm-cli metrics-token create
Note: The token validity is equal to the TTL you set in the JWT config section.
Copy the resulting token, and add it to your prometheus config file. The following is an example of how to add garm as a target in your prometheus config file:
scrape_configs:
- job_name: "garm"
# Connect over https. If you don't have TLS enabled, change this to http.
scheme: https
static_configs:
- targets: ["garm.example.com"]
authorization:
credentials: "superSecretTokenYouGeneratedEarlier"