Skip to content

Latest commit

 

History

History
118 lines (87 loc) · 13.9 KB

config_metrics.md

File metadata and controls

118 lines (87 loc) · 13.9 KB

The metrics section

This is one of the features in GARM that I really love having. For one thing, it's community contributed and for another, it really adds value to the project. It allows us to create some pretty nice visualizations of what is happening with GARM.

Common metrics

Metric name Type Labels Description
garm_health Gauge controller_id=<controller id>
callback_url=<callback url>
controller_webhook_url=<controller webhook url>
metadata_url=<metadata url>
webhook_url=<webhook url>
name=<hostname>
This is a gauge that is set to 1 if GARM is healthy and 0 if it is not. This is useful for alerting.
garm_webhooks_received Counter valid=<valid request>
reason=<reason for invalid requests>
This is a counter that increments every time GARM receives a webhook from GitHub.

Enterprise metrics

Metric name Type Labels Description
garm_enterprise_info Gauge id=<enterprise id>
name=<enterprise name>
This is a gauge that is set to 1 and expose enterprise information
garm_enterprise_pool_manager_status Gauge id=<enterprise id>
name=<enterprise name>
running=<true|false>
This is a gauge that is set to 1 if the enterprise pool manager is running and set to 0 if not

Organization metrics

Metric name Type Labels Description
garm_organization_info Gauge id=<organization id>
name=<organization name>
This is a gauge that is set to 1 and expose organization information
garm_organization_pool_manager_status Gauge id=<organization id>
name=<organization name>
running=<true|false>
This is a gauge that is set to 1 if the organization pool manager is running and set to 0 if not

Repository metrics

Metric name Type Labels Description
garm_repository_info Gauge id=<repository id>
name=<repository name>
This is a gauge that is set to 1 and expose repository information
garm_repository_pool_manager_status Gauge id=<repository id>
name=<repository name>
running=<true|false>
This is a gauge that is set to 1 if the repository pool manager is running and set to 0 if not

Provider metrics

Metric name Type Labels Description
garm_provider_info Gauge description=<provider description>
name=<provider name>
type=<internal|external>
This is a gauge that is set to 1 and expose provider information

Pool metrics

Metric name Type Labels Description
garm_pool_info Gauge flavor=<flavor>
id=<pool id>
image=<image name>
os_arch=<defined OS arch>
os_type=<defined OS name>
pool_owner=<owner name>
pool_type=<repository|organization|enterprise>
prefix=<prefix>
provider=<provider name>
tags=<concatenated list of pool tags>
This is a gauge that is set to 1 and expose pool information
garm_pool_status Gauge enabled=<true|false>
id=<pool id>
This is a gauge that is set to 1 if the pool is enabled and set to 0 if not
garm_pool_bootstrap_timeout Gauge id=<pool id> This is a gauge that is set to the pool bootstrap timeout
garm_pool_max_runners Gauge id=<pool id> This is a gauge that is set to the pool max runners
garm_pool_min_idle_runners Gauge id=<pool id> This is a gauge that is set to the pool min idle runners

Runner metrics

Metric name Type Labels Description
garm_runner_status Gauge name=<runner name>
pool_owner=<owner name>
pool_type=<repository|organization|enterprise>
provider=<provider name>
runner_status=<running|stopped|error|pending_delete|deleting|pending_create|creating|unknown>
status=<idle|pending|terminated|installing|failed|active>
This is a gauge value that gives us details about the runners garm spawns
garm_runner_operations_total Counter provider=<provider name>
operation=<CreateInstance|DeleteInstance|GetInstance|ListInstances|RemoveAllInstances|Start\Stop>
This is a counter that increments every time a runner operation is performed
garm_runner_errors_total Counter provider=<provider name>
operation=<CreateInstance|DeleteInstance|GetInstance|ListInstances|RemoveAllInstances|Start\Stop>
This is a counter that increments every time a runner operation errored

Github metrics

Metric name Type Labels Description
garm_github_operations_total Counter operation=<ListRunners|CreateRegistrationToken|...>
scope=<Organization|Repository|Enterprise>
This is a counter that increments every time a github operation is performed
garm_github_errors_total Counter operation=<ListRunners|CreateRegistrationToken|...>
scope=<Organization|Repository|Enterprise>
This is a counter that increments every time a github operation errored

Enabling metrics

Metrics are disabled by default. To enable them, add the following to your config file:

[metrics]

# Toggle to disable authentication (not recommended) on the metrics endpoint.
# If you do disable authentication, I encourage you to put a reverse proxy in front
# of garm and limit which systems can access that particular endpoint. Ideally, you
# would enable some kind of authentication using the reverse proxy, if the built-in auth
# is not sufficient for your needs.
#
# Default: false
disable_auth = true

# Toggle metrics. If set to false, the API endpoint for metrics collection will
# be disabled.
#
# Default: false
enable = true

# period is the time interval when the /metrics endpoint will update internal metrics about
# controller specific objects (e.g. runners, pools, etc.)
#
# Default: "60s"
period = "30s"

You can choose to disable authentication if you wish, however it's not terribly difficult to set up, so I generally advise against disabling it.

Configuring prometheus

The following section assumes that your garm instance is running at garm.example.com and has TLS enabled.

First, generate a new JWT token valid only for the metrics endpoint:

garm-cli metrics-token create

Note: The token validity is equal to the TTL you set in the JWT config section.

Copy the resulting token, and add it to your prometheus config file. The following is an example of how to add garm as a target in your prometheus config file:

scrape_configs:
  - job_name: "garm"
    # Connect over https. If you don't have TLS enabled, change this to http.
    scheme: https
    static_configs:
      - targets: ["garm.example.com"]
    authorization:
      credentials: "superSecretTokenYouGeneratedEarlier"