Skip to content

Commit

Permalink
Merge pull request #107 from fluent/add-more-metrics
Browse files Browse the repository at this point in the history
add more metrics
  • Loading branch information
kazegusuri authored Aug 10, 2019
2 parents abf147b + a2a7b80 commit b68abb5
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 30 deletions.
46 changes: 29 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,16 @@ When using multiple workers, each worker binds to port + `fluent_worker_id`.

This plugin collects internal metrics in Fluentd. The metrics are similar to/part of [monitor_agent](https://docs.fluentd.org/input/monitor_agent).

Current exposed metrics:

- `buffer_queue_length` of each BufferedOutput plugins
- `buffer_total_queued_size` of each BufferedOutput plugins
- `retry_count` of each BufferedOutput plugins
#### Exposed metrics

- `fluentd_status_buffer_queue_length`
- `fluentd_status_buffer_total_queued_size`
- `fluentd_status_retry_count`
- `fluentd_status_buffer_newest_timekey` from fluentd v1.4.2
- `fluentd_status_buffer_oldest_timekey` from fluentd v1.4.2

#### Configuration

With following configuration, those metrics are collected.

Expand All @@ -86,28 +91,35 @@ More configuration parameters:

### prometheus_output_monitor input plugin

**experimental**

This plugin collects internal metrics for output plugin in Fluentd. This is similar to `prometheus_monitor` plugin, but specialized for output plugin. There are Many metrics `prometheus_monitor` does not include, such as `num_errors`, `retry_wait` and so on.

Current exposed metrics:
#### Exposed metrics

Metrics for output

- `fluentd_output_status_buffer_queue_length`
- `fluentd_output_status_buffer_total_bytes`
- `fluentd_output_status_retry_count`
- `fluentd_output_status_num_errors`
- `fluentd_output_status_emit_count`
- `fluentd_output_status_flush_time_count`
- `fluentd_output_status_slow_flush_count`
- `fluentd_output_status_retry_wait`
- current retry_wait computed from last retry time and next retry time
- `fluentd_output_status_emit_records`
- only for v0.14
- `fluentd_output_status_write_count`
- only for v0.14
- `fluentd_output_status_rollback_count`
- only for v0.14
- `fluentd_output_status_flush_time_count` from fluentd v1.16.0
- `fluentd_output_status_slow_flush_count` from fluentd v1.16.0

Metrics for buffer

- `fluentd_output_status_buffer_total_bytes`
- `fluentd_output_status_buffer_stage_length` from fluentd v1.16.0
- `fluentd_output_status_buffer_stage_byte_size` from fluentd v1.16.0
- `fluentd_output_status_buffer_queue_length`
- `fluentd_output_status_buffer_queue_byte_size` from fluentd v1.16.0
- `fluentd_output_status_buffer_newest_timekey` from fluentd v1.16.0
- `fluentd_output_status_buffer_oldest_timekey` from fluentd v1.16.0
- `fluentd_output_status_buffer_available_space_ratio` from fluentd v1.16.0

#### Configuration

With following configuration, those metrics are collected.

Expand All @@ -124,13 +136,11 @@ More configuration parameters:

### prometheus_tail_monitor input plugin

**experimental**

This plugin collects internal metrics for in_tail plugin in Fluentd. in_tail plugin holds internal state for files that the plugin is watching. The state is sometimes important to monitor plugins work correctly.

This plugin uses internal class of Fluentd, so it's easy to break.

Current exposed metrics:
#### Exposed metrics

- `fluentd_tail_file_position`
- Current bytes which plugin reads from the file
Expand All @@ -143,6 +153,8 @@ Default labels:
- `type`: plugin name. `in_tail` only for now.
- `path`: file path

#### Configuration

With following configuration, those metrics are collected.

```
Expand Down
44 changes: 31 additions & 13 deletions lib/fluent/plugin/in_prometheus_output_monitor.rb
Original file line number Diff line number Diff line change
Expand Up @@ -64,18 +64,33 @@ def start
super

@metrics = {
# Buffer metrics
buffer_total_queued_size: @registry.gauge(
:fluentd_output_status_buffer_total_bytes,
'Current total size of stage and queue buffers.'),
buffer_stage_length: @registry.gauge(
:fluentd_output_status_buffer_stage_length,
'Current length of stage buffers.'),
buffer_stage_byte_size: @registry.gauge(
:fluentd_output_status_buffer_stage_byte_size,
'Current total size of stage buffers.'),
buffer_queue_length: @registry.gauge(
:fluentd_output_status_buffer_queue_length,
'Current length of queue buffers.'),
buffer_queue_byte_size: @registry.gauge(
:fluentd_output_status_queue_byte_size,
'Current total size of queue buffers.'),
buffer_available_buffer_space_ratios: @registry.gauge(
:fluentd_output_status_buffer_available_space_ratio,
'Ratio of available space in buffer.'),
buffer_newest_timekey: @registry.gauge(
:fluentd_output_status_buffer_newest_timekey,
'Newest timekey in buffer.'),
buffer_oldest_timekey: @registry.gauge(
:fluentd_output_status_buffer_oldest_timekey,
'Oldest timekey in buffer.'),
buffer_queue_length: @registry.gauge(
:fluentd_output_status_buffer_queue_length,
'Current buffer queue length.'),
buffer_total_queued_size: @registry.gauge(
:fluentd_output_status_buffer_total_bytes,
'Current total size of queued buffers.'),

# Output metrics
retry_counts: @registry.gauge(
:fluentd_output_status_retry_count,
'Current retry counts.'),
Expand Down Expand Up @@ -118,8 +133,17 @@ def update_monitor_info
}

monitor_info = {
'buffer_queue_length' => @metrics[:buffer_queue_length],
# buffer metrics
'buffer_total_queued_size' => @metrics[:buffer_total_queued_size],
'buffer_stage_length' => @metrics[:buffer_stage_length],
'buffer_stage_byte_size' => @metrics[:buffer_stage_byte_size],
'buffer_queue_length' => @metrics[:buffer_queue_length],
'buffer_queue_byte_size' => @metrics[:buffer_queue_byte_size],
'buffer_available_buffer_space_ratios' => @metrics[:buffer_available_buffer_space_ratios],
'buffer_newest_timekey' => @metrics[:buffer_newest_timekey],
'buffer_oldest_timekey' => @metrics[:buffer_oldest_timekey],

# output metrics
'retry_count' => @metrics[:retry_counts],
}
instance_vars_info = {
Expand All @@ -141,12 +165,6 @@ def update_monitor_info
end
end

timekeys = info["buffer_timekeys"]
if timekeys && !timekeys.empty?
@metrics[:buffer_newest_timekey].set(label, timekeys.max)
@metrics[:buffer_oldest_timekey].set(label, timekeys.min)
end

if info['instance_variables']
instance_vars_info.each do |name, metric|
if info['instance_variables'][name]
Expand Down

0 comments on commit b68abb5

Please sign in to comment.