-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Add text boxes and descriptions to reads and writes dashboards #324
Changes from 36 commits
4642b5c
77f8609
c4db3e1
9e6c2f4
7a7b13c
acc320a
8368248
6ad57cd
c33303a
cb7054c
357db43
19cb601
4735870
fa48a91
dafb212
c7b7871
6c0066c
773926a
a12d815
b335df9
73e65cf
13f0fa3
6c0ebb8
ea7d87d
4aed696
c411115
0c17f02
b8ccacc
d5b14c1
b22d22e
dffe62a
2aae011
eafdbfc
dddd6e7
fcc4896
513b096
5794607
4fb7275
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -6,12 +6,6 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |||||
.addClusterSelectorTemplates() | ||||||
.addRow( | ||||||
$.row('Summary') | ||||||
.addPanel( | ||||||
$.textPanel('', ||| | ||||||
- **Per-instance runs**: number of times a compactor instance triggers a compaction across all tenants its shard manage. | ||||||
- **Tenants compaction progress**: in a multi-tenant cluster it shows the progress of tenants compacted while compaction is running. Reset to 0 once the compaction run is completed for all tenants in the shard. | ||||||
|||), | ||||||
) | ||||||
.addPanel( | ||||||
$.startedCompletedFailedPanel( | ||||||
'Per-instance runs / sec', | ||||||
|
@@ -20,7 +14,13 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |||||
'sum(rate(cortex_compactor_runs_failed_total{%s}[$__rate_interval]))' % $.jobMatcher($._config.job_names.compactor) | ||||||
) + | ||||||
$.bars + | ||||||
{ yaxes: $.yaxes('ops') }, | ||||||
{ yaxes: $.yaxes('ops') } + | ||||||
$.panelDescription( | ||||||
'Per-instance runs', | ||||||
||| | ||||||
Number of times a compactor instance triggers a compaction across all tenants that it manages. | ||||||
||| | ||||||
), | ||||||
) | ||||||
.addPanel( | ||||||
$.panel('Tenants compaction progress') + | ||||||
|
@@ -31,42 +31,55 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |||||
cortex_compactor_tenants_skipped{%s} | ||||||
) / cortex_compactor_tenants_discovered{%s} | ||||||
||| % [$.jobMatcher($._config.job_names.compactor), $.jobMatcher($._config.job_names.compactor), $.jobMatcher($._config.job_names.compactor), $.jobMatcher($._config.job_names.compactor)], '{{%s}}' % $._config.per_instance_label) + | ||||||
{ yaxes: $.yaxes({ format: 'percentunit', max: 1 }) }, | ||||||
{ yaxes: $.yaxes({ format: 'percentunit', max: 1 }) } + | ||||||
$.panelDescription( | ||||||
'Tenants compaction progress', | ||||||
||| | ||||||
In a multi-tenant cluster, display the progress of tenants that are compacted while compaction is running. | ||||||
Reset to `0` after the compaction run is completed for all tenants in the shard. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can fix the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, thanks! Keep in mind I've commented here, but this may apply to other placesu/descriptions too (eg. the heading color applies to any description). |
||||||
||| | ||||||
), | ||||||
) | ||||||
) | ||||||
.addRow( | ||||||
$.row('') | ||||||
.addPanel( | ||||||
$.textPanel('', ||| | ||||||
- **Compacted blocks**: number of blocks generated as a result of a compaction operation. | ||||||
- **Per-block compaction duration**: time taken to generate a single compacted block. | ||||||
|||), | ||||||
) | ||||||
.addPanel( | ||||||
$.panel('Compacted blocks / sec') + | ||||||
$.queryPanel('sum(rate(prometheus_tsdb_compactions_total{%s}[$__rate_interval]))' % $.jobMatcher($._config.job_names.compactor), 'blocks') + | ||||||
{ yaxes: $.yaxes('ops') }, | ||||||
{ yaxes: $.yaxes('ops') } + | ||||||
$.panelDescription( | ||||||
'Compacted blocks / sec', | ||||||
||| | ||||||
Display the amount of time that it’s taken to generate a single compacted block. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description looks wrong here.
Suggested change
|
||||||
||| | ||||||
), | ||||||
) | ||||||
.addPanel( | ||||||
$.panel('Per-block compaction duration') + | ||||||
$.latencyPanel('prometheus_tsdb_compaction_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor)) | ||||||
$.latencyPanel('prometheus_tsdb_compaction_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor)) + | ||||||
$.panelDescription( | ||||||
'Per-block compaction duration', | ||||||
||| | ||||||
Rate of blocks that are generated as a result of a compaction operation. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The description looks wrong here.
Suggested change
|
||||||
||| | ||||||
), | ||||||
) | ||||||
) | ||||||
.addRow( | ||||||
$.row('') | ||||||
.addPanel( | ||||||
$.textPanel('', ||| | ||||||
- **Average blocks / tenant**: the average number of blocks per tenant. | ||||||
- **Tenants with largest number of blocks**: the 10 tenants with the largest number of blocks. | ||||||
|||), | ||||||
) | ||||||
.addPanel( | ||||||
$.panel('Average blocks / tenant') + | ||||||
$.queryPanel('avg(max by(user) (cortex_bucket_blocks_count{%s}))' % $.jobMatcher($._config.job_names.compactor), 'avg'), | ||||||
) | ||||||
.addPanel( | ||||||
$.panel('Tenants with largest number of blocks') + | ||||||
$.queryPanel('topk(10, max by(user) (cortex_bucket_blocks_count{%s}))' % $.jobMatcher($._config.job_names.compactor), '{{user}}'), | ||||||
$.queryPanel('topk(10, max by(user) (cortex_bucket_blocks_count{%s}))' % $.jobMatcher($._config.job_names.compactor), '{{user}}') + | ||||||
$.panelDescription( | ||||||
'Tenants with largest number of blocks', | ||||||
||| | ||||||
The 10 tenants with the largest number of blocks. | ||||||
||| | ||||||
), | ||||||
) | ||||||
) | ||||||
.addRow( | ||||||
|
@@ -103,6 +116,5 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |||||
$.latencyPanel('cortex_compactor_meta_sync_duration_seconds', '{%s}' % $.jobMatcher($._config.job_names.compactor)), | ||||||
) | ||||||
) | ||||||
.addRow($.objectStorePanels1('Object Store', 'compactor')) | ||||||
.addRow($.objectStorePanels2('', 'compactor')), | ||||||
.addRows($.getObjectStoreRows('Object Store', 'compactor')), | ||||||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,24 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |
then self.addRow(row) | ||
else self, | ||
|
||
addRowsIf(condition, rows):: | ||
if condition | ||
then | ||
local reduceRows(dashboard, remainingRows) = | ||
if (std.length(remainingRows) == 0) | ||
then dashboard | ||
else | ||
reduceRows( | ||
dashboard.addRow(remainingRows[0]), | ||
std.slice(remainingRows, 1, std.length(remainingRows), 1) | ||
) | ||
; | ||
reduceRows(self, rows) | ||
else self, | ||
|
||
addRows(rows):: | ||
self.addRowsIf(true, rows), | ||
|
||
addClusterSelectorTemplates(multi=true):: | ||
local d = self { | ||
tags: $._config.tags, | ||
|
@@ -43,7 +61,6 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |
else d | ||
.addTemplate('cluster', 'cortex_build_info', 'cluster') | ||
.addTemplate('namespace', 'cortex_build_info{cluster=~"$cluster"}', 'namespace'), | ||
|
||
}, | ||
|
||
// The mixin allow specialism of the job selector depending on if its a single binary | ||
|
@@ -274,7 +291,7 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |
type: 'text', | ||
} + options, | ||
|
||
objectStorePanels1(title, component):: | ||
getObjectStoreRows(title, component):: [ | ||
super.row(title) | ||
.addPanel( | ||
$.panel('Operations / sec') + | ||
|
@@ -288,62 +305,136 @@ local utils = import 'mixin-utils/utils.libsonnet'; | |
{ yaxes: $.yaxes('percentunit') }, | ||
) | ||
.addPanel( | ||
$.panel('Op: Attributes') + | ||
$.panel('Latency of Op: Attributes') + | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Where does the user see this information? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Attached screenshot, @osg-grafana |
||
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="attributes"}' % [$.namespaceMatcher(), component]), | ||
) | ||
.addPanel( | ||
$.panel('Op: Exists') + | ||
$.panel('Latency of Op: Exists') + | ||
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="exists"}' % [$.namespaceMatcher(), component]), | ||
), | ||
|
||
// Second row of Object Store stats | ||
objectStorePanels2(title, component):: | ||
super.row(title) | ||
$.row('') | ||
.addPanel( | ||
$.panel('Op: Get') + | ||
$.panel('Latency of Op: Get') + | ||
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="get"}' % [$.namespaceMatcher(), component]), | ||
) | ||
.addPanel( | ||
$.panel('Op: GetRange') + | ||
$.panel('Latency of Op: GetRange') + | ||
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="get_range"}' % [$.namespaceMatcher(), component]), | ||
) | ||
.addPanel( | ||
$.panel('Op: Upload') + | ||
$.panel('Latency of Op: Upload') + | ||
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="upload"}' % [$.namespaceMatcher(), component]), | ||
) | ||
.addPanel( | ||
$.panel('Op: Delete') + | ||
$.panel('Latency of Op: Delete') + | ||
$.latencyPanel('thanos_objstore_bucket_operation_duration_seconds', '{%s,component="%s",operation="delete"}' % [$.namespaceMatcher(), component]), | ||
), | ||
], | ||
|
||
thanosMemcachedCache(title, jobName, component, cacheName):: | ||
local config = { | ||
jobMatcher: $.jobMatcher(jobName), | ||
component: component, | ||
cacheName: cacheName, | ||
cacheNameReadable: std.strReplace(cacheName, '-', ' '), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unused. I think can be removed. |
||
}; | ||
super.row(title) | ||
.addPanel( | ||
$.panel('QPS') + | ||
$.queryPanel('sum by(operation) (rate(thanos_memcached_operations_total{%s,component="%s",name="%s"}[$__rate_interval]))' % [$.jobMatcher(jobName), component, cacheName], '{{operation}}') + | ||
$.panel('Requests per second') + | ||
$.queryPanel( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to reviewers: just reformatted. |
||
||| | ||
sum by(operation) ( | ||
rate( | ||
thanos_memcached_operations_total{ | ||
%(jobMatcher)s, | ||
component="%(component)s", | ||
name="%(cacheName)s" | ||
}[$__rate_interval] | ||
) | ||
) | ||
||| % config, | ||
'{{operation}}' | ||
) + | ||
$.stack + | ||
{ yaxes: $.yaxes('ops') }, | ||
{ yaxes: $.yaxes('ops') } | ||
) | ||
.addPanel( | ||
$.panel('Latency (getmulti)') + | ||
$.latencyPanel('thanos_memcached_operation_duration_seconds', '{%s,operation="getmulti",component="%s",name="%s"}' % [$.jobMatcher(jobName), component, cacheName]) | ||
$.latencyPanel( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to reviewers: just reformatted. |
||
'thanos_memcached_operation_duration_seconds', | ||
||| | ||
{ | ||
%(jobMatcher)s, | ||
operation="getmulti", | ||
component="%(component)s", | ||
name="%(cacheName)s" | ||
darrenjaneczek marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
||| % config | ||
) | ||
) | ||
.addPanel( | ||
$.panel('Hit ratio') + | ||
$.queryPanel('sum(rate(thanos_cache_memcached_hits_total{%s,component="%s",name="%s"}[$__rate_interval])) / sum(rate(thanos_cache_memcached_requests_total{%s,component="%s",name="%s"}[$__rate_interval]))' % | ||
[ | ||
$.jobMatcher(jobName), | ||
component, | ||
cacheName, | ||
$.jobMatcher(jobName), | ||
component, | ||
cacheName, | ||
], 'items') + | ||
{ yaxes: $.yaxes('percentunit') }, | ||
$.queryPanel( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to reviewers: just reformatted. |
||
||| | ||
sum( | ||
rate( | ||
thanos_cache_memcached_hits_total{ | ||
%(jobMatcher)s, | ||
component="%(component)s", | ||
name="%(cacheName)s" | ||
}[$__rate_interval] | ||
) | ||
) | ||
/ | ||
sum( | ||
rate( | ||
thanos_cache_memcached_requests_total{ | ||
%(jobMatcher)s, | ||
component="%(component)s", | ||
name="%(cacheName)s" | ||
}[$__rate_interval] | ||
) | ||
) | ||
||| % config, | ||
'items' | ||
) + | ||
{ yaxes: $.yaxes('percentunit') } | ||
), | ||
|
||
filterNodeDiskContainer(containerName):: | ||
||| | ||
ignoring(%s) group_right() (label_replace(count by(%s, %s, device) (container_fs_writes_bytes_total{%s,container="%s",device!~".*sda.*"}), "device", "$1", "device", "/dev/(.*)") * 0) | ||
||| % [$._config.per_instance_label, $._config.per_node_label, $._config.per_instance_label, $.namespaceMatcher(), containerName], | ||
ignoring(%s) group_right() ( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note to reviewers: just reformatted. |
||
label_replace( | ||
count by( | ||
%s, | ||
%s, | ||
device | ||
) | ||
( | ||
container_fs_writes_bytes_total{ | ||
%s, | ||
container="%s", | ||
device!~".*sda.*" | ||
} | ||
), | ||
"device", | ||
"$1", | ||
"device", | ||
"/dev/(.*)" | ||
) * 0 | ||
) | ||
||| % [ | ||
$._config.per_instance_label, | ||
$._config.per_node_label, | ||
$._config.per_instance_label, | ||
$.namespaceMatcher(), | ||
containerName, | ||
], | ||
|
||
panelDescription(title, description):: { | ||
description: ||| | ||
### %s | ||
%s | ||
||| % [title, description], | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] We're used to link the related PR.