-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add add Automated Dashboard for Kubernetes Node metrics #64
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Kubernetes automated dashboard | ||
|
||
There is currently one automated dashboard for AppSignal for Kubernetes: | ||
|
||
- [Nodes](#nodes) | ||
|
||
## Nodes | ||
|
||
The Nodes dashboard uses Kubernetes node metrics extracted from the `/api/v1/nodes/<NODE>/proxy/stats/summary` API endpoint via [AppSignal for Kubernetes](https://github.com/appsignal/appsignal-kubernetes). | ||
The following metrics are reported through this automated dashboard: | ||
|
||
- node_cpu_usage_nano_cores | ||
- node_memory_usage_bytes | ||
- node_memory_available_bytes | ||
- node_swap_available_bytes | ||
- node_swap_usage_bytes | ||
- node_fs_available_bytes | ||
- node_fs_used_bytes | ||
- node_network_rx_bytes | ||
- node_network_tx_bytes | ||
- node_fs_inodes | ||
- node_fs_inodes_free | ||
- node_fs_inodes_used | ||
- node_rlimit_maxpid | ||
- node_rlimit_curproc |
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,300 @@ | ||||||
{ | ||||||
"metric_keys": [ | ||||||
"node_cpu_usage_nano_cores" | ||||||
], | ||||||
"dashboard": { | ||||||
"title": "Kubernetes Nodes", | ||||||
"description": "", | ||||||
"visuals": [ | ||||||
{ | ||||||
"title": "Node CPU Usage", | ||||||
"description": "node_cpu_usage_nano_cores", | ||||||
"line_label": "%name% %node%", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We shouldn't include the metric name unless there are multiple metrics in a graph, and for those graphs we're better off using tags on one metric. See my other comment.
Suggested change
|
||||||
"display": "LINE", | ||||||
"format": "number", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_cpu_usage_nano_cores", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node Memory Usage", | ||||||
"description": "node_memory_usage_bytes vs node_memory_available_bytes", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "size", | ||||||
"format_input": "byte", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_memory_usage_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"name": "node_memory_available_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node Swap", | ||||||
"description": "node_swap_usage_bytes vs. node_swap_available_bytes", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "size", | ||||||
"format_input": "byte", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_swap_available_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"name": "node_swap_usage_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node File System Usage", | ||||||
"description": "node_fs_used_bytes vs node_fs_available_bytes", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "size", | ||||||
"format_input": "byte", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_fs_available_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"name": "node_fs_used_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node Network Traffic Received", | ||||||
"description": "node_network_rx_bytes", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "size", | ||||||
"format_input": "byte", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_network_rx_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node Network Traffic Transmitted", | ||||||
"description": "node_network_tx_bytes", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "size", | ||||||
"format_input": "byte", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_network_tx_bytes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node Inodes Usage", | ||||||
"description": "node_fs_inodes_free & node_fs_inodes_used vs node_fs_inodes", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "number", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_fs_inodes", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"name": "node_fs_inodes_free", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"name": "node_fs_inodes_used", | ||||||
Comment on lines
+215
to
+243
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's use tags for different states like free and used instead of reporting them as different metrics. We do this for other (host) metrics too. It would help that we don't have to show the full metric name then for every line in the graph, freeing up valuable space in the hover box. For example:
I also see this in some other graphs in this dashboard. We should update those as well. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm trying to keep this dashboard as close to what's reported from Kubernetes as I can. I think this is a good idea for the future, when we know what we'd like to report exactly, but let's get this out of the door and get users to try it first. |
||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
}, | ||||||
{ | ||||||
"title": "Node Resource Limits", | ||||||
"description": "node_rlimit_curproc vs node_rlimit_maxpid", | ||||||
"line_label": "%name% %node%", | ||||||
"display": "LINE", | ||||||
"format": "number", | ||||||
"draw_null_as_zero": true, | ||||||
"metrics": [ | ||||||
{ | ||||||
"name": "node_rlimit_maxpid", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
}, | ||||||
{ | ||||||
"name": "node_rlimit_curproc", | ||||||
"fields": [ | ||||||
{ | ||||||
"field": "GAUGE" | ||||||
} | ||||||
], | ||||||
"tags": [ | ||||||
{ | ||||||
"key": "node", | ||||||
"value": "*" | ||||||
} | ||||||
] | ||||||
} | ||||||
], | ||||||
"type": "timeseries" | ||||||
} | ||||||
] | ||||||
} | ||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These descriptions don't explain anything to me. Let's remove them if it's just for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or if we can do human-readable descriptions, let's do those instead!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the internal metric names, so they'll probably make sense to Kubernetes users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't the names of these metrics anywhere, so I wouldn't be so sure that it's clear. The same way I can't find that you're using the kubernetes metric names, like you say here. In the library I see it being mapped from a JSON struct that doesn't use the same naming: https://github.com/appsignal/appsignal-kubernetes/blob/0b3f39d65ba99622ab3e647e8e4c012ee944baca/src/main.rs#L97-L167
Do you have any links to docs or source code that mentions these metric names?