monitoring: Configure KSM & cluster dashboard #4116
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
NOTE: These changes are added in the new flux2-monitoring-examples repository - fluxcd/flux2-monitoring-example#1 .
The motivation behind this change is to move the responsibility of custom resource metrics to kube-state-metrics (KSM) instead of the individual controllers. The controllers will continue to export metrics about reconciliation and other controller specific metrics. All the metrics about resources that are available on the CRD are exported using KSM. This will allow users to configure custom metrics as per their needs without any changes in the controllers. This will also allow us to have static resources that don't have a reconciler for reporting resource readiness metrics but continue to have the same monitoring capabilities, for example the HelmRepository source resource in OCI mode, Alert and Provider notification resources, and other upcoming API resources that may not be backed by reconcilers.
Update kube-prometheus-stack helm release values to configure kube-state-metrics and use kube-state-metrics to collect gotk resource state metrics.
gotk_resource_info
. KSM issues a warning if an Info type object doesn't have_info
suffix. These metrics have the value 1 always. This works well for the CRD state metrics as a zero value would mean that the resource doesn't exist, in which case, the resource is deleted.gotk_resource_info
in the queries.$namespace
variable has been updated to refer toexported_namespace
fromgotk_resource_info
.Sample resource metrics from KSM:
The dashboard is identical to the existing dashboard with slight differences:

Kube-state-metrics custom-resource state metrics docs: https://github.com/kubernetes/kube-state-metrics/blob/main/docs/customresourcestate-metrics.md