Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mimir: Add attributed prefix to labels of cost attribution metrics #10509

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
### Grafana Mimir

* [FEATURE] Ingester/Distributor: Add support for exporting cost attribution metrics (`cortex_ingester_attributed_active_series`, `cortex_distributor_received_attributed_samples_total`, and `cortex_discarded_attributed_samples_total`) with labels specified by customers to a custom Prometheus registry. This feature enables more flexible billing data tracking. #10269
* [CHANGE] Ingester/Distributor: Prefix cost attribution metric labels with `attributed_` to prevent collisions with system cluster and namespace labels. #10509
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed because this new feature isn't included in any release yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed here 97a5d77

* [CHANGE] Querier: pass context to queryable `IsApplicable` hook. #10451
* [CHANGE] Distributor: OTLP and push handler replace all non-UTF8 characters with the unicode replacement character `\uFFFD` in error messages before propagating them. #10236
* [CHANGE] Querier: pass query matchers to queryable `IsApplicable` hook. #10256
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/config-descriptor.json
Original file line number Diff line number Diff line change
Expand Up @@ -4414,7 +4414,7 @@
"kind": "field",
"name": "cost_attribution_labels",
"required": false,
"desc": "Defines labels for cost attribution. Applies to metrics like cortex_distributor_received_attributed_samples_total. To disable, set to an empty string. For example, 'team,service' produces metrics such as cortex_distributor_received_attributed_samples_total{team='frontend', service='api'}.",
"desc": "Defines labels for cost attribution. Applies to metrics like cortex_distributor_received_attributed_samples_total. To disable, set to an empty string. For example, 'team,service' produces metrics such as cortex_distributor_received_attributed_samples_total{attributed_team='frontend', attributed_service='api'}.",
"fieldValue": null,
"fieldDefaultValue": "",
"fieldFlag": "validation.cost-attribution-labels",
Expand Down
2 changes: 1 addition & 1 deletion cmd/mimir/help-all.txt.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -3332,7 +3332,7 @@ Usage of ./cmd/mimir/mimir:
-validation.cost-attribution-cooldown duration
[experimental] Defines how long cost attribution stays in overflow before attempting a reset, with received/discarded samples extending the cooldown if overflow persists, while active series reset and restart tracking after the cooldown.
-validation.cost-attribution-labels comma-separated-list-of-strings
[experimental] Defines labels for cost attribution. Applies to metrics like cortex_distributor_received_attributed_samples_total. To disable, set to an empty string. For example, 'team,service' produces metrics such as cortex_distributor_received_attributed_samples_total{team='frontend', service='api'}.
[experimental] Defines labels for cost attribution. Applies to metrics like cortex_distributor_received_attributed_samples_total. To disable, set to an empty string. For example, 'team,service' produces metrics such as cortex_distributor_received_attributed_samples_total{attributed_team='frontend', attributed_service='api'}.
-validation.create-grace-period duration
Controls how far into the future incoming samples and exemplars are accepted compared to the wall clock. Any sample or exemplar will be rejected if its timestamp is greater than '(now + creation_grace_period)'. This configuration is enforced in the distributor and ingester. (default 10m)
-validation.enforce-metadata-metric-name
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3623,8 +3623,8 @@ The `limits` block configures default and per-tenant limits imposed by component
# (experimental) Defines labels for cost attribution. Applies to metrics like
# cortex_distributor_received_attributed_samples_total. To disable, set to an
# empty string. For example, 'team,service' produces metrics such as
# cortex_distributor_received_attributed_samples_total{team='frontend',
# service='api'}.
# cortex_distributor_received_attributed_samples_total{attributed_team='frontend',
# attributed_service='api'}.
# CLI flag: -validation.cost-attribution-labels
[cost_attribution_labels: <string> | default = ""]

Expand Down
15 changes: 11 additions & 4 deletions pkg/costattribution/active_tracker.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ type ActiveSeriesTracker struct {
overflowCounter atomic.Int64
}

func addLabelsPrefix(labels []string) []string {
out := make([]string, 0, len(labels))
for _, l := range labels {
out = append(out, strings.Join([]string{usagePrefix, l}, "_"))
}
return out
}

func newActiveSeriesTracker(userID string, trackedLabels []string, limit int, cooldownDuration time.Duration, logger log.Logger) *ActiveSeriesTracker {
// Create a map for overflow labels to export when overflow happens
overflowLabels := make([]string, len(trackedLabels)+2)
Expand All @@ -54,11 +62,10 @@ func newActiveSeriesTracker(userID string, trackedLabels []string, limit int, co
cooldownDuration: cooldownDuration,
}

variableLabels := slices.Clone(trackedLabels)
variableLabels = append(variableLabels, tenantLabel, "reason")

labelsWithPrefix := addLabelsPrefix(trackedLabels)
labelsWithPrefix = append(labelsWithPrefix, tenantLabel)
ast.activeSeriesPerUserAttribution = prometheus.NewDesc("cortex_ingester_attributed_active_series",
"The total number of active series per user and attribution.", variableLabels[:len(variableLabels)-1],
"The total number of active series per user and attribution.", labelsWithPrefix,
prometheus.Labels{trackerLabel: defaultTrackerName})

return ast
Expand Down
2 changes: 1 addition & 1 deletion pkg/costattribution/active_tracker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ func TestActiveTracker_Concurrency(t *testing.T) {
expectedMetrics := `
# HELP cortex_ingester_attributed_active_series The total number of active series per user and attribution.
# TYPE cortex_ingester_attributed_active_series gauge
cortex_ingester_attributed_active_series{team="__overflow__",tenant="user1",tracker="cost-attribution"} 100
cortex_ingester_attributed_active_series{attributed_team="__overflow__",tenant="user1",tracker="cost-attribution"} 100
`
assert.NoError(t, testutil.GatherAndCompare(m.reg, strings.NewReader(expectedMetrics), "cortex_ingester_attributed_active_series"))

Expand Down
1 change: 1 addition & 0 deletions pkg/costattribution/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ const (
defaultTrackerName = "cost-attribution"
missingValue = "__missing__"
overflowValue = "__overflow__"
usagePrefix = "attributed"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but I would expect the prefix to be attributed_.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in 97a5d77

)

type Manager struct {
Expand Down
24 changes: 12 additions & 12 deletions pkg/costattribution/manager_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,14 @@ func TestManager_CreateDeleteTracker(t *testing.T) {
expectedMetrics := `
# HELP cortex_discarded_attributed_samples_total The total number of samples that were discarded per attribution.
# TYPE cortex_discarded_attributed_samples_total counter
cortex_discarded_attributed_samples_total{reason="invalid-metrics-name",team="bar",tenant="user1",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{reason="invalid-metrics-name",team="foo",tenant="user1",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_team="bar",reason="invalid-metrics-name",tenant="user1",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_team="foo",reason="invalid-metrics-name",tenant="user1",tracker="cost-attribution"} 1
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{department="foo",service="dodo",tenant="user3",tracker="cost-attribution"} 1
cortex_distributor_received_attributed_samples_total{attributed_department="foo",attributed_service="dodo",tenant="user3",tracker="cost-attribution"} 1
# HELP cortex_ingester_attributed_active_series The total number of active series per user and attribution.
# TYPE cortex_ingester_attributed_active_series gauge
cortex_ingester_attributed_active_series{team="bar",tenant="user1",tracker="cost-attribution"} 1
cortex_ingester_attributed_active_series{attributed_team="bar",tenant="user1",tracker="cost-attribution"} 1
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_discarded_attributed_samples_total", "cortex_distributor_received_attributed_samples_total", "cortex_ingester_attributed_active_series"))
})
Expand All @@ -78,10 +78,10 @@ func TestManager_CreateDeleteTracker(t *testing.T) {
expectedMetrics := `
# HELP cortex_discarded_attributed_samples_total The total number of samples that were discarded per attribution.
# TYPE cortex_discarded_attributed_samples_total counter
cortex_discarded_attributed_samples_total{reason="invalid-metrics-name",team="foo",tenant="user1",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_team="foo",reason="invalid-metrics-name",tenant="user1",tracker="cost-attribution"} 1
# HELP cortex_ingester_attributed_active_series The total number of active series per user and attribution.
# TYPE cortex_ingester_attributed_active_series gauge
cortex_ingester_attributed_active_series{team="bar",tenant="user1",tracker="cost-attribution"} 1
cortex_ingester_attributed_active_series{attributed_team="bar",tenant="user1",tracker="cost-attribution"} 1
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_discarded_attributed_samples_total", "cortex_ingester_attributed_active_series"))
})
Expand All @@ -96,7 +96,7 @@ func TestManager_CreateDeleteTracker(t *testing.T) {
expectedMetrics := `
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{department="foo",service="dodo",tenant="user3",tracker="cost-attribution"} 1
cortex_distributor_received_attributed_samples_total{attributed_department="foo",attributed_service="dodo",tenant="user3",tracker="cost-attribution"} 1
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_discarded_attributed_samples_total", "cortex_distributor_received_attributed_samples_total", "cortex_ingester_attributed_active_series"))
})
Expand All @@ -114,7 +114,7 @@ func TestManager_CreateDeleteTracker(t *testing.T) {
expectedMetrics := `
# HELP cortex_discarded_attributed_samples_total The total number of samples that were discarded per attribution.
# TYPE cortex_discarded_attributed_samples_total counter
cortex_discarded_attributed_samples_total{feature="__missing__",reason="invalid-metrics-name",team="foo",tenant="user3",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_feature="__missing__",attributed_team="foo",reason="invalid-metrics-name",tenant="user3",tracker="cost-attribution"} 1
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_discarded_attributed_samples_total"))
})
Expand All @@ -126,7 +126,7 @@ func TestManager_CreateDeleteTracker(t *testing.T) {
expectedMetrics := `
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{feature="__overflow__",team="__overflow__",tenant="user3",tracker="cost-attribution"} 2
cortex_distributor_received_attributed_samples_total{attributed_feature="__overflow__",attributed_team="__overflow__",tenant="user3",tracker="cost-attribution"} 2
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_distributor_received_attributed_samples_total"))
})
Expand All @@ -146,8 +146,8 @@ func TestManager_PurgeInactiveAttributionsUntil(t *testing.T) {
expectedMetrics := `
# HELP cortex_discarded_attributed_samples_total The total number of samples that were discarded per attribution.
# TYPE cortex_discarded_attributed_samples_total counter
cortex_discarded_attributed_samples_total{reason="invalid-metrics-name",team="foo",tenant="user1",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{department="foo",reason="out-of-window",service="bar",tenant="user3",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_team="foo",reason="invalid-metrics-name",tenant="user1",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_department="foo",attributed_service="bar",reason="out-of-window",tenant="user3",tracker="cost-attribution"} 1
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_discarded_attributed_samples_total"))
})
Expand All @@ -165,7 +165,7 @@ func TestManager_PurgeInactiveAttributionsUntil(t *testing.T) {
expectedMetrics := `
# HELP cortex_discarded_attributed_samples_total The total number of samples that were discarded per attribution.
# TYPE cortex_discarded_attributed_samples_total counter
cortex_discarded_attributed_samples_total{department="foo",reason="out-of-window",service="bar",tenant="user3",tracker="cost-attribution"} 1
cortex_discarded_attributed_samples_total{attributed_department="foo",attributed_service="bar",reason="out-of-window",tenant="user3",tracker="cost-attribution"} 1
`
assert.NoError(t, testutil.GatherAndCompare(manager.reg, strings.NewReader(expectedMetrics), "cortex_discarded_attributed_samples_total"))
})
Expand Down
9 changes: 5 additions & 4 deletions pkg/costattribution/sample_tracker.go
Original file line number Diff line number Diff line change
Expand Up @@ -66,16 +66,17 @@ func newSampleTracker(userID string, trackedLabels []string, limit int, cooldown
overflowCounter: observation{},
}

variableLabels := slices.Clone(trackedLabels)
variableLabels = append(variableLabels, tenantLabel, "reason")
labelsWithPrefix := addLabelsPrefix(trackedLabels)
labelsWithPrefix = append(labelsWithPrefix, tenantLabel, "reason")

tracker.discardedSampleAttribution = prometheus.NewDesc("cortex_discarded_attributed_samples_total",
"The total number of samples that were discarded per attribution.",
variableLabels,
labelsWithPrefix,
prometheus.Labels{trackerLabel: defaultTrackerName})

tracker.receivedSamplesAttribution = prometheus.NewDesc("cortex_distributor_received_attributed_samples_total",
"The total number of samples that were received per attribution.",
variableLabels[:len(variableLabels)-1],
labelsWithPrefix[:len(labelsWithPrefix)-1],
prometheus.Labels{trackerLabel: defaultTrackerName})
return tracker
}
Expand Down
15 changes: 7 additions & 8 deletions pkg/costattribution/sample_tracker_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ func TestSampleTracker_IncrementReceviedSamples(t *testing.T) {
expectedMetrics := `
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{platform="foo",tenant="user4",tracker="cost-attribution"} 3
cortex_distributor_received_attributed_samples_total{attributed_platform="foo",tenant="user4",tracker="cost-attribution"} 3
`
assert.NoError(t, testutil.GatherAndCompare(tManager.reg, strings.NewReader(expectedMetrics), "cortex_distributor_received_attributed_samples_total"))
})
Expand All @@ -43,8 +43,8 @@ func TestSampleTracker_IncrementReceviedSamples(t *testing.T) {
expectedMetrics := `
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{platform="foo",tenant="user4",tracker="cost-attribution"} 6
cortex_distributor_received_attributed_samples_total{platform="bar",tenant="user4",tracker="cost-attribution"} 5
cortex_distributor_received_attributed_samples_total{attributed_platform="foo",tenant="user4",tracker="cost-attribution"} 6
cortex_distributor_received_attributed_samples_total{attributed_platform="bar",tenant="user4",tracker="cost-attribution"} 5
`
assert.NoError(t, testutil.GatherAndCompare(tManager.reg, strings.NewReader(expectedMetrics), "cortex_distributor_received_attributed_samples_total"))
})
Expand All @@ -58,8 +58,8 @@ func TestSampleTracker_IncrementReceviedSamples(t *testing.T) {
expectedMetrics := `
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{platform="foo",tenant="user4",tracker="cost-attribution"} 14
cortex_distributor_received_attributed_samples_total{platform="bar",tenant="user4",tracker="cost-attribution"} 5
cortex_distributor_received_attributed_samples_total{attributed_platform="foo",tenant="user4",tracker="cost-attribution"} 14
cortex_distributor_received_attributed_samples_total{attributed_platform="bar",tenant="user4",tracker="cost-attribution"} 5
`
assert.NoError(t, testutil.GatherAndCompare(tManager.reg, strings.NewReader(expectedMetrics), "cortex_distributor_received_attributed_samples_total"))
})
Expand Down Expand Up @@ -148,11 +148,10 @@ func TestSampleTracker_Concurrency(t *testing.T) {
expectedMetrics := `
# HELP cortex_discarded_attributed_samples_total The total number of samples that were discarded per attribution.
# TYPE cortex_discarded_attributed_samples_total counter
cortex_discarded_attributed_samples_total{reason="__overflow__",team="__overflow__",tenant="user1",tracker="cost-attribution"} 95
cortex_discarded_attributed_samples_total{attributed_team="__overflow__",reason="__overflow__",tenant="user1",tracker="cost-attribution"} 95
# HELP cortex_distributor_received_attributed_samples_total The total number of samples that were received per attribution.
# TYPE cortex_distributor_received_attributed_samples_total counter
cortex_distributor_received_attributed_samples_total{team="__overflow__",tenant="user1",tracker="cost-attribution"} 95

cortex_distributor_received_attributed_samples_total{attributed_team="__overflow__",tenant="user1",tracker="cost-attribution"} 95
`
assert.NoError(t, testutil.GatherAndCompare(m.reg, strings.NewReader(expectedMetrics), "cortex_distributor_received_attributed_samples_total", "cortex_discarded_attributed_samples_total"))
}
Loading
Loading