Poc cost attribution #9392

ying-jeanne · 2024-09-24T14:51:06Z

What this PR does

The prototype to support adding custom label to track active series/ samples received / discarded samples related to this ticket #5698.

The cost attribution related metrics would be exported through a different endpoint.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

Tests updated.
Documentation added.
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
about-versioning.md updated with experimental features.

pkg/distributor/distributor.go

pkg/ingester/activeseries/active_series.go

pkg/ingester/ingester.go

pkg/util/cost_attribution.go

pkg/util/validation/limits.go

pkg/util/validation/separate_metrics.go

pkg/ingester/activeseries/active_series.go

bboreham · 2024-09-25T09:46:48Z

Glad to see this make some progress. Ideally I would like to see this kind of implementation take over from usage groups, in simple cases. So it would be good to have the label names, etc., line up.

pkg/mimir/mimir.go

Logiraptor · 2024-09-25T21:08:32Z

pkg/ingester/metrics.go

@@ -303,7 +303,7 @@ func newIngesterMetrics(
 		activeSeriesPerUser: promauto.With(activeSeriesReg).NewGaugeVec(prometheus.GaugeOpts{
 			Name: "cortex_ingester_active_series",
 			Help: "Number of currently active series per user.",
-		}, []string{"user"}),
+		}, []string{"user", "attrib"}),


To follow the pattern set by GEL, we would create a new metric instead of adding to the existing one.

This new metric would be exposed on a separate endpoint apart from /metrics called /usage_metrics

The label name should be dynamic and match the configured name, instead of using a constant attrib name

Thanks, this is fixed in current PR.

pkg/util/cost_attribution.go

colega

Made a quick pass, left some comments, didn't finish reviewing everything because I have two big concerns right now:

The current implementation will use quite a lot of memory resources in active series tracker, I left appropriate comments on that.
I don't think that the current implementation will handle correctly a configuration change on the cost attribution label and back, i.e.:
- Current label is a, series is created.
- Label is changed to b, counters in the series stripes are removed
- Label is changed back to a
- Active series are purged, since label is the same as when it was created, the counters which are already zero are decremented and are going negative.

colega · 2024-10-09T11:18:55Z

pkg/costattribution/caimpl/managerImpl.go

@@ -0,0 +1,158 @@
+package caimpl


Why do we need this subpackage?

to isolate the interface and implementations, all function needs to be called out side of costattribution would present in the interface file manager.go and depends on that package only.

I personally find this approach strange and unnecessary. Why do we need to isolate that? We don't do that anywhere.

I would move everything into the costattribution package.

to avoid import cycle, some projects do this, and easy mocks with interface. 👍 since we don't use this in Mimir would remove it to cost attribution as requested.

I would suggest building the mocks in a different package, so production code just needs its production package, but test packages would import the mock packages. If you use mockery to generate them, you can use --outpkg and --output flags for that.

Otherwise you'd need to put your mocks in the production interface package, and import two packages from your prod package.

do we have a preference in Mimir? I will follow our guideline. In grafana/grafana people are against mockery and prefer manual fake.

I don't think we've used mockery so far, but I'm not telling you to use mockery, the tool you use for the mock is orthogonal to the mock placement.

pkg/distributor/distributor.go

pkg/costattribution/caimpl/managerImpl.go

colega · 2024-10-09T11:24:01Z

pkg/ingester/activeseries/active_series.go

@@ -73,23 +77,38 @@ type seriesStripe struct {
 	activeMatchingNativeHistograms       []uint32 // Number of active entries (only native histograms) in this stripe matching each matcher of the configured Matchers.
 	activeNativeHistogramBuckets         uint32   // Number of buckets in active native histogram entries in this stripe. Only decreased during purge or clear.
 	activeMatchingNativeHistogramBuckets []uint32 // Number of buckets in active native histogram entries in this stripe matching each matcher of the configured Matchers.
+	userID                               string


Can we avoid copying the user in all stripes?

pkg/ingester/activeseries/active_series.go

colega · 2024-10-09T11:29:16Z

pkg/ingester/activeseries/active_series.go

@@ -212,6 +231,20 @@ func (c *ActiveSeries) ActiveWithMatchers() (total int, totalMatching []int, tot
 	return
 }

+func (c *ActiveSeries) ActiveByAttributionValue(calb string) map[string]uint32 {


Suggested change

func (c *ActiveSeries) ActiveByAttributionValue(calb string) map[string]uint32 {

func (c *ActiveSeries) ActiveByAttributionValue(label string) map[string]uint32 {

pkg/ingester/activeseries/active_series.go

colega · 2024-10-09T11:35:39Z

pkg/ingester/activeseries/active_series.go

@@ -456,6 +514,8 @@ func (s *seriesStripe) purge(keepUntil time.Time) {
 				s.deleted.purge(ref)
 			}
 			delete(s.refs, ref)
+			// here need to find what is deleted and decrement counters


This sounds like a TODO, please use a TODO comment.

pkg/costattribution/caimpl/tracker.go

pkg/costattribution/caimpl/tracker_group.go