Enforce series and sample limits on streaming queries to ingester from querier #3873

bboreham · 2021-02-24T17:54:51Z

What this PR does:
Count how many series and samples have been received by the querier, and stop processing if the limit is exceeded.

I check the sample limit only when samples are received (not chunks) and only after de-duplicating, which means the limit isn't very good protection against memory blow-up.

I also refactored the way userID is received by Query() and QueryStream() functions: I changed it from an indirect property of the Context to an explicit parameter. I needed userID to find the limits and couldn't bear adding another call to tenant.TenantID().

I haven't renamed the CLI flags from ingester.max...; thought I would post this as a draft and see what people think.

Which issue(s) this PR fixes:
Partial fix for #3669

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Instead of extracting it from the context. This way is less surprising, and less code. Signed-off-by: Bryan Boreham <[email protected]>

For streaming queries (which are the default) Signed-off-by: Bryan Boreham <[email protected]>

Signed-off-by: Bryan Boreham <[email protected]>

pracucci

A couple of things:

Please update the MaxSeriesPerQuery and MaxSamplesPerQuery CLI flags description (and then make doc)
Add a CHANGELOG entry

I also refactored the way userID is received by Query() and QueryStream() functions

Makes sense to me.

pracucci · 2021-02-26T12:06:59Z

pkg/distributor/query.go

@@ -19,15 +20,15 @@ import (
 )

 // Query multiple ingesters and returns a Matrix of samples.
-func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers ...*labels.Matcher) (model.Matrix, error) {
+func (d *Distributor) Query(ctx context.Context, userID string, from, to model.Time, matchers ...*labels.Matcher) (model.Matrix, error) {


Shouldn't we enforce the limits here as well?

Yes; my focus was on things that blow up, which a range query is much more likely to.

The Query() is used for range queries too, no? It's used when the "gRPC streaming" is disabled, while "QueryStream()" is called when it's enabled.

Oh, sorry, I got confused.

For non-streaming queries and chunks, ingester enforces the limits already.

For non-streaming queries and chunks, ingester enforces the limits already.

Where is it done? I can't find it.

pracucci · 2021-02-26T12:07:48Z

pkg/distributor/query.go

@@ -204,6 +203,10 @@ func (d *Distributor) queryIngesterStream(ctx context.Context, replicationSet ri

 			result.Chunkseries = append(result.Chunkseries, resp.Chunkseries...)
 			result.Timeseries = append(result.Timeseries, resp.Timeseries...)
+
+			if len(result.Chunkseries) > maxSeries || len(result.Timeseries) > maxSeries {
+				return nil, fmt.Errorf("exceeded maximum number of series in a query (limit %d)", maxSeries)


What are the implications on the returned status code? I've the feeling this may be detected as a storage error and we return a 5xx error (while it should be a 4xx) but I haven't deeply checked it.

Do you think it would work better as an httpgrpc error?

@bboreham I don't remember how the error code propagation works in detail but I would start testing it. Do you have time/interest to work on it, otherwise I can takeover, cause I'm interested into this limit too.

I checked it and we have to return a validation.LimitError.

stale · 2021-07-28T22:19:45Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

bboreham · 2021-09-16T09:48:46Z

The max-fetched-chunks-per-query limit from #4125 is close to a limit on samples - each chunk is max 120 samples - so this limit is not needed in practice.

bboreham added 2 commits February 24, 2021 17:27

Refactor: pass userID in to Query functions

7289864

Instead of extracting it from the context. This way is less surprising, and less code. Signed-off-by: Bryan Boreham <[email protected]>

Check limits on series and samples in querier

052d265

For streaming queries (which are the default) Signed-off-by: Bryan Boreham <[email protected]>

pull-request-size bot added the size/M label Feb 24, 2021

Check series limit after combining all series from ingesters

0a08d5a

Signed-off-by: Bryan Boreham <[email protected]>

bboreham force-pushed the query-limits branch from 512ee27 to 0a08d5a Compare February 25, 2021 09:15

pracucci reviewed Feb 26, 2021

View reviewed changes

stale bot added the stale label Jul 28, 2021

stale bot closed this Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce series and sample limits on streaming queries to ingester from querier #3873

Enforce series and sample limits on streaming queries to ingester from querier #3873

bboreham commented Feb 24, 2021 •

edited

Loading

pracucci left a comment

pracucci Feb 26, 2021

bboreham Feb 26, 2021

pracucci Feb 26, 2021

bboreham Feb 26, 2021

pracucci Apr 29, 2021

pracucci Feb 26, 2021

bboreham Feb 26, 2021

pracucci Mar 18, 2021 •

edited

Loading

pracucci Apr 29, 2021

stale bot commented Jul 28, 2021

bboreham commented Sep 16, 2021

Enforce series and sample limits on streaming queries to ingester from querier #3873

Enforce series and sample limits on streaming queries to ingester from querier #3873

Conversation

bboreham commented Feb 24, 2021 • edited Loading

pracucci left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pracucci Mar 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stale bot commented Jul 28, 2021

bboreham commented Sep 16, 2021

bboreham commented Feb 24, 2021 •

edited

Loading

pracucci Mar 18, 2021 •

edited

Loading