Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce series and sample limits on streaming queries to ingester from querier #3873

Closed
wants to merge 3 commits into from

Conversation

bboreham
Copy link
Contributor

@bboreham bboreham commented Feb 24, 2021

What this PR does:
Count how many series and samples have been received by the querier, and stop processing if the limit is exceeded.

I check the sample limit only when samples are received (not chunks) and only after de-duplicating, which means the limit isn't very good protection against memory blow-up.

I also refactored the way userID is received by Query() and QueryStream() functions: I changed it from an indirect property of the Context to an explicit parameter. I needed userID to find the limits and couldn't bear adding another call to tenant.TenantID().

I haven't renamed the CLI flags from ingester.max...; thought I would post this as a draft and see what people think.

Which issue(s) this PR fixes:
Partial fix for #3669

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Instead of extracting it from the context.
This way is less surprising, and less code.

Signed-off-by: Bryan Boreham <[email protected]>
For streaming queries (which are the default)

Signed-off-by: Bryan Boreham <[email protected]>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of things:

  • Please update the MaxSeriesPerQuery and MaxSamplesPerQuery CLI flags description (and then make doc)
  • Add a CHANGELOG entry

I also refactored the way userID is received by Query() and QueryStream() functions

Makes sense to me.

@@ -19,15 +20,15 @@ import (
)

// Query multiple ingesters and returns a Matrix of samples.
func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers ...*labels.Matcher) (model.Matrix, error) {
func (d *Distributor) Query(ctx context.Context, userID string, from, to model.Time, matchers ...*labels.Matcher) (model.Matrix, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we enforce the limits here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; my focus was on things that blow up, which a range query is much more likely to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Query() is used for range queries too, no? It's used when the "gRPC streaming" is disabled, while "QueryStream()" is called when it's enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I got confused.

For non-streaming queries and chunks, ingester enforces the limits already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For non-streaming queries and chunks, ingester enforces the limits already.

Where is it done? I can't find it.

@@ -204,6 +203,10 @@ func (d *Distributor) queryIngesterStream(ctx context.Context, replicationSet ri

result.Chunkseries = append(result.Chunkseries, resp.Chunkseries...)
result.Timeseries = append(result.Timeseries, resp.Timeseries...)

if len(result.Chunkseries) > maxSeries || len(result.Timeseries) > maxSeries {
return nil, fmt.Errorf("exceeded maximum number of series in a query (limit %d)", maxSeries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the implications on the returned status code? I've the feeling this may be detected as a storage error and we return a 5xx error (while it should be a 4xx) but I haven't deeply checked it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would work better as an httpgrpc error?

Copy link
Contributor

@pracucci pracucci Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bboreham I don't remember how the error code propagation works in detail but I would start testing it. Do you have time/interest to work on it, otherwise I can takeover, cause I'm interested into this limit too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked it and we have to return a validation.LimitError.

@stale
Copy link

stale bot commented Jul 28, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 28, 2021
@stale stale bot closed this Aug 13, 2021
@bboreham
Copy link
Contributor Author

The max-fetched-chunks-per-query limit from #4125 is close to a limit on samples - each chunk is max 120 samples - so this limit is not needed in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants