-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce series and sample limits on streaming queries to ingester from querier #3873
Conversation
Instead of extracting it from the context. This way is less surprising, and less code. Signed-off-by: Bryan Boreham <[email protected]>
For streaming queries (which are the default) Signed-off-by: Bryan Boreham <[email protected]>
Signed-off-by: Bryan Boreham <[email protected]>
512ee27
to
0a08d5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things:
- Please update the MaxSeriesPerQuery and MaxSamplesPerQuery CLI flags description (and then
make doc
) - Add a CHANGELOG entry
I also refactored the way userID is received by Query() and QueryStream() functions
Makes sense to me.
@@ -19,15 +20,15 @@ import ( | |||
) | |||
|
|||
// Query multiple ingesters and returns a Matrix of samples. | |||
func (d *Distributor) Query(ctx context.Context, from, to model.Time, matchers ...*labels.Matcher) (model.Matrix, error) { | |||
func (d *Distributor) Query(ctx context.Context, userID string, from, to model.Time, matchers ...*labels.Matcher) (model.Matrix, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we enforce the limits here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; my focus was on things that blow up, which a range query is much more likely to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Query()
is used for range queries too, no? It's used when the "gRPC streaming" is disabled, while "QueryStream()" is called when it's enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I got confused.
For non-streaming queries and chunks, ingester enforces the limits already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-streaming queries and chunks, ingester enforces the limits already.
Where is it done? I can't find it.
@@ -204,6 +203,10 @@ func (d *Distributor) queryIngesterStream(ctx context.Context, replicationSet ri | |||
|
|||
result.Chunkseries = append(result.Chunkseries, resp.Chunkseries...) | |||
result.Timeseries = append(result.Timeseries, resp.Timeseries...) | |||
|
|||
if len(result.Chunkseries) > maxSeries || len(result.Timeseries) > maxSeries { | |||
return nil, fmt.Errorf("exceeded maximum number of series in a query (limit %d)", maxSeries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the implications on the returned status code? I've the feeling this may be detected as a storage error and we return a 5xx error (while it should be a 4xx) but I haven't deeply checked it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would work better as an httpgrpc error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bboreham I don't remember how the error code propagation works in detail but I would start testing it. Do you have time/interest to work on it, otherwise I can takeover, cause I'm interested into this limit too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked it and we have to return a validation.LimitError
.
This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions. |
The |
What this PR does:
Count how many series and samples have been received by the querier, and stop processing if the limit is exceeded.
I check the sample limit only when samples are received (not chunks) and only after de-duplicating, which means the limit isn't very good protection against memory blow-up.
I also refactored the way
userID
is received byQuery()
andQueryStream()
functions: I changed it from an indirect property of theContext
to an explicit parameter. I neededuserID
to find the limits and couldn't bear adding another call totenant.TenantID()
.I haven't renamed the CLI flags from
ingester.max...
; thought I would post this as a draft and see what people think.Which issue(s) this PR fixes:
Partial fix for #3669
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]