-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix killing queries #8158
Fix killing queries #8158
Conversation
The limit waited until all the iterators had been created which still allows problem queries to be planned. This allows the queries to be aborted much earlier in some cases.
The underlying iterators were not closed when a query was kill so although the client would receive an error, the query would continue on until completion.
If a bad query is run, kill query and limits would not kick in until after it started executing. Some bad queries that involve high cardinality can cause the server to OOM just from planning which defeats the purpose of the max-select-series limit. This change primarily fixes max-select-series limit so that the query is killed earlier and has the side effect that kill query now can kill a query while it's being planned.
@@ -767,7 +767,7 @@ func (itr *floatInterruptIterator) Next() (*FloatPoint, error) { | |||
if itr.count&0xFF == 0xFF { | |||
select { | |||
case <-itr.closing: | |||
return nil, nil | |||
return nil, itr.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the reason this returned nil was mostly because another part of the code is supposed to call Close()
and I didn't want to double close. Is the query engine not calling close correctly? It's supposed to call Next()
until it gets a nil
and then call Close()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code path where a query is interrupted either via kill query
or a hitting a limit. The cause of #7811 is that itr.Close()
is not called in this case. I can change it to call itr.Close()
and return nil though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't think there are any iterators that can't accept a double close. If there are, we can add a check to the interrupt iterator so it will only perform a single close.
tsdb/engine/tsm1/engine.go
Outdated
@@ -1352,7 +1360,7 @@ func (e *Engine) createVarRefIterator(measurement string, opt influxql.IteratorO | |||
} | |||
|
|||
// Determine tagsets for this measurement based on dimensions and filters. | |||
tagSets, err := mm.TagSets(e.id, opt.Dimensions, opt.Condition) | |||
tagSets, err := mm.TagSets(e.id, opt.Dimensions, opt.Condition, opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just take a <-chan struct{}
instead? If we want to change it to take the iterator options, we should just remove the opt.Dimensions
and opt.Condition
and pass that as part of the IteratorOptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these in 8177df2
While troubleshooting some high cardinality issues, I discovered that
max-select-series
did not actually kill a query until after it was planned. In this particular case, RAM blew up during planning. (#7457)This change fixes
max-select-series
so that it's killed as soon as we know the query will exceed the limit. It has the side effect that queries can now be killed during the planning phase even withkill query
.There was also a bug in the
InterruptIterators
where even when a query was killed, it would still continue to execute until completion (#7811). The client would get an error saying it was interrupted, but it actually wasn't completely.Required for all non-trivial PRs
Fixes #7457 #7811