Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

frontend: Add option to "spin off" subqueries as actual range queries #10460

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

julienduchesne
Copy link
Member

@julienduchesne julienduchesne commented Jan 16, 2025

Issue: #10023

Description:

The feature was developed by inspiring myself from the query sharding feature. The queries are mapped into either downstream queries or subqueries. Both types of queries are run and the results are fed back into prometheus' engine and the result is calculated in the frontend.

The way it optimizes subqueries, is that range queries are passed back to the frontend and all of the range queries optimizations are applied (query sharding, splitting + caching)

Configuration:

This is a new feature that is completely isolated within a new middleware so it shouldn't affect current functionality of the frontend.
For safety, it requires two configurations to be enabled:

  • --query-frontend.spin-off-instant-subqueries-to-url=<url> on the frontend. This should be set to the URL of the frontend for optimal performance. The range queries are load balanced across frontends
  • instant_queries_with_subquery_spin_off in tenant configs. These are regexp patterns that allow us to match individual queries (or all of them). This will allow us to opt-in queries to enable the feature gradually

Performance impact:

The AST mapper only selects queries that are susceptible to be improved, others are just passed on to the next middleware.
For the queries that are improved, results can be up to 50x faster. When a query is selected, the worst cases I've seen are ~equal or a bit better in performance to unmodified queries.
Further tests will be done and the mapper may be improved to detect cases that aren't optimal

PromQL results impact:

None detected from all the tests I've done. The results were always identical.

julienduchesne added a commit that referenced this pull request Jan 20, 2025
julienduchesne added a commit that referenced this pull request Jan 21, 2025
Extracted from #10460. Trying to make the PR more presentable
@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch 3 times, most recently from fc11f73 to 16cbd65 Compare January 21, 2025 14:22
Copy link
Contributor

github-actions bot commented Jan 21, 2025

@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch 2 times, most recently from 1045f45 to 42d7dbc Compare January 21, 2025 16:14
julienduchesne added a commit that referenced this pull request Jan 23, 2025
Extracted from #10460. Trying to make the PR more presentable
@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch 4 times, most recently from 11a1801 to 0f57ae1 Compare January 24, 2025 03:57
@julienduchesne julienduchesne changed the title Subquery Spin-off POC frontend: Add option to "spin off" subqueries as actual range queries Jan 24, 2025
@julienduchesne julienduchesne marked this pull request as ready for review January 24, 2025 04:12
@julienduchesne julienduchesne requested review from tacole02 and a team as code owners January 24, 2025 04:12
Copy link
Contributor

@chencs chencs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this makes sense. I think it is a first for Mimir to see Queryable implemented in the frontend, so it's worth getting another person to comment on that pattern.

@julienduchesne
Copy link
Member Author

julienduchesne commented Jan 27, 2025

Overall, I think this makes sense. I think it is a first for Mimir to see Queryable implemented in the frontend, so it's worth getting another person to comment on that pattern.

Query sharding does the same

@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch 2 times, most recently from 594e50b to 7e4b451 Compare January 27, 2025 15:34
@julienduchesne julienduchesne requested a review from a team January 27, 2025 15:35

// InstantQueriesWithSubquerySpinOff returns a list of regexp patterns of instant queries that can be optimized by spinning off range queries.
// If the list is empty, the feature is disabled.
InstantQueriesWithSubquerySpinOff(userID string) []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a list of expressions, what if we only allowed a single expression? Anyone who wants to allow multiple expressions can chain them together in the expression with something like (...|...|...).

Also, would it be possible to parse and compile the expression once, rather than doing this for every query request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to leave it as-is because it might otherwise get complicated. I want to gradually add more queries to it, without modifying the current inclusions, the slice works better for that.

For the regexp matcher, I use prometheus' FastRegexMatcher (https://github.com/prometheus/prometheus/blob/7bc11dcb0664/pkg/labels/regexp.go#L29), just like the blocked queries feature. It has a cache: grafana/mimir-prometheus#465

pkg/frontend/querymiddleware/roundtrip.go Show resolved Hide resolved
@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch 2 times, most recently from 510428d to bedc55e Compare January 28, 2025 21:30
pkg/frontend/querymiddleware/spin_off_subqueries_test.go Outdated Show resolved Hide resolved
pkg/frontend/querymiddleware/spin_off_subqueries.go Outdated Show resolved Hide resolved
pkg/frontend/querymiddleware/spin_off_subqueries.go Outdated Show resolved Hide resolved

streams := make([][]SampleStream, len(rangeQueries))
for idx, req := range rangeQueries {
resp, err := q.upstreamRangeHandler.Do(ctx, req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these requests run concurrently? If not, seems like it might be worthwhile to do so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason: I don't want to unexpectedly blow up the query path

The case where the steps will go over 11000 is when querying far into the past, since even with a 30s resolution, 11000 steps is a 92h range, so most queries are supported with a single range query

If we always have only one inflight query, there's no risk of killing the query path, because if someone does an aggregation for a whole year, then it could potentially spawn dozens of range queries. By doing one at a time, we'll eventually time out the query instead of killing Mimir.

Issue: #10023

This is a new feature that is completely isolated within a new middleware so it shouldn't affect current functionality of the frontend.
For safety, it requires two configurations to be enabled:
- `--query-frontend.spin-off-instant-subqueries-to-url=<url>` on the frontend. This should be set to the URL of the frontend for optimal performance. The range queries are load balanced across frontends
- `instant_queries_with_subquery_spin_off` in tenant configs. These are regexp patterns that allow us to match individual queries (or all of them). This will allow us to opt-in queries to enable the feature gradually

The feature was developed by basing myself upon the query sharding feature. The queries are mapped into either downstream queries or subqueries. Both types of queries are run and the results are fed back into prometheus' engine and the result is calculated in the frontend.

Performance impact:
The AST mapper only selects queries that are susceptible to be improved, others are just passed on to the next middleware.
For the queries that are improved, results can be up to 50x faster. When a query is selected, the worst cases I've seen are ~equal or a bit better in performance to unmodified queries.
Further tests will be done and the mapper may be improved to detect cases that aren't optimal

PromQL results impact:
None detected from all the tests I've done
- Support offsets
- Disable `@`
- Improve tests. Run each query with a different offset each time
- Add new test cases with `offset x`
- Add new test case with a long range (more than 11000 steps). It has to be split into multiple range queries
- Allow setting query path in frontend arg (instead of hardcoding `/prometheus/api/v1/query_range`)
- Use step align code from MQE
- Fix up comments and log messages
- Return error in case of a wrong
@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch from 5aa61c3 to 8d0e0ba Compare January 29, 2025 17:14
@julienduchesne julienduchesne requested a review from a team January 29, 2025 17:18
@julienduchesne julienduchesne force-pushed the julienduchesne/subquery-spinoff-poc branch from 6157817 to afa71ae Compare January 29, 2025 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants