Query Frontend: Job weights #4076

joe-elliott · 2024-09-12T16:23:56Z

What this PR does:
The query frontend treats all jobs as the same size when it farms them out to the queriers. This can cause querier instability b/c some jobs actually require quite a bit more resources to execute. By assigning weights to jobs we can reduce the amount each querier is asked to do will hopefully:

reduce querier OOMs/timeouts/retries
reduce querier latency
increase total throughput

Other changes

Removed the roundtripper httpgrpc bridge and pushed the concept of pipeline.Request all the way down into the cortex frontend code. This can be a nice perf improvement b/c translating http -> httpgrpc is costly and we are pushing it to the last moment. Currently for some queries we are translating thousands of jobs and then throwing them away.
Removed redundant parseQuery and createFetchSpansRequest to consolidate on the Compile function in pkg/traceql
Check for context error before going through retry logic in retryWare. This causes retry metrics to be more accurate in the event of many cancelled jobs.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <[email protected]>

zalegrala · 2024-09-18T16:52:08Z

modules/frontend/queue/queue.go

+			}
+			totalWeight += weight
+
+			if totalWeight >= requestedCount {


I think this makes sense. I suppose what we're saying here is that we request of this batch a certain high water mark of work that we're willing to take, and the weight increases the notion of complexity for a single item above this threshold. Implicitly here I suppose is that weight and requestedCount are of the same unit of measure.

Implicitly here I suppose is that weight and requestedCount are of the same unit of measure.

yes! currently all jobs fill a single "slot" in the batch. the "weight" is basically just making it fill more slots.

zalegrala · 2024-09-18T16:55:53Z

modules/frontend/weights/weights.go

+		}
+	}
+
+	if conditions > 4 { // yay, magic!


A fine starting point. I was wonder if each condition is weight++, and maybe each regex is weight+2 or some such. It means for the queue logic that if any condition is present, we'll never consume the entire requested batch. 🤔

modules/frontend/weights/weights.go

modules/frontend/transport/roundtripper.go

…equest

modules/frontend/pipeline/async_weight_middleware.go

modules/frontend/config.go

modules/frontend/pipeline/pipeline.go

modules/frontend/metrics_query_range_sharder.go

modules/frontend/pipeline/async_weight_middleware.go

modules/frontend/pipeline/pipeline.go

The query frontend treats all jobs as the same size when it farms them out to the queriers. This can cause querier instability b/c some jobs actually require quite a bit more resources to execute. By assigning weights to jobs we can reduce the amount each querier is asked to do will hopefully: reduce querier OOMs/timeouts/retries reduce querier latency increase total throughput Other changes Removed the roundtripper httpgrpc bridge and pushed the concept of pipeline.Request all the way down into the cortex frontend code. This can be a nice perf improvement b/c translating http -> httpgrpc is costly and we are pushing it to the last moment. Currently for some queries we are translating thousands of jobs and then throwing them away. Removed redundant parseQuery and createFetchSpansRequest to consolidate on the Compile function in pkg/traceql Check for context error before going through retry logic in retryWare. This causes retry metrics to be more accurate in the event of many cancelled jobs.

joe-elliott added 5 commits September 12, 2024 09:28

push down pipelineResponse to the frontend

5d33f7d

Signed-off-by: Joe Elliott <[email protected]>

early exit on ctx.Err()

42e1d11

Signed-off-by: Joe Elliott <[email protected]>

added weights to request

791b6b8

Signed-off-by: Joe Elliott <[email protected]>

remove unused roundtripper

f062280

Signed-off-by: Joe Elliott <[email protected]>

notes cleanup

dde6193

Signed-off-by: Joe Elliott <[email protected]>

joe-elliott requested review from annanay25, mdisibio, mapno, yvrhdn, zalegrala, electron0zero, ie-pham and stoewer as code owners September 12, 2024 16:23

joe-elliott marked this pull request as draft September 12, 2024 16:24

zalegrala reviewed Sep 18, 2024

View reviewed changes

modules/frontend/weights/weights.go Outdated Show resolved Hide resolved

zalegrala reviewed Sep 18, 2024

View reviewed changes

modules/frontend/transport/roundtripper.go Show resolved Hide resolved

javiermolinar and others added 12 commits September 20, 2024 11:03

fix documentation

e1a8363

fix tests

b2c4e4c

added test for weights picking request batches

36d4725

fix ast tests

4b980ca

fix panic in test

d6afbbb

fix another panic

a114967

Add weight test

18fdf16

move weight functionality as a middleware

9535ef9

cleanup

f78838f

more cleanup

0b190b2

rollback some unneded changes

3188260

fix tests

5f294c1

fix traceql errors by propagating the start and end to the fetchspanr…

dd78c7e

…equest

joe-elliott commented Sep 23, 2024

View reviewed changes

modules/frontend/pipeline/async_weight_middleware.go Outdated Show resolved Hide resolved

javiermolinar added 3 commits September 23, 2024 16:29

add config to disable the feature

2e611e7

propatage weights to sharded requests

f480a62

simplify logic

76570b1

javiermolinar marked this pull request as ready for review October 9, 2024 15:04

joe-elliott commented Oct 9, 2024

View reviewed changes

mdisibio mentioned this pull request Oct 9, 2024

TraceQL performance improvement: dynamic reordering of binop branches #4163

Merged

3 tasks

javiermolinar added 2 commits October 11, 2024 16:06

simplify query passthrough

bf64d78

improve weights configuration

727ec8e

javiermolinar self-requested a review October 11, 2024 14:47

javiermolinar approved these changes Oct 11, 2024

View reviewed changes

javiermolinar merged commit 5aef523 into grafana:main Oct 11, 2024
16 checks passed

electron0zero mentioned this pull request Nov 28, 2024

max query expr electron0zero/tempo#3

Closed

electron0zero mentioned this pull request Jan 29, 2025

contri svs electron0zero/tempo#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Frontend: Job weights #4076

Query Frontend: Job weights #4076

joe-elliott commented Sep 12, 2024 •

edited by javiermolinar

Loading

zalegrala Sep 18, 2024

joe-elliott Sep 18, 2024

zalegrala Sep 18, 2024

Query Frontend: Job weights #4076

Query Frontend: Job weights #4076

Conversation

joe-elliott commented Sep 12, 2024 • edited by javiermolinar Loading

zalegrala Sep 18, 2024

Choose a reason for hiding this comment

joe-elliott Sep 18, 2024

Choose a reason for hiding this comment

zalegrala Sep 18, 2024

Choose a reason for hiding this comment

joe-elliott commented Sep 12, 2024 •

edited by javiermolinar

Loading