Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create flag to prioritize dequeuing from query components over tenant fairness #9016

Merged
merged 5 commits into from
Aug 17, 2024

Conversation

chencs
Copy link
Contributor

@chencs chencs commented Aug 15, 2024

What this PR does

This PR introduces a flag, query-scheduler.prioritize-query-components, which allows operators to configure how preferentially the query scheduler dequeues from query components (also known as queue dimensions) vs. from tenants.

In the current state, the query scheduler constructs a tree queue which looks something like this:

root
├─ tenant-1
│  ├─ unknown
│  ├─ ingester
│  ├─ store-gateway
│  └─ ingester-and-store-gateway
...
└─ tenant-n
   ├─ unknown
   ├─ ingester
   ├─ store-gateway
   └─ ingester-and-store-gateway

and when a query is dequeued from this tree, it walks the tree down to, e.g., tenant-1 and checks every subtree for a dequeuable item, only checking the next tenant if it finds nothing to dequeue. This means that it is possible for slow requests from one tenant to eventually back up the entire request queue if a query component is experiencing degradation.

Example: store-gateways are having a problem, or are working on expensive queries. Ingesters are fine, so we'd like to keep executing a steady stream of ingester-only queries. With this tree, though, once we dequeue to a tenant, each query component subtree will be checked until a dequeue-able query is found, regardless of whether it is an ingester-only query, which means we can't guarantee that steady stream of queries on healthy components.

With these changes and query-scheduler.prioritize-query-components enabled, the tree would instead be constructed as:

root
├─ unknown
│  ├─ tenant-1
│  ├─ ...
│  └─ tenant-n
├─ ingester
│  ├─ tenant-1
│  ├─ ...
│  └─ tenant-n
...
└─ store-gateway
   ├─ tenant-1
   ├─ ...
   └─ tenant-n

Thus, in the scenario of the same example above, once we get to the ingester-only subtree, we will dequeue a query for any tenant in the fairness order which has an ingester-only query queued.

It will still be possible to get into the scenario where all queriers are busy with degraded-component queries, since the root node is still only round-robining between components. @francoposa is working on a change to improve the component subtree selection.

Which issue(s) this PR fixes or relates to

Fixes #

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@chencs chencs force-pushed the casie/implement-flip branch 2 times, most recently from 8f15ec1 to 4a9d281 Compare August 15, 2024 23:44
@chencs chencs force-pushed the casie/implement-flip branch from 98ab724 to 926d7bf Compare August 16, 2024 00:01
@chencs chencs force-pushed the casie/implement-flip branch from 926d7bf to 06708d9 Compare August 16, 2024 00:02
CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@francoposa francoposa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM save for some minor nits & I will leave more of the flag doc wording for docs team to approve.

I kept looking for where the flag was actually used in the production code in this PR, but it was already in makeQueuePath :)

Copy link
Contributor

@tacole02 tacole02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this doc!

@chencs chencs marked this pull request as ready for review August 16, 2024 23:59
@chencs chencs requested a review from a team as a code owner August 16, 2024 23:59
@chencs chencs merged commit 3872ccb into main Aug 17, 2024
29 checks passed
@chencs chencs deleted the casie/implement-flip branch August 17, 2024 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants