-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
admission: improvements to KV admission #65957
Comments
Should we do something about |
Yes. Since this is happening through |
Cool. For context, I'm raise this in the context of the recent issues involving interactions between the bespoke cockroach/pkg/kv/kvserver/store.go Lines 181 to 193 in 14bd70c
|
Yep, I noticed the silliness with rate limiting after acquiring latches (when it got fixed). I don't think admission control would have helped -- since the CPU would not be saturated, it would allow low priority requests through. It is not aware that latches are a resource too, and that reducing latch contention due to low priority traffic will allow high priority traffic to get higher throughput. Modeling latches as a resource would be hard since we'd need to track contended spans. |
…rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs cockroachdb#65957 Release note: None
…rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs cockroachdb#65957 Release note: None
…rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs cockroachdb#65957 Release note: None
…rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs cockroachdb#65957 Release note: None
67717: admission,server: scope store overload admission control to write ope… r=sumeerbhola a=sumeerbhola …rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs #65957 Release note: None Co-authored-by: sumeerbhola <[email protected]>
…rations on the overloaded store Previously KVWork was all admitted through a single WorkQueue, and this WorkQueue was paired with a GrantCoordinator that took into account both CPU overload and storage overload. This meant a single overloaded store (in a multi-store setting) would slow down all KV operations including reads, and those on stores that were fine. We now have multiple WorkQueue's, one per store, that KV writes go through, and then get to the shared WorkQueue that all operations go through. The per-store WorkQueues are associated with their own GrantCoordinators that listen to health state for their respective stores. The per-store queues do not care about CPU and therefore do not do grant chaining. Since admission through these per-store queues happens before the shared queue, a store bottleneck will not cause KV slots in the shared queue to be taken by requests that are still waiting for admission elsewhere. The reverse can happen, and is considered acceptable -- per-store tokens, when a store is overloaded, can be used by requests that are now waiting for admission in the shared WorkQueue because of a cpu bottleneck. The code is significantly refactored for the above: NewGrantCoordinators returns a container called StoreGrantCoordinators which lazily initializes the relevant per-store GrantCoordinators when it first fetches Pebble metrics, in addition to the shared GrantCoordinator. The server code now integrates with both types and the code in Node.Batch will sometimes subject a request to two WorkQueues. The PebbleMetricsProvider now includes StoreIDs, and the periodic ticking that fetches these metrics at 1min intervals, and does 1s ticks, is moved to StoreGrantCoordinators. This simplifies the ioLoadListener which no longer does the periodic ticking and eliminates a some testing-only abstractions. The per-store WorkQueues share the same metrics, which represent an aggregate across these queues. Informs cockroachdb#65957 Release note: None
Admission control will slow down writes to that store without affecting operations on the other store. Also fixed some related graphs on the overload dashboard - the "IO tokens exhausted" graph should have been using a rate - the kv-stores metrics were missing Informs cockroachdb#65955,cockroachdb#65957 Release note: None
Admission control will slow down writes to that store without affecting operations on the other store. Also fixed some related graphs on the overload dashboard - the "IO tokens exhausted" graph should have been using a rate - the kv-stores metrics were missing Informs cockroachdb#65955,cockroachdb#65957 Release note: None
69050: sql: Set txn deadline to session expiry for mt pods. r=ajwerner a=rimadeodhar For ephemeral SQL pods in the multi-tenant environment, the transaction deadline should be set to min(sqlliveness.Session expiration, leased descriptor duration). Since a sql instance ID can be reclaimed by another SQL pod once the session for the old pod has expired, we need to ensure that all transactions for a SQL pod commit before the session for that SQL pod expires. In single-tenant environments, the session expiry will not be used. Release note: None Release justification: bug fix for new sqlinstance functionality Resolves: [67719](#67719) 69241: tests,ui: add a multi-store test with single overloaded store r=sumeerbhola a=sumeerbhola Admission control will slow down writes to that store without affecting operations on the other store. Also fixed some related graphs on the overload dashboard - the "IO tokens exhausted" graph should have been using a rate - the kv-stores metrics were missing Informs #65955,#65957 Release note: None Co-authored-by: rimadeodhar <[email protected]> Co-authored-by: sumeerbhola <[email protected]>
They are now marked with AdmissionHeader_ROOT_KV, which stops them from bypassing admission control. The priority is set to admission.LowPri since these are background activities that should not be preferred over interactive user traffic. Informs cockroachdb#65957 Release note: None
They are now marked with AdmissionHeader_ROOT_KV, which stops them from bypassing admission control. The priority is set to admission.LowPri since these are background activities that should not be preferred over interactive user traffic. Informs cockroachdb#65957 Release note: None
They are now marked with AdmissionHeader_ROOT_KV, which stops them from bypassing admission control. The priority is set to admission.NormalPri since there are cases where they do need to complete in a very timely manner. There are TODOs in the integration code to make more sophisticated priority assignments and use LowPri when possible. Informs cockroachdb#65957 Release note: None
They are now marked with AdmissionHeader_ROOT_KV, which stops them from bypassing admission control. The priority is set to admission.NormalPri since there are cases where they do need to complete in a very timely manner. There are TODOs in the integration code to make more sophisticated priority assignments and use LowPri when possible. Informs cockroachdb#65957 Release note: None
71109: kvserver,backupccl: make export, gc subject to admission control r=sumeerbhola a=sumeerbhola They are now marked with AdmissionHeader_ROOT_KV, which stops them from bypassing admission control. The priority is set to admission.NormalPri since there are cases where they do need to complete in a very timely manner. There are TODOs in the integration code to make more sophisticated priority assignments and use LowPri when possible. Informs #65957 Release note: None Co-authored-by: sumeerbhola <[email protected]>
They are now marked with AdmissionHeader_ROOT_KV, which stops them from bypassing admission control. The priority is set to admission.NormalPri since there are cases where they do need to complete in a very timely manner. There are TODOs in the integration code to make more sophisticated priority assignments and use LowPri when possible. Informs cockroachdb#65957 Release note: None
Partial backport of: tests,ui: add a multi-store test with single overloaded store Admission control will slow down writes to that store without affecting operations on the other store. Also fixed some related graphs on the overload dashboard - the "IO tokens exhausted" graph should have been using a rate - the kv-stores metrics were missing Informs cockroachdb#65955,cockroachdb#65957 Release note: None
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri). Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work. Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
89709: kv,rangefeed: integrate catchup scans with elastic cpu r=irfansharif a=irfansharif Part of #65957. First commit is from #89656. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In #89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in #86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri). Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something #89589 can help with. We leave admission integration of parallel worker goroutines to future work. Unlike export requests, rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to use at most 100ms of CPU time (it makes for an ineffective controller otherwise), and to that end we introduce the following component used within the rangefeed catchup iterator: // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None Co-authored-by: irfan sharif <[email protected]>
Part of cockroachdb#65957. Changefeed backfills, given their scan-heavy nature, can be fairly CPU-intensive. In cockroachdb#89656 we introduced a roachtest demonstrating the latency impact backfills can have on a moderately CPU-saturated cluster. Similar to what we saw for backups, this CPU heavy nature can elevate Go scheduling latencies which in turn translates to foreground latency impact. This commit integrates rangefeed catchup scan with the elastic CPU limiter we introduced in cockroachdb#86638; this is one of two optional halves of changefeed backfills. The second half is the initial scan -- scan requests issued over some keyspan as of some timestamp. For that we simply rely on the existing slots mechanism but now setting a lower priority bit (BulkNormalPri) -- cockroachdb#88733. Experimentally we observed that during initial scans the encoding routines in changefeedccl are the most impactful CPU-wise, something cockroachdb#89589 can help with. We leave admission integration of parallel worker goroutines to future work (cockroachdb#90089). Unlike export requests rangefeed catchup scans are non-premptible. The rangefeed RPC is a streaming one, and the catchup scan is done during stream setup. So we don't have resumption tokens to propagate up to the caller like we did for backups. We still want CPU-bound work going through admission control to only use 100ms of CPU time, to exercise better control over scheduling latencies. To that end, we introduce the following component used within the rangefeed catchup iterator. // Pacer is used in tight loops (CPU-bound) for non-premptible // elastic work. Callers are expected to invoke Pace() every loop // iteration and Close() once done. Internally this type integrates // with elastic CPU work queue, acquiring tokens for the CPU work // being done, and blocking if tokens are unavailable. This allows // for a form of cooperative scheduling with elastic CPU granters. type Pacer struct func (p *Pacer) Pace(ctx context.Context) error { ... } func (p *Pacer) Close() { ... } Release note: None
Closing this old issue since statements like "Most work initiated from within KV bypasses admission control. This is not appropriate especially for background work like GC, backups etc" are stale and have been addressed. |
Misc TODOs/limitations in KV admission control:
AdmissionHeader
. (Export, GC are now subject to admission control kvserver,backupccl: make export, gc subject to admission control #71109, but there are TODOs regarding priority assignment)KVLowPriWork
WorkKind
, which means even low-priority KV work will get prioritized over user-facing SQL processing. (Edit: with the elastic classification, this is no longer true, since low-priority KV work goes through a different grant coordinator).RangeFeedRequest
is not subject to admission control.WorkQueue
that is hooked up to anGranter
that only uses write overload as a resource signal.Jira issue: CRDB-7813
The text was updated successfully, but these errors were encountered: