kvserver: don't send snapshots to overloaded learners #85479

erikgrinaker · 2022-08-02T17:35:26Z

In #83851, we prevent sending Raft snapshots to followers that have an overloaded engine:

cockroach/pkg/kv/kvserver/raft_snapshot_queue.go

Lines 142 to 153 in acdf42a

    
           repl.mu.RLock() 
        
           _, destPaused := repl.mu.pausedFollowers[id] 
        
           repl.mu.RUnlock() 
        
           if ioThresh := repl.store.ioOverloadedStores.Load()[repDesc.StoreID]; ioThresh != nil && destPaused { 
        
           	// If the destination is paused, be more hesitant to send snapshots. The destination being 
        
           	// paused implies that we have recently checked that it's not required for quorum, and that 
        
           	// we wish to conserve I/O on that store, which sending a snapshot counteracts. So hold back on 
        
           	// the snapshot as well. 
        
           	err := errors.Errorf("skipping snapshot; %s is overloaded: %s", repDesc, ioThresh) 
        
           	repl.reportSnapshotStatus(ctx, repDesc.ReplicaID, err) 
        
           	return false, err 
        
           }

However, we don't do this when sending snapshots to new learners:

cockroach/pkg/kv/kvserver/replica_command.go

Lines 1801 to 1803 in acdf42a

    
           if err := r.sendSnapshot(ctx, rDesc, kvserverpb.SnapshotRequest_INITIAL, priority); err != nil { 
        
           	return nil, err 
        
           }

We probably should. Erroring out here may be sufficient.

Jira issue: CRDB-18266

Epic CRDB-15069

blathers-crl · 2022-08-02T17:35:28Z

cc @cockroachdb/replication

erikgrinaker · 2022-08-02T17:35:43Z

Hat tip to @nvanbenschoten for the hint.

74337: kv: pool rangeCacheKey objects r=arulajmani a=nvanbenschoten This commit introduces a `sync.Pool` for `rangeCacheKey` objects. This is used to avoid heap allocations when querying and updating the `RangeCache`. ``` name old time/op new time/op delta KV/Scan/Native/rows=1-10 14.8µs ± 2% 14.9µs ± 4% ~ (p=0.356 n=9+10) KV/Scan/SQL/rows=1-10 92.1µs ± 4% 94.3µs ± 5% ~ (p=0.113 n=9+10) name old alloc/op new alloc/op delta KV/Scan/Native/rows=1-10 6.87kB ± 0% 6.85kB ± 0% -0.35% (p=0.000 n=10+10) KV/Scan/SQL/rows=1-10 20.0kB ± 0% 20.0kB ± 0% -0.25% (p=0.012 n=10+10) name old allocs/op new allocs/op delta KV/Scan/Native/rows=1-10 52.0 ± 0% 51.0 ± 0% -1.92% (p=0.000 n=10+10) KV/Scan/SQL/rows=1-10 244 ± 0% 242 ± 0% -0.78% (p=0.000 n=10+10) ``` ---- This is part of a collection of assorted micro-optimizations: - #74336 - #74337 - #74338 - #74339 - #74340 - #74341 - #74342 - #74343 - #74344 - #74345 - #74346 - #74347 - #74348 Combined, these changes have the following effect on end-to-end SQL query performance: ``` name old time/op new time/op delta KV/Scan/SQL/rows=1-10 94.4µs ±10% 92.3µs ±11% -2.20% (p=0.000 n=93+93) KV/Scan/SQL/rows=10-10 102µs ±10% 99µs ±10% -2.16% (p=0.000 n=94+94) KV/Update/SQL/rows=10-10 378µs ±15% 370µs ±11% -2.04% (p=0.003 n=95+91) KV/Insert/SQL/rows=1-10 133µs ±14% 132µs ±12% ~ (p=0.738 n=95+93) KV/Insert/SQL/rows=10-10 197µs ±14% 196µs ±13% ~ (p=0.902 n=95+94) KV/Update/SQL/rows=1-10 186µs ±14% 185µs ±14% ~ (p=0.351 n=94+93) KV/Delete/SQL/rows=1-10 132µs ±13% 132µs ±14% ~ (p=0.473 n=94+94) KV/Delete/SQL/rows=10-10 254µs ±16% 250µs ±16% ~ (p=0.086 n=100+99) name old alloc/op new alloc/op delta KV/Scan/SQL/rows=1-10 20.1kB ± 0% 19.1kB ± 1% -4.91% (p=0.000 n=96+96) KV/Scan/SQL/rows=10-10 21.7kB ± 0% 20.7kB ± 1% -4.61% (p=0.000 n=96+97) KV/Delete/SQL/rows=10-10 64.0kB ± 3% 63.7kB ± 3% -0.55% (p=0.000 n=100+100) KV/Update/SQL/rows=1-10 45.8kB ± 1% 45.5kB ± 1% -0.55% (p=0.000 n=97+98) KV/Update/SQL/rows=10-10 105kB ± 1% 105kB ± 1% -0.10% (p=0.008 n=97+98) KV/Delete/SQL/rows=1-10 40.8kB ± 0% 40.7kB ± 0% -0.08% (p=0.001 n=95+96) KV/Insert/SQL/rows=1-10 37.4kB ± 1% 37.4kB ± 0% ~ (p=0.698 n=97+96) KV/Insert/SQL/rows=10-10 76.4kB ± 1% 76.4kB ± 0% ~ (p=0.822 n=99+98) name old allocs/op new allocs/op delta KV/Scan/SQL/rows=1-10 245 ± 0% 217 ± 0% -11.43% (p=0.000 n=95+92) KV/Scan/SQL/rows=10-10 280 ± 0% 252 ± 0% -10.11% (p=0.000 n=75+97) KV/Delete/SQL/rows=10-10 478 ± 0% 459 ± 0% -4.04% (p=0.000 n=94+97) KV/Delete/SQL/rows=1-10 297 ± 1% 287 ± 1% -3.34% (p=0.000 n=97+97) KV/Update/SQL/rows=1-10 459 ± 0% 444 ± 0% -3.27% (p=0.000 n=97+97) KV/Insert/SQL/rows=1-10 291 ± 0% 286 ± 0% -1.72% (p=0.000 n=82+86) KV/Update/SQL/rows=10-10 763 ± 1% 750 ± 1% -1.68% (p=0.000 n=96+98) KV/Insert/SQL/rows=10-10 489 ± 0% 484 ± 0% -1.03% (p=0.000 n=98+98) ``` 85730: kvserver: also block LEARNER snaps to paused followers r=erikgrinaker a=tbg We checked whether the snapshot recipient was paused only in the raft log queue path. By pushing the check down into `sendSnapshot`, it is now hit by any snapshot attempt, which includes the replicate queue and store rebalancer. For best results, both of these should avoid moving replicas to paused followers in the first place, which they already do, at least partially, so this change shouldn't have much of an impact in practice. Fixes #85479. Release note: None 85732: kvserver: only pause followers when holding active lease r=erikgrinaker a=tbg If the raft leader is not the leaseholder (which includes the case in which we just transferred the lease away), leave all followers unpaused. Otherwise, the leaseholder won't learn that the entries it submitted were committed which effectively causes range unavailability. Fixes #84884. Release note: None 85756: builtins: add strptime/strftime builtins without experimental prefix r=rafiss a=rafiss refs #52139 These are just an alias for the existing implementation. Release note (sql change): The strptime and strftime builtin functions were added as aliases for experimental_strptime and experimental_strftime. Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>

erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Aug 2, 2022

erikgrinaker assigned tbg Aug 2, 2022

blathers-crl bot added the A-kv-replication Relating to Raft, consensus, and coordination. label Aug 2, 2022

tbg mentioned this issue Aug 8, 2022

kvserver: also block LEARNER snaps to paused followers #85730

Merged

craig bot closed this as completed in 04481d8 Aug 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: don't send snapshots to overloaded learners #85479

kvserver: don't send snapshots to overloaded learners #85479

erikgrinaker commented Aug 2, 2022 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Aug 2, 2022

erikgrinaker commented Aug 2, 2022

kvserver: don't send snapshots to overloaded learners #85479

kvserver: don't send snapshots to overloaded learners #85479

Comments

erikgrinaker commented Aug 2, 2022 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Aug 2, 2022

erikgrinaker commented Aug 2, 2022

erikgrinaker commented Aug 2, 2022 •

edited by exalate-issue-sync bot

Loading