-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: don't send snapshots to overloaded learners #85479
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
erikgrinaker
added
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
T-kv-replication
labels
Aug 2, 2022
cc @cockroachdb/replication |
blathers-crl
bot
added
the
A-kv-replication
Relating to Raft, consensus, and coordination.
label
Aug 2, 2022
Hat tip to @nvanbenschoten for the hint. |
craig bot
pushed a commit
that referenced
this issue
Aug 10, 2022
74337: kv: pool rangeCacheKey objects r=arulajmani a=nvanbenschoten This commit introduces a `sync.Pool` for `rangeCacheKey` objects. This is used to avoid heap allocations when querying and updating the `RangeCache`. ``` name old time/op new time/op delta KV/Scan/Native/rows=1-10 14.8µs ± 2% 14.9µs ± 4% ~ (p=0.356 n=9+10) KV/Scan/SQL/rows=1-10 92.1µs ± 4% 94.3µs ± 5% ~ (p=0.113 n=9+10) name old alloc/op new alloc/op delta KV/Scan/Native/rows=1-10 6.87kB ± 0% 6.85kB ± 0% -0.35% (p=0.000 n=10+10) KV/Scan/SQL/rows=1-10 20.0kB ± 0% 20.0kB ± 0% -0.25% (p=0.012 n=10+10) name old allocs/op new allocs/op delta KV/Scan/Native/rows=1-10 52.0 ± 0% 51.0 ± 0% -1.92% (p=0.000 n=10+10) KV/Scan/SQL/rows=1-10 244 ± 0% 242 ± 0% -0.78% (p=0.000 n=10+10) ``` ---- This is part of a collection of assorted micro-optimizations: - #74336 - #74337 - #74338 - #74339 - #74340 - #74341 - #74342 - #74343 - #74344 - #74345 - #74346 - #74347 - #74348 Combined, these changes have the following effect on end-to-end SQL query performance: ``` name old time/op new time/op delta KV/Scan/SQL/rows=1-10 94.4µs ±10% 92.3µs ±11% -2.20% (p=0.000 n=93+93) KV/Scan/SQL/rows=10-10 102µs ±10% 99µs ±10% -2.16% (p=0.000 n=94+94) KV/Update/SQL/rows=10-10 378µs ±15% 370µs ±11% -2.04% (p=0.003 n=95+91) KV/Insert/SQL/rows=1-10 133µs ±14% 132µs ±12% ~ (p=0.738 n=95+93) KV/Insert/SQL/rows=10-10 197µs ±14% 196µs ±13% ~ (p=0.902 n=95+94) KV/Update/SQL/rows=1-10 186µs ±14% 185µs ±14% ~ (p=0.351 n=94+93) KV/Delete/SQL/rows=1-10 132µs ±13% 132µs ±14% ~ (p=0.473 n=94+94) KV/Delete/SQL/rows=10-10 254µs ±16% 250µs ±16% ~ (p=0.086 n=100+99) name old alloc/op new alloc/op delta KV/Scan/SQL/rows=1-10 20.1kB ± 0% 19.1kB ± 1% -4.91% (p=0.000 n=96+96) KV/Scan/SQL/rows=10-10 21.7kB ± 0% 20.7kB ± 1% -4.61% (p=0.000 n=96+97) KV/Delete/SQL/rows=10-10 64.0kB ± 3% 63.7kB ± 3% -0.55% (p=0.000 n=100+100) KV/Update/SQL/rows=1-10 45.8kB ± 1% 45.5kB ± 1% -0.55% (p=0.000 n=97+98) KV/Update/SQL/rows=10-10 105kB ± 1% 105kB ± 1% -0.10% (p=0.008 n=97+98) KV/Delete/SQL/rows=1-10 40.8kB ± 0% 40.7kB ± 0% -0.08% (p=0.001 n=95+96) KV/Insert/SQL/rows=1-10 37.4kB ± 1% 37.4kB ± 0% ~ (p=0.698 n=97+96) KV/Insert/SQL/rows=10-10 76.4kB ± 1% 76.4kB ± 0% ~ (p=0.822 n=99+98) name old allocs/op new allocs/op delta KV/Scan/SQL/rows=1-10 245 ± 0% 217 ± 0% -11.43% (p=0.000 n=95+92) KV/Scan/SQL/rows=10-10 280 ± 0% 252 ± 0% -10.11% (p=0.000 n=75+97) KV/Delete/SQL/rows=10-10 478 ± 0% 459 ± 0% -4.04% (p=0.000 n=94+97) KV/Delete/SQL/rows=1-10 297 ± 1% 287 ± 1% -3.34% (p=0.000 n=97+97) KV/Update/SQL/rows=1-10 459 ± 0% 444 ± 0% -3.27% (p=0.000 n=97+97) KV/Insert/SQL/rows=1-10 291 ± 0% 286 ± 0% -1.72% (p=0.000 n=82+86) KV/Update/SQL/rows=10-10 763 ± 1% 750 ± 1% -1.68% (p=0.000 n=96+98) KV/Insert/SQL/rows=10-10 489 ± 0% 484 ± 0% -1.03% (p=0.000 n=98+98) ``` 85730: kvserver: also block LEARNER snaps to paused followers r=erikgrinaker a=tbg We checked whether the snapshot recipient was paused only in the raft log queue path. By pushing the check down into `sendSnapshot`, it is now hit by any snapshot attempt, which includes the replicate queue and store rebalancer. For best results, both of these should avoid moving replicas to paused followers in the first place, which they already do, at least partially, so this change shouldn't have much of an impact in practice. Fixes #85479. Release note: None 85732: kvserver: only pause followers when holding active lease r=erikgrinaker a=tbg If the raft leader is not the leaseholder (which includes the case in which we just transferred the lease away), leave all followers unpaused. Otherwise, the leaseholder won't learn that the entries it submitted were committed which effectively causes range unavailability. Fixes #84884. Release note: None 85756: builtins: add strptime/strftime builtins without experimental prefix r=rafiss a=rafiss refs #52139 These are just an alias for the existing implementation. Release note (sql change): The strptime and strftime builtin functions were added as aliases for experimental_strptime and experimental_strftime. Co-authored-by: Nathan VanBenschoten <[email protected]> Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: Rafi Shamim <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-kv-replication
Relating to Raft, consensus, and coordination.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
In #83851, we prevent sending Raft snapshots to followers that have an overloaded engine:
cockroach/pkg/kv/kvserver/raft_snapshot_queue.go
Lines 142 to 153 in acdf42a
However, we don't do this when sending snapshots to new learners:
cockroach/pkg/kv/kvserver/replica_command.go
Lines 1801 to 1803 in acdf42a
We probably should. Erroring out here may be sufficient.
Jira issue: CRDB-18266
Epic CRDB-15069
The text was updated successfully, but these errors were encountered: