Skip to content

Commit 4787e3c

Browse files
committed
storage: adopt new raft MaxCommittedSizePerReady config parameter
Before this change, the size of log committed log entries which a replica could apply at a time was bound to the same configuration as the total size of log entries which could be sent in a message (MaxSizePerMsg) which is generally kilobytes. This limit had an impact on the throughput of writes to a replica, particularly when writing large amounts of data. A new raft configuration option MaxCommittedSizePerReady was adding to etcd/raft in (etcd-io/etcd#10258) which allows these two size parameters to be decoupled. This change adopts the configuration and sets it to a default of 64MB. On the below workload which is set up to always return exactly one entry per Ready with the old configuration we see a massive win in both throughput and latency. ``` ./workload run kv {pgurl:1-3} \ --init --splits=10 \ --duration 60s \ --read-percent=${READ_PERCENT} \ --min-block-bytes=8193 --max-block-bytes=16385 \ --concurrency=1024 ``` ``` name old ops/s new ops/s delta KV0 483 ± 3% 2025 ± 3% +319.32% (p=0.002 n=6+6) ``` Before: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 29570 492.8 1981.2 2281.7 5100.3 5637.1 6442.5 write 60.0s 0 28405 473.4 2074.8 2281.7 5637.1 6710.9 7516.2 write 60.0s 0 28615 476.9 2074.3 2550.1 5905.6 6442.5 8321.5 write 60.0s 0 28718 478.6 2055.4 2550.1 5100.3 6442.5 7516.2 write 60.0s 0 28567 476.1 2079.8 2684.4 4831.8 5368.7 6442.5 write 60.0s 0 29981 499.7 1975.7 1811.9 5368.7 6174.0 6979.3 write ``` After: ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 119652 1994.0 510.9 486.5 1006.6 1409.3 4295.0 write 60.0s 0 125321 2088.4 488.5 469.8 906.0 1275.1 4563.4 write 60.0s 0 119644 1993.9 505.2 469.8 1006.6 1610.6 5637.1 write 60.0s 0 119027 1983.6 511.4 469.8 1073.7 1946.2 4295.0 write 60.0s 0 121723 2028.5 500.6 469.8 1040.2 1677.7 4160.7 write 60.0s 0 123697 2061.4 494.1 469.8 1006.6 1610.6 4295.0 write ``` Fixes cockroachdb#31511 Release note: None
1 parent 9ef9532 commit 4787e3c

File tree

4 files changed

+21
-7
lines changed

4 files changed

+21
-7
lines changed

Gopkg.lock

+2-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pkg/base/config.go

+17-4
Original file line numberDiff line numberDiff line change
@@ -102,11 +102,17 @@ var (
102102
defaultRaftLogTruncationThreshold = envutil.EnvOrDefaultInt64(
103103
"COCKROACH_RAFT_LOG_TRUNCATION_THRESHOLD", 4<<20 /* 4 MB */)
104104

105-
// defaultRaftMaxSizePerMsg specifies the maximum number of Raft log entries
106-
// that a leader will send to followers in a single MsgApp.
105+
// defaultRaftMaxSizePerMsg specifies the maximum aggregate byte size of Raft
106+
// log entries that a leader will send to followers in a single MsgApp.
107107
defaultRaftMaxSizePerMsg = envutil.EnvOrDefaultInt(
108108
"COCKROACH_RAFT_MAX_SIZE_PER_MSG", 16<<10 /* 16 KB */)
109109

110+
// defaultRaftMaxSizeCommittedSizePerReady specifies the maximum aggregate
111+
// byte size of the committed log entries which a node will receive in a
112+
// single Ready.
113+
defaultRaftMaxCommittedSizePerReady = envutil.EnvOrDefaultInt(
114+
"COCKROACH_RAFT_MAX_COMMITTED_SIZE_PER_READY", 64<<20 /* 64 MB */)
115+
110116
// defaultRaftMaxSizePerMsg specifies how many "inflight" messages a leader
111117
// will send to a follower without hearing a response.
112118
defaultRaftMaxInflightMsgs = envutil.EnvOrDefaultInt(
@@ -467,10 +473,14 @@ type RaftConfig struct {
467473
// committed but continue to be proposed.
468474
RaftMaxUncommittedEntriesSize uint64
469475

470-
// RaftMaxSizePerMsg controls how many Raft log entries the leader will send to
471-
// followers in a single MsgApp.
476+
// RaftMaxSizePerMsg controls the maximum aggregate byte size of Raft log
477+
// entries the leader will send to followers in a single MsgApp.
472478
RaftMaxSizePerMsg uint64
473479

480+
// RaftMaxCommittedSizePerReady controls the maximum aggregate byte size of
481+
// committed Raft log entries a replica will receive in a single Ready.
482+
RaftMaxCommittedSizePerReady uint64
483+
474484
// RaftMaxInflightMsgs controls how many "inflight" messages Raft will send
475485
// to a follower without hearing a response. The total number of Raft log
476486
// entries is a combination of this setting and RaftMaxSizePerMsg. The
@@ -512,6 +522,9 @@ func (cfg *RaftConfig) SetDefaults() {
512522
if cfg.RaftMaxSizePerMsg == 0 {
513523
cfg.RaftMaxSizePerMsg = uint64(defaultRaftMaxSizePerMsg)
514524
}
525+
if cfg.RaftMaxCommittedSizePerReady == 0 {
526+
cfg.RaftMaxCommittedSizePerReady = uint64(defaultRaftMaxCommittedSizePerReady)
527+
}
515528
if cfg.RaftMaxInflightMsgs == 0 {
516529
cfg.RaftMaxInflightMsgs = defaultRaftMaxInflightMsgs
517530
}

pkg/storage/store.go

+1
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,7 @@ func newRaftConfig(
171171
ElectionTick: storeCfg.RaftElectionTimeoutTicks,
172172
HeartbeatTick: storeCfg.RaftHeartbeatIntervalTicks,
173173
MaxUncommittedEntriesSize: storeCfg.RaftMaxUncommittedEntriesSize,
174+
MaxCommittedSizePerReady: storeCfg.RaftMaxCommittedSizePerReady,
174175
MaxSizePerMsg: storeCfg.RaftMaxSizePerMsg,
175176
MaxInflightMsgs: storeCfg.RaftMaxInflightMsgs,
176177
Storage: strg,

0 commit comments

Comments
 (0)