perf: investigate reenabling flush/compaction pacing #687

petermattis · 2020-05-19T00:58:22Z

Flush/compaction pacing was disabled in #431 because it had an apparent effect on throughput. Some recent experiments with pebble bench ycsb show a throughput benefit to pacing (old is with pacing disabled, new is with pacing enabled).

name        old ops/sec  new ops/sec  delta
update_100    439k ± 2%    524k ± 2%  +19.46%  (p=0.008 n=5+5)

name        old p50      new p50      delta
update_100    0.30 ± 0%    0.30 ± 0%     ~     (all equal)

name        old p90      new p90      delta
update_100    1.66 ± 4%    1.40 ± 0%  -15.66%  (p=0.000 n=5+4)

name        old p99      new p99      delta
update_100    4.88 ± 4%    2.28 ± 5%  -53.28%  (p=0.008 n=5+5)

name        old pMax     new pMax     delta
update_100     160 ±62%      30 ± 7%  -81.01%  (p=0.016 n=5+4)

Interestingly, compaction pacing was very quickly disabled in these tests because compactions were not keeping up. The above tests were for only 1m in duration. Longer runs need to be performed. Also need to look at scenarios involving reads and writes.

A lot has changed since mid-December when #431 was merged, so perhaps the true culprit was not pacing. Or perhaps we've fixed other bottlenecks so that pacing is no longer a problem.

Jira issue: PEBBLE-175

The text was updated successfully, but these errors were encountered:

petermattis · 2020-05-19T01:27:49Z

Here is a 10m run with pacing enabled:

____optype__elapsed_____ops(total)___ops/sec(cum)__keys/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
update_100   600.0s      295399650       492331.3       492331.4      0.5      0.2      1.6      2.6     33.6

And a 10m run with pacing disabled:

____optype__elapsed_____ops(total)___ops/sec(cum)__keys/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
update_100   600.0s      243737974       406229.5       406229.9      0.6      0.3      1.8      6.3    104.9

petermattis · 2020-05-19T16:01:50Z

Ah, pacing causes problems with a workload composed of reads and writes. Here is ycsb A which is 50% reads and 50% updates. Old is with pacing disabled, new is with pacing enabled:

name     old ops/sec  new ops/sec  delta
ycsb/A     847k ± 1%    633k ± 1%  -25.27%  (p=0.000 n=10+9)

I have a strong suspicion the problem is that pacing increases read amplification. I added a bit of instrumentation to Iterator so that it can report read amplification. With pacing disabled the average read amplification across iterators was 5.8. With pacing enabled the average read amplification was 8.7. This would also explain why pacing speeds up write-only workloads: we're allowing the read amplification to increase which is generally good for write performance.

sumeerbhola · 2021-10-13T19:44:00Z

With pacing disabled the average read amplification across iterators was 5.8. With pacing enabled the average read amplification was 8.7. This would also explain why pacing speeds up write-only workloads: we’re allowing the read amplification to increase which is generally good for write performance.

This makes me more convinced that any pacing that does not take into account whether we have an actual CPU or IO resource bottleneck is probably making unnecessary tradeoffs (related to #1329). In this example:

if there is ample cpu/io resources there is no need to pace.
if there is a resource bottleneck the situation becomes more murky: pacing will save immediate CPU and IO on the write path, though we are probably just shifting that saving to be spent in the future. But by making reads less efficient, due to higher read amplification, it may pay back that saving immediately, making it overall worse. This seems a tricky one to optimize.

jbowens · 2022-02-02T21:20:51Z

Is this still worthwhile as an issue separate from #1329?

itsbilal · 2022-02-02T23:44:47Z

@jbowens I would say yes - it makes sense as a sub-issue of that. This one is more general about testing the implications of enabling pacing on its own, while that one is about potentially using pacing in IO-constrained situations (assuming it is beneficial in prioritizing user-writes or other compactions). I can see #1329 wrapping up without much in the way of using pacing, in which case this could stand as a remaining bit of exploration/work.

Compaction and flush pacing have been disabled for several years. Remove the code to avoid maintaining dead code. If/when we attempt to re-enable pacing, we can revert this commit. Informs cockroachdb#687.

Compaction and flush pacing have been disabled for several years. Remove the code to avoid maintaining dead code. If or when we attempt to re-enable pacing, we can revert this commit. Informs cockroachdb#687.

Compaction and flush pacing have been disabled for several years. Remove the code to avoid maintaining dead code. If or when we attempt to re-enable pacing, we can revert this commit. Informs #687.

github-actions · 2023-07-31T11:01:24Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

jbowens mentioned this issue Jul 14, 2022

db: remove compaction/flush pacing #1812

Merged

github-actions bot added the no-issue-activity label Jul 31, 2023

jbowens removed the no-issue-activity label Jul 31, 2023

jbowens mentioned this issue Nov 27, 2023

Is there any way to limit the compaction rate in pebbledb #3105

Closed

jbowens added this to [Deprecated] Storage Jun 4, 2024

jbowens moved this to Backlog in [Deprecated] Storage Jun 4, 2024

This was referenced Oct 28, 2024

Read performance degradation during compaction workload #4109

Open

storage: smooth pebble compaction cockroachdb/cockroach#133948

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: investigate reenabling flush/compaction pacing #687

perf: investigate reenabling flush/compaction pacing #687

petermattis commented May 19, 2020 •

edited by cockroach-jira-scripts

Loading

petermattis commented May 19, 2020

petermattis commented May 19, 2020

sumeerbhola commented Oct 13, 2021

jbowens commented Feb 2, 2022

itsbilal commented Feb 2, 2022

github-actions bot commented Jul 31, 2023

perf: investigate reenabling flush/compaction pacing #687

perf: investigate reenabling flush/compaction pacing #687

Comments

petermattis commented May 19, 2020 • edited by cockroach-jira-scripts Loading

petermattis commented May 19, 2020

petermattis commented May 19, 2020

sumeerbhola commented Oct 13, 2021

jbowens commented Feb 2, 2022

itsbilal commented Feb 2, 2022

github-actions bot commented Jul 31, 2023

petermattis commented May 19, 2020 •

edited by cockroach-jira-scripts

Loading