Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: investigate reenabling flush/compaction pacing #687

Open
petermattis opened this issue May 19, 2020 · 6 comments
Open

perf: investigate reenabling flush/compaction pacing #687

petermattis opened this issue May 19, 2020 · 6 comments

Comments

@petermattis
Copy link
Collaborator

petermattis commented May 19, 2020

Flush/compaction pacing was disabled in #431 because it had an apparent effect on throughput. Some recent experiments with pebble bench ycsb show a throughput benefit to pacing (old is with pacing disabled, new is with pacing enabled).

name        old ops/sec  new ops/sec  delta
update_100    439k ± 2%    524k ± 2%  +19.46%  (p=0.008 n=5+5)

name        old p50      new p50      delta
update_100    0.30 ± 0%    0.30 ± 0%     ~     (all equal)

name        old p90      new p90      delta
update_100    1.66 ± 4%    1.40 ± 0%  -15.66%  (p=0.000 n=5+4)

name        old p99      new p99      delta
update_100    4.88 ± 4%    2.28 ± 5%  -53.28%  (p=0.008 n=5+5)

name        old pMax     new pMax     delta
update_100     160 ±62%      30 ± 7%  -81.01%  (p=0.016 n=5+4)

Interestingly, compaction pacing was very quickly disabled in these tests because compactions were not keeping up. The above tests were for only 1m in duration. Longer runs need to be performed. Also need to look at scenarios involving reads and writes.

A lot has changed since mid-December when #431 was merged, so perhaps the true culprit was not pacing. Or perhaps we've fixed other bottlenecks so that pacing is no longer a problem.

Jira issue: PEBBLE-175

@petermattis
Copy link
Collaborator Author

Here is a 10m run with pacing enabled:

____optype__elapsed_____ops(total)___ops/sec(cum)__keys/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
update_100   600.0s      295399650       492331.3       492331.4      0.5      0.2      1.6      2.6     33.6

And a 10m run with pacing disabled:

____optype__elapsed_____ops(total)___ops/sec(cum)__keys/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
update_100   600.0s      243737974       406229.5       406229.9      0.6      0.3      1.8      6.3    104.9

@petermattis
Copy link
Collaborator Author

Ah, pacing causes problems with a workload composed of reads and writes. Here is ycsb A which is 50% reads and 50% updates. Old is with pacing disabled, new is with pacing enabled:

name     old ops/sec  new ops/sec  delta
ycsb/A     847k ± 1%    633k ± 1%  -25.27%  (p=0.000 n=10+9)

I have a strong suspicion the problem is that pacing increases read amplification. I added a bit of instrumentation to Iterator so that it can report read amplification. With pacing disabled the average read amplification across iterators was 5.8. With pacing enabled the average read amplification was 8.7. This would also explain why pacing speeds up write-only workloads: we're allowing the read amplification to increase which is generally good for write performance.

@sumeerbhola
Copy link
Collaborator

With pacing disabled the average read amplification across iterators was 5.8. With pacing enabled the average read amplification was 8.7. This would also explain why pacing speeds up write-only workloads: we’re allowing the read amplification to increase which is generally good for write performance.

This makes me more convinced that any pacing that does not take into account whether we have an actual CPU or IO resource bottleneck is probably making unnecessary tradeoffs (related to #1329). In this example:

  • if there is ample cpu/io resources there is no need to pace.
  • if there is a resource bottleneck the situation becomes more murky: pacing will save immediate CPU and IO on the write path, though we are probably just shifting that saving to be spent in the future. But by making reads less efficient, due to higher read amplification, it may pay back that saving immediately, making it overall worse. This seems a tricky one to optimize.

@jbowens
Copy link
Collaborator

jbowens commented Feb 2, 2022

Is this still worthwhile as an issue separate from #1329?

@itsbilal
Copy link
Contributor

itsbilal commented Feb 2, 2022

@jbowens I would say yes - it makes sense as a sub-issue of that. This one is more general about testing the implications of enabling pacing on its own, while that one is about potentially using pacing in IO-constrained situations (assuming it is beneficial in prioritizing user-writes or other compactions). I can see #1329 wrapping up without much in the way of using pacing, in which case this could stand as a remaining bit of exploration/work.

jbowens added a commit to jbowens/pebble that referenced this issue Jul 14, 2022
Compaction and flush pacing have been disabled for several years. Remove the
code to avoid maintaining dead code. If/when we attempt to re-enable pacing, we
can revert this commit.

Informs cockroachdb#687.
jbowens added a commit to jbowens/pebble that referenced this issue Jul 14, 2022
Compaction and flush pacing have been disabled for several years. Remove the
code to avoid maintaining dead code. If or when we attempt to re-enable pacing,
we can revert this commit.

Informs cockroachdb#687.
jbowens added a commit to jbowens/pebble that referenced this issue Jul 15, 2022
Compaction and flush pacing have been disabled for several years. Remove the
code to avoid maintaining dead code. If or when we attempt to re-enable pacing,
we can revert this commit.

Informs cockroachdb#687.
jbowens added a commit that referenced this issue Jul 15, 2022
Compaction and flush pacing have been disabled for several years. Remove the
code to avoid maintaining dead code. If or when we attempt to re-enable pacing,
we can revert this commit.

Informs #687.
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Backlog
Development

No branches or pull requests

4 participants