Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(bitmap): optimize bitmap for all zeros and all ones #11090

Merged
merged 8 commits into from
Jul 24, 2023
Merged

Conversation

wangrunji0408
Copy link
Contributor

@wangrunji0408 wangrunji0408 commented Jul 20, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

In our scenarios, it is very likely to have a bitmap with all bits set to 1. For example, the visibility of a compacted chunk and the null bitmap of an array with all values non-null. In this case, the dynamic allocated buffer can be avoided to safe both memory and time. A similar optimization is used in SmallVec.

We already have such an optimization for visibilities (Vis). But this enum is not a perfect abstraction over bitmap and leaks its internal structure (bitmap or len). This leads to our current mixed use of Vis, Bitmap and Option<Bitmap> for visibilities.

This PR includes this optimization as a built-in feature of Bitmap. Since we have maintained the bit count in Bitmap, we can identify whether the bitmap is all-zeros or all-ones by checking count_ones and num_bits only.

  • if count_ones == 0, the bitmap should be all-zeros
  • if count_ones == num_bits, the bitmap should be all-ones

in both cases, the bits buffer can be None.

With this optimization, we are free to use Bitmap everywhere without worrying about the extra cost in all-one cases. Next step, we can remove Vis and only use Bitmap for visibilities.

Benchmark results on bitmap of size 1024:

bench before after change
zeros 20.075 ns 480.24 ps -98%
ones 20.506 ns 480.21 ps -98%
from_bytes 26.216 ns 28.430 ns 8%
get_from_ones 320.24 ps 320.31 ps 0%
get 318.48 ps 320.31 ps 1%
and 29.640 ns 31.000 ns 5%
or 29.224 ns 31.020 ns 6%
not 23.268 ns 21.869 ns -6%
eq 317.18 ps 318.11 ps 0%
iter 477.25 ns 479.22 ns 0%
iter_on_ones 485.85 ns 333.99 ns -31%
iter_ones_on_zeros 6.0423 ns 317.61 ps -95%
iter_ones_on_ones 790.78 ns 334.26 ns -58%
iter_ones_on_sparse 92.240 ns 90.978 ns -1%
iter_ones_on_dense 696.99 ns 698.80 ns 0%
iter_range 333.91 ns 333.42 ns -0%
zeros_iter 5.0219 ms 3.3781 ms -33%
ones_iter 5.0097 ms 3.3936 ms -32%

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR contains user-facing changes.

Signed-off-by: Runji Wang <[email protected]>
Copy link
Member

@BugenZhao BugenZhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@codecov
Copy link

codecov bot commented Jul 20, 2023

Codecov Report

Merging #11090 (3f406a5) into main (481f778) will decrease coverage by 0.01%.
The diff coverage is 89.76%.

@@            Coverage Diff             @@
##             main   #11090      +/-   ##
==========================================
- Coverage   69.78%   69.78%   -0.01%     
==========================================
  Files        1313     1313              
  Lines      223930   223958      +28     
==========================================
+ Hits       156279   156292      +13     
- Misses      67651    67666      +15     
Flag Coverage Δ
rust 69.78% <89.76%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/buffer/bitmap.rs 95.01% <89.60%> (-1.32%) ⬇️
src/common/src/array/data_chunk.rs 87.88% <100.00%> (ø)

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@kwannoel
Copy link
Contributor

kwannoel commented Jul 20, 2023

I will take a look tmr. I suggest running q17 again btw. I remember it is performance sensitive with bitmap, and the last time some optimizations with bitmap actually caused regressions had minimal benefit on e2e performance, although micro-bench shows improvement.

@xxchan
Copy link
Member

xxchan commented Jul 20, 2023

the last time some optimizations with bitmap actually caused regressions, although micro-bench shows improvement.

Just out of curiosity, which one is that?

@kwannoel
Copy link
Contributor

kwannoel commented Jul 20, 2023

the last time some optimizations with bitmap actually caused regressions, although micro-bench shows improvement.

Just out of curiosity, which one is that?

Sorry it should be "potentially caused regressions", don't have such strong evidence.

#8848 I think it was this one. It was likely just an observation of the performance regression from me, could be just variance.

I have rephrased the above comment to be more accurate: #11090 (comment).

Copy link
Member

@xxchan xxchan Jul 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the benchmark shows it's seemingly neglectable, but just in case I will ask: does it introduce overheads for other cases, since this introduce match everywhere? And what's the worst case? 👀

Copy link
Contributor Author

@wangrunji0408 wangrunji0408 Jul 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You can see that the time for AND and OR increased 2ns (5%) in normal cases. I think the overhead should be acceptable compared to the benefit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree matching could be costly, can try https://doc.rust-lang.org/std/intrinsics/fn.unlikely.html, to see if it improves performance.

(Just a thought I had, not actually asking for it to be implemented here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I think it’s not an unlikely path. Otherwise why do we need to optimize it? 🤣 idk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @xxchan. I can not decide which branch is more likely. 🤣

@wangrunji0408
Copy link
Contributor Author

I will take a look tmr. I suggest running q17 again btw. I remember it is performance sensitive with bitmap, and the last time some optimizations with bitmap actually caused regressions had minimal benefit on e2e performance, although micro-bench shows improvement.

Okay. I'm running q17 now. It's expected that bitmap has little impact on e2e performance. The actual purpose of this PR is to encourage the use of bitmap and avoid workarounds out of performance concerns (such as Vis). 😄

@wangrunji0408
Copy link
Contributor Author

wangrunji0408 commented Jul 21, 2023

The performance of Q17 seems not changed significantly. (compared to last nightly version)

WX20230721-143620@2x

Copy link
Contributor

@kwannoel kwannoel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this!

bits: Box<[usize]>,
///
/// Optimization: If all bits are set to 0 or 1, this field is `None`.
bits: Option<Box<[usize]>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do further optimization, we can just use count_ones == num_bits OR count_ones == 0 to check if for all_ones and all_zeros case?
Then it seems we can avoid branching?

Not needed for now I guess. Just my own thoughts if we need to optimize in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we first check count_ones and then bits.unwrap_unchecked()?

@wangrunji0408 wangrunji0408 added this pull request to the merge queue Jul 24, 2023
Merged via the queue into main with commit 57b7632 Jul 24, 2023
@wangrunji0408 wangrunji0408 deleted the wrj/bitmap branch July 24, 2023 16:30
@wangrunji0408 wangrunji0408 mentioned this pull request Sep 18, 2023
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants