-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(bitmap): optimize bitmap for all zeros and all ones #11090
Conversation
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Codecov Report
@@ Coverage Diff @@
## main #11090 +/- ##
==========================================
- Coverage 69.78% 69.78% -0.01%
==========================================
Files 1313 1313
Lines 223930 223958 +28
==========================================
+ Hits 156279 156292 +13
- Misses 67651 67666 +15
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 5 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I will take a look tmr. I suggest running q17 again btw. I remember it is performance sensitive with bitmap, and the last time some optimizations with bitmap |
Just out of curiosity, which one is that? |
Sorry it should be "potentially caused regressions", don't have such strong evidence. #8848 I think it was this one. It was likely just an observation of the performance regression from me, could be just variance. I have rephrased the above comment to be more accurate: #11090 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the benchmark shows it's seemingly neglectable, but just in case I will ask: does it introduce overheads for other cases, since this introduce match
everywhere? And what's the worst case? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. You can see that the time for AND and OR increased 2ns (5%) in normal cases. I think the overhead should be acceptable compared to the benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree matching could be costly, can try https://doc.rust-lang.org/std/intrinsics/fn.unlikely.html, to see if it improves performance.
(Just a thought I had, not actually asking for it to be implemented here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think it’s not an unlikely path. Otherwise why do we need to optimize it? 🤣 idk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @xxchan. I can not decide which branch is more likely. 🤣
Okay. I'm running q17 now. It's expected that bitmap has little impact on e2e performance. The actual purpose of this PR is to encourage the use of bitmap and avoid workarounds out of performance concerns (such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for this!
bits: Box<[usize]>, | ||
/// | ||
/// Optimization: If all bits are set to 0 or 1, this field is `None`. | ||
bits: Option<Box<[usize]>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can do further optimization, we can just use count_ones == num_bits
OR count_ones == 0
to check if for all_ones
and all_zeros
case?
Then it seems we can avoid branching?
Not needed for now I guess. Just my own thoughts if we need to optimize in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we first check count_ones
and then bits.unwrap_unchecked()
?
Co-authored-by: Noel Kwan <[email protected]>
Signed-off-by: Runji Wang <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
In our scenarios, it is very likely to have a bitmap with all bits set to 1. For example, the visibility of a compacted chunk and the null bitmap of an array with all values non-null. In this case, the dynamic allocated buffer can be avoided to safe both memory and time. A similar optimization is used in
SmallVec
.We already have such an optimization for visibilities (
Vis
). But this enum is not a perfect abstraction over bitmap and leaks its internal structure (bitmap or len). This leads to our current mixed use ofVis
,Bitmap
andOption<Bitmap>
for visibilities.This PR includes this optimization as a built-in feature of
Bitmap
. Since we have maintained the bit count inBitmap
, we can identify whether the bitmap is all-zeros or all-ones by checkingcount_ones
andnum_bits
only.count_ones == 0
, the bitmap should be all-zeroscount_ones == num_bits
, the bitmap should be all-onesin both cases, the
bits
buffer can beNone
.With this optimization, we are free to use
Bitmap
everywhere without worrying about the extra cost in all-one cases. Next step, we can removeVis
and only useBitmap
for visibilities.Benchmark results on bitmap of size 1024:
Checklist
./risedev check
(or alias,./risedev c
)Documentation