-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(storage): block format epoch dictionary encoding #8605
Conversation
Any benchmark result? |
@@ -149,6 +166,8 @@ pub struct Block { | |||
|
|||
/// Restart points. | |||
restart_points: Vec<RestartPoint>, | |||
|
|||
epoch_dictionary: EpochDictionary, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need more documentation on what the structure of this dictionary is and how we use it.
@@ -445,17 +512,29 @@ impl BlockBuilder { | |||
#[cfg(debug_assertions)] | |||
self.debug_valid(); | |||
|
|||
let epoch = full_key.epoch; | |||
self.epoch_dictionary.insert(epoch); | |||
|
|||
let mut key: BytesMut = Default::default(); | |||
full_key.encode_into_without_table_id(&mut key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to encode epoch into key
. We can avoid slicing in L534 and L567
@@ -697,7 +817,7 @@ mod tests { | |||
builder.add(construct_full_key_struct(0, b"k3", 3), b"v03"); | |||
builder.add(construct_full_key_struct(0, b"k4", 4), b"v04"); | |||
let capacity = builder.uncompressed_block_size(); | |||
assert_eq!(capacity, builder.approximate_len() - 9); | |||
// assert_eq!(capacity, builder.approximate_len() - 9); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to fix
@@ -569,7 +660,40 @@ impl BlockBuilder { | |||
self.buf | |||
.put_u32_le(self.restart_points_type_index.len() as u32); | |||
|
|||
let mut epoch_index: usize = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the doc of build
in L629 accordingly
// encode epoch_group | ||
for group in &self.epoch_group { | ||
let group_len = group.len(); | ||
self.buf.put_u16_le(group_len as u16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although highly unlikely, the possibility of group_len overflow still exists. How about focing a block swtich when the number of keys added to the block >= u16::MAX?
} | ||
|
||
// epoch_group == resatrt_points count , not need to record | ||
self.buf.put_u32_le(self.entry_count as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment above, entry_count can be u16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for self.restart_points_type_index.len()
and self.restart_points.len()
|
||
// epoch_group == resatrt_points count , not need to record | ||
self.buf.put_u32_le(self.entry_count as u32); | ||
self.buf.put_u16_le(self.epoch_group.len() as u16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment in L684, why do we need to encode this?
last_key_len_type: LenType, | ||
last_value_len_type: LenType, | ||
|
||
key_index_in_restart_point: usize, | ||
|
||
epoch: HummockEpoch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docs for all the newly added fields.
…into li0k/storage_block_dictionary_encoding
…into li0k/storage_block_dictionary_encoding
TLDR:
@hzxa21 @wcy-fdu @Little-Wallace I suggest we can drop this feat, what's your opinion? I executed two scenarios for testing test 1
test 2
|
From result of test 1:
From result of test 2:
|
So you mean the impact of latency is greater than the benefits of block compression? It's a trade off🤔 Maybe we can hold this PR for a while, after #8584 is implemented, we can compare the benefits of different compression methods horizontally, and think about whether to let users choose whether to compress. |
Yes,its trade-off between space with perform, just that I don't think the benefits are proportional |
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Checklist For Contributors
./risedev check
(or alias,./risedev c
)Checklist For Reviewers
Documentation
Click here for Documentation
Types of user-facing changes
Please keep the types that apply to your changes, and remove the others.
Release note