feat(storage): block format epoch dictionary encoding #8605

Li0k · 2023-03-16T12:01:17Z

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Checklist For Contributors

I have written necessary rustdoc comments
I have added necessary unit tests and integration tests
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
I have demonstrated that backward compatibility is not broken by breaking changes and created issues to track deprecated features to be removed in the future. (Please refer to the issue)
All checks passed in ./risedev check (or alias, ./risedev c)

Checklist For Reviewers

I have requested macro/micro-benchmarks as this PR can affect performance substantially, and the results are shown.

Documentation

My PR DOES NOT contain user-facing changes.

Click here for Documentation

Types of user-facing changes

Please keep the types that apply to your changes, and remove the others.

Installation and deployment
Connector (sources & sinks)
SQL commands, functions, and operators
RisingWave cluster configuration changes
Other (please specify in the release note below)

Release note

wcy-fdu · 2023-03-16T12:03:00Z

Any benchmark result?

hzxa21 · 2023-03-16T12:05:53Z

src/storage/src/hummock/sstable/block.rs

@@ -149,6 +166,8 @@ pub struct Block {

    /// Restart points.
    restart_points: Vec<RestartPoint>,
+
+    epoch_dictionary: EpochDictionary,


need more documentation on what the structure of this dictionary is and how we use it.

hzxa21 · 2023-03-16T12:20:24Z

src/storage/src/hummock/sstable/block.rs

@@ -445,17 +512,29 @@ impl BlockBuilder {
        #[cfg(debug_assertions)]
        self.debug_valid();

+        let epoch = full_key.epoch;
+        self.epoch_dictionary.insert(epoch);
+
        let mut key: BytesMut = Default::default();
        full_key.encode_into_without_table_id(&mut key);


We don't need to encode epoch into key. We can avoid slicing in L534 and L567

src/storage/src/hummock/sstable/block.rs

hzxa21 · 2023-03-16T12:30:39Z

src/storage/src/hummock/sstable/block.rs

@@ -697,7 +817,7 @@ mod tests {
        builder.add(construct_full_key_struct(0, b"k3", 3), b"v03");
        builder.add(construct_full_key_struct(0, b"k4", 4), b"v04");
        let capacity = builder.uncompressed_block_size();
-        assert_eq!(capacity, builder.approximate_len() - 9);
+        // assert_eq!(capacity, builder.approximate_len() - 9);


need to fix

hzxa21 · 2023-03-16T12:32:03Z

src/storage/src/hummock/sstable/block.rs

@@ -569,7 +660,40 @@ impl BlockBuilder {
        self.buf
            .put_u32_le(self.restart_points_type_index.len() as u32);

+        let mut epoch_index: usize = 0;


Please update the doc of build in L629 accordingly

hzxa21 · 2023-03-16T12:36:18Z

src/storage/src/hummock/sstable/block.rs

+        // encode epoch_group
+        for group in &self.epoch_group {
+            let group_len = group.len();
+            self.buf.put_u16_le(group_len as u16);


Although highly unlikely, the possibility of group_len overflow still exists. How about focing a block swtich when the number of keys added to the block >= u16::MAX?

hzxa21 · 2023-03-16T12:38:09Z

src/storage/src/hummock/sstable/block.rs

+        }
+
+        // epoch_group == resatrt_points count , not need to record
+        self.buf.put_u32_le(self.entry_count as u32);


Based on the comment above, entry_count can be u16?

Same for self.restart_points_type_index.len() and self.restart_points.len()

hzxa21 · 2023-03-16T12:39:56Z

src/storage/src/hummock/sstable/block.rs

+
+        // epoch_group == resatrt_points count , not need to record
+        self.buf.put_u32_le(self.entry_count as u32);
+        self.buf.put_u16_le(self.epoch_group.len() as u16);


Based on the comment in L684, why do we need to encode this?

hzxa21 · 2023-03-16T12:51:54Z

src/storage/src/hummock/sstable/block_iterator.rs

    last_key_len_type: LenType,
    last_value_len_type: LenType,
+
+    key_index_in_restart_point: usize,
+
+    epoch: HummockEpoch,


Please add docs for all the newly added fields.

…into li0k/storage_block_dictionary_encoding

Li0k · 2023-03-20T06:04:16Z

TLDR:

Introduces complexity and some performance degradation, with insignificant space savings
Version compatibility issues

@hzxa21 @wcy-fdu @Little-Wallace I suggest we can drop this feat, what's your opinion?

I executed two scenarios for testing

test 1

key num 5w per block
k-v pair (200B, 400B)
restart_point_interval = 16

main cost

branch cost

test 2

key num 1000 per block
k-v pair 100B
restart_point_interval = 16

main cost

branch cost

wcy-fdu · 2023-03-20T06:18:59Z

From result of test 1:

block size reduced to 97% of previous
iter.next latency increased by 3.5%
iter.prev latency increased by 6%

From result of test 2:

block size reduced to 92% of previous
iter.next latency increased by 10%

wcy-fdu · 2023-03-20T06:24:31Z

So you mean the impact of latency is greater than the benefits of block compression? It's a trade off🤔

Maybe we can hold this PR for a while, after #8584 is implemented, we can compare the benefits of different compression methods horizontally, and think about whether to let users choose whether to compress.

Li0k · 2023-03-20T09:00:57Z

So you mean the impact of latency is greater than the benefits of block compression? It's a trade off🤔

Maybe we can hold this PR for a while, after #8584 is implemented, we can compare the benefits of different compression methods horizontally, and think about whether to let users choose whether to compress.

Yes，its trade-off between space with perform, just that I don't think the benefits are proportional

Li0k added 3 commits March 16, 2023 17:12

feat(storage): basic epoch dictionary encoding

7a7fb00

feat(storage): epoch dictionary encoding for Block

24f3d73

fix(storage): fix typo

688583f

Li0k requested review from hzxa21, Little-Wallace and wcy-fdu March 16, 2023 12:01

github-actions bot added the type/feature label Mar 16, 2023

Li0k marked this pull request as draft March 16, 2023 12:10

hzxa21 reviewed Mar 16, 2023

View reviewed changes

Li0k added 6 commits March 16, 2023 22:43

refactor(storage): refactor epoch_dictionary

c801898

Merge branch 'main' of https://github.com/singularity-data/risingwave …

f8a6e8e

…into li0k/storage_block_dictionary_encoding

Merge branch 'main' of https://github.com/singularity-data/risingwave …

fbb792a

…into li0k/storage_block_dictionary_encoding

refactor(storage): refactor epoch group

e3d45ca

refactor(storage): refactor epoch group to buf

e3a7423

fix(storage): fix typo

07efb27

Li0k mentioned this pull request Mar 20, 2023

Tracking: Hummock Block Format #8252

Closed

6 tasks

Li0k closed this Mar 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(storage): block format epoch dictionary encoding #8605

feat(storage): block format epoch dictionary encoding #8605

Li0k commented Mar 16, 2023

wcy-fdu commented Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

hzxa21 Mar 16, 2023

Li0k commented Mar 20, 2023

wcy-fdu commented Mar 20, 2023 •

edited

Loading

wcy-fdu commented Mar 20, 2023 •

edited

Loading

Li0k commented Mar 20, 2023

feat(storage): block format epoch dictionary encoding #8605

feat(storage): block format epoch dictionary encoding #8605

Conversation

Li0k commented Mar 16, 2023

What's changed and what's your intention?

Checklist For Contributors

Checklist For Reviewers

Documentation

Types of user-facing changes

Release note

wcy-fdu commented Mar 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Li0k commented Mar 20, 2023

test 1

test 2

wcy-fdu commented Mar 20, 2023 • edited Loading

wcy-fdu commented Mar 20, 2023 • edited Loading

Li0k commented Mar 20, 2023

wcy-fdu commented Mar 20, 2023 •

edited

Loading

wcy-fdu commented Mar 20, 2023 •

edited

Loading