Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(storage): change FullKey from concatenated &[u8] to struct #6130

Merged
merged 55 commits into from
Nov 11, 2022

Conversation

Gun9niR
Copy link
Contributor

@Gun9niR Gun9niR commented Oct 31, 2022

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

This PR has introduced another concept: table_key as part of the FullKey, which is the storage input key from the state table (without table id prefix), to get rid of the assumption that StateStore must be accessed from KeySpace to ensure the input key always starts with table_id. To make the structure of FullKey clear and avoid the frequent splitting and assembly of the &[u8] slice, FullKey is now changed to a struct, with its table_id, table_key and epoch easily accessible from fields.

More specifically, this PR contains the following changes:

  • Change FullKey's definition.
pub struct TableKey<T: AsRef<[u8]>>(pub T);

pub struct UserKey<T: AsRef<[u8]>> {
    pub table_id: TableId,
    pub table_key: T,
}

pub struct FullKey<T: AsRef<[u8]>> {
    pub user_key: UserKey<T>,
    pub epoch: HummockEpoch,
}
  • Remove duplicated table id and epoch from SharedBufferBatch. In other words, SharedBufferBatch now only stores table_key.
  • Change HummockIterator to use FullKey as seek parameter and return FullKey from key(). What's different is that UserIterator::key returns the encoded UserKey previously, but now it will also return its epoch.
  • Change state store interface to pass table key instead of user key as the key for all read/write ops. The prefixing logic in Keyspace is removed, but Keyspace is not removed to limit the size of this PR. (Hopefully it will be removed in the next PR)
  • Functions like key_with_epoch and user_key are retained because sometimes we still need to manipulate encoded keys with these functions, mainly in tests and processing SST meta. But with more refactoring they can be removed in the future.

Review Tips

This PR seems very large, but most of the changes are in the test code. Here are the files that contain logic/interface change

Benchmark

Run nexmark Q5 for 20 mins, this pr (above) vs main (below). Note that in main there's a peak in the beginning, so the "max" value tend to be larger". In a nutshell, although ingest_batch is a lot quicker, there's no significant improvement on read/write throughput of state store. Memory consumption also stays the same, so I'm guessing shared buffer does not take up much memory in the first place. Compaction has become a bit slower, because iterator's FullKey is not very friendly with compaction, which manipulates encoded keys.

  • Hummock write throughput

image
image

  • Hummock read throughput

image
image

  • Ingest batch

image
image

  • Compaction
    image
    image

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • All checks passed in ./risedev check (or alias, ./risedev c)

Refer to a related PR or issue link (optional)

@Gun9niR Gun9niR marked this pull request as draft October 31, 2022 16:27
@codecov
Copy link

codecov bot commented Nov 1, 2022

Codecov Report

Merging #6130 (c733124) into main (daf3ea3) will increase coverage by 0.06%.
The diff coverage is 89.97%.

@@            Coverage Diff             @@
##             main    #6130      +/-   ##
==========================================
+ Coverage   74.34%   74.40%   +0.06%     
==========================================
  Files         953      952       -1     
  Lines      154348   154996     +648     
==========================================
+ Hits       114754   115331     +577     
- Misses      39594    39665      +71     
Flag Coverage Δ
rust 74.40% <89.97%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/src/types/mod.rs 66.39% <ø> (+0.35%) ⬆️
...c/connector/src/source/nexmark/source/generator.rs 0.00% <0.00%> (ø)
src/ctl/src/cmd_impl/hummock/list_kv.rs 0.00% <0.00%> (ø)
src/ctl/src/cmd_impl/hummock/sst_dump.rs 0.00% <0.00%> (ø)
src/rpc_client/src/meta_client.rs 0.00% <0.00%> (ø)
src/storage/hummock_sdk/src/lib.rs 70.73% <ø> (ø)
src/storage/hummock_sdk/src/prost_key_range.rs 100.00% <ø> (ø)
src/storage/hummock_test/src/test_utils.rs 94.88% <ø> (-0.14%) ⬇️
src/storage/src/hummock/validator.rs 0.00% <0.00%> (ø)
src/storage/src/monitor/monitored_store.rs 5.84% <0.00%> (ø)
... and 88 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Gun9niR
Copy link
Contributor Author

Gun9niR commented Nov 10, 2022

@Little-Wallace After this PR, the items in SharedBufferBatch will not contain table id or epoch. So this may have minor conflict with range delete.

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Thanks for so much work!!!

@Gun9niR Gun9niR requested a review from wenym1 November 11, 2022 04:47
Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants