-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(storage): change FullKey from concatenated &[u8]
to struct
#6130
Conversation
Codecov Report
@@ Coverage Diff @@
## main #6130 +/- ##
==========================================
+ Coverage 74.34% 74.40% +0.06%
==========================================
Files 953 952 -1
Lines 154348 154996 +648
==========================================
+ Hits 114754 115331 +577
- Misses 39594 39665 +71
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@Little-Wallace After this PR, the items in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM. Thanks for so much work!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM!
I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.
What's changed and what's your intention?
This PR has introduced another concept:
table_key
as part of theFullKey
, which is the storage input key from the state table (without table id prefix), to get rid of the assumption thatStateStore
must be accessed fromKeySpace
to ensure the inputkey
always starts withtable_id
. To make the structure ofFullKey
clear and avoid the frequent splitting and assembly of the&[u8]
slice,FullKey
is now changed to a struct, with itstable_id
,table_key
andepoch
easily accessible from fields.More specifically, this PR contains the following changes:
FullKey
's definition.SharedBufferBatch
. In other words,SharedBufferBatch
now only storestable_key
.HummockIterator
to useFullKey
asseek
parameter and returnFullKey
fromkey()
. What's different is thatUserIterator::key
returns the encodedUserKey
previously, but now it will also return its epoch.Keyspace
is removed, butKeyspace
is not removed to limit the size of this PR. (Hopefully it will be removed in the next PR)key_with_epoch
anduser_key
are retained because sometimes we still need to manipulate encoded keys with these functions, mainly in tests and processing SST meta. But with more refactoring they can be removed in the future.Review Tips
This PR seems very large, but most of the changes are in the test code. Here are the files that contain logic/interface change
FullKey
andUserKey
interfaceFullKey
as inputHummockIterator
interface changeLocalVersion
use table key to do read filteringSharedBuffer
uses table key as filter.SharedBufferBatch
SstableBuilder
andCapacitySplitTableBuilder
usesFullKey
as input.MemoryStateStore
implUserIterator
returnsFullKey
instead ofUserKey
,StripPrefixIterator
now becomesExtractTableKeyIterator
which not only strips table id prefix as usual, but also strips epoch postfixKeySpaceWriteBatch
Benchmark
Run nexmark Q5 for 20 mins, this pr (above) vs main (below). Note that in main there's a peak in the beginning, so the "max" value tend to be larger". In a nutshell, although
ingest_batch
is a lot quicker, there's no significant improvement on read/write throughput of state store. Memory consumption also stays the same, so I'm guessing shared buffer does not take up much memory in the first place. Compaction has become a bit slower, because iterator'sFullKey
is not very friendly with compaction, which manipulates encoded keys.Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)