Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(storage): StateStore::iter() interface doesn't filter rows correctly #6104

Closed
StrikeW opened this issue Oct 29, 2022 · 11 comments
Closed
Assignees
Labels
component/storage Storage good first issue Good for newcomers type/bug Something isn't working

Comments

@StrikeW
Copy link
Contributor

StrikeW commented Oct 29, 2022

Describe the bug

Although we specify the table_id field in ReadOptions, it may still read kv pairs belonging to other tables. Because the table_id isn't used as a prefix to filter the kv pairs in a SST file.

    /// Opens and returns an iterator for a given `key_range`.
    /// The returned iterator will iterate data based on a snapshot corresponding to
    /// the given `epoch`.
    fn iter(
        &self,
        key_range: (Bound<Vec<u8>>, Bound<Vec<u8>>),
        epoch: u64,
        read_options: ReadOptions,
    ) -> Self::IterFuture<'_>;

To Reproduce

If the SST file contains kv pairs of multiple tables, this iterator will scan all kvs in the file.

hummock.iter(
                None,
                (Bound::Unbounded, Bound::Unbounded),
                ReadOptions {
                    epoch,
                    table_id: TableId { table_id },
                    retention_seconds: None,
                },
            )
            .await?

Expected behavior

Since we have set the table_id field in ReadOptions, the iterator should only scan kv pairs of the specified table.

Additional context

No response

@jon-chuang
Copy link
Contributor

Hi, I'd like to work on this issue.

@jon-chuang jon-chuang self-assigned this Nov 1, 2022
@jon-chuang
Copy link
Contributor

jon-chuang commented Nov 1, 2022

Hi could I ask if we still need to encode table_id as a prefix even if table_id == 0? What does table_id == 0 correspond to?

Just seems that table_id == 0 is handled differently to other table_id.

Edit: Seems like that is reserved for unit tests? Seems the main function is not to have to include a table_id prefix in the keys so that unit tests are more readable.

@jon-chuang
Copy link
Contributor

jon-chuang commented Nov 1, 2022

Anw, seems like this issue is mostly an interface issue. Its not ambiguous for a get that you need to specify the table_id, as if you do not prefix the key correctly, you will not fetch the right data. But an unbounded range iter may give you a confusing result as it is ambiguous if the ReadOptions table_id acts as a filter or not. In our case, since we already filter irrelevant SSTs, it makes sense that it wil filter irrelevant keys as well.

So we are just doing some defensive coding by fixing this.

I will add some stronger guarantees about ReadOptions in the interface.

@StrikeW
Copy link
Contributor Author

StrikeW commented Nov 1, 2022

Anw, seems like this issue is mostly an interface issue. Its not ambiguous for a get that you need to specify the table_id, as if you do not prefix the key correctly, you will not fetch the right data. But an unbounded range iter may give you a confusing result as it is ambiguous if the ReadOptions table_id acts as a filter or not. In our case, since we already filter irrelevant SSTs, it makes sense that it wil filter irrelevant keys as well.

Yes. The problem is the semantic of the interface. And since our executors in the compute node always read with a Keyspace which will prepend the correct prefix (and the unit tests in storage don't have good coverage), so this problem doesn't reveal. But if we use the interface alone, the semantic is ambiguous.

let range = prefixed_range(range, &self.prefix);

@jon-chuang
Copy link
Contributor

jon-chuang commented Nov 1, 2022

Actually, I don't think the right behaviour should be to add the prefix to the range.

If the bound is Included/Excluded, these need to already be prefixed by the table_id.

We cannot accept Unbounded bound, this should have been converted to e.g. Included([table_id]).

Hence, I suggest instead to throw an error on Unbounded bound.

We can also do a sanity check that the table_id will serialize to the first 4 bytes of included/excluded bounds.

Not sure if you think we should put these checks only in debug mode as debug_assertions

@StrikeW
Copy link
Contributor Author

StrikeW commented Nov 1, 2022

Just prepending ReadOptions::table_id to the key_range may cause other problems if we still rely on Keyspace.
I agree that we can check for whether the first 4bytes of key_range equal to the table_id in the ReadOptions (only in debug mode is ok I think), then the problem will show up early for the case of unbounded scan mentioned in the issue description.

@wenym1
Copy link
Contributor

wenym1 commented Nov 1, 2022

#6130 is working on a full key refactor proposed in #5960. After this refactor, the key passed in will not be prefixed with table id, and all read will be enforced to be performed under the specified table id.

#6005 has proposed to remove all the table_id == 0 hack. Maybe we can work on resolving this issue.

@jon-chuang
Copy link
Contributor

Ok makes sense. Should we close this issue?

@StrikeW
Copy link
Contributor Author

StrikeW commented Nov 1, 2022

I think this issue can be closed only after #6130 has finished and add more unit tests. cc @Gun9niR

@Gun9niR
Copy link
Contributor

Gun9niR commented Nov 1, 2022

I will add tests related to multiple tables.

@Gun9niR
Copy link
Contributor

Gun9niR commented Nov 16, 2022

#6130 has removed prefix requirement for the state store, and added ut for multiple tables. I think #6130 closes this issue. #6005 will resolve the table_id == 0 hack.

@Gun9niR Gun9niR closed this as completed Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/storage Storage good first issue Good for newcomers type/bug Something isn't working
Projects
None yet
4 participants