Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix & refactor(table): use dist_key_in_pk instead of distribution_key #8377

Closed
1 of 2 tasks
st1page opened this issue Mar 6, 2023 · 7 comments
Closed
1 of 2 tasks
Assignees

Comments

@st1page
Copy link
Contributor

st1page commented Mar 6, 2023

In our system, all the table's distribution keys are always in the primary key. And we are more care about the distribution key from the primary key or its prefix for partition pruning.

https://github.com/singularity-data/risingwave/blob/main/src/stream/src/common/table/state_table.rs
https://github.com/singularity-data/risingwave/blob/main/src/storage/src/table/batch_table/storage_table.rs

Also, the distribution keys can not express the actual indices in the read prefix, especially when the primary key has duplications. #7698

@github-actions github-actions bot added this to the release-0.1.18 milestone Mar 6, 2023
@st1page
Copy link
Contributor Author

st1page commented Mar 6, 2023

c.c. @yuhao-su @BugenZhao @wcy-fdu

@yuhao-su
Copy link
Contributor

yuhao-su commented Mar 6, 2023

LGTM

@lmatz
Copy link
Contributor

lmatz commented Mar 6, 2023

Off the topic, it seems the concept of distribution key never appears in the doc, is it worth to be mentioned?

@st1page
Copy link
Contributor Author

st1page commented Mar 7, 2023

Off the topic, it seems the concept of distribution key never appears in the doc, is it worth to be mentioned?
I think not.

currently, the distribution key seems an internal concept. Users can not specify the distribution key for any table/mview or internal table and they are all generated by the optimizer. Users can use create index to influence the final distribution key, but it is just physical implementation behind the logical concept such as Index, in other words "create a table which is optimized by the key xxx"

@yuhao-su
Copy link
Contributor

yuhao-su commented Mar 7, 2023

currently, the distribution key seems an internal concept

Although it is an internal concept, the user performance might be affected by dist key when it's cardinality is extremely low (e.g. the dist key is gender) Users may want to know the cause?

@lmatz
Copy link
Contributor

lmatz commented Mar 7, 2023

Users can not specify the distribution key

I thought the doc maintained on Github intends to be developer-facing 🤔

@st1page
Copy link
Contributor Author

st1page commented Mar 7, 2023

I thought the doc maintained on Github intends to be developer-facing thinking

oh, sorry, I get it now.

@st1page st1page changed the title chore(table): use dist_key_in_pk instead of distribution_key fix & refactor(table): use dist_key_in_pk instead of distribution_key Mar 7, 2023
@fuyufjh fuyufjh modified the milestones: release-0.18, release-0.19 Mar 22, 2023
@yuhao-su yuhao-su modified the milestones: release-0.19, release-0.20 May 19, 2023
@fuyufjh fuyufjh modified the milestones: release-1.2, release-1.3 Sep 11, 2023
@st1page st1page removed this from the release-1.3 milestone Oct 10, 2023
@st1page st1page closed this as completed Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants