Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Should the default data_directory: hummock001 be removed? #9153

Closed
huangjw806 opened this issue Apr 13, 2023 · 6 comments · Fixed by #9170
Closed

Discussion: Should the default data_directory: hummock001 be removed? #9153

huangjw806 opened this issue Apr 13, 2023 · 6 comments · Fixed by #9170
Assignees
Milestone

Comments

@huangjw806
Copy link
Contributor

Yesterday cloud found a Risingwave data directory got overwritten issue.

This is because the kernel has changed the configuration, but the cloud has not been modified synchronously, and data_directory is not provided, and the kernel will use hummock001 as the default value, which will cause data overwritten.

This parameter is too important, should the default value be removed to avoid risks?

@github-actions github-actions bot added this to the release-0.19 milestone Apr 13, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Apr 13, 2023

Agree.

By the way, so it sounds like multiple tenants share the same S3 bucket, right?

@huangjw806
Copy link
Contributor Author

multiple tenants share the same S3 bucket, right?

Yes, now the cloud is designed like this.

@fuyufjh
Copy link
Member

fuyufjh commented Apr 13, 2023

multiple tenants share the same S3 bucket, right?

Yes, now the cloud is designed like this.

Not sure whether this is a good design. Especially considering

  1. Isolation of resource. Performance of different tenants should not intervene with each other.
  2. Permission control. We are considering to let the users to access the bucket in some cases directly, for example, run analytic queries via Spark/Presto. In this case, it seems to be painful to do permission control over the shared bucket.
  3. Radius of impact of some extreme failure. For example, some software bugs or human error delete this bucket accidently, then all tenants are affected.

Anyway, this is another topic. We can fix this issue first and discuss this later.

@huangjw806
Copy link
Contributor Author

The bucket quota limit should be the reason why cloud adopts this design.

@wjf3121
Copy link

wjf3121 commented Apr 13, 2023

Regarding sharing S3 bucket, there's a 1000 bucket number limit: https://docs.aws.amazon.com/AmazonS3/latest/userguide/BucketRestrictions.html

@Gun9niR
Copy link
Contributor

Gun9niR commented Apr 13, 2023

We should deny uninitialized state_store as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants