-
Notifications
You must be signed in to change notification settings - Fork 613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: associate notification version with epoch #5446
Comments
Strong +1. Btw, do we have a case that notification version increases without epoch increases? In other words, can we use epoch as the notification version? |
Yes, for those changes not relevant to materialized views, like |
I think we must find a way to update & read a snapshot of the combination of Hummock snapshot (epoch) and the vnode mapping atomically. Currently, they're totally independent in both the frontend and the meta committing the barrier. However, any case of mismatch will lead to some problems.
So I guess some major refactoring might be necessary. 🤔 cc @hzxa21 @yezizp2012 @st1page |
Is this closed by #5999? |
…w fragment_mapping (#7042) This PR addresses #5446, by ensuring frontend is always notified of new snapshot after corresponding fragment_mapping. Also a minor refactor that extracts `collect_synced_ssts` from `complete_barrier`, to make the latter cleaner. This PR doesn't (and no need to) affect relative order between snapshot and catalog notification: - [snapshot is notified firstly](https://github.com/risingwavelabs/risingwave/blob/14d88919e29b11fe27ff3a0c6d921dacf66c157c/src/meta/src/rpc/service/ddl_service.rs#L453) - [catalog is notified later](https://github.com/risingwavelabs/risingwave/blob/14d88919e29b11fe27ff3a0c6d921dacf66c157c/src/meta/src/rpc/service/ddl_service.rs#L456) Approved-By: hzxa21 Approved-By: BugenZhao
The notification mechanism is completely asynchronous for now, as it was initially introduced for catalog updates. However, we've found some cases where we must build an association between the notification version and the system epoch.
Batch query when scaling
We prefer to schedule the scan tasks to the corresponding worker(parallel unit) by vnode mapping of the
Materialize
executors. After we support the barrier and checkpoint decoupling (#4966), this assumption must be kept. Besides, the in-memory state store e2e (#5322) also tests this.It's possible that a batch query is scheduled when scaling, where we get an epoch after the scaling, while the vnode mapping from the notification manager has not received updates yet. As the uncheckpointed data is invisible to other compute nodes, we will fail to scan the data from some migrated/rebalanced partitions.
Batch query when dropping
Similar to the above, we may be able to scan the dropped table whose data is being cleaned-up by the storage, due to asynchronous updates of catalog deletion in this frontend. This may cause wrong results or even break some assumptions in batch executors.
Solution
As epoch maintains the sequential consistency of DDLs and configuration changes, we may associate a "minimal notification version" with each epoch. After the frontend pins an epoch for a batch task, it must wait for this version to be synced before scheduling.
cc @liurenjie1024 @yezizp2012 @zwang28 @st1page 🤩
The text was updated successfully, but these errors were encountered: