fix: associate notification version with epoch #5446

BugenZhao · 2022-09-20T07:41:22Z

The notification mechanism is completely asynchronous for now, as it was initially introduced for catalog updates. However, we've found some cases where we must build an association between the notification version and the system epoch.

Batch query when scaling

We prefer to schedule the scan tasks to the corresponding worker(parallel unit) by vnode mapping of the Materialize executors. After we support the barrier and checkpoint decoupling (#4966), this assumption must be kept. Besides, the in-memory state store e2e (#5322) also tests this.

It's possible that a batch query is scheduled when scaling, where we get an epoch after the scaling, while the vnode mapping from the notification manager has not received updates yet. As the uncheckpointed data is invisible to other compute nodes, we will fail to scan the data from some migrated/rebalanced partitions.

Batch query when dropping

Similar to the above, we may be able to scan the dropped table whose data is being cleaned-up by the storage, due to asynchronous updates of catalog deletion in this frontend. This may cause wrong results or even break some assumptions in batch executors.

Solution

As epoch maintains the sequential consistency of DDLs and configuration changes, we may associate a "minimal notification version" with each epoch. After the frontend pins an epoch for a batch task, it must wait for this version to be synced before scheduling.

cc @liurenjie1024 @yezizp2012 @zwang28 @st1page 🤩

The text was updated successfully, but these errors were encountered:

hzxa21 · 2022-09-20T10:26:59Z

Strong +1.

Btw, do we have a case that notification version increases without epoch increases? In other words, can we use epoch as the notification version?

BugenZhao · 2022-09-20T10:44:54Z

Btw, do we have a case that notification version increases without epoch increases? In other words, can we use epoch as the notification version?

Yes, for those changes not relevant to materialized views, like create/drop source, create/drop database, or even hummock version updates?

BugenZhao · 2022-10-24T07:33:44Z

I think we must find a way to update & read a snapshot of the combination of Hummock snapshot (epoch) and the vnode mapping atomically. Currently, they're totally independent in both the frontend and the meta committing the barrier. However, any case of mismatch will lead to some problems.

Old (current) snapshot, new mapping: we'll try_wait_epoch for a Current epoch on a wrong partition, which will fail on assertion.

risingwave/src/storage/src/hummock/local_version/local_version_manager.rs

Lines 226 to 235 in 75468b4

    
           let sealed_epoch = self.local_version.read().get_sealed_epoch(); 
        
           assert!( 
        
               epoch <= sealed_epoch 
        
                   && epoch != HummockEpoch::MAX 
        
               , 
        
               "current epoch can't read, because the epoch in storage is not updated, epoch{}, sealed epoch{}" 
        
               ,epoch 
        
               ,sealed_epoch 
        
           ); 
        
           return Ok(());

New (current) snapshot, old mapping: some data will be invisible remotely, which leads to wrong results.

So I guess some major refactoring might be necessary. 🤔 cc @hzxa21 @yezizp2012 @st1page

fuyufjh · 2022-11-08T06:20:31Z

Is this closed by #5999?

BugenZhao · 2022-11-08T07:08:19Z

Is this closed by #5999?

Nope. They're unrelated. I think this might be closed by #6250.

…w fragment_mapping (#7042) This PR addresses #5446, by ensuring frontend is always notified of new snapshot after corresponding fragment_mapping. Also a minor refactor that extracts `collect_synced_ssts` from `complete_barrier`, to make the latter cleaner. This PR doesn't (and no need to) affect relative order between snapshot and catalog notification: - [snapshot is notified firstly](https://github.com/risingwavelabs/risingwave/blob/14d88919e29b11fe27ff3a0c6d921dacf66c157c/src/meta/src/rpc/service/ddl_service.rs#L453) - [catalog is notified later](https://github.com/risingwavelabs/risingwave/blob/14d88919e29b11fe27ff3a0c6d921dacf66c157c/src/meta/src/rpc/service/ddl_service.rs#L456) Approved-By: hzxa21 Approved-By: BugenZhao

BugenZhao · 2023-01-30T03:49:40Z

Can we close it with #7042? cc @zwang28

zwang28 · 2023-01-30T05:29:54Z

Can we close it with #7042? cc @zwang28

#7024 addresses the "Batch query when scaling" case.

I'm checking whether the "Batch query when dropping" case is correctly handled. Will open another tracking issue when required.

BugenZhao added type/enhancement Improvements to existing implementation. component/meta Meta related issue. component/frontend Protocol, parsing, binder. labels Sep 20, 2022

github-actions bot added this to the release-0.1.13 milestone Sep 20, 2022

fuyufjh modified the milestones: release-0.1.13, next-release-0.1.14 Sep 26, 2022

fuyufjh assigned BugenZhao Sep 26, 2022

This was referenced Oct 14, 2022

Tracking: online scaling in compute node #3750

Open

feat(batch): support query data without checkpoint #5850

Merged

BugenZhao removed their assignment Oct 24, 2022

BugenZhao mentioned this issue Oct 24, 2022

refactor: hummock snapshot management #5999

Merged

3 tasks

BugenZhao assigned xxhZs Nov 8, 2022

hzxa21 modified the milestones: release-0.1.14, release-0.1.15 Nov 22, 2022

fuyufjh changed the title ~~discussion: associate notification version with epoch~~ fix: associate notification version with epoch Nov 24, 2022

fuyufjh mentioned this issue Nov 24, 2022

feat: read non-checkpoint barrier #6558

Closed

xxhZs mentioned this issue Nov 30, 2022

feat(frontend): Align vnode mapping and snapshot epoch #6250

Closed

3 tasks

BugenZhao modified the milestones: release-0.1.15, release-0.1.16 Dec 19, 2022

zwang28 mentioned this issue Dec 23, 2022

refactor(notification): ensure new snapshot is only notified after new fragment_mapping #7042

Merged

2 tasks

fuyufjh closed this as completed Jan 30, 2023

zwang28 mentioned this issue Jan 31, 2023

bug: batch query dropped table may get wrong result #7615

Closed

BugenZhao mentioned this issue Mar 17, 2023

bug: batch query uses wrong parallel unit after fragment rescheduling #8615

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: associate notification version with epoch #5446

fix: associate notification version with epoch #5446

BugenZhao commented Sep 20, 2022 •

edited

Loading

hzxa21 commented Sep 20, 2022

BugenZhao commented Sep 20, 2022

BugenZhao commented Oct 24, 2022

fuyufjh commented Nov 8, 2022

BugenZhao commented Nov 8, 2022

BugenZhao commented Jan 30, 2023

zwang28 commented Jan 30, 2023

fix: associate notification version with epoch #5446

fix: associate notification version with epoch #5446

Comments

BugenZhao commented Sep 20, 2022 • edited Loading

Batch query when scaling

Batch query when dropping

Solution

hzxa21 commented Sep 20, 2022

BugenZhao commented Sep 20, 2022

BugenZhao commented Oct 24, 2022

fuyufjh commented Nov 8, 2022

BugenZhao commented Nov 8, 2022

BugenZhao commented Jan 30, 2023

zwang28 commented Jan 30, 2023

BugenZhao commented Sep 20, 2022 •

edited

Loading