Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(storage): update seal_epoch synchronously #7171

Merged
merged 3 commits into from
Jan 3, 2023
Merged

Conversation

zwang28
Copy link
Contributor

@zwang28 zwang28 commented Jan 3, 2023

I hereby agree to the terms of the Singularity Data, Inc. Contributor License Agreement.

What's changed and what's your intention?

Before this PR, seal_epoch updating is handled by HummockEventHandler asynchronously, which cannot ensure HummockEventHandler::seal_epoch is updated before notifying barrier finishing . It can lead to current epoch can't read, because the epoch in storage is not updated.

This PR fixes it by updating HummockStorage::seal_epoch synchronously, which shares the same underlying atomic var of HummockEventHandler::seal_epoch.

Checklist

  • I have written necessary rustdoc comments
    - [ ] I have added necessary unit tests and integration tests
    - [ ] All checks passed in ./risedev check (or alias, ./risedev c)

Documentation

If your pull request contains user-facing changes, please specify the types of the changes, and create a release note. Otherwise, please feel free to remove this section.

Types of user-facing changes

Please keep the types that apply to your changes, and remove those that do not apply.

  • Installation and deployment
  • Connector (sources & sinks)
  • SQL commands, functions, and operators
  • RisingWave cluster configuration changes
  • Other (please specify in the release note below)

Release note

Please create a release note for your changes. In the release note, focus on the impact on users, and mention the environment or conditions where the impact may occur.

Refer to a related PR or issue link (optional)

fix #7169

@github-actions github-actions bot added the type/fix Bug fix label Jan 3, 2023
@zwang28 zwang28 force-pushed the wangzheng/fix_seal_epoch branch from 884bb55 to db94868 Compare January 3, 2023 10:15
@zwang28 zwang28 requested review from wenym1 and hzxa21 January 3, 2023 10:15
@@ -268,6 +268,9 @@ impl StateStore for HummockStorage {
warn!("sealing invalid epoch");
return;
}
// Update `seal_epoch` synchronously,
// as `HummockEvent::SealEpoch` is handled asynchronously.
self.seal_epoch.store(epoch, MemOrdering::SeqCst);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, the seal epoch should be updated in somewhere else. Should we remove updating the seal epoch there?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, the seal epoch should be updated in somewhere else. Should we remove updating the seal epoch there?

+1. I think we don't need to update seal epoch when handling HummockEvent::SealEpoch in HummockEventHandler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@zwang28 zwang28 requested a review from wenym1 January 3, 2023 10:25
@wenym1
Copy link
Contributor

wenym1 commented Jan 3, 2023

If this PR is intended to fix the panic caused by try_wait_epoch on uncommitted epoch, I would suggest we modify the logic of try_wait_epoch to do a really wait for the sealed_epoch to bump up asynchronously. In the future there may not be such global seal_epoch method, but instead seal each local state store, and then we may need to wait on the sealed_epoch of each local state store instance to bump up.

@codecov
Copy link

codecov bot commented Jan 3, 2023

Codecov Report

Merging #7171 (d284a86) into main (0e90e6b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #7171   +/-   ##
=======================================
  Coverage   73.15%   73.15%           
=======================================
  Files        1053     1053           
  Lines      167994   167996    +2     
=======================================
+ Hits       122891   122893    +2     
  Misses      45103    45103           
Flag Coverage Δ
rust 73.15% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...src/hummock/event_handler/hummock_event_handler.rs 80.53% <ø> (-0.05%) ⬇️
src/storage/src/hummock/state_store.rs 70.37% <100.00%> (+0.47%) ⬆️
src/meta/src/hummock/mock_hummock_meta_client.rs 64.73% <0.00%> (-1.58%) ⬇️
...torage/src/hummock/local_version/pinned_version.rs 88.30% <0.00%> (-0.59%) ⬇️
src/object_store/src/object/mem.rs 87.12% <0.00%> (-0.38%) ⬇️
src/storage/src/hummock/compactor/mod.rs 83.10% <0.00%> (-0.31%) ⬇️
src/object_store/src/object/mod.rs 51.08% <0.00%> (-0.22%) ⬇️
src/storage/src/hummock/sstable_store.rs 63.58% <0.00%> (-0.20%) ⬇️
src/meta/src/hummock/manager/mod.rs 79.33% <0.00%> (-0.12%) ⬇️
src/batch/src/executor/group_top_n.rs 74.85% <0.00%> (+6.43%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Contributor

@wenym1 wenym1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, the current fix LGTM. We may introduce the real wait in future PR.

@zwang28
Copy link
Contributor Author

zwang28 commented Jan 3, 2023

I would suggest we modify the logic of try_wait_epoch to do a really wait for the sealed_epoch to bump up asynchronously

Agree.
I'll merge this quick fix first, to avoid flaky tests.

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify mergify bot merged commit 44b4c10 into main Jan 3, 2023
@mergify mergify bot deleted the wangzheng/fix_seal_epoch branch January 3, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/fix Bug fix
Projects
None yet
3 participants