-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(en): gracefully shutdown en waiting for reorg detector #270
fix(en): gracefully shutdown en waiting for reorg detector #270
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #270 +/- ##
=======================================
Coverage 35.61% 35.62%
=======================================
Files 520 520
Lines 28352 28348 -4
=======================================
Hits 10098 10098
+ Misses 18254 18250 -4
☔ View full report in Codecov by Sentry. |
…hecker-are-in-a-race
…hecker-are-in-a-race
Taking a step back - why do we need such an (arguably) complicated logic? It's partly because we need to run the reorg detector in parallel. We have a similar case with Circuit Breaker on the main node. Is there any potential to extract some abstraction? |
@montekki I think it's a good solution for now. A proper task management entity is in the plans too. |
…hecker-are-in-a-race
…hecker-are-in-a-race
…hecker-are-in-a-race
…hecker-are-in-a-race
…hecker-are-in-a-race
…hecker-are-in-a-race
🤖 I have created a release *beep* *boop* --- ## [16.2.0](core-v16.1.0...core-v16.2.0) (2023-10-26) ### Features * **basic_witness_producer_input:** Add Basic Witness Producer Input component ([#156](#156)) ([3cd24c9](3cd24c9)) * **core:** adding pubdata to statekeeper and merkle tree ([#259](#259)) ([1659c84](1659c84)) ### Bug Fixes * **db:** Fix root cause of RocksDB misbehavior ([#301](#301)) ([d6c30ab](d6c30ab)) * **en:** gracefully shutdown en waiting for reorg detector ([#270](#270)) ([f048485](f048485)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This PR changes the shutdown order of the external node to a more graceful one.
Why?
Currently sutting down the EN because of reorgs and inconsistency checks may be raceful and a shutting down consistency checker may bring down the reorg detector with it which would prevent the reorg detector from doing it's job in the legitimate cases of reorgs where the node actually needs to be rolled back to some previous state.
What happens currently?
The concerns have been focused around the logic of consistency checker that used to panic when an inconsistency was found. That is no longer the case (
anyhow::bail!
used to bepanic
):zksync-era/core/lib/zksync_core/src/consistency_checker/mod.rs
Lines 181 to 183 in feb8a6c
however, this error would still unconditionally bring the node down because currently the handler for
ConsistencyChecker
is a part ofwait_tasks
that would resolve if any of the tasks has resolved:zksync-era/core/bin/external_node/src/main.rs
Lines 401 to 404 in feb8a6c
The proposed logic works as follows:
The handle to reorg detector is fused and on the node shutdown, after all the components are gracefully shutdown we manually check the result of the reorg detector. This logic handles both situations: