-
Notifications
You must be signed in to change notification settings - Fork 725
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature/2595 l1 sync deadlock (#2616)
* changed that producer doesnt stop
- Loading branch information
1 parent
156cd76
commit bf82f69
Showing
13 changed files
with
268 additions
and
172 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,63 +1,63 @@ | ||
|
||
# L1 parallel synchronization | ||
This is a refactor of L1 synchronization to improve speed. | ||
- It ask data in parallel to L1 meanwhile another goroutine is execution the rollup info. | ||
- It makes that executor be ocupied 100% of time. | ||
|
||
## Pending to do | ||
|
||
- All the stuff related to updating last block on L1 could be moved to another class | ||
- Check context usage: | ||
It need a context to cancel itself and create another context to cancel workers? | ||
- Emit metrics | ||
- if nothing to update reduce code to be executed (not sure, because functionality to keep update beyond last block on L1) | ||
- Improve the unittest of all objects | ||
- Check all log.fatals to remove it or add a status before the panic | ||
- Missing **feature update beyond last block on L1**: Old syncBlocks method try to ask for blocks over last L1 block, I suppose that is to keep synchronizing even a long the synchronization have new blocks. This is not implemented here | ||
This is the behaviour of ethman in that situation: | ||
- GetRollupInfoByBlockRange returns no errors, zero blocks... | ||
- EthBlockByNumber returns error: "not found" | ||
- It ask data in parallel to L1 meanwhile another goroutine is executing the rollup info. | ||
- It makes that the executor be occupied 100% of the time. | ||
|
||
## Pending to do | ||
- Some test on ` synchronizer/synchronizer_test.go` are based on this feature, so are running against legacy code | ||
- Move to configuration file some 'hardcoded' values | ||
|
||
## Configuration | ||
This feature is experimental for that reason you can configure to use old sequential one: | ||
You could choose between new L1 parallel sync or sequential one (legacy): | ||
``` | ||
[Synchronizer] | ||
UseParallelModeForL1Synchronization = false | ||
``` | ||
If you activate this feature you can configure: | ||
- `NumberOfParallelOfEthereumClients`: how many parallel request can be done. Currently this create multiples instances of etherman over same server, in the future maybe make sense to use differents servers | ||
- `CapacityOfBufferingRollupInfoFromL1`: buffer of data pending to be processed | ||
- `NumberOfParallelOfEthereumClients`: how many parallel request can be done. You must consider that 1 is just for requesting the last block on L1, and the rest for rollup info | ||
- `CapacityOfBufferingRollupInfoFromL1`: buffer of data pending to be processed. This is the queue data to be executed by consumer. | ||
|
||
For a full description of fields please check config-file documentation. | ||
|
||
Example: | ||
``` | ||
UseParallelModeForL1Synchronization = true | ||
[Synchronizer.L1ParallelSynchronization] | ||
NumberOfParallelOfEthereumClients = 2 | ||
CapacityOfBufferingRollupInfoFromL1 = 10 | ||
TimeForCheckLastBlockOnL1Time = "5s" | ||
TimeoutForRequestLastBlockOnL1 = "5s" | ||
MaxNumberOfRetriesForRequestLastBlockOnL1 = 3 | ||
TimeForShowUpStatisticsLog = "5m" | ||
TimeOutMainLoop = "5m" | ||
MinTimeBetweenRetriesForRollupInfo = "5s" | ||
[Synchronizer.L1ParallelSynchronization.PerformanceCheck] | ||
AcceptableTimeWaitingForNewRollupInfo = "5s" | ||
NumIterationsBeforeStartCheckingTimeWaitinfForNewRollupInfo = 10 | ||
``` | ||
## Remakable logs | ||
### How to known the occupation of executor | ||
To check that executor are fully ocuppied you can check next log: | ||
``` | ||
INFO synchronizer/l1_processor_consumer.go:110 consumer: processing rollupInfo #1291: range:[188064, 188164] num_blocks [0] wasted_time_waiting_for_data [74.17575ms] last_process_time [2.534115ms] block_per_second [0.000000] | ||
INFO synchronizer/l1_rollup_info_consumer.go:128 consumer: processing rollupInfo #1553: range:[8720385, 8720485] num_blocks [37] statistics:wasted_time_waiting_for_data [0s] last_process_time [6m2.635208117s] block_per_second [2.766837] | ||
``` | ||
The `wasted_time_waiting_for_data` show the waiting time between this call and the previous to executor. If this value (after 20 interations) are greater to 1 seconds a warning is show. | ||
The `wasted_time_waiting_for_data` show the waiting time between this call and the previous to executor. It could show a warning configuring `Synchronizer.L1ParallelSynchronization.PerformanceCheck` | ||
|
||
### Estimated time to be fully synchronizer with L1 | ||
This log show the estimated time (**ETA**) to reach the block goal | ||
This log show the estimated time (**ETA**) to reach the block goal. You can configure the frequency with var `TimeForShowUpStatisticsLog` | ||
``` | ||
INFO synchronizer/l1_data_retriever_producer.go:255 producer: Statistics:ETA: 3h40m1.311379085s percent:1.35 blocks_per_seconds:706.80 pending_block:127563/9458271 num_errors:0 | ||
INFO synchronizer/l1_rollup_info_producer.go:357 producer: Statistics:ETA: 54h7m47.594422312s percent:12.26 blocks_per_seconds:5.48 pending_block:149278/1217939 num_errors:8 | ||
``` | ||
|
||
## Flow of data | ||
![l1_sync_channels_flow_v2 drawio](https://github.com/0xPolygonHermez/zkevm-node/assets/129153821/430abeb3-13b2-4c13-8d5e-4996a134a353) | ||
![l1_sync_channels_flow_v2 drawio](l1_sync_channels_flow_v2.drawio.png) | ||
|
||
## Class diagram | ||
This is a class diagram of principal class an relationships. | ||
The entry point is `synchronizer.go:276` function `syncBlocksParallel`. | ||
- It create all objects needed and launch `l1SyncOrchestration` that wait until the job is done to return | ||
|
||
### The main objects are: | ||
- `l1RollupInfoProducer`: is the object that send rollup data through the channel | ||
- `l1SyncOrchestration`: is the entry point and the reponsable to launch the producer and consumer | ||
- `l1RollupInfoProducer`: this object send rollup data through the channel to the consumer | ||
- `l1RollupInfoConsumer`: that receive the data and execute it | ||
|
||
![image](https://github.com/0xPolygonHermez/zkevm-node/assets/129153821/957a3e95-77c7-446b-a6ec-ef28cc44cb18) | ||
|
||
## Future changes | ||
- Configure multiples servers for L1 information: instead of calling the same server,it make sense to configure individually each URL to allow to have multiples sources |
Oops, something went wrong.