Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/2595 l1 sync deadlock #2616

Merged
merged 6 commits into from
Oct 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 32 additions & 32 deletions docs/design/synchronizer/l1_synchronization.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,63 @@

# L1 parallel synchronization
This is a refactor of L1 synchronization to improve speed.
- It ask data in parallel to L1 meanwhile another goroutine is execution the rollup info.
- It makes that executor be ocupied 100% of time.

## Pending to do

- All the stuff related to updating last block on L1 could be moved to another class
- Check context usage:
It need a context to cancel itself and create another context to cancel workers?
- Emit metrics
- if nothing to update reduce code to be executed (not sure, because functionality to keep update beyond last block on L1)
- Improve the unittest of all objects
- Check all log.fatals to remove it or add a status before the panic
- Missing **feature update beyond last block on L1**: Old syncBlocks method try to ask for blocks over last L1 block, I suppose that is to keep synchronizing even a long the synchronization have new blocks. This is not implemented here
This is the behaviour of ethman in that situation:
- GetRollupInfoByBlockRange returns no errors, zero blocks...
- EthBlockByNumber returns error: "not found"
- It ask data in parallel to L1 meanwhile another goroutine is executing the rollup info.
- It makes that the executor be occupied 100% of the time.

## Pending to do
- Some test on ` synchronizer/synchronizer_test.go` are based on this feature, so are running against legacy code
- Move to configuration file some 'hardcoded' values

## Configuration
This feature is experimental for that reason you can configure to use old sequential one:
You could choose between new L1 parallel sync or sequential one (legacy):
```
[Synchronizer]
UseParallelModeForL1Synchronization = false
```
If you activate this feature you can configure:
- `NumberOfParallelOfEthereumClients`: how many parallel request can be done. Currently this create multiples instances of etherman over same server, in the future maybe make sense to use differents servers
- `CapacityOfBufferingRollupInfoFromL1`: buffer of data pending to be processed
- `NumberOfParallelOfEthereumClients`: how many parallel request can be done. You must consider that 1 is just for requesting the last block on L1, and the rest for rollup info
- `CapacityOfBufferingRollupInfoFromL1`: buffer of data pending to be processed. This is the queue data to be executed by consumer.

For a full description of fields please check config-file documentation.

Example:
```
UseParallelModeForL1Synchronization = true
[Synchronizer.L1ParallelSynchronization]
NumberOfParallelOfEthereumClients = 2
CapacityOfBufferingRollupInfoFromL1 = 10
TimeForCheckLastBlockOnL1Time = "5s"
TimeoutForRequestLastBlockOnL1 = "5s"
MaxNumberOfRetriesForRequestLastBlockOnL1 = 3
TimeForShowUpStatisticsLog = "5m"
TimeOutMainLoop = "5m"
MinTimeBetweenRetriesForRollupInfo = "5s"
[Synchronizer.L1ParallelSynchronization.PerformanceCheck]
AcceptableTimeWaitingForNewRollupInfo = "5s"
NumIterationsBeforeStartCheckingTimeWaitinfForNewRollupInfo = 10

```
## Remakable logs
### How to known the occupation of executor
To check that executor are fully ocuppied you can check next log:
```
INFO synchronizer/l1_processor_consumer.go:110 consumer: processing rollupInfo #1291: range:[188064, 188164] num_blocks [0] wasted_time_waiting_for_data [74.17575ms] last_process_time [2.534115ms] block_per_second [0.000000]
INFO synchronizer/l1_rollup_info_consumer.go:128 consumer: processing rollupInfo #1553: range:[8720385, 8720485] num_blocks [37] statistics:wasted_time_waiting_for_data [0s] last_process_time [6m2.635208117s] block_per_second [2.766837]
```
The `wasted_time_waiting_for_data` show the waiting time between this call and the previous to executor. If this value (after 20 interations) are greater to 1 seconds a warning is show.
The `wasted_time_waiting_for_data` show the waiting time between this call and the previous to executor. It could show a warning configuring `Synchronizer.L1ParallelSynchronization.PerformanceCheck`

### Estimated time to be fully synchronizer with L1
This log show the estimated time (**ETA**) to reach the block goal
This log show the estimated time (**ETA**) to reach the block goal. You can configure the frequency with var `TimeForShowUpStatisticsLog`
```
INFO synchronizer/l1_data_retriever_producer.go:255 producer: Statistics:ETA: 3h40m1.311379085s percent:1.35 blocks_per_seconds:706.80 pending_block:127563/9458271 num_errors:0
INFO synchronizer/l1_rollup_info_producer.go:357 producer: Statistics:ETA: 54h7m47.594422312s percent:12.26 blocks_per_seconds:5.48 pending_block:149278/1217939 num_errors:8
```

## Flow of data
![l1_sync_channels_flow_v2 drawio](https://github.com/0xPolygonHermez/zkevm-node/assets/129153821/430abeb3-13b2-4c13-8d5e-4996a134a353)
![l1_sync_channels_flow_v2 drawio](l1_sync_channels_flow_v2.drawio.png)

## Class diagram
This is a class diagram of principal class an relationships.
The entry point is `synchronizer.go:276` function `syncBlocksParallel`.
- It create all objects needed and launch `l1SyncOrchestration` that wait until the job is done to return

### The main objects are:
- `l1RollupInfoProducer`: is the object that send rollup data through the channel
- `l1SyncOrchestration`: is the entry point and the reponsable to launch the producer and consumer
- `l1RollupInfoProducer`: this object send rollup data through the channel to the consumer
- `l1RollupInfoConsumer`: that receive the data and execute it

![image](https://github.com/0xPolygonHermez/zkevm-node/assets/129153821/957a3e95-77c7-446b-a6ec-ef28cc44cb18)

## Future changes
- Configure multiples servers for L1 information: instead of calling the same server,it make sense to configure individually each URL to allow to have multiples sources
Loading