Skip to content

Commit

Permalink
Feature/2595 l1 sync deadlock (#2616)
Browse files Browse the repository at this point in the history
* changed that producer doesnt stop
  • Loading branch information
joanestebanr authored Oct 5, 2023
1 parent 156cd76 commit bf82f69
Show file tree
Hide file tree
Showing 13 changed files with 268 additions and 172 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
64 changes: 32 additions & 32 deletions docs/design/synchronizer/l1_synchronization.md
Original file line number Diff line number Diff line change
@@ -1,63 +1,63 @@

# L1 parallel synchronization
This is a refactor of L1 synchronization to improve speed.
- It ask data in parallel to L1 meanwhile another goroutine is execution the rollup info.
- It makes that executor be ocupied 100% of time.

## Pending to do

- All the stuff related to updating last block on L1 could be moved to another class
- Check context usage:
It need a context to cancel itself and create another context to cancel workers?
- Emit metrics
- if nothing to update reduce code to be executed (not sure, because functionality to keep update beyond last block on L1)
- Improve the unittest of all objects
- Check all log.fatals to remove it or add a status before the panic
- Missing **feature update beyond last block on L1**: Old syncBlocks method try to ask for blocks over last L1 block, I suppose that is to keep synchronizing even a long the synchronization have new blocks. This is not implemented here
This is the behaviour of ethman in that situation:
- GetRollupInfoByBlockRange returns no errors, zero blocks...
- EthBlockByNumber returns error: "not found"
- It ask data in parallel to L1 meanwhile another goroutine is executing the rollup info.
- It makes that the executor be occupied 100% of the time.

## Pending to do
- Some test on ` synchronizer/synchronizer_test.go` are based on this feature, so are running against legacy code
- Move to configuration file some 'hardcoded' values

## Configuration
This feature is experimental for that reason you can configure to use old sequential one:
You could choose between new L1 parallel sync or sequential one (legacy):
```
[Synchronizer]
UseParallelModeForL1Synchronization = false
```
If you activate this feature you can configure:
- `NumberOfParallelOfEthereumClients`: how many parallel request can be done. Currently this create multiples instances of etherman over same server, in the future maybe make sense to use differents servers
- `CapacityOfBufferingRollupInfoFromL1`: buffer of data pending to be processed
- `NumberOfParallelOfEthereumClients`: how many parallel request can be done. You must consider that 1 is just for requesting the last block on L1, and the rest for rollup info
- `CapacityOfBufferingRollupInfoFromL1`: buffer of data pending to be processed. This is the queue data to be executed by consumer.

For a full description of fields please check config-file documentation.

Example:
```
UseParallelModeForL1Synchronization = true
[Synchronizer.L1ParallelSynchronization]
NumberOfParallelOfEthereumClients = 2
CapacityOfBufferingRollupInfoFromL1 = 10
TimeForCheckLastBlockOnL1Time = "5s"
TimeoutForRequestLastBlockOnL1 = "5s"
MaxNumberOfRetriesForRequestLastBlockOnL1 = 3
TimeForShowUpStatisticsLog = "5m"
TimeOutMainLoop = "5m"
MinTimeBetweenRetriesForRollupInfo = "5s"
[Synchronizer.L1ParallelSynchronization.PerformanceCheck]
AcceptableTimeWaitingForNewRollupInfo = "5s"
NumIterationsBeforeStartCheckingTimeWaitinfForNewRollupInfo = 10
```
## Remakable logs
### How to known the occupation of executor
To check that executor are fully ocuppied you can check next log:
```
INFO synchronizer/l1_processor_consumer.go:110 consumer: processing rollupInfo #1291: range:[188064, 188164] num_blocks [0] wasted_time_waiting_for_data [74.17575ms] last_process_time [2.534115ms] block_per_second [0.000000]
INFO synchronizer/l1_rollup_info_consumer.go:128 consumer: processing rollupInfo #1553: range:[8720385, 8720485] num_blocks [37] statistics:wasted_time_waiting_for_data [0s] last_process_time [6m2.635208117s] block_per_second [2.766837]
```
The `wasted_time_waiting_for_data` show the waiting time between this call and the previous to executor. If this value (after 20 interations) are greater to 1 seconds a warning is show.
The `wasted_time_waiting_for_data` show the waiting time between this call and the previous to executor. It could show a warning configuring `Synchronizer.L1ParallelSynchronization.PerformanceCheck`

### Estimated time to be fully synchronizer with L1
This log show the estimated time (**ETA**) to reach the block goal
This log show the estimated time (**ETA**) to reach the block goal. You can configure the frequency with var `TimeForShowUpStatisticsLog`
```
INFO synchronizer/l1_data_retriever_producer.go:255 producer: Statistics:ETA: 3h40m1.311379085s percent:1.35 blocks_per_seconds:706.80 pending_block:127563/9458271 num_errors:0
INFO synchronizer/l1_rollup_info_producer.go:357 producer: Statistics:ETA: 54h7m47.594422312s percent:12.26 blocks_per_seconds:5.48 pending_block:149278/1217939 num_errors:8
```

## Flow of data
![l1_sync_channels_flow_v2 drawio](https://github.com/0xPolygonHermez/zkevm-node/assets/129153821/430abeb3-13b2-4c13-8d5e-4996a134a353)
![l1_sync_channels_flow_v2 drawio](l1_sync_channels_flow_v2.drawio.png)

## Class diagram
This is a class diagram of principal class an relationships.
The entry point is `synchronizer.go:276` function `syncBlocksParallel`.
- It create all objects needed and launch `l1SyncOrchestration` that wait until the job is done to return

### The main objects are:
- `l1RollupInfoProducer`: is the object that send rollup data through the channel
- `l1SyncOrchestration`: is the entry point and the reponsable to launch the producer and consumer
- `l1RollupInfoProducer`: this object send rollup data through the channel to the consumer
- `l1RollupInfoConsumer`: that receive the data and execute it

![image](https://github.com/0xPolygonHermez/zkevm-node/assets/129153821/957a3e95-77c7-446b-a6ec-ef28cc44cb18)

## Future changes
- Configure multiples servers for L1 information: instead of calling the same server,it make sense to configure individually each URL to allow to have multiples sources
Loading

0 comments on commit bf82f69

Please sign in to comment.