docs(pruning): Improve pruning and snapshot recovery docs (matter-lab…

…s#2311) ## What ❔ Improves various pruning and snapshot recovery docs. ## Why ❔ Makes docs more thorough and clearer. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Code has been formatted via `zk fmt` and `zk lint`.
vianetwork · Jun 24, 2024 · b4327b6 · b4327b6
1 parent 4e9f724
commit b4327b6
Show file tree

Hide file tree

Showing 4 changed files with 183 additions and 30 deletions.
diff --git a/core/lib/snapshots_applier/README.md b/core/lib/snapshots_applier/README.md
@@ -0,0 +1,29 @@
+# `zksync_snapshots_applier`
+
+Library responsible for recovering Postgres from a protocol-level snapshot.
+
+## Recovery workflow
+
+_(See [node docs](../../../docs/guides/external-node/07_snapshots_recovery.md) for a high-level snapshot recovery
+overview and [snapshot creator docs](../../bin/snapshots_creator/README.md) for the snapshot format details)_
+
+1. Recovery is started by querying the main node and determining the snapshot parameters. By default, recovery is
+   performed from the latest snapshot, but it is possible to provide a manual override (L1 batch number of the
+   snapshot).
+2. Factory dependencies (= contract bytecodes) are downloaded from the object store and are atomically saved to Postgres
+   together with the snapshot metadata (L1 batch number / L2 block numbers and timestamps, L1 batch state root hash, L2
+   block hash etc.).
+3. Storage log chunks are downloaded from the object store; each chunk is atomically saved to Postgres (`storage_logs`
+   and `initial_writes` tables). This step has a configurable degree of concurrency to control speed – I/O load
+   trade-off.
+4. After all storage logs are restored, token information is fetched from the main node and saved in the corresponding
+   table. Tokens are double-checked against storage logs.
+
+Recovery is resilient to stops / failures; if the recovery process is interrupted, it will restart from the same
+snapshot and will skip saving data that is already present in Postgres.
+
+Recovery logic for node components (such as metadata calculator and state keeper) is intentionally isolated from
+Postgres recovery. A component requiring recovery must organize it on its own. This is motivated by the fact that at
+least some components requiring recovery may initialize after an arbitrary delay after Postgres recovery (or not run at
+all) and/or may be instantiated multiple times for a single node. As an example, both of these requirements hold for
+metadata calculator / Merkle tree.
diff --git a/core/node/db_pruner/README.md b/core/node/db_pruner/README.md
@@ -3,15 +3,20 @@
 Database pruner is a component that regularly removes the oldest l1 batches from the database together with
 corresponding L2 blocks, events, etc.
 
-**There are two types of objects that are not fully cleaned:**
+There are two types of objects that are not fully cleaned:
 
-**Transactions** - Transactions only have BYTEA fields cleaned as many of other components rely on transactions
-existence.
+- **Transactions** only have `BYTEA` fields cleaned as some components rely on transactions existence.
+- **Storage logs:** only storage logs that have been overwritten are removed
 
-**Storage logs** - We only remove storage logs that have been overwritten
+## Pruning workflow
 
-### Soft and Hard pruning
+_(See [node docs](../../../docs/guides/external-node/08_pruning.md) for a high-level pruning overview)_
 
-There are two 'phases' of pruning an L1 batch, soft pruning and hard pruning. Every batch that would have it's records
-removed if first soft pruned. Soft pruned batches can't safely be used. One minute (this is configurable) after soft
-pruning, hard pruning is performed, where hard means physically removing those batches from the database
+There are two phases of pruning an L1 batch, soft pruning and hard pruning. Every batch that would have its records
+removed if first _soft-pruned_. Soft-pruned batches cannot safely be used. One minute (this is configurable) after soft
+pruning, _hard pruning_ is performed, where hard means physically removing data from the database.
+
+The reasoning behind this split is to allow node components such as the API server to become aware of planned data
+pruning, and restrict access to the pruned data in advance. This ensures that data does not unexpectedly (from the
+component perspective) disappear from Postgres in a middle of an operation (like serving a Web3 request). At least in
+some case, like in VM-related Web3 methods, we cannot rely on database transactions for this purpose.
diff --git a/docs/guides/external-node/07_snapshots_recovery.md b/docs/guides/external-node/07_snapshots_recovery.md
@@ -1,24 +1,52 @@
 # Snapshots Recovery
 
-Instead of starting node using DB snapshots, it's possible to configure them to start from a protocol-level snapshots.
-This process is much faster and requires way less storage. Postgres database of a mainnet node recovered from a snapshot
-is only about 300GB. Without [_pruning_](08_pruning.md) enabled, the state will continuously grow about 15GB per day.
+Instead of initializing a node using a Postgres dump, it's possible to configure a node to recover from a protocol-level
+snapshot. This process is much faster and requires much less storage. Postgres database of a mainnet node recovered from
+a snapshot is only about 300GB. Note that without [pruning](08_pruning.md) enabled, the node state will continuously
+grow at a rate about 15GB per day.
 
-> [!NOTE]
->
-> Nodes recovered from snapshot don't have any historical data from before the recovery!
+## How it works
+
+A snapshot is effectively a point-in-time snapshot of the VM state at the end of a certain L1 batch. Snapshots are
+created for the latest L1 batches periodically (roughly twice a day) and are stored in a public GCS bucket.
+
+Recovery from a snapshot consists of several parts.
+
+- **Postgres** recovery is the initial stage. The node API is not functioning during this stage. The stage is expected
+  to take about 1 hour on the mainnet.
+- **Merkle tree** recovery starts once Postgres is fully recovered. Merkle tree recovery can take about 3 hours on the
+  mainnet. Ordinarily, Merkle tree recovery is a blocker for node synchronization; i.e., the node will not process
+  blocks newer than the snapshot block until the Merkle tree is recovered.
+- Recovering RocksDB-based **VM state cache** is concurrent with Merkle tree recovery and also depends on Postgres
+  recovery. It takes about 1 hour on the mainnet. Unlike Merkle tree recovery, VM state cache is not necessary for node
+  operation (the node will get the state from Postgres is if it is absent), although it considerably speeds up VM
+  execution.
+
+After Postgres recovery is completed, the node becomes operational, providing Web3 API etc. It still needs some time to
+catch up executing blocks after the snapshot (i.e, roughly several hours worth of blocks / transactions). This may take
+order of 1–2 hours on the mainnet. In total, recovery process and catch-up thus should take roughly 5–6 hours.
+
+## Current limitations
+
+Nodes recovered from snapshot don't have any historical data from before the recovery. There is currently no way to
+back-fill this historic data. E.g., if a node has recovered from a snapshot for L1 batch 500,000; then, it will not have
+data for L1 batches 499,999, 499,998, etc. The relevant Web3 methods, such as `eth_getBlockByNumber`, will return an
+error mentioning the first locally retained block or L1 batch if queried this missing data. The same error messages are
+used for [pruning](08_pruning.md) because logically, recovering from a snapshot is equivalent to pruning node storage to
+the snapshot L1 batch.
 
 ## Configuration
 
-To enable snapshots-recovery on mainnet, you need to set environment variables:
+To enable snapshot recovery on mainnet, you need to set environment variables for a node before starting it for the
+first time:
 
 ```yaml
 EN_SNAPSHOTS_RECOVERY_ENABLED: 'true'
 EN_SNAPSHOTS_OBJECT_STORE_BUCKET_BASE_URL: 'zksync-era-mainnet-external-node-snapshots'
 EN_SNAPSHOTS_OBJECT_STORE_MODE: 'GCSAnonymousReadOnly'
 ```
 
-For sepolia testnet, use:
+For the Sepolia testnet, use:
 
 ```yaml
 EN_SNAPSHOTS_RECOVERY_ENABLED: 'true'
@@ -27,4 +55,43 @@ EN_SNAPSHOTS_OBJECT_STORE_MODE: 'GCSAnonymousReadOnly'
 ```
 
 For a working examples of a fully configured Nodes recovering from snapshots, see
-[_docker compose examples_](docker-compose-examples) directory and [_Quick Start_](00_quick_start.md)
+[Docker Compose examples](docker-compose-examples) and [_Quick Start_](00_quick_start.md).
+
+If a node is already recovered (does not matter whether from a snapshot or from a Postgres dump), setting these env
+variables will have no effect; the node will never reset its state.
+
+## Monitoring recovery
+
+Snapshot recovery information is logged with the following targets:
+
+- **Recovery orchestration:** `zksync_external_node::init`
+- **Postgres recovery:** `zksync_snapshots_applier`
+- **Merkle tree recovery:** `zksync_metadata_calculator::recovery`, `zksync_merkle_tree::recovery`
+
+An example of snapshot recovery logs during the first node start:
+
+```text
+2024-06-20T07:25:32.466926Z  INFO zksync_external_node::init: Node has neither genesis L1 batch, nor snapshot recovery info
+2024-06-20T07:25:32.466946Z  INFO zksync_external_node::init: Chosen node initialization strategy: SnapshotRecovery
+2024-06-20T07:25:32.466951Z  WARN zksync_external_node::init: Proceeding with snapshot recovery. This is an experimental feature; use at your own risk
+2024-06-20T07:25:32.475547Z  INFO zksync_snapshots_applier: Found snapshot with data up to L1 batch #7, L2 block #27, version 0, storage logs are divided into 10 chunk(s)
+2024-06-20T07:25:32.516142Z  INFO zksync_snapshots_applier: Applied factory dependencies in 27.768291ms
+2024-06-20T07:25:32.527363Z  INFO zksync_snapshots_applier: Recovering storage log chunks with 10 max concurrency
+2024-06-20T07:25:32.608539Z  INFO zksync_snapshots_applier: Recovered 3007 storage logs in total; checking overall consistency...
+2024-06-20T07:25:32.612967Z  INFO zksync_snapshots_applier: Retrieved 2 tokens from main node
+2024-06-20T07:25:32.616142Z  INFO zksync_external_node::init: Recovered Postgres from snapshot in 148.523709ms
+2024-06-20T07:25:32.645399Z  INFO zksync_metadata_calculator::recovery: Recovering Merkle tree from Postgres snapshot in 1 chunks with max concurrency 10
+2024-06-20T07:25:32.650478Z  INFO zksync_metadata_calculator::recovery: Filtered recovered key chunks; 1 / 1 chunks remaining
+2024-06-20T07:25:32.681327Z  INFO zksync_metadata_calculator::recovery: Recovered 1/1 Merkle tree chunks, there are 0 left to process
+2024-06-20T07:25:32.784597Z  INFO zksync_metadata_calculator::recovery: Recovered Merkle tree from snapshot in 144.040125ms
+```
+
+(Obviously, timestamps and numbers in the logs will differ.)
+
+Recovery logic also exports some metrics, the main of which are as follows:
+
+| Metric name                                             | Type      | Labels       | Description                                                           |
+| ------------------------------------------------------- | --------- | ------------ | --------------------------------------------------------------------- |
+| `snapshots_applier_storage_logs_chunks_left_to_process` | Gauge     | -            | Number of storage log chunks left to process during Postgres recovery |
+| `db_pruner_pruning_chunk_duration_seconds`              | Histogram | `prune_type` | Latency of a single pruning iteration                                 |
+| `merkle_tree_pruning_deleted_stale_key_versions`        | Gauge     | `bound`      | Versions (= L1 batches) pruned from the Merkle tree                   |
diff --git a/docs/guides/external-node/08_pruning.md b/docs/guides/external-node/08_pruning.md
@@ -1,13 +1,37 @@
 # Pruning
 
-It is possible to configure ZKsync Node to periodically remove all data from batches older than a configurable
-threshold. Data is pruned both from Postgres and from tree (RocksDB).
+It is possible to configure a ZKsync node to periodically prune all data from L1 batches older than a configurable
+threshold. Data is pruned both from Postgres and from tree (RocksDB). Pruning happens continuously (i.e., does not
+require stopping the node) in the background during normal node operation. It is designed to not significantly impact
+node performance.
 
-> [!NOTE]
->
-> If you need a node with data retention period of up to a few days, please set up a node from a
-> [_snapshot_](07_snapshots_recovery.md) and wait for it to have enough data. Pruning an archival node can take
-> unpractical amount of time. In the future we will be offering pre-pruned DB snapshots with a few months of data.
+Types of pruned data in Postgres include:
+
+- Block and L1 batch headers
+- Transactions
+- EVM logs aka events
+- Overwritten storage logs
+- Transaction traces
+
+Pruned data is no longer available via Web3 API of the node. The relevant Web3 methods, such as `eth_getBlockByNumber`,
+will return an error mentioning the first retained block or L1 batch if queried pruned data.
+
+## Interaction with snapshot recovery
+
+Pruning and [snapshot recovery](07_snapshots_recovery.md) are independent features. Pruning works both for archival
+nodes restored from a Postgres dump, and nodes recovered from a snapshot. Conversely, a node recovered from a snapshot
+may have pruning disabled; this would mean that it retains all data starting from the snapshot indefinitely (but not
+earlier data, see [snapshot recovery limitations](07_snapshots_recovery.md#current-limitations)).
+
+A rough guide whether to choose the recovery option and/or pruning is as follows:
+
+- If you need a node with data retention period of up to a few days, set up a node from a snapshot with pruning enabled
+  and wait for it to have enough data.
+- If you need a node with the entire rollup history, using a Postgres dump is the only option, and pruning should be
+  disabled.
+- If you need a node with significant data retention (order of months), the best option right now is using a Postgres
+  dump. You may enable pruning for such a node, but beware that full pruning may take significant amount of time (order
+  of weeks or months). In the future, we intend to offer pre-pruned Postgres dumps with a few months of data.
 
 ## Configuration
 
@@ -17,14 +41,17 @@ You can enable pruning by setting the environment variable
 EN_PRUNING_ENABLED: 'true'
 ```
 
-By default, it will keep history for 7 days. You can configure retention period using:
+By default, the node will keep L1 batch data for 7 days determined by the batch timestamp (always equal to the timestamp
+of the first block in the batch). You can configure the retention period using:
 
 ```yaml
 EN_PRUNING_DATA_RETENTION_SEC: '259200' # 3 days
 ```
 
-The data retention can be set to any value, but for mainnet values under 21h will be ignored as the batch can only be
-pruned as soon as it has been executed on Ethereum.
+The retention period can be set to any value, but for mainnet values under 21h will be ignored because a batch can only
+be pruned after it has been executed on Ethereum.
+
+Pruning can be disabled or enabled and the data retention period can be freely changed during the node lifetime.
 
 ## Storage requirements for pruned nodes
 
@@ -35,6 +62,31 @@ The storage requirements depend on how long you configure to retain the data, bu
 
 > [!NOTE]
 >
-> When pruning an existing archival node, Postgres will be unable to reclaim disk space automatically, to reclaim disk
-> space, you need to manually run VACUUM FULL, which requires an ACCESS EXCLUSIVE lock, you can read more about it in
-> [_postgres docs_](https://www.postgresql.org/docs/current/sql-vacuum.html)
+> When pruning an existing archival node, Postgres will be unable to reclaim disk space automatically. To reclaim disk
+> space, you need to manually run `VACUUM FULL`, which requires an `ACCESS EXCLUSIVE` lock. You can read more about it
+> in [Postgres docs](https://www.postgresql.org/docs/current/sql-vacuum.html).
+
+## Monitoring pruning
+
+Pruning information is logged with the following targets:
+
+- **Postgres pruning:** `zksync_node_db_pruner`
+- **Merkle tree pruning:** `zksync_metadata_calculator::pruning`, `zksync_merkle_tree::pruning`.
+
+To check whether Postgres pruning works as intended, you should look for logs like this:
+
+```text
+2024-06-20T07:26:03.415382Z  INFO zksync_node_db_pruner: Soft pruned db l1_batches up to 8 and L2 blocks up to 29, operation took 14.850042ms
+2024-06-20T07:26:04.433574Z  INFO zksync_node_db_pruner::metrics: Performed pruning of database, deleted 1 L1 batches, 2 L2 blocks, 68 storage logs, 383 events, 27 call traces, 12 L2-to-L1 logs
+2024-06-20T07:26:04.436516Z  INFO zksync_node_db_pruner: Hard pruned db l1_batches up to 8 and L2 blocks up to 29, operation took 18.653083ms
+```
+
+(Obviously, timestamps and numbers in the logs will differ.)
+
+Pruning logic also exports some metrics, the main of which are as follows:
+
+| Metric name                                      | Type      | Labels       | Description                                         |
+| ------------------------------------------------ | --------- | ------------ | --------------------------------------------------- |
+| `db_pruner_not_pruned_l1_batches_count`          | Gauge     | -            | Number of retained L1 batches                       |
+| `db_pruner_pruning_chunk_duration_seconds`       | Histogram | `prune_type` | Latency of a single pruning iteration               |
+| `merkle_tree_pruning_deleted_stale_key_versions` | Gauge     | `bound`      | Versions (= L1 batches) pruned from the Merkle tree |