Skip to content
This repository has been archived by the owner on Aug 28, 2024. It is now read-only.

Commit

Permalink
docs(pruning): Improve pruning and snapshot recovery docs (matter-lab…
Browse files Browse the repository at this point in the history
…s#2311)

## What ❔

Improves various pruning and snapshot recovery docs.

## Why ❔

Makes docs more thorough and clearer.

## Checklist

- [x] PR title corresponds to the body of PR (we generate changelog
entries from PRs).
- [x] Code has been formatted via `zk fmt` and `zk lint`.
  • Loading branch information
slowli authored Jun 24, 2024
1 parent 4e9f724 commit b4327b6
Show file tree
Hide file tree
Showing 4 changed files with 183 additions and 30 deletions.
29 changes: 29 additions & 0 deletions core/lib/snapshots_applier/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# `zksync_snapshots_applier`

Library responsible for recovering Postgres from a protocol-level snapshot.

## Recovery workflow

_(See [node docs](../../../docs/guides/external-node/07_snapshots_recovery.md) for a high-level snapshot recovery
overview and [snapshot creator docs](../../bin/snapshots_creator/README.md) for the snapshot format details)_

1. Recovery is started by querying the main node and determining the snapshot parameters. By default, recovery is
performed from the latest snapshot, but it is possible to provide a manual override (L1 batch number of the
snapshot).
2. Factory dependencies (= contract bytecodes) are downloaded from the object store and are atomically saved to Postgres
together with the snapshot metadata (L1 batch number / L2 block numbers and timestamps, L1 batch state root hash, L2
block hash etc.).
3. Storage log chunks are downloaded from the object store; each chunk is atomically saved to Postgres (`storage_logs`
and `initial_writes` tables). This step has a configurable degree of concurrency to control speed – I/O load
trade-off.
4. After all storage logs are restored, token information is fetched from the main node and saved in the corresponding
table. Tokens are double-checked against storage logs.

Recovery is resilient to stops / failures; if the recovery process is interrupted, it will restart from the same
snapshot and will skip saving data that is already present in Postgres.

Recovery logic for node components (such as metadata calculator and state keeper) is intentionally isolated from
Postgres recovery. A component requiring recovery must organize it on its own. This is motivated by the fact that at
least some components requiring recovery may initialize after an arbitrary delay after Postgres recovery (or not run at
all) and/or may be instantiated multiple times for a single node. As an example, both of these requirements hold for
metadata calculator / Merkle tree.
21 changes: 13 additions & 8 deletions core/node/db_pruner/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,20 @@
Database pruner is a component that regularly removes the oldest l1 batches from the database together with
corresponding L2 blocks, events, etc.

**There are two types of objects that are not fully cleaned:**
There are two types of objects that are not fully cleaned:

**Transactions** - Transactions only have BYTEA fields cleaned as many of other components rely on transactions
existence.
- **Transactions** only have `BYTEA` fields cleaned as some components rely on transactions existence.
- **Storage logs:** only storage logs that have been overwritten are removed

**Storage logs** - We only remove storage logs that have been overwritten
## Pruning workflow

### Soft and Hard pruning
_(See [node docs](../../../docs/guides/external-node/08_pruning.md) for a high-level pruning overview)_

There are two 'phases' of pruning an L1 batch, soft pruning and hard pruning. Every batch that would have it's records
removed if first soft pruned. Soft pruned batches can't safely be used. One minute (this is configurable) after soft
pruning, hard pruning is performed, where hard means physically removing those batches from the database
There are two phases of pruning an L1 batch, soft pruning and hard pruning. Every batch that would have its records
removed if first _soft-pruned_. Soft-pruned batches cannot safely be used. One minute (this is configurable) after soft
pruning, _hard pruning_ is performed, where hard means physically removing data from the database.

The reasoning behind this split is to allow node components such as the API server to become aware of planned data
pruning, and restrict access to the pruned data in advance. This ensures that data does not unexpectedly (from the
component perspective) disappear from Postgres in a middle of an operation (like serving a Web3 request). At least in
some case, like in VM-related Web3 methods, we cannot rely on database transactions for this purpose.
85 changes: 76 additions & 9 deletions docs/guides/external-node/07_snapshots_recovery.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,52 @@
# Snapshots Recovery

Instead of starting node using DB snapshots, it's possible to configure them to start from a protocol-level snapshots.
This process is much faster and requires way less storage. Postgres database of a mainnet node recovered from a snapshot
is only about 300GB. Without [_pruning_](08_pruning.md) enabled, the state will continuously grow about 15GB per day.
Instead of initializing a node using a Postgres dump, it's possible to configure a node to recover from a protocol-level
snapshot. This process is much faster and requires much less storage. Postgres database of a mainnet node recovered from
a snapshot is only about 300GB. Note that without [pruning](08_pruning.md) enabled, the node state will continuously
grow at a rate about 15GB per day.

> [!NOTE]
>
> Nodes recovered from snapshot don't have any historical data from before the recovery!
## How it works

A snapshot is effectively a point-in-time snapshot of the VM state at the end of a certain L1 batch. Snapshots are
created for the latest L1 batches periodically (roughly twice a day) and are stored in a public GCS bucket.

Recovery from a snapshot consists of several parts.

- **Postgres** recovery is the initial stage. The node API is not functioning during this stage. The stage is expected
to take about 1 hour on the mainnet.
- **Merkle tree** recovery starts once Postgres is fully recovered. Merkle tree recovery can take about 3 hours on the
mainnet. Ordinarily, Merkle tree recovery is a blocker for node synchronization; i.e., the node will not process
blocks newer than the snapshot block until the Merkle tree is recovered.
- Recovering RocksDB-based **VM state cache** is concurrent with Merkle tree recovery and also depends on Postgres
recovery. It takes about 1 hour on the mainnet. Unlike Merkle tree recovery, VM state cache is not necessary for node
operation (the node will get the state from Postgres is if it is absent), although it considerably speeds up VM
execution.

After Postgres recovery is completed, the node becomes operational, providing Web3 API etc. It still needs some time to
catch up executing blocks after the snapshot (i.e, roughly several hours worth of blocks / transactions). This may take
order of 1–2 hours on the mainnet. In total, recovery process and catch-up thus should take roughly 5–6 hours.

## Current limitations

Nodes recovered from snapshot don't have any historical data from before the recovery. There is currently no way to
back-fill this historic data. E.g., if a node has recovered from a snapshot for L1 batch 500,000; then, it will not have
data for L1 batches 499,999, 499,998, etc. The relevant Web3 methods, such as `eth_getBlockByNumber`, will return an
error mentioning the first locally retained block or L1 batch if queried this missing data. The same error messages are
used for [pruning](08_pruning.md) because logically, recovering from a snapshot is equivalent to pruning node storage to
the snapshot L1 batch.

## Configuration

To enable snapshots-recovery on mainnet, you need to set environment variables:
To enable snapshot recovery on mainnet, you need to set environment variables for a node before starting it for the
first time:

```yaml
EN_SNAPSHOTS_RECOVERY_ENABLED: 'true'
EN_SNAPSHOTS_OBJECT_STORE_BUCKET_BASE_URL: 'zksync-era-mainnet-external-node-snapshots'
EN_SNAPSHOTS_OBJECT_STORE_MODE: 'GCSAnonymousReadOnly'
```
For sepolia testnet, use:
For the Sepolia testnet, use:
```yaml
EN_SNAPSHOTS_RECOVERY_ENABLED: 'true'
Expand All @@ -27,4 +55,43 @@ EN_SNAPSHOTS_OBJECT_STORE_MODE: 'GCSAnonymousReadOnly'
```
For a working examples of a fully configured Nodes recovering from snapshots, see
[_docker compose examples_](docker-compose-examples) directory and [_Quick Start_](00_quick_start.md)
[Docker Compose examples](docker-compose-examples) and [_Quick Start_](00_quick_start.md).
If a node is already recovered (does not matter whether from a snapshot or from a Postgres dump), setting these env
variables will have no effect; the node will never reset its state.
## Monitoring recovery
Snapshot recovery information is logged with the following targets:
- **Recovery orchestration:** `zksync_external_node::init`
- **Postgres recovery:** `zksync_snapshots_applier`
- **Merkle tree recovery:** `zksync_metadata_calculator::recovery`, `zksync_merkle_tree::recovery`

An example of snapshot recovery logs during the first node start:

```text
2024-06-20T07:25:32.466926Z INFO zksync_external_node::init: Node has neither genesis L1 batch, nor snapshot recovery info
2024-06-20T07:25:32.466946Z INFO zksync_external_node::init: Chosen node initialization strategy: SnapshotRecovery
2024-06-20T07:25:32.466951Z WARN zksync_external_node::init: Proceeding with snapshot recovery. This is an experimental feature; use at your own risk
2024-06-20T07:25:32.475547Z INFO zksync_snapshots_applier: Found snapshot with data up to L1 batch #7, L2 block #27, version 0, storage logs are divided into 10 chunk(s)
2024-06-20T07:25:32.516142Z INFO zksync_snapshots_applier: Applied factory dependencies in 27.768291ms
2024-06-20T07:25:32.527363Z INFO zksync_snapshots_applier: Recovering storage log chunks with 10 max concurrency
2024-06-20T07:25:32.608539Z INFO zksync_snapshots_applier: Recovered 3007 storage logs in total; checking overall consistency...
2024-06-20T07:25:32.612967Z INFO zksync_snapshots_applier: Retrieved 2 tokens from main node
2024-06-20T07:25:32.616142Z INFO zksync_external_node::init: Recovered Postgres from snapshot in 148.523709ms
2024-06-20T07:25:32.645399Z INFO zksync_metadata_calculator::recovery: Recovering Merkle tree from Postgres snapshot in 1 chunks with max concurrency 10
2024-06-20T07:25:32.650478Z INFO zksync_metadata_calculator::recovery: Filtered recovered key chunks; 1 / 1 chunks remaining
2024-06-20T07:25:32.681327Z INFO zksync_metadata_calculator::recovery: Recovered 1/1 Merkle tree chunks, there are 0 left to process
2024-06-20T07:25:32.784597Z INFO zksync_metadata_calculator::recovery: Recovered Merkle tree from snapshot in 144.040125ms
```

(Obviously, timestamps and numbers in the logs will differ.)

Recovery logic also exports some metrics, the main of which are as follows:

| Metric name | Type | Labels | Description |
| ------------------------------------------------------- | --------- | ------------ | --------------------------------------------------------------------- |
| `snapshots_applier_storage_logs_chunks_left_to_process` | Gauge | - | Number of storage log chunks left to process during Postgres recovery |
| `db_pruner_pruning_chunk_duration_seconds` | Histogram | `prune_type` | Latency of a single pruning iteration |
| `merkle_tree_pruning_deleted_stale_key_versions` | Gauge | `bound` | Versions (= L1 batches) pruned from the Merkle tree |
78 changes: 65 additions & 13 deletions docs/guides/external-node/08_pruning.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,37 @@
# Pruning

It is possible to configure ZKsync Node to periodically remove all data from batches older than a configurable
threshold. Data is pruned both from Postgres and from tree (RocksDB).
It is possible to configure a ZKsync node to periodically prune all data from L1 batches older than a configurable
threshold. Data is pruned both from Postgres and from tree (RocksDB). Pruning happens continuously (i.e., does not
require stopping the node) in the background during normal node operation. It is designed to not significantly impact
node performance.

> [!NOTE]
>
> If you need a node with data retention period of up to a few days, please set up a node from a
> [_snapshot_](07_snapshots_recovery.md) and wait for it to have enough data. Pruning an archival node can take
> unpractical amount of time. In the future we will be offering pre-pruned DB snapshots with a few months of data.
Types of pruned data in Postgres include:

- Block and L1 batch headers
- Transactions
- EVM logs aka events
- Overwritten storage logs
- Transaction traces

Pruned data is no longer available via Web3 API of the node. The relevant Web3 methods, such as `eth_getBlockByNumber`,
will return an error mentioning the first retained block or L1 batch if queried pruned data.

## Interaction with snapshot recovery

Pruning and [snapshot recovery](07_snapshots_recovery.md) are independent features. Pruning works both for archival
nodes restored from a Postgres dump, and nodes recovered from a snapshot. Conversely, a node recovered from a snapshot
may have pruning disabled; this would mean that it retains all data starting from the snapshot indefinitely (but not
earlier data, see [snapshot recovery limitations](07_snapshots_recovery.md#current-limitations)).

A rough guide whether to choose the recovery option and/or pruning is as follows:

- If you need a node with data retention period of up to a few days, set up a node from a snapshot with pruning enabled
and wait for it to have enough data.
- If you need a node with the entire rollup history, using a Postgres dump is the only option, and pruning should be
disabled.
- If you need a node with significant data retention (order of months), the best option right now is using a Postgres
dump. You may enable pruning for such a node, but beware that full pruning may take significant amount of time (order
of weeks or months). In the future, we intend to offer pre-pruned Postgres dumps with a few months of data.

## Configuration

Expand All @@ -17,14 +41,17 @@ You can enable pruning by setting the environment variable
EN_PRUNING_ENABLED: 'true'
```
By default, it will keep history for 7 days. You can configure retention period using:
By default, the node will keep L1 batch data for 7 days determined by the batch timestamp (always equal to the timestamp
of the first block in the batch). You can configure the retention period using:
```yaml
EN_PRUNING_DATA_RETENTION_SEC: '259200' # 3 days
```
The data retention can be set to any value, but for mainnet values under 21h will be ignored as the batch can only be
pruned as soon as it has been executed on Ethereum.
The retention period can be set to any value, but for mainnet values under 21h will be ignored because a batch can only
be pruned after it has been executed on Ethereum.
Pruning can be disabled or enabled and the data retention period can be freely changed during the node lifetime.
## Storage requirements for pruned nodes
Expand All @@ -35,6 +62,31 @@ The storage requirements depend on how long you configure to retain the data, bu
> [!NOTE]
>
> When pruning an existing archival node, Postgres will be unable to reclaim disk space automatically, to reclaim disk
> space, you need to manually run VACUUM FULL, which requires an ACCESS EXCLUSIVE lock, you can read more about it in
> [_postgres docs_](https://www.postgresql.org/docs/current/sql-vacuum.html)
> When pruning an existing archival node, Postgres will be unable to reclaim disk space automatically. To reclaim disk
> space, you need to manually run `VACUUM FULL`, which requires an `ACCESS EXCLUSIVE` lock. You can read more about it
> in [Postgres docs](https://www.postgresql.org/docs/current/sql-vacuum.html).

## Monitoring pruning

Pruning information is logged with the following targets:

- **Postgres pruning:** `zksync_node_db_pruner`
- **Merkle tree pruning:** `zksync_metadata_calculator::pruning`, `zksync_merkle_tree::pruning`.

To check whether Postgres pruning works as intended, you should look for logs like this:

```text
2024-06-20T07:26:03.415382Z INFO zksync_node_db_pruner: Soft pruned db l1_batches up to 8 and L2 blocks up to 29, operation took 14.850042ms
2024-06-20T07:26:04.433574Z INFO zksync_node_db_pruner::metrics: Performed pruning of database, deleted 1 L1 batches, 2 L2 blocks, 68 storage logs, 383 events, 27 call traces, 12 L2-to-L1 logs
2024-06-20T07:26:04.436516Z INFO zksync_node_db_pruner: Hard pruned db l1_batches up to 8 and L2 blocks up to 29, operation took 18.653083ms
```

(Obviously, timestamps and numbers in the logs will differ.)

Pruning logic also exports some metrics, the main of which are as follows:

| Metric name | Type | Labels | Description |
| ------------------------------------------------ | --------- | ------------ | --------------------------------------------------- |
| `db_pruner_not_pruned_l1_batches_count` | Gauge | - | Number of retained L1 batches |
| `db_pruner_pruning_chunk_duration_seconds` | Histogram | `prune_type` | Latency of a single pruning iteration |
| `merkle_tree_pruning_deleted_stale_key_versions` | Gauge | `bound` | Versions (= L1 batches) pruned from the Merkle tree |

0 comments on commit b4327b6

Please sign in to comment.