Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RS: Database auto recovery #3087

Merged
merged 1 commit into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 29 additions & 3 deletions content/rs/databases/recover.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ When a cluster fails or a database is corrupted, you must:
1. [Restore the cluster configuration]({{< relref "/rs/clusters/cluster-recovery.md" >}}) from the CCS files
1. Recover the databases with their previous configuration and data

To restore the data that was in the databases to databases in the new cluster
To restore data to databases in the new cluster,
you must restore the database persistence files (backup, AOF, or snapshot files) to the databases.
These files are stored in the [persistence storage location]({{< relref "/rs/installing-upgrading/install/plan-deployment/persistent-ephemeral-storage" >}}).

Expand Down Expand Up @@ -67,7 +67,7 @@ of the configuration and persistence files on each of the nodes.
The status for each database can be either ready for recovery or missing files.
An indication of missing files in any of the databases can result from:

- The storage location is not found - Make sure that on all of the nodes in the cluster the recovery path is set correctly.
- The storage location is not found - Make sure the recovery path is set correctly on all nodes in the cluster.
- Files are not found in the storage location - Move the files to the storage location.
- No permission to read the files - Change the file permissions so that redislabs:redislabs has 640 permissions.
- Files are corrupted - Locate copies of the files that are not corrupted.
Expand Down Expand Up @@ -105,7 +105,7 @@ of the configuration and persistence files on each of the nodes.
{{< note >}}
- If persistence was not configured for the database, the database is restored empty.
- For Active-Active databases that still have live instances, we recommend that you recover the configuration for the failed instances and let the data update from the other instances.
- For Active-Active databases that all instances need to be recovered, we recommend that you recover one instance with the data and only recover the configuration for the other instances.
- For Active-Active databases where all instances need to be recovered, we recommend you recover one instance with the data and only recover the configuration for the other instances.
The empty instances then update from the recovered data.
- If the persistence files of the databases from the old cluster are not stored in the persistent storage location of the new node,
you must first map the recovery path of each node to the location of the old persistence files.
Expand All @@ -120,3 +120,29 @@ of the configuration and persistence files on each of the nodes.
```

After the databases are recovered, make sure your Redis clients can successfully connect to the databases.

## Configure automatic recovery

If you enable the automatic recovery cluster policy, Redis Enterprise tries to quickly recover as much data as possible from before the disaster.

To enable automatic recovery, [update the cluster policy]({{<relref "/rs/references/rest-api/requests/cluster/policy#put-cluster-policy">}}) using the REST API:

```sh
PUT /v1/cluster/policy
{
"auto_recovery": true
}
```

For each database, you can set the `recovery_wait_time` to define how many seconds the database waits for a persistence file to become available before recovery. The default value is `-1`, which means to wait forever. Short wait times can increase the risk of potential data loss.

To change `recovery_wait_time` for an existing database using the REST API:

```sh
PUT /v1/bdbs/<bdb_uid>
{
"recovery_wait_time": 3600
}
```

You can also set `recovery_wait_time` when you [create a database]({{<relref "/rs/references/rest-api/requests/bdbs#post-bdbs-v1">}}) using the REST API.
1 change: 1 addition & 0 deletions content/rs/references/rest-api/objects/bdb/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ An API object that represents a managed database in the cluster.
| port | integer | TCP port on which the database is available. Generated automatically if omitted and returned as 0 |
| proxy_policy | 'single'<br />'all-master-shards'<br />'all-nodes' | The default policy used for proxy binding to endpoints |
| rack_aware | boolean (default:&nbsp;false) | Require the database to always replicate across multiple racks |
| recovery_wait_time | integer (default:&nbsp;-1) | Defines how many seconds to wait for the persistence file to become available during auto recovery. After the wait time expires, auto recovery completes with potential data loss. The default `-1` means to wait forever. |
| redis_version | string | Version of the redis-server processes: e.g. 6.0, 5.0-big |
| repl_backlog_size | string | Redis replication backlog size ('auto' or size in bytes) |
| replica_sources | array of [syncer_sources]({{<relref "/rs/references/rest-api/objects/bdb/syncer_sources">}}) objects | Remote endpoints of database to sync from. See the 'bdb -\> replica_sources' section |
Expand Down
1 change: 1 addition & 0 deletions content/rs/references/rest-api/objects/cluster_settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ Cluster resources management policy
| Name | Type/Value | Description |
|------|------------|-------------|
| acl_pubsub_default | `resetchannels`<br /> `allchannels` | Default pub/sub ACL rule for all databases in the cluster:<br />•`resetchannels` blocks access to all channels (restrictive)<br />•`allchannels` allows access to all channels (permissive) |
| auto_recovery | boolean (default:&nbsp;false) | Defines whether to use automatic recovery after shard failure |
| bigstore_migrate_node_threshold | integer | Minimum free memory (excluding reserved memory) allowed on a node before automatic migration of shards from it to free more memory |
| bigstore_migrate_node_threshold_p | integer | Minimum free memory (excluding reserved memory) allowed on a node before automatic migration of shards from it to free more memory |
| bigstore_provision_node_threshold | integer | Minimum free memory (excluding reserved memory) allowed on a node before new shards can no longer be added to it |
Expand Down
Loading