Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BR: Add a new doc about the batch create table #7983

2 changes: 2 additions & 0 deletions TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@
- [Back up and Restore Data on Azure Blob Storage](/br/backup-and-restore-azblob.md)
- BR Features
- [Auto Tune](/br/br-auto-tune.md)
- [Batch Create Table](/br/br-batch-create-table.md)
- [BR FAQ](/br/backup-and-restore-faq.md)
- [Configure Time Zone](/configure-time-zone.md)
- [Daily Checklist](/daily-check.md)
Expand Down Expand Up @@ -203,6 +204,7 @@
- [External Storages](/br/backup-and-restore-storages.md)
- BR Features
- [Auto Tune](/br/br-auto-tune.md)
- [Batch Create Table](/br/br-batch-create-table.md)
- [BR FAQ](/br/backup-and-restore-faq.md)
- TiDB Binlog
- [Overview](/tidb-binlog/tidb-binlog-overview.md)
Expand Down
6 changes: 6 additions & 0 deletions br/backup-and-restore-faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,12 @@ You can use [`filter.rules`](https://github.com/pingcap/tiflow/blob/7c3c2336f981

Yes. BR backs up the [`SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS`](/sql-statements/sql-statement-split-region.md#pre_split_regions) information of a table. The data of the restored table is also split into multiple Regions.

## What should I do if the restore fails with the error message `the entry too large, the max entry size is 6291456, the size of data is 7690800`?

You can try to reduce the number of tables to be created in a batch by setting `--ddl-batch-size` to `128` or a smaller value.

When using BR to restore the backup data with the value of [`--ddl-batch-size`](/br/br-batch-create-table.md#how to use) greater than `1`, TiDB writes a DDL job of table creation to the DDL jobs queue that is maintained by TiKV. At this time, the total size of all tables schema sent by TiDB at one time should not exceed 6 MB, because the maximum value of job messages is `6 MB` by default (it is **not recommended** to modify this value. For details, see [`txn-entry-size-limit`](/tidb-configuration-file.md#txn-entry-size-limit-new-in-v50) and [`raft-entry-max-size`](/tikv-configuration-file.md#raft-entry-max-size)). Therefore, if you set `--ddl-batch-size` to an excessively large value, the schema size of the tables sent by TiDB in a batch at one time exceeds the specified value, which causes BR to report the `entry too large, the max entry size is 6291456, the size of data is 7690800` error.

## Why is the `region is unavailable` error reported for a SQL query after I use BR to restore the backup data?

If the cluster backed up using BR has TiFlash, `TableInfo` stores the TiFlash information when BR restores the backup data. If the cluster to be restored does not have TiFlash, the `region is unavailable` error is reported.
Expand Down
65 changes: 65 additions & 0 deletions br/br-batch-create-table.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
---
title: Batch Create Table
summary: Learn how to use the Batch Create Table feature. When restoring data, BR can create tables in batches to speed up the restore process.
---

# Batch Create Table

When restoring data, Backup & Restore (BR) creates databases and tables in the target TiDB cluster and then restores the backup data to the tables. In versions earlier than TiDB v6.0.0, BR uses the [serial execution](#implementation-principles) implementation to create tables in the restore process. However, when BR restores data with a large number of tables (nearly 50000), this implementation takes much time on creating tables.

To speed up the table creation process and reduce the time for restoring data, the Batch Create Table feature is introduced in TiDB v6.0.0. This feature is enabled by default.

> **Note:**
>
> - To use the Batch Create Table feature, both TiDB and BR are expected to be of v6.0.0 or later. If either TiDB or BR is earlier than v6.0.0, BR uses the serial execution implementation.
> - Suppose that you use a cluster management tool (for example, TiUP), and your TiDB and BR are of v6.0.0 or later versions, or your TiDB and BR are upgraded from a version earlier than v6.0.0 to v6.0.0 or later. In this case, BR enables the Batch Create Table feature by default.

## Usage scenario

If you need to restore data with a massive amount of tables, for example, 50000 tables, you can use the Batch Create Table feature to speed up the restore process.

For the detailed effect, see [Test for the Batch Create Table Feature](#test-for-the-batch-create-table-feature).

## Use the Batch Create Table feature

BR enables the Batch Create Table feature by default, with the default configuration of `--ddl-batch-size=128` in v6.0.0 or later to speed up the restore process. Therefore, you do not need to configure this parameter. `--ddl-batch-size=128` means that BR creates tables in batches, each batch with 128 tables.

To disable this feature, you can set `--ddl-batch-size` to `0`. See the following example command:

{{< copyable "shell-regular" >}}

```shell
br restore full -s local:///br_data/ --pd 172.16.5.198:2379 --log-file restore.log --ddl-batch-size=0
```

After this feature is disabled, BR uses the [serial execution implementation](#implementation-principles) instead.

## Implementation principles

- Serial execution implementation before v6.0.0:

When restoring data, BR creates databases and tables in the target TiDB cluster and then restores the backup data to the tables. To create tables, BR calls TiDB internal API first, and then processes table creation tasks, which works similarly to executing the `Create Table` statement by BR. The TiDB DDL owner creates tables sequentially. Once the DDL owner creates a table, the DDL schema version changes correspondingly and each version change is synchronized to other TiDB DDL workers (including BR). Therefore, when BR restores a large number of tables, the serial execution implementation is time-consuming.

- Batch create table implementation since v6.0.0:

By default, BR creates tables in multiple batches, and each batch has 128 tables. Using this implementation, when BR creates one batch of tables, the TiDB schema version only changes once. This implementation significantly increases the speed of table creation.

## Test for the Batch Create Table feature

This section describes the test information about the Batch Create Table feature. The test environment is as follows:

- Cluster configurations:

- 15 TiKV instances. Each TiKV instance is equipped with 16 CPU cores, 80 GB memory, and 16 threads to process RPC requests ([`import.num-threads`](/tikv-configuration-file.md#num-threads) = 16).
- 3 TiDB instances. Each TiDB instance is equipped with 16 CPU cores, 32 GB memory.
- 3 PD instances. Each PD instance is equipped with 16 CPU cores, 32 GB memory.

- The size of data to be restored: 16.16 TB

The test result is as follows:

```
‘[2022/03/12 22:37:49.060 +08:00] [INFO] [collector.go:67] ["Full restore success summary"] [total-ranges=751760] [ranges-succeed=751760] [ranges-failed=0] [split-region=1h33m18.078448449s] [restore-ranges=542693] [total-take=1h41m35.471476438s] [restore-data-size(after-compressed)=8.337TB] [Size=8336694965072] [BackupTS=431773933856882690] [total-kv=148015861383] [total-kv-size=16.16TB] [average-speed=2.661GB/s]’
```

From the test result, you can see that the average speed of restoring one TiKV instance is as high as 181.65 MB/s (which equals to `average-speed`/`tikv_count`).