Skip to content

Commit

Permalink
tidb-lightning,br: replaced black-white-list by table-filter (#3065) (#…
Browse files Browse the repository at this point in the history
…3139)

Signed-off-by: ti-srebot <[email protected]>
  • Loading branch information
ti-srebot authored Jul 6, 2020
1 parent 0b5e512 commit df3d964
Show file tree
Hide file tree
Showing 5 changed files with 294 additions and 11 deletions.
3 changes: 2 additions & 1 deletion TOC.md
Original file line number Diff line number Diff line change
Expand Up @@ -343,6 +343,7 @@
- [Overview](/ecosystem-tool-user-guide.md)
- [Use Cases](/ecosystem-tool-user-case.md)
- [Download](/download-ecosystem-tools.md)
- [Table Filter](/table-filter.md)
+ Backup & Restore (BR)
- [Use BR](/br/backup-and-restore-tool.md)
- [BR Use Cases](/br/backup-and-restore-use-cases.md)
Expand All @@ -356,7 +357,7 @@
- [Deployment](/tidb-lightning/deploy-tidb-lightning.md)
- [Configuration](/tidb-lightning/tidb-lightning-configuration.md)
- [Checkpoints](/tidb-lightning/tidb-lightning-checkpoints.md)
- [Table Filter](/tidb-lightning/tidb-lightning-table-filter.md)
- [Table Filter](/table-filter.md)
- [CSV Support](/tidb-lightning/migrate-from-csv-using-tidb-lightning.md)
- [TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md)
- [Web Interface](/tidb-lightning/tidb-lightning-web-interface.md)
Expand Down
37 changes: 37 additions & 0 deletions br/backup-and-restore-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,6 +269,25 @@ For descriptions of other options, see [Back up all cluster data](#back-up-all-t

A progress bar is displayed in the terminal during the backup operation. When the progress bar advances to 100%, the backup is complete. Then the BR also checks the backup data to ensure data safety.

### Back up with table filter

To back up multiple tables with more complex criteria, execute the `br backup full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`.

**Usage example:**

The following command backs up the data of all tables in the form `db*.tbl*` to the `/tmp/backup` path on each TiKV node and writes the `backupmeta` file to this path.

{{< copyable "shell-regular" >}}

```shell
br backup full \
--pd "${PDIP}:2379" \
--filter 'db*.tbl*' \
--storage "local:///tmp/backup" \
--ratelimit 120 \
--log-file backupfull.log
```

### Back up data to Amazon S3 backend

If you back up the data to the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3.
Expand Down Expand Up @@ -443,6 +462,24 @@ br restore table \

In the above command, `--table` specifies the name of the table to be restored. For descriptions of other options, see [Restore all backup data](#restore-all-the-backup-data) and [Restore a database](#restore-a-database).

### Restore with table filter

To restore multiple tables with more complex criteria, execute the `br restore full` command and specify the [table filters](/table-filter.md) with `--filter` or `-f`.

**Usage example:**

The following command restores a subset of tables backed up in the `/tmp/backup` path to the cluster.

{{< copyable "shell-regular" >}}

```shell
br restore full \
--pd "${PDIP}:2379" \
--filter 'db*.tbl*' \
--storage "local:///tmp/backup" \
--log-file restorefull.log
```

### Restore data from Amazon S3 backend

If you restore data from the Amazon S3 backend, instead of `local` storage, you need to specify the S3 storage path in the `storage` sub-command, and allow the BR node and the TiKV node to access Amazon S3.
Expand Down
241 changes: 241 additions & 0 deletions table-filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
---
title: Table Filter
summary: Usage of table filter feature in TiDB tools.
category: reference
aliases: ['/docs/v3.1/tidb-lightning/tidb-lightning-table-filter/','/docs/v3.1/reference/tools/tidb-lightning/table-filter/','/tidb/v3.1/tidb-lightning-table-filter/']
---

# Table Filter

The TiDB ecosystem tools operate on all the databases by default, but oftentimes only a subset is needed. For example, you only want to work with the schemas in the form of `foo*` and `bar*` and nothing else.

Several TiDB ecosystem tools share a common filter syntax to define subsets. This document describes how to use the table filter feature.

## Usage

### CLI

Table filters can be applied to the tools using multiple `-f` or `--filter` command line parameters. Each filter is in the form of `db.table`, where each part can be a wildcard (further explained in the [next section](#wildcards)). The following lists the example usage in each tool.

* [BR](/br/backup-and-restore-tool.md):

{{< copyable "shell-regular" >}}

```shell
./br backup full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup'
# ^~~~~~~~~~~~~~~~~~~~~~~
./br restore full -f 'foo*.*' -f 'bar*.*' -s 'local:///tmp/backup'
# ^~~~~~~~~~~~~~~~~~~~~~~
```

* [Dumpling](/export-or-backup-using-dumpling.md):

{{< copyable "shell-regular" >}}

```shell
./dumpling -f 'foo*.*' -f 'bar*.*' -P 3306 -o /tmp/data/
# ^~~~~~~~~~~~~~~~~~~~~~~
```

* [Lightning](/tidb-lightning/tidb-lightning-overview.md):

{{< copyable "shell-regular" >}}

```shell
./tidb-lightning -f 'foo*.*' -f 'bar*.*' -d /tmp/data/ --backend tidb
# ^~~~~~~~~~~~~~~~~~~~~~~
```

### TOML configuration files

Table filters in TOML files are specified as [array of strings](https://toml.io/en/v1.0.0-rc.1#section-15). The following lists the example usage in each tool.

* Lightning:

```toml
[mydumper]
filter = ['foo*.*', 'bar*.*']
```

## Syntax

### Plain table names

Each table filter rule consists of a "schema pattern" and a "table pattern", separated by a dot (`.`). Tables whose fully-qualified name matches the rules are accepted.

```
db1.tbl1
db2.tbl2
db3.tbl3
```
A plain name must only consist of valid [identifier characters](/schema-object-names.md), such as:
* digits (`0` to `9`)
* letters (`a` to `z`, `A` to `Z`)
* `$`
* `_`
* non ASCII characters (U+0080 to U+10FFFF)
All other ASCII characters are reserved. Some punctuations have special meanings, as described in the next section.
### Wildcards
Each part of the name can be a wildcard symbol described in [fnmatch(3)](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13):
* `*` — matches zero or more characters
* `?` — matches one character
* `[a-z]` — matches one character between "a" and "z" inclusively
* `[!a-z]` — matches one character except "a" to "z".
```
db[0-9].tbl[0-9a-f][0-9a-f]
data.*
*.backup_*
```
"Character" here means a Unicode code point, such as:
* U+00E9 (é) is 1 character.
* U+0065 U+0301 (é) are 2 characters.
* U+1F926 U+1F3FF U+200D U+2640 U+FE0F (🤦🏿‍♀️) are 5 characters.
### File import
To import a file as the filter rule, include an `@` at the beginning of the rule to specify the file name. The table filter parser treats each line of the imported file as additional filter rules.
For example, if a file `config/filter.txt` has the following content:
```
employees.*
*.WorkOrder
```
the following two invocations are equivalent:
```bash
./dumpling -f '@config/filter.txt'
./dumpling -f 'employees.*' -f '*.WorkOrder'
```

A filter file cannot further import another file.

### Comments and blank lines

Inside a filter file, leading and trailing white-spaces of every line are trimmed. Furthermore, blank lines (empty strings) are ignored.

A leading `#` marks a comment and is ignored. `#` not at start of line is considered syntax error.

```
# this line is a comment
db.table # but this part is not comment and may cause error
```

### Exclusion

An `!` at the beginning of the rule means the pattern after it is used to exclude tables from being processed. This effectively turns the filter into a block list.

```
*.*
#^ note: must add the *.* to include all tables first
!*.Password
!employees.salaries
```

### Escape character

To turn a special character into an identifier character, precede it with a backslash `\`.

```
db\.with\.dots.*
```

For simplicity and future compatibility, the following sequences are prohibited:

* `\` at the end of the line after trimming whitespaces (use `[ ]` to match a literal whitespace at the end).
* `\` followed by any ASCII alphanumeric character (`[0-9a-zA-Z]`). In particular, C-like escape sequences like `\0`, `\r`, `\n` and `\t` currently are meaningless.

### Quoted identifier

Besides `\`, special characters can also be suppressed by quoting using `"` or `` ` ``.

```
"db.with.dots"."tbl\1"
`db.with.dots`.`tbl\2`
```

The quotation mark can be included within an identifier by doubling itself.

```
"foo""bar".`foo``bar`
# equivalent to:
foo\"bar.foo\`bar
```

Quoted identifiers cannot span multiple lines.

It is invalid to partially quote an identifier:

```
"this is "invalid*.*
```

### Regular expression

In case very complex rules are needed, each pattern can be written as a regular expression delimited with `/`:

```
/^db\d{2,}$/./^tbl\d{2,}$/
```

These regular expressions use the [Go dialect](https://pkg.go.dev/regexp/syntax?tab=doc). The pattern is matched if the identifier contains a substring matching the regular expression. For instance, `/b/` matches `db01`.

> **Note:**
>
> Every `/` in the regular expression must be escaped as `\/`, including inside `[…]`. You cannot place an unescaped `/` between `\Q…\E`.
## Multiple rules

When a table name matches none of the rules in the filter list, the default behavior is to ignore such unmatched tables.

To build a block list, an explicit `*.*` must be used as the first rule, otherwise all tables will be excluded.

```bash
# every table will be filtered out
./dumpling -f '!*.Password'

# only the "Password" table is filtered out, the rest are included.
./dumpling -f '*.*' -f '!*.Password'
```

In a filter list, if a table name matches multiple patterns, the last match decides the outcome. For instance:

```
# rule 1
employees.*
# rule 2
!*.dep*
# rule 3
*.departments
```

The filtered outcome is as follows:

| Table name | Rule 1 | Rule 2 | Rule 3 | Outcome |
|-----------------------|--------|--------|--------|------------------|
| irrelevant.table | | | | Default (reject) |
| employees.employees || | | Rule 1 (accept) |
| employees.dept_emp ||| | Rule 2 (reject) |
| employees.departments |||| Rule 3 (accept) |
| else.departments | ||| Rule 3 (accept) |

> **Note:**
>
> In TiDB tools, the system schemas are always excluded regardless of the table filter settings. The system schemas are:
>
> * `INFORMATION_SCHEMA`
> * `PERFORMANCE_SCHEMA`
> * `METRICS_SCHEMA`
> * `INSPECTION_SCHEMA`
> * `mysql`
> * `sys`
8 changes: 4 additions & 4 deletions tidb-lightning/tidb-lightning-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@ no-schema = false
# schema encoding.
character-set = "auto"

# Only import tables if these wildcard rules are matched. See the corresponding section for details.
filter = ['*.*']

# Configures how CSV files are parsed.
[mydumper.csv]
# Separator between fields, should be an ASCII character.
Expand Down Expand Up @@ -205,10 +208,6 @@ analyze = true
switch-mode = "5m"
# Duration between which an import progress is printed to the log.
log-progress = "5m"

# Table filter options. See the corresponding section for details.
# [black-white-list]
# ...
```

### TiKV Importer
Expand Down Expand Up @@ -289,6 +288,7 @@ min-available-ratio = 0.05
| -V | Prints program version | |
| -d *directory* | Directory of the data dump to read from | `mydumper.data-source-dir` |
| -L *level* | Log level: debug, info, warn, error, fatal (default = info) | `lightning.log-level` |
| -f *rule* | [Table filter rules](/table-filter.md) (can be specified multiple times) | `mydumper.filter` |
| --log-file *file* | Log file path | `lightning.log-file` |
| --status-addr *ip:port* | Listening address of the TiDB Lightning server | `lightning.status-port` |
| --importer *host:port* | Address of TiKV Importer | `tikv-importer.addr` |
Expand Down
16 changes: 10 additions & 6 deletions tidb-lightning/tidb-lightning-glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,6 @@ Back end is the destination where TiDB Lightning sends the parsed result. Also s

See [TiDB Lightning TiDB-backend](/tidb-lightning/tidb-lightning-tidb-backend.md) for details.

### Black-white list

A configuration list that specifies which tables to be imported and which should be excluded.

See [TiDB Lightning Table Filter](/tidb-lightning/tidb-lightning-table-filter.md) for details.

<!-- C -->

## C
Expand Down Expand Up @@ -101,6 +95,16 @@ Engines use TiKV Importer's `import-dir` as temporary storage, which are sometim

See also [data engine](/tidb-lightning/tidb-lightning-glossary.md#data-engine) and [index engine](/tidb-lightning/tidb-lightning-glossary.md#index-engine).

<!-- F -->

## F

### Filter

A configuration list that specifies which tables to be imported or excluded.

See [Table Filter](/table-filter.md) for details.

<!-- I -->

## I
Expand Down

0 comments on commit df3d964

Please sign in to comment.