Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

table-filter: implement the new table filter #341

Merged
merged 3 commits into from
May 15, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 3 additions & 50 deletions pkg/filter/filter.go
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@
package filter

import (
"fmt"
"regexp"
"strings"
"sync"

"github.com/pingcap/errors"
tfilter "github.com/pingcap/tidb-tools/pkg/table-filter"
selector "github.com/pingcap/tidb-tools/pkg/table-rule-selector"
)

Expand All @@ -33,26 +33,7 @@ const (
)

// Table represents a table.
type Table struct {
Schema string `toml:"db-name" json:"db-name" yaml:"db-name"`
Name string `toml:"tbl-name" json:"tbl-name" yaml:"tbl-name"`
}

// String implements the fmt.Stringer interface.
func (t *Table) String() string {
if len(t.Name) > 0 {
return fmt.Sprintf("`%s`.`%s`", t.Schema, t.Name)
}
return fmt.Sprintf("`%s`", t.Schema)
}

// Clone clones a new filter.Table
func (t *Table) Clone() *Table {
return &Table{
Schema: t.Schema,
Name: t.Name,
}
}
type Table = tfilter.Table

type cache struct {
sync.RWMutex
Expand All @@ -74,35 +55,7 @@ func (c *cache) set(key string, action ActionType) {
}

// Rules contains Filter rules.
type Rules struct {
DoTables []*Table `json:"do-tables" toml:"do-tables" yaml:"do-tables"`
DoDBs []string `json:"do-dbs" toml:"do-dbs" yaml:"do-dbs"`

IgnoreTables []*Table `json:"ignore-tables" toml:"ignore-tables" yaml:"ignore-tables"`
IgnoreDBs []string `json:"ignore-dbs" toml:"ignore-dbs" yaml:"ignore-dbs"`
}

// ToLower convert all entries to lowercase
func (r *Rules) ToLower() {
if r == nil {
return
}

for _, table := range r.DoTables {
table.Name = strings.ToLower(table.Name)
table.Schema = strings.ToLower(table.Schema)
}
for _, table := range r.IgnoreTables {
table.Name = strings.ToLower(table.Name)
table.Schema = strings.ToLower(table.Schema)
}
for i, db := range r.IgnoreDBs {
r.IgnoreDBs[i] = strings.ToLower(db)
}
for i, db := range r.DoDBs {
r.DoDBs[i] = strings.ToLower(db)
}
}
type Rules = tfilter.MySQLReplicationRules

// Filter implements whitelist and blacklist filters.
type Filter struct {
Expand Down
225 changes: 225 additions & 0 deletions pkg/table-filter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Table Filter

A table filter is an interface which determines if a table or schema should be
accepted for some process or not given its name.

This package defines the format allowing users to specify the filter criteria
via command line or config files. This package is used by all tools in the TiDB
ecosystem.

## Examples

```go
package main

import (
"fmt"

"github.com/pingcap/tidb-tools/pkg/table-filter"
"github.com/spf13/pflag"
)

func main() {
args := pflag.StringArrayP("filter", "f", []string{"*.*"}, "table filter")
pflag.Parse()

f, err := filter.Parse(*args)
if err != nil {
panic(err)
}
f = filter.CaseInsensitive(f)

tables := []filter.Table{
{Schema: "employees", Name: "employees"},
{Schema: "employees", Name: "departments"},
{Schema: "employees", Name: "dept_manager"},
{Schema: "employees", Name: "dept_emp"},
{Schema: "employees", Name: "titles"},
{Schema: "employees", Name: "salaries"},
{Schema: "AdventureWorks.Person", Name: "Person"},
{Schema: "AdventureWorks.Person", Name: "Password"},
{Schema: "AdventureWorks.Sales", Name: "SalesOrderDetail"},
{Schema: "AdventureWorks.Sales", Name: "SalesOrderHeader"},
{Schema: "AdventureWorks.Production", Name: "WorkOrder"},
{Schema: "AdventureWorks.Production", Name: "WorkOrderRouting"},
{Schema: "AdventureWorks.Production", Name: "ProductPhoto"},
{Schema: "AdventureWorks.Production", Name: "TransactionHistory"},
{Schema: "AdventureWorks.Production", Name: "TransactionHistoryArchive"},
}

for _, table := range tables {
fmt.Printf("%5v: %v\n", f.MatchTable(table.Schema, table.Name), table)
}
}
```

Try to run with `./main -f 'employee.*' -f '*.WorkOrder'` and see the result.

## Syntax

### Whitelist

The input to the `filter.Parse()` function is a list of table filter rules.
Each rule specifies what the fully-qualified name of the table to be accepted.

```
db1.tbl1
db2.tbl2
db3.tbl3
```

A plain name must only consist of valid [identifier characters]
`[0-9a-zA-Z$_\U00000080-\U0010ffff]+`. All other ASCII characters are reserved.
Some punctuations have special meanings, described below.

### Wildcards

Each part of the name can be a wildcard symbol as in [fnmatch(3)]:
* `*` — matches zero or more characters
* `?` — matches one character
* `[a-z]` — matches one character between “a” and “z” inclusive
* `[!a-z]` — matches one character except “a” to “z”.

```
db[0-9].tbl[0-9][0-9]
data.*
*.backup_*
```

“Character” here means a Unicode code point, so e.g.
* U+00E9 (é) is 1 character.
* U+0065 U+0301 (é) are 2 characters.
* U+1F926 U+1F3FF U+200D U+2640 U+FE0F (🤦🏿‍♀️) are 5 characters.

### File import

Include an `@` at the beginning of the string to specify a file name, which
`filter.Parse()` reads every line as filter rules.

For example, if a file `config/filter.txt` has content:

```
employees.*
*.WorkOrder
```

the following two invocations would be equivalent:

```sh
./main -f '@config/filter.txt'
./main -f 'employees.*' -f '*.WorkOrder'
```

A filter file cannot further import another file.

### Comments and blank lines

Leading and trailing white-spaces of every line are trimmed.

Blank lines (empty strings) are ignored.

A leading `#` marks a comment and is ignored.
`#` not at start of line may be considered syntax error.

### Blacklist

An `!` at the beginning of the line means the pattern after it is used to
exclude tables from being processed. This effectively turns the filter into a
blacklist.

```ini
*.*
#^ note: must add the *.* to include all tables first
!*.Password
!employees.salaries
```

### Escape character

Precede any special character by a `\` to turn it into an identifier character.

```
AdventureWorks\.*.*
```

For simplicity and future compatibility, the following sequences are prohibited:
* `\` at the end of the line after trimming whitespaces (use “`[ ]`” to match a literal whitespace at the end).
* `\` followed by any ASCII alphanumeric character (`[0-9a-zA-Z]`). In particular, C-like escape sequences like `\0`, `\r`, `\n` and `\t` currently are meaningless.

### Quoted identifier

Besides `\`, special characters can also be escaped by quoting using `"` or `` ` ``.

```
"AdventureWorks.Person".Person
`AdventureWorks.Person`.Password
```

Quoted identifier cannot span multiple lines.

It is invalid to partially quote an identifier.

```
"this is "invalid*.*
```

### Regular expression

Use `/` to delimit regular expressions:

```
/^db\d{2,}$/./^tbl\d{2,}$/
```

These regular expressions use the [Go dialect]. The pattern is matched if the
identifier contains a substring matching the regular expression. For instance,
`/b/` matches `db01`.

(Note: every `/` in the regex must be escaped as `\/`, including inside `[`…`]`.
You cannot place an unescaped `/` between `\Q`…`\E`.)

[identifier characters]: https://dev.mysql.com/doc/refman/8.0/en/identifiers.html
[fnmatch(3)]: https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_13
[Go dialect]: https://pkg.go.dev/regexp/syntax?tab=doc

## Algorithm

### Default behavior

When a table name matches none of the rules in the filter list, the default
behavior is to ignore such unmatched tables.

To build a blacklist, an explicit `*.*` must be used as the first rule,
otherwise all tables will be excluded.

```sh
# every table will be filtered out
./main -f '!*.Password'

# only the "Password" table is filtered out, the rest are included.
./main -f '*.*' -f '!*.Password'
```

### Precedence

In a filter list, if a table name matches multiple patterns, the last match
decides the outcome. For instance, given

```ini
# rule 1
employees.*
# rule 2
!*.dep*
# rule 3
*.departments
```

We get:

| Table name | Rule 1 | Rule 2 | Rule 3 | Outcome |
|-----------------------|--------|--------|--------|------------------|
| irrelevant.table | | | | Default (reject) |
| employees.employees | ✓ | | | Rule 1 (accept) |
| employees.dept_emp | ✓ | ✓ | | Rule 2 (reject) |
| employees.departments | ✓ | ✓ | ✓ | Rule 3 (accept) |
| else.departments | | ✓ | ✓ | Rule 3 (accept) |
Loading