Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiDB OOM during dumpling 2500 tables #30880

Closed
fubinzh opened this issue Dec 20, 2021 · 9 comments · Fixed by #31723 or #32554
Closed

TiDB OOM during dumpling 2500 tables #30880

fubinzh opened this issue Dec 20, 2021 · 9 comments · Fixed by #31723 or #32554
Assignees
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. found/automation Found by automation tests severity/critical type/bug The issue is confirmed as a bug.

Comments

@fubinzh
Copy link

fubinzh commented Dec 20, 2021

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

  1. Use br to restore a backup with 2500+ tables
/br  restore  full "-s" "s3://bank/shema-full?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http://minio.pingcap.net:9000&force-path-style=true" "-u" "http://downstream-pd.brie-acceptance--tps-512568-1-547:2379"
...
mysql> SELECT count(*) AS TOTALNUMBEROFTABLES FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA like '%ens%';
+---------------------+
| TOTALNUMBEROFTABLES |
+---------------------+
|                2544 |
+---------------------+
  1. Use dumpling to dumpling the TiDB

2. What did you expect to see? (Required)

Dumpling should succeed

3. What did you see instead (Required)

Dumpling failed due to TiDB OOM
tidb_oom

TiDB log

[2021/12/20 05:46:14.375 +00:00] [WARN] [memory_usage_alarm.go:140] ["tidb-server has the risk of OOM. Running SQLs and heap profile will be recorded in record path"] ["is server-memory-quota set"=false] ["system memory total"=17179869184] ["system memory usage"=13784551424] ["tidb-server memory usage"=13084344168] [memory-usage-alarm-ratio=0.8] ["record path"="/tmp/0_tidb/MC4wLjAuMDo0MDAwLzAuMC4wLjA6MTAwODA=/tmp-storage/record"]

Dumpling log

[2021/12/20 05:46:35.152 +00:00] [DEBUG] [writer.go:168] ["trying to dump table chunk"] [retryTime=2] [db=ens_rb004] [table=rb_acct_attach] [chunkIndex=5] [lastError="sql: SELECT * FROM `ens_rb004`.`rb_acct_attach` WHERE `_tidb_rowid`>=3063676 and `_tidb_rowid`<4023676  ORDER BY `_tidb_rowid`: invalid connection"] [lastErrorVerbose="invalid connection\nsql: SELECT * FROM `ens_rb004`.`rb_acct_attach` WHERE `_tidb_rowid`>=3063676 and `_tidb_rowid`<4023676  ORDER BY `_tidb_rowid`\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*tableData).Start\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/ir_impl.go:210\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:178\ngithub.jparrowsec.cn/pingcap/tidb/br/pkg/utils.WithRetry\n\tgithub.jparrowsec.cn/pingcap/[email protected]/br/pkg/utils/retry.go:58\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:160\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).handleTask\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:103\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).run\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:85\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/dump.go:282\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371"]
[2021/12/20 05:46:35.152 +00:00] [DEBUG] [writer.go:168] ["trying to dump table chunk"] [retryTime=2] [db=ens_rb004] [table=rb_acct_attach] [chunkIndex=4] [lastError="sql: SELECT * FROM `ens_rb004`.`rb_acct_attach` WHERE `_tidb_rowid`>=2103676 and `_tidb_rowid`<3063676  ORDER BY `_tidb_rowid`: invalid connection"] [lastErrorVerbose="invalid connection\nsql: SELECT * FROM `ens_rb004`.`rb_acct_attach` WHERE `_tidb_rowid`>=2103676 and `_tidb_rowid`<3063676  ORDER BY `_tidb_rowid`\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*tableData).Start\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/ir_impl.go:210\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:178\ngithub.jparrowsec.cn/pingcap/tidb/br/pkg/utils.WithRetry\n\tgithub.jparrowsec.cn/pingcap/[email protected]/br/pkg/utils/retry.go:58\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:160\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).handleTask\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:103\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).run\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:85\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/dump.go:282\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371"]
[2021/12/20 05:46:35.242 +00:00] [DEBUG] [writer.go:168] ["trying to dump table chunk"] [retryTime=2] [db=ens_rb004] [table=rb_acct_attach] [chunkIndex=3] [lastError="invalid connection"] [lastErrorVerbose="invalid connection\ngithub.jparrowsec.cn/pingcap/errors.AddStack\n\tgithub.jparrowsec.cn/pingcap/[email protected]/errors.go:174\ngithub.jparrowsec.cn/pingcap/errors.Trace\n\tgithub.jparrowsec.cn/pingcap/[email protected]/juju_adaptor.go:15\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*rowIter).Error\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/ir_impl.go:42\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.WriteInsertInCsv\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer_util.go:392\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.FileFormat.WriteInsert\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer_util.go:628\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:204\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:189\ngithub.jparrowsec.cn/pingcap/tidb/br/pkg/utils.WithRetry\n\tgithub.jparrowsec.cn/pingcap/[email protected]/br/pkg/utils/retry.go:58\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:160\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).handleTask\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:103\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).run\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:85\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/dump.go:282\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371"]
[2021/12/20 05:46:35.294 +00:00] [DEBUG] [writer.go:168] ["trying to dump table chunk"] [retryTime=2] [db=ens_rb004] [table=rb_acct_attach] [chunkIndex=2] [lastError="invalid connection"] [lastErrorVerbose="invalid connection\ngithub.jparrowsec.cn/pingcap/errors.AddStack\n\tgithub.jparrowsec.cn/pingcap/[email protected]/errors.go:174\ngithub.jparrowsec.cn/pingcap/errors.Trace\n\tgithub.jparrowsec.cn/pingcap/[email protected]/juju_adaptor.go:15\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*rowIter).Error\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/ir_impl.go:42\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.WriteInsertInCsv\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer_util.go:392\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.FileFormat.WriteInsert\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer_util.go:628\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).tryToWriteTableData\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:204\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData.func1\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:189\ngithub.jparrowsec.cn/pingcap/tidb/br/pkg/utils.WithRetry\n\tgithub.jparrowsec.cn/pingcap/[email protected]/br/pkg/utils/retry.go:58\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).WriteTableData\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:160\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).handleTask\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:103\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Writer).run\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/writer.go:85\ngithub.jparrowsec.cn/pingcap/dumpling/v4/export.(*Dumper).startWriters.func4\n\tgithub.jparrowsec.cn/pingcap/dumpling/v4/export/dump.go:282\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/[email protected]/errgroup/errgroup.go:57\nruntime.goexit\n\truntime/asm_amd64.s:1371"]

4. What is your TiDB version? (Required)

TiDB Version
Release Version: v5.4.0-nightly
Edition: Community
Git Commit Hash: 24d970f
Git Branch: heads/refs/tags/v5.4.0-nightly
UTC Build Time: 2021-12-20 00:13:04
GoVersion: go1.16.4
Race Enabled: false
TiKV Min Version: v3.0.0-60965b006877ca7234adaced7890d7b029ed1306
Check Table Before Drop: false

Dumpling version
Release version: v5.4.0-nightly
Git commit hash: 05b0b48d711a95ae330c08d07ec09543254cea6b
Git branch: heads/refs/tags/v5.4.0-nightly
Build timestamp: 2021-12-19 04:08:28Z
Go version: go version go1.16.4 linux/amd64

tidb.log

dumpling_t4.log

@fubinzh fubinzh added type/bug The issue is confirmed as a bug. severity/major found/automation Found by automation tests labels Dec 20, 2021
@fubinzh
Copy link
Author

fubinzh commented Dec 20, 2021

Tested with TiDB & Dumpling 5.3.0, dumpling succeed when --threads =4, while it fails when --threads = 20.
For TiDB & Dumpling 5.4.0, it fails for both cases.

@fubinzh fubinzh added the affects-5.3 This bug affects 5.3.x versions. label Dec 20, 2021
@wjhuang2016
Copy link
Member

@fubinzh What's configuration of the TiDB server?

@fubinzh
Copy link
Author

fubinzh commented Dec 21, 2021

@wjhuang2016 default configuration was used.

/ # ps -ef  |grep tidb
    1 root      0:21 /tidb-server --store=tikv --advertise-address=dst-tidb-tidb-0.dst-tidb-tidb-peer.ou-br-statistic55vwh.svc --host=0.0.0.0 --path=dst-tidb-pd:2379 --config=/etc/tidb/tidb.toml
   48 root      0:00 grep tidb
/ # cat /etc/tidb/tidb.toml
[log]
  [log.file]
    filename = "/var/lib/tidb/log/tidb.log"
    max-backups = 3

@hawkingrei
Copy link
Member

profile001

@winoros
Copy link
Member

winoros commented Dec 22, 2021

This is a duplicate one, whose cause is #29749

@winoros
Copy link
Member

winoros commented Dec 28, 2021

We re-investigate the issue.
It's nothing about the analyze. Should be the bug of dumpling itself.

@winoros winoros removed their assignment Dec 28, 2021
@jebter jebter added the component/dumpling This is related to Dumpling of TiDB. label Dec 29, 2021
@cyliu0 cyliu0 removed the component/dumpling This is related to Dumpling of TiDB. label Jan 10, 2022
@jebter jebter added the affects-5.4 This bug affects the 5.4.x(LTS) versions. label Jan 11, 2022
@tiancaiamao
Copy link
Contributor

We re-investigate the issue. It's nothing about the analyze. Should be the bug of dumpling itself.

No, it's nothing about dumpling ... @winoros
The root cause is the chunk rpc protocol.

If you change the TiDB config file, set config tikv-client.enable-chunk-rpc=false,
the memory usage is low when using dumpling, around 300M-900M most of the time
By default, chunk RPC is used, and you can see the picture on the left comparing to the right:

image

@github-actions
Copy link

Please check whether the issue should be labeled with 'affects-x.y' or 'fixes-x.y.z', and then remove 'needs-more-info' label.

@tiancaiamao
Copy link
Contributor

Reopen it because it's not really fixed, it's workarounded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects the 5.4.x(LTS) versions. found/automation Found by automation tests severity/critical type/bug The issue is confirmed as a bug.
Projects
None yet
8 participants