Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: track the memory usage of runtime info #44048

Closed
wants to merge 8 commits into from

Conversation

wshwsh12
Copy link
Contributor

@wshwsh12 wshwsh12 commented May 22, 2023

What problem does this PR solve?

Issue Number: close #44047

Problem Summary: Track the memory usage of Runtime info.
Fix the memory leak in Runtime info.

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    Test the case in issue
# Run the sql in master: (Can't track the memroy usage)
tidb> desc analyze select /*+ use_index(t,idx) */ count(b) from t;
+---------------------------------+-------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------+------+
| id                              | estRows     | actRows  | task      | access object         | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | operator info                   | memory   | disk |
+---------------------------------+-------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------+------+
| HashAgg_12                      | 1.00        | 1        | root      |                       | time:33.9s, loops:2, RU:726035.861466, partial_worker:{wall_time:33.949118855s, concurrency:5, task_num:13446, tot_wait:2m49.270174683s, tot_exec:473.369057ms, tot_time:2m49.745449529s, max:33.949104798s, p95:33.949104798s}, final_worker:{wall_time:0s, concurrency:5, task_num:5, tot_wait:2m49.745579104s, tot_exec:19.056µs, tot_time:2m49.745600176s, max:33.949126356s, p95:33.949126356s}                                                                                                                                                                                                                                                                                                         | funcs:count(Column#7)->Column#4 | 260.2 KB | N/A  |
| └─IndexLookUp_13                | 1.00        | 13768128 | root      |                       | time:33.9s, loops:13447, index_task: {total_time: 33.8s, fetch_handle: 513.8ms, build: 1.76ms, wait: 33.3s}, table_task: {total_time: 2m49.6s, num: 1437, concurrency: 5}, next: {wait_index: 4.25ms, wait_table_lookup_build: 1.19s, wait_table_lookup_resp: 32.5s}                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                 | 8.17 MB  | N/A  |
|   ├─IndexFullScan_10(Build)     | 14680064.00 | 14680064 | cop[tikv] | table:t, index:idx(a) | time:17.8ms, loops:14393, cop_task: {num: 428, max: 43.3ms, min: 206.2µs, avg: 11.3ms, p95: 24.5ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 3.71s, tot_wait: 67ms, rpc_num: 428, rpc_time: 4.81s, copr_cache_hit_ratio: 0.00, build_task_duration: 2.14ms, max_distsql_concurrency: 15}, tikv_task:{proc max:38ms, min:0s, avg: 8.82ms, p80:13ms, p95:19ms, iters:16038, tasks:428}, scan_detail: {total_process_keys: 14680064, total_process_keys_size: 675282944, total_keys: 14680492, get_snapshot_time: 8.27ms, rocksdb: {key_skipped_count: 14680064, block: {cache_hit_count: 19609}}}                                                                                                 | keep order:false                | N/A      | N/A  |
|   └─HashAgg_6(Probe)            | 1.00        | 13768128 | cop[tikv] |                       | time:1m59.1s, loops:15261, cop_task: {num: 13768128, max: 105ms, min: 0s, avg: 1.32ms, p95: 8.91ms, max_proc_keys: 11, p95_proc_keys: 2, tot_proc: 6m8s, tot_wait: 2h41m25.7s, rpc_num: 2755361, rpc_time: 4h59m35.8s, copr_cache_hit_ratio: 0.00, build_task_duration: 45.4s, max_distsql_concurrency: 1, max_extra_concurrency: 640, store_batch_num: 11012767}, tikv_task:{proc max:50ms, min:0s, avg: 26.1µs, p80:0s, p95:0s, iters:13768128, tasks:13768128}, scan_detail: {total_process_keys: 14680064, total_process_keys_size: 556021760, total_keys: 14680064, get_snapshot_time: 3m30.4s, rocksdb: {block: {cache_hit_count: 43036410, read_count: 2, read_byte: 4.21 KB, read_time: 146.4µs}}}   | funcs:count(test.t.b)->Column#7 | N/A      | N/A  |
|     └─TableRowIDScan_11         | 14680064.00 | 14680064 | cop[tikv] | table:t               | tikv_task:{proc max:50ms, min:0s, avg: 22.3µs, p80:0s, p95:0s, iters:13768128, tasks:13768128}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | keep order:false                | N/A      | N/A  |
+---------------------------------+-------------+----------+-----------+-----------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------+------+
5 rows in set (34.237 sec)

# Run the sql in this patch with mem_quota=1GB:
tidb> desc analyze select /*+ use_index(t,idx) */ count(b) from t;
ERROR 1105 (HY000): Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query. Please try narrowing your query scope or increase the tidb_mem_quota_query limit and try again.[conn=6086076135693615509]

# Run the sql in this patch without mem_quota:
tidb> desc analyze select /*+ use_index(t,idx) */ count(b) from t;
+---------------------------------+-------------+----------+-----------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------+------+
| id                              | estRows     | actRows  | task      | access object         | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | operator info                   | memory   | disk |
+---------------------------------+-------------+----------+-----------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------+------+
| HashAgg_12                      | 1.00        | 1        | root      |                       | time:34.6s, loops:2, RU:727274.213325, partial_worker:{wall_time:34.607099961s, concurrency:5, task_num:13492, tot_wait:2m52.580460846s, tot_exec:452.76961ms, tot_time:2m53.035101947s, max:34.607028781s, p95:34.607028781s}, final_worker:{wall_time:0s, concurrency:5, task_num:5, tot_wait:2m53.035154278s, tot_exec:19.045µs, tot_time:2m53.035175897s, max:34.607042004s, p95:34.607042004s}                                                                                                                                                                                                                                                                                                            | funcs:count(Column#7)->Column#4 | 260.2 KB | N/A  |
| └─IndexLookUp_13                | 1.00        | 13814933 | root      |                       | time:34.5s, loops:13493, index_task: {total_time: 34.5s, fetch_handle: 510.4ms, build: 1.75ms, wait: 34s}, table_task: {total_time: 2m52.9s, num: 1437, concurrency: 5}, next: {wait_index: 3.56ms, wait_table_lookup_build: 106.8ms, wait_table_lookup_resp: 34.2s}                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                 | 2.16 GB  | N/A  |
|   ├─IndexFullScan_10(Build)     | 14680064.00 | 14680064 | cop[tikv] | table:t, index:idx(a) | time:25.5ms, loops:14399, cop_task: {num: 428, max: 36.1ms, min: 335.9µs, avg: 11.9ms, p95: 23.6ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4.07s, tot_wait: 75.1ms, rpc_num: 428, rpc_time: 5.09s, copr_cache_hit_ratio: 0.04, build_task_duration: 83.2µs, max_distsql_concurrency: 15}, tikv_task:{proc max:32ms, min:0s, avg: 9.88ms, p80:14ms, p95:21ms, iters:16038, tasks:428}, scan_detail: {total_process_keys: 14676704, total_process_keys_size: 675128384, total_keys: 14677117, get_snapshot_time: 18.9ms, rocksdb: {key_skipped_count: 14676704, block: {cache_hit_count: 10663, read_count: 8897, read_byte: 6.27 MB, read_time: 169.7ms}}}                                       | keep order:false                | N/A      | N/A  |
|   └─HashAgg_6(Probe)            | 1.00        | 13814933 | cop[tikv] |                       | time:2m7.8s, loops:15301, cop_task: {num: 13814933, max: 186.7ms, min: 0s, avg: 1.1ms, p95: 7.64ms, max_proc_keys: 10, p95_proc_keys: 2, tot_proc: 5m47.4s, tot_wait: 2h0m33.8s, rpc_num: 2764679, rpc_time: 4h11m36.6s, copr_cache_hit_ratio: 0.00, build_task_duration: 40.1s, max_distsql_concurrency: 1, max_extra_concurrency: 640, store_batch_num: 11050254}, tikv_task:{proc max:48ms, min:0s, avg: 23.9µs, p80:0s, p95:0s, iters:13814933, tasks:13814933}, scan_detail: {total_process_keys: 14679859, total_process_keys_size: 556014020, total_keys: 14679859, get_snapshot_time: 6m49.6s, rocksdb: {block: {cache_hit_count: 43156949, read_count: 486, read_byte: 1.73 MB, read_time: 96.1ms}}}  | funcs:count(test.t.b)->Column#7 | N/A      | N/A  |
|     └─TableRowIDScan_11         | 14680064.00 | 14680064 | cop[tikv] | table:t               | tikv_task:{proc max:48ms, min:0s, avg: 20.2µs, p80:0s, p95:0s, iters:13814933, tasks:13814933}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | keep order:false                | N/A      | N/A  |
+---------------------------------+-------------+----------+-----------+-----------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------+----------+------+
5 rows in set (34.911 sec)

Grafana:
(From left to right are this patch with 1GB mem quota, this patch without mem_quota, master in 1GB mem quota)
image

  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link

ti-chi-bot bot commented May 22, 2023

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 22, 2023
@wshwsh12 wshwsh12 changed the title Fix runtime info *: track the memory usage of runtime info May 22, 2023
@wshwsh12
Copy link
Contributor Author

/cc @XuHuaiyu @guo-shaoge

@ti-chi-bot ti-chi-bot bot requested review from guo-shaoge and XuHuaiyu May 22, 2023 07:43
Comment on lines 54 to 61
// detailsNeedP90StructSize indicates the size of DetailsNeedP90Size, used for memory tracking.
const detailsNeedP90StructSize = int64(unsafe.Sizeof(DetailsNeedP90{}))

// backoffEmptyMapSzie indicates the size of BackoffSleep and BackoffTimes when the map is empty.
const backoffEmptyMapSize = 48 * 2

// AvgMemorySizeForDetailsNeedP90 is the avg size of DetailsNeedP90 need.
const AvgMemorySizeForDetailsNeedP90 = detailsNeedP90StructSize + backoffEmptyMapSize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// detailsNeedP90StructSize indicates the size of DetailsNeedP90Size, used for memory tracking.
const detailsNeedP90StructSize = int64(unsafe.Sizeof(DetailsNeedP90{}))
// backoffEmptyMapSzie indicates the size of BackoffSleep and BackoffTimes when the map is empty.
const backoffEmptyMapSize = 48 * 2
// AvgMemorySizeForDetailsNeedP90 is the avg size of DetailsNeedP90 need.
const AvgMemorySizeForDetailsNeedP90 = detailsNeedP90StructSize + backoffEmptyMapSize
// AvgMemorySizeForDetailsNeedP90 is the estimated size of DetailsNeedP90 struct.
// "48" represents the base memory overhead of a Go map after initialization. Here, the memory overhead of the \`BackoffSleep\` and \`BackoffTimes\` member attributes is recorded.
const AvgMemorySizeForDetailsNeedP90 = int64(unsafe.Sizeof(DetailsNeedP90{})) + 48*2

@wshwsh12
Copy link
Contributor Author

/test unit-test

@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 29, 2023
@wshwsh12 wshwsh12 force-pushed the fix-runtime-info branch from b962874 to 274a5ab Compare May 29, 2023 07:52
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 29, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented May 29, 2023

@wshwsh12: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/check_dev 274a5ab link true /test check-dev
idc-jenkins-ci-tidb/unit-test 274a5ab link true /test unit-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wshwsh12 wshwsh12 closed this May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Track the memory usage of Runtime Info
2 participants