Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics: Remove the ineffective dirty IDs from the row count cache #56287

Merged
merged 6 commits into from
Mar 2, 2025

Conversation

Rustin170506
Copy link
Member

@Rustin170506 Rustin170506 commented Sep 25, 2024

What problem does this PR solve?

Issue Number: close #55803
Problem Summary:

What changed and how does it work?

As I mentioned in the issue #55803 (comment), the main problem is that UpdateByID anciently updates the modify_time even when the dirty tables have not been updated.

But as @time-and-fate mentioned, the maintenance of the dirty table follows a best-effort approach, so it would be better to delete it entirely.

Check List

Tests

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

修复 information_schema.tables 内存表信息不准确的问题
Fix the issue of inaccurate memory table information in `information_schema.tables`

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-triage-completed release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. sig/planner SIG: Planner and removed do-not-merge/needs-triage-completed labels Sep 25, 2024
Copy link

codecov bot commented Sep 25, 2024

Codecov Report

Attention: Patch coverage is 70.00000% with 3 lines in your changes missing coverage. Please review.

Project coverage is 73.2029%. Comparing base (ec18512) to head (f3b75ac).
Report is 3 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #56287        +/-   ##
================================================
+ Coverage   72.9851%   73.2029%   +0.2177%     
================================================
  Files          1699       1729        +30     
  Lines        469598     477589      +7991     
================================================
+ Hits         342737     349609      +6872     
- Misses       105774     106158       +384     
- Partials      21087      21822       +735     
Flag Coverage Δ
integration 53.1788% <ø> (?)
unit 72.1618% <70.0000%> (-0.0145%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 44.9720% <ø> (+0.0196%) ⬆️

@Rustin170506 Rustin170506 marked this pull request as draft September 26, 2024 08:21
@ti-chi-bot ti-chi-bot bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2024
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. label Oct 28, 2024
@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. label Dec 11, 2024
@Rustin170506 Rustin170506 marked this pull request as ready for review February 11, 2025 09:35
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 11, 2025
@Rustin170506 Rustin170506 force-pushed the rustin-patch-modify-time branch 2 times, most recently from 7f74d6f to 69353ca Compare February 12, 2025 08:13
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Feb 12, 2025
@Rustin170506 Rustin170506 changed the title statistics: Do not refresh the modification time if only update for one table statistics: Remove the ineffective dirty IDs from the row count cache Feb 13, 2025
@Rustin170506
Copy link
Member Author

Tested in the test-infra:

mysql> SHOW STATS_META where db_name='uds500k';
+---------+-------------+----------------+---------------------+--------------+-----------+---------------------+
| Db_name | Table_name  | Partition_name | Update_time         | Modify_count | Row_count | Last_analyze_time   |
+---------+-------------+----------------+---------------------+--------------+-----------+---------------------+
| uds500k | index_Data4 |                | 2025-02-13 19:33:14 |            0 |    500000 | 2025-02-13 19:33:14 |
| uds500k | index_Data3 |                | 2025-02-13 19:33:17 |            0 |    500000 | 2025-02-13 19:33:17 |
| uds500k | index_Data1 |                | 2025-02-13 19:33:20 |            0 |    500000 | 2025-02-13 19:33:20 |
| uds500k | index_Data2 |                | 2025-02-13 19:33:23 |            0 |    500000 | 2025-02-13 19:33:23 |
| uds500k | index_Data5 |                | 2025-02-13 19:33:26 |            0 |    500000 | 2025-02-13 19:33:26 |
| uds500k | Data4       |                | 2025-02-13 19:32:58 |            0 |    500000 | 2025-02-13 19:32:58 |
| uds500k | Data5       |                | 2025-02-13 19:33:01 |            0 |    500000 | 2025-02-13 19:33:01 |
| uds500k | Data2       |                | 2025-02-13 19:33:04 |            0 |    500000 | 2025-02-13 19:33:04 |
| uds500k | Data3       |                | 2025-02-13 19:33:07 |            0 |    500000 | 2025-02-13 19:33:07 |
| uds500k | Data1       |                | 2025-02-13 19:33:10 |            0 |    500000 | 2025-02-13 19:33:10 |
+---------+-------------+----------------+---------------------+--------------+-----------+---------------------+
10 rows in set (0.39 sec)

mysql> select table_name, avg_row_length, max_data_length, data_length, table_rows from information_schema.tables where table_schema = 'uds500k';
+-------------+----------------+-----------------+-------------+------------+
| table_name  | avg_row_length | max_data_length | data_length | table_rows |
+-------------+----------------+-----------------+-------------+------------+
| index_Data4 |            135 |               0 |    67991573 |     500000 |
| index_Data3 |            136 |               0 |    68001335 |     500000 |
| index_Data1 |            136 |               0 |    68002490 |     500000 |
| index_Data2 |            136 |               0 |    68003939 |     500000 |
| index_Data5 |            136 |               0 |    68004062 |     500000 |
| Data4       |             24 |               0 |    12000000 |     500000 |
| Data5       |             24 |               0 |    12000000 |     500000 |
| Data2       |             24 |               0 |    12000000 |     500000 |
| Data3       |             24 |               0 |    12000000 |     500000 |
| Data1       |             24 |               0 |    12000000 |     500000 |
+-------------+----------------+-----------------+-------------+------------+
10 rows in set (0.08 sec)

Copy link
Member Author

@Rustin170506 Rustin170506 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔢 Self-check (PR reviewed by myself and ready for feedback.)

@Rustin170506 Rustin170506 requested review from qw4990 and tangenta and removed request for winoros and time-and-fate February 26, 2025 06:27
@Rustin170506
Copy link
Member Author

Tested locally:

tiup playground v8.5.1 --db.host 127.0.0.1 --without-monitor --tiflash 0 --db 2

On tidb1:

use test;
create table t1(a int);create table t3(a int);create table t2(a int);
insert into t1 value(1);
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t2 select * from t1;
insert into t3 select * from t1;

Wait until on tidb1:

show stats_meta;
mysql> show stats_meta;
+---------+------------+----------------+---------------------+--------------+-----------+-------------------+
| Db_name | Table_name | Partition_name | Update_time         | Modify_count | Row_count | Last_analyze_time |
+---------+------------+----------------+---------------------+--------------+-----------+-------------------+
| test    | t1         |                | 2025-02-26 14:39:12 |        16384 |     16384 | NULL              |
| test    | t3         |                | 2025-02-26 14:39:12 |        16384 |     16384 | NULL              |
| test    | t2         |                | 2025-02-26 14:39:12 |        16384 |     16384 | NULL              |
+---------+------------+----------------+---------------------+--------------+-----------+-------------------+
3 rows in set (0.02 sec)

Then, on tidb2:

mysql> select table_name, avg_row_length, max_data_length, data_length, table_rows from information_schema.tables where table_schema = 'test';
+------------+----------------+-----------------+-------------+------------+
| table_name | avg_row_length | max_data_length | data_length | table_rows |
+------------+----------------+-----------------+-------------+------------+
| t1         |              8 |               0 |      131072 |      16384 |
| t3         |              8 |               0 |      131072 |      16384 |
| t2         |              8 |               0 |      131072 |      16384 |
+------------+----------------+-----------------+-------------+------------+
3 rows in set (0.02 sec)

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 26, 2025
return rows, err
}
} else {
// Even if the table is a partition table, we still need to update the stats cache for the table itself.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the reason.

Copy link
Member

@time-and-fate time-and-fate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others LGTM.

@ti-chi-bot ti-chi-bot bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. approved labels Mar 2, 2025
@Rustin170506
Copy link
Member Author

/reopen

Copy link

ti-chi-bot bot commented Mar 2, 2025

@Rustin170506: Failed to re-open PR: state cannot be changed. There are no new commits on the Rustin170506:rustin-patch-modify-time branch.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Rustin170506 Rustin170506 reopened this Mar 2, 2025
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 2, 2025
Copy link

ti-chi-bot bot commented Mar 2, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qw4990, tangenta, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Mar 2, 2025
@Rustin170506
Copy link
Member Author

/test all

@Rustin170506
Copy link
Member Author

/retest

@ti-chi-bot ti-chi-bot bot merged commit 2214bd0 into pingcap:master Mar 2, 2025
25 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Mar 2, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #59854.
But this PR has conflicts, please resolve them!

@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.1: #59855.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Mar 2, 2025
@Rustin170506 Rustin170506 added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Mar 3, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #59862.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. needs-cherry-pick-release-8.1 Should cherry pick this PR to release-8.1 branch. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/planner SIG: Planner size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

query results of information_schema.tables not accurate
5 participants