statistics: do not copy and paste the code for saving statistics #55046

Rustin170506 · 2024-07-30T07:10:09Z

What problem does this PR solve?

Issue Number: ref #55043

Problem Summary:

What changed and how does it work?

Just reuse the concurrent code to handle the single-threaded case.

Check List

Tests

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Rustin170506 · 2024-07-30T07:18:12Z

Tested locally:

Create a TiDB cluster: tiup playground nightly --db.binpath /Volumes/t7/code/tidb/bin/tidb-server
Create a partitioned table and insert some data:

#!/usr/bin/env -S cargo +nightly-2024-05-01 -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---
use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;

#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
    #[clap(short, long, help = "MySQL connection string")]
    database_url: String,
    #[clap(short, long, default_value = "3", help = "Number of partitions")]
    num_partitions: u32,
}

#[derive(Debug)]
struct TableRow {
    partition_key: u32,
    column1: String,
    column2: i32,
    column3: i32,
    column4: String,
}

#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
    let args = Args::parse();
    let pool = MySqlPoolOptions::new()
        .max_connections(5)
        .connect(&args.database_url)
        .await?;

    // Create partitioned table
    let create_table_query = format!(
        "CREATE TABLE IF NOT EXISTS t (
            partition_key INT,
            column1 VARCHAR(255),
            column2 INT,
            column3 INT,
            column4 VARCHAR(255)
        ) PARTITION BY RANGE (partition_key) (
            {}
        )",
        (0..args.num_partitions)
            .map(|i| format!("PARTITION p{} VALUES LESS THAN ({})", i, (i + 1) * 3000))
            .collect::<Vec<_>>()
            .join(",\n")
    );

    sqlx::query(&create_table_query).execute(&pool).await?;

    for partition in 0..args.num_partitions {
        for _ in 0..3000 {
            let row = TableRow {
                partition_key: partition * 3000 + 1,
                column1: Faker.fake::<String>(),
                column2: Faker.fake::<i32>(),
                column3: Faker.fake::<i32>(),
                column4: Faker.fake::<String>(),
            };

            sqlx::query(
                "INSERT INTO t (partition_key, column1, column2, column3, column4) 
                VALUES (?, ?, ?, ?, ?)",
            )
            .bind(row.partition_key)
            .bind(&row.column1)
            .bind(row.column2)
            .bind(row.column3)
            .bind(&row.column4)
            .execute(&pool)
            .await?;
        }
        println!("Inserted 3000 rows into partition {}", partition);
    }

    println!("Successfully inserted {} rows into the 't' table across {} partitions.", 
             3000 * args.num_partitions, args.num_partitions);

    Ok(())
}

./sql2.rs --database-url="mysql://root@localhost:4000/test"
warning: `package.edition` is unspecified, defaulting to `2021`
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.28s
     Running `/Users/hi-rustin/.cargo/target/6a/821756d0d91706/debug/sql2 '--database-url=mysql://root@localhost:4000/test'`
Inserted 3000 rows into partition 0
Inserted 3000 rows into partition 1
Inserted 3000 rows into partition 2
Successfully inserted 9000 rows into the 't' table across 3 partitions.

Check the tidb log:

[2024/07/30 15:19:36.107 +08:00] [INFO] [refresher.go:102] ["Auto analyze triggered"] [category=stats] [job="DynamicPartitionedTableAnalysisJob:\n\tAnalyzeType: analyzeDynamicPartition\n\tPartitions: p1, p2, p0\n\tPartitionIndexes: map[]\n\tSchema: test\n\tGlobal Table: t\n\tGlobal TableID: 104\n\tTableStatsVer: 2\n\tChangePercentage: 1.000000\n\tTableSize: 15000.00\n\tLastAnalysisDuration: 30m0s\n\tWeight: 1.376307\n"]
[2024/07/30 15:19:36.138 +08:00] [INFO] [analyze.go:401] ["use multiple sessions to save analyze results"] [sessionCount=2]

Manually analyze the table: analyze table t
Check the logs:

[2024/07/30 15:21:03.612 +08:00] [INFO] [analyze.go:401] ["use multiple sessions to save analyze results"] [sessionCount=2]

Set tidb_analyze_partition_concurrency to 1:

mysql> set global tidb_analyze_partition_concurrency = 1;
Query OK, 0 rows affected (0.01 sec)
mysql> select @@tidb_analyze_partition_concurrency;
+--------------------------------------+
| @@tidb_analyze_partition_concurrency |
+--------------------------------------+
| 1                                    |
+--------------------------------------+
1 row in set (0.00 sec)

Analyze table again: analyze table t
Check the logs:

[2024/07/30 15:23:54.322 +08:00] [INFO] [analyze.go:408] ["use single session to save analyze results"]

Insert more data to p1:

#!/usr/bin/env -S cargo +nightly-2024-05-01 -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---
use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;

#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
    #[clap(short, long, help = "MySQL connection string")]
    database_url: String,
}

#[derive(Debug)]
struct TableRow {
    partition_key: u32,
    column1: String,
    column2: i32,
    column3: i32,
    column4: String,
}

#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
    let args = Args::parse();
    let pool = MySqlPoolOptions::new()
        .max_connections(5)
        .connect(&args.database_url)
        .await?;

    // Insert 3000 rows into partition p1
    for _ in 0..3000 {
        let row = TableRow {
            partition_key: 3001, // This ensures the row goes into partition p1 (3000 < 3001 < 6000)
            column1: Faker.fake::<String>(),
            column2: Faker.fake::<i32>(),
            column3: Faker.fake::<i32>(),
            column4: Faker.fake::<String>(),
        };

        sqlx::query(
            "INSERT INTO t (partition_key, column1, column2, column3, column4) 
            VALUES (?, ?, ?, ?, ?)",
        )
        .bind(row.partition_key)
        .bind(&row.column1)
        .bind(row.column2)
        .bind(row.column3)
        .bind(&row.column4)
        .execute(&pool)
        .await?;
    }

    println!("Successfully inserted 3000 additional rows into partition p1 of the 't' table.");

    Ok(())
}

Check the logs again:

[2024/07/30 15:32:09.103 +08:00] [INFO] [refresher.go:102] ["Auto analyze triggered"] [category=stats] [job="DynamicPartitionedTableAnalysisJob:\n\tAnalyzeType: analyzeDynamicPartition\n\tPartitions: p1\n\tPartitionIndexes: map[]\n\tSchema: test\n\tGlobal Table: t\n\tGlobal TableID: 104\n\tTableStatsVer: 2\n\tChangePercentage: 0.800000\n\tTableSize: 75000.00\n\tLastAnalysisDuration: 12m32.998s\n\tWeight: 1.193767\n"]
[2024/07/30 15:32:09.120 +08:00] [INFO] [analyze.go:408] ["use single session to save analyze results"]

pkg/executor/analyze.go

Rustin170506

🔢 Self-check (PR reviewed by myself and ready for feedback.)

codecov · 2024-07-30T07:47:41Z

Codecov Report

Attention: Patch coverage is 54.54545% with 10 lines in your changes missing coverage. Please review.

Project coverage is 74.0203%. Comparing base (199bf90) to head (c2d1c7a).
Report is 8 commits behind head on master.

Additional details and impacted files

@@               Coverage Diff                @@
##             master     #55046        +/-   ##
================================================
+ Coverage   72.6158%   74.0203%   +1.4044%     
================================================
  Files          1565       1565                
  Lines        440229     442538      +2309     
================================================
+ Hits         319676     327568      +7892     
+ Misses       100711      94898      -5813     
- Partials      19842      20072       +230

Flag	Coverage Δ
integration	`47.0636% <54.5454%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
dumpling	`52.9567% <ø> (ø)`
parser	`∅ <ø> (∅)`
br	`46.3346% <ø> (+0.4856%)`	⬆️

pkg/executor/analyze.go

Rustin170506

🔢 Self-check (PR reviewed by myself and ready for feedback.)

Rustin170506 · 2024-07-31T07:23:43Z

Tested locally:

Create a TiDB cluster:

tiup playground nightly --db.binpath /Volumes/t7/code/tidb/bin/tidb-server

Create a partitioned table which has 20 partitions:

#!/usr/bin/env -S cargo +nightly -Zscript
---cargo
[dependencies]
clap = { version = "4.2", features = ["derive"] }
sqlx = { version = "0.7", features = ["runtime-tokio-rustls", "mysql"] }
tokio = { version = "1", features = ["full"] }
fake = { version = "2.5", features = ["derive"] }
---

use clap::Parser;
use fake::{Fake, Faker};
use sqlx::mysql::MySqlPoolOptions;

#[derive(Parser, Debug)]
#[clap(version)]
struct Args {
    #[clap(short, long, help = "MySQL connection string")]
    database_url: String,
}

#[derive(Debug)]
struct TableRow {
    partition_key: u32,
    column1: String,
    column2: i32,
    column3: i32,
    column4: String,
}

#[tokio::main]
async fn main() -> Result<(), sqlx::Error> {
    let args = Args::parse();

    let pool = MySqlPoolOptions::new()
        .max_connections(5)
        .connect(&args.database_url)
        .await?;

    // Create partitioned table if not exists
    sqlx::query(
        "CREATE TABLE IF NOT EXISTS t (
            partition_key INT NOT NULL,
            column1 VARCHAR(255) NOT NULL,
            column2 INT NOT NULL,
            column3 INT NOT NULL,
            column4 VARCHAR(255) NOT NULL
        ) PARTITION BY RANGE (partition_key) (
            PARTITION p0 VALUES LESS THAN (3000),
            PARTITION p1 VALUES LESS THAN (6000),
            PARTITION p2 VALUES LESS THAN (9000),
            PARTITION p3 VALUES LESS THAN (12000),
            PARTITION p4 VALUES LESS THAN (15000),
            PARTITION p5 VALUES LESS THAN (18000),
            PARTITION p6 VALUES LESS THAN (21000),
            PARTITION p7 VALUES LESS THAN (24000),
            PARTITION p8 VALUES LESS THAN (27000),
            PARTITION p9 VALUES LESS THAN (30000),
            PARTITION p10 VALUES LESS THAN (33000),
            PARTITION p11 VALUES LESS THAN (36000),
            PARTITION p12 VALUES LESS THAN (39000),
            PARTITION p13 VALUES LESS THAN (42000),
            PARTITION p14 VALUES LESS THAN (45000),
            PARTITION p15 VALUES LESS THAN (48000),
            PARTITION p16 VALUES LESS THAN (51000),
            PARTITION p17 VALUES LESS THAN (54000),
            PARTITION p18 VALUES LESS THAN (57000),
            PARTITION p19 VALUES LESS THAN (60000),
            PARTITION p20 VALUES LESS THAN (63000)
        )"
    )
    .execute(&pool)
    .await?;

    // Insert 3000 rows into each of the 20 partitions
    for partition in 1..=20 {
        let partition_key = partition * 3000 + 1; // This ensures each partition key is unique

        for _ in 0..3000 {
            let row = TableRow {
                partition_key, // Use the current partition key
                column1: Faker.fake::<String>(),
                column2: Faker.fake::<i32>(),
                column3: Faker.fake::<i32>(),
                column4: Faker.fake::<String>(),
            };

            sqlx::query(
                "INSERT INTO t (partition_key, column1, column2, column3, column4) 
                VALUES (?, ?, ?, ?, ?)"
            )
            .bind(row.partition_key)
            .bind(&row.column1)
            .bind(row.column2)
            .bind(row.column3)
            .bind(&row.column4)
            .execute(&pool)
            .await?;
        }

        println!("Successfully inserted 3000 rows into partition {} of the 't' table.", partition);
    }

    Ok(())
}

Set tidb_analyze_partition_concurrency to 30
Analyze the table:

mysql> analyze table t;
Query OK, 0 rows affected, 23 warnings (0.44 sec)

mysql> show warnings;
+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                                                                                                                                  |
+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1105 | No predicate column has been collected yet for table test.t, so only indexes and the columns composing the indexes will be analyzed                                                                                                                      |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p0, reason to use this rate is "TiDB assumes that the table is empty, use sample-rate=1"                                                                                     |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p1, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p2, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p3, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p4, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p5, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p6, reason to use this rate is "use min(1, 110000/3020) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p7, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p8, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p9, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                                |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p10, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p11, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p12, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p13, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p14, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p15, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p16, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p17, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p18, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p19, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Note    | 1105 | Analyze use auto adjusted sample rate 1.000000 for table test.t's partition p20, reason to use this rate is "use min(1, 110000/3000) as the sample-rate=1"                                                                                               |
| Warning | 1105 | Insufficient sessions to save analyze results. Consider increasing the 'analyze-partition-concurrency-quota' configuration to improve analyze performance. This value should typically be greater than or equal to 'tidb_analyze_partition_concurrency'. |
+---------+------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
23 rows in set (0.01 sec)

Rustin170506

🔢 Self-check (PR reviewed by myself and ready for feedback.)

elsa0520

LGTM

ti-chi-bot · 2024-07-31T08:03:44Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: elsa0520, time-and-fate

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [elsa0520,time-and-fate]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2024-07-31T08:03:47Z

[LGTM Timeline notifier]

Timeline:

2024-07-31 08:03:13.499793665 +0000 UTC m=+342909.779841727: ☑️ agreed by time-and-fate.
2024-07-31 08:03:46.197744772 +0000 UTC m=+342942.477793038: ☑️ agreed by elsa0520.

…gcap#55046) ref pingcap#55043

statistics: do not copy and paste the code for saving statistics

32bd831

ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 30, 2024

Rustin170506 added 2 commits July 30, 2024 15:10

refactor: revert

0adffc7

feat: add logs

9a36522

hawkingrei reviewed Jul 30, 2024

View reviewed changes

pkg/executor/analyze.go Show resolved Hide resolved

Rustin170506 requested review from time-and-fate and elsa0520 July 30, 2024 07:41

Rustin170506 commented Jul 30, 2024

View reviewed changes

elsa0520 reviewed Jul 30, 2024

View reviewed changes

pkg/executor/analyze.go Show resolved Hide resolved

Rustin170506 added 2 commits July 30, 2024 18:17

chore: add a TODO

bf2efce

chore: add a FIXME

8228f06

Rustin170506 requested a review from elsa0520 July 30, 2024 10:46

Rustin170506 added 2 commits July 30, 2024 18:47

chore: add a FIXME

7fa42e8

chore: add a FIXME

86833de

Rustin170506 commented Jul 30, 2024

View reviewed changes

Rustin170506 added 4 commits July 30, 2024 20:44

fix: record the killed error

8e8e788

fix: add more logs

1660ecb

fix: destroy

ac53c7d

fix: add the warning

904100c

Rustin170506 added 2 commits July 31, 2024 15:24

chore: better warning

29d4f92

chore: better warning

31f7092

Rustin170506 commented Jul 31, 2024

View reviewed changes

chore: better warning

15e365f

time-and-fate approved these changes Jul 31, 2024

View reviewed changes

ti-chi-bot bot added approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 31, 2024

elsa0520 approved these changes Jul 31, 2024

View reviewed changes

ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Jul 31, 2024

Rustin170506 added 2 commits July 31, 2024 16:09

refactor: better code

21884fb

refactor: better code

c2d1c7a

Rustin170506 added the sig/planner SIG: Planner label Jul 31, 2024

ti-chi-bot bot merged commit 73e3425 into pingcap:master Jul 31, 2024
23 checks passed

Rustin170506 deleted the rustin-patch-debt branch July 31, 2024 10:35

hawkingrei pushed a commit to hawkingrei/tidb that referenced this pull request Aug 1, 2024

statistics: do not copy and paste the code for saving statistics (pin…

7bd9f8f

…gcap#55046) ref pingcap#55043

Rustin170506 mentioned this pull request Aug 2, 2024

Statistics Tech Debt #55043

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistics: do not copy and paste the code for saving statistics #55046

statistics: do not copy and paste the code for saving statistics #55046

Rustin170506 commented Jul 30, 2024 •

edited

Loading

Rustin170506 commented Jul 30, 2024 •

edited

Loading

Rustin170506 left a comment

codecov bot commented Jul 30, 2024 •

edited

Loading

Rustin170506 left a comment

Rustin170506 commented Jul 31, 2024 •

edited

Loading

Rustin170506 left a comment

elsa0520 left a comment

ti-chi-bot bot commented Jul 31, 2024

ti-chi-bot bot commented Jul 31, 2024

statistics: do not copy and paste the code for saving statistics #55046

statistics: do not copy and paste the code for saving statistics #55046

Conversation

Rustin170506 commented Jul 30, 2024 • edited Loading

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

Rustin170506 commented Jul 30, 2024 • edited Loading

Rustin170506 left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 30, 2024 • edited Loading

Codecov Report

Rustin170506 left a comment

Choose a reason for hiding this comment

Rustin170506 commented Jul 31, 2024 • edited Loading

Rustin170506 left a comment

Choose a reason for hiding this comment

elsa0520 left a comment

Choose a reason for hiding this comment

ti-chi-bot bot commented Jul 31, 2024

ti-chi-bot bot commented Jul 31, 2024

[LGTM Timeline notifier]

Rustin170506 commented Jul 30, 2024 •

edited

Loading

Rustin170506 commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading

Rustin170506 commented Jul 31, 2024 •

edited

Loading