Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dxf: add tidb_max_dist_task_nodes to specify max node count #58937

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

tangenta
Copy link
Contributor

@tangenta tangenta commented Jan 15, 2025

What problem does this PR solve?

Issue Number: close #58944

Problem Summary: See #58944

What changed and how does it work?

  • Add system variable tidb_max_dist_task_nodes
  • Add new column max_node_count to mysql.tidb_global_task
  • Add simple tests for balance subtasks.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
    image
    mysql> admin show ddl jobs;
    +--------+---------+--------------------------+--------------+--------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+--------+-------------------------------+
    | JOB_ID | DB_NAME | TABLE_NAME               | JOB_TYPE     | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME                | START_TIME                 | END_TIME                   | STATE  | COMMENTS                      |
    +--------+---------+--------------------------+--------------+--------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+--------+-------------------------------+
    |    118 | test    | sbtest1                  | add index    | public       |         2 |      112 |  10000000 | 2025-02-10 03:09:51.852000 | 2025-02-10 03:09:51.852000 | 2025-02-10 03:10:06.053000 | synced | ingest, DXF, max_node_count=5 |
    |    117 | test    | sbtest1                  | drop index   | none         |         2 |      112 |         0 | 2025-02-10 03:09:48.003000 | 2025-02-10 03:09:48.052000 | 2025-02-10 03:09:48.052000 | synced |                               |
    |    116 | test    | sbtest1                  | add index    | public       |         2 |      112 |  10000000 | 2025-02-10 03:07:56.502000 | 2025-02-10 03:07:56.552000 | 2025-02-10 03:08:10.752000 | synced | ingest, DXF, max_node_count=3 |
    
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 15, 2025
@tangenta tangenta changed the title disttask: support tidb_max_dist_task_nodes to spcify node count for DXF disttask: support tidb_max_dist_task_nodes to spcify node count for DXF Jan 15, 2025
Copy link

tiprow bot commented Jan 15, 2025

Hi @tangenta. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

Attention: Patch coverage is 71.21212% with 19 lines in your changes missing coverage. Please review.

Project coverage is 74.7723%. Comparing base (d0d4876) to head (6762c1c).
Report is 53 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #58937        +/-   ##
================================================
+ Coverage   73.0457%   74.7723%   +1.7266%     
================================================
  Files          1689       1736        +47     
  Lines        467009     480976     +13967     
================================================
+ Hits         341130     359637     +18507     
+ Misses       104894      98766      -6128     
- Partials      20985      22573      +1588     
Flag Coverage Δ
integration 48.7782% <35.3846%> (?)
unit 72.2062% <71.2121%> (-0.0271%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 60.3559% <ø> (+14.9248%) ⬆️

@tangenta tangenta changed the title disttask: support tidb_max_dist_task_nodes to spcify node count for DXF disttask: add tidb_max_dist_task_nodes to specify node count for DXF Jan 16, 2025
@D3Hunter D3Hunter changed the title disttask: add tidb_max_dist_task_nodes to specify node count for DXF dxf: add tidb_max_dist_task_nodes to specify max node count Jan 22, 2025
Copy link
Member

@CbcWestwolf CbcWestwolf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

return
}
doReentrantDDL(s, "ALTER TABLE mysql.tidb_global_task ADD COLUMN max_node_count INT AFTER `modify_params`;", infoschema.ErrColumnExists)
doReentrantDDL(s, "ALTER TABLE mysql.tidb_global_task_history ADD COLUMN max_node_count INT AFTER `modify_params`;", infoschema.ErrColumnExists)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify the default value for this column?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value should be zero or a large integer so that it does not affect the behavior of an existing task.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can specify the default value be 0, so there is no need to check nil for tasks before upgrade

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 22, 2025
Copy link

ti-chi-bot bot commented Jan 22, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-01-22 12:17:02.805148029 +0000 UTC m=+269550.136067433: ☑️ agreed by CbcWestwolf.

pkg/ddl/executor.go Outdated Show resolved Hide resolved
pkg/disttask/framework/proto/task.go Outdated Show resolved Hide resolved
return
}
doReentrantDDL(s, "ALTER TABLE mysql.tidb_global_task ADD COLUMN max_node_count INT AFTER `modify_params`;", infoschema.ErrColumnExists)
doReentrantDDL(s, "ALTER TABLE mysql.tidb_global_task_history ADD COLUMN max_node_count INT AFTER `modify_params`;", infoschema.ErrColumnExists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can specify the default value be 0, so there is no need to check nil for tasks before upgrade

pkg/disttask/framework/storage/converter.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot bot added component/dumpling This is related to Dumpling of TiDB. sig/planner SIG: Planner and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 7, 2025
@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 7, 2025
@tangenta tangenta force-pushed the disttask-max-node-cnt branch from cafcf08 to d47d494 Compare February 7, 2025 10:01
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 7, 2025
@tangenta tangenta removed component/dumpling This is related to Dumpling of TiDB. sig/planner SIG: Planner labels Feb 7, 2025
Copy link

ti-chi-bot bot commented Feb 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: CbcWestwolf
Once this PR has been reviewed and has the lgtm label, please assign bornchanger, gmhdbjd, yudongusa for approval, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 10, 2025
Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

29/35

pkg/session/bootstrap.go Outdated Show resolved Hide resolved
pkg/session/bootstrap.go Outdated Show resolved Hide resolved
pkg/ddl/executor.go Outdated Show resolved Hide resolved
pkg/disttask/framework/scheduler/balancer.go Outdated Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after change this might incorrectly takes some node as dead:

  • we have 4 nodes, in order of N1/N2.... task T have 2 subtasks, and they are scheduled to N3/N4 initially, and are running
  • when N1/N2 are available, this PR will choose to use them to schedule subtasks of T, and in L163(have the log dead node or not have enough slots, schedule subtasks away) of doBalanceSubtasks N3/N4 will be taken as dead node, and those running subtasks will be scheduled away

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support specifying DXF node count
3 participants