privilege, domain: reduce the memory jitter of privilege reload activity for 2M users (#59487) #59812

ti-chi-bot · 2025-02-27T06:54:56Z

This is an automated cherry-pick of #59487

What problem does this PR solve?

Issue Number: close #59403, ref #55563

Problem Summary:

I create 2M users, and for example, make 10% or 50% of the users active (in-memory).

Then I observe that even when the workload is gone, the tidb-server memory usage jitter periodically.
For example, this one:

What changed and how does it work?

There are several changes.

Before this PR, loadAll() is used when the active user count > 1024 ... that's the direct root cause of the jitter.

That's because loadSomeUsers() does not support tooooo many filter condition.
The SQL "select * from user where user = 'a' or user = 'b' or user = 'c' or ..." works poorly when there are too many or conditions. This is a known issue #43885 that we write the code the recursive way and cause stackoverflow.

So the first change is to enhance loadSomeUsers() to support unlimited user count.

It works like this:

if user count > 1024, use the 'or user = xx' filter condition to construct the SQL
otherwise, use load all SQL but do the 'user = xx' filter condition in the user space.

Use this SQLExecutor.ExecuteInternal() streaming API to replace RestrictedSQLExec.ExecRestrictedSQL() API

The problem of ExecRestrictedSQL is that the API design not fit here.
Its drainRecordSet return []chunk.Row as result and here it can be 2M huge array.
What we need is a streaming API, doing the filter condition at the same time rather than take the whole data set and filter out later.

Use deep copy in the loadTable decode function.

I suspect there is a leak like #59403, the decode function may using a shallow copy and it references the chunk data.
So the whole chunk cannot be freed.

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)

The memory usage now:

The privilege reload activity is every 10min and you can see that the max memory usage is much less than before:

No need to test
- I checked and no code files have been changed.

Side effects

Performance regression: Consumes more CPU
Performance regression: Consumes more Memory
Breaking backward compatibility

Documentation

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot · 2025-02-27T06:54:59Z

@tiancaiamao This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

ti-chi-bot · 2025-02-27T06:55:05Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign wjhuang2016 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS
pkg/domain/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2025-02-27T07:03:55Z

@ti-chi-bot: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
idc-jenkins-ci-tidb/unit-test	`ad6455a`	link	true	`/test unit-test`
idc-jenkins-ci-tidb/check_dev_2	`ad6455a`	link	true	`/test check-dev2`
idc-jenkins-ci-tidb/check_dev	`ad6455a`	link	true	`/test check-dev`
idc-jenkins-ci-tidb/mysql-test	`ad6455a`	link	true	`/test mysql-test`
idc-jenkins-ci-tidb/build	`ad6455a`	link	true	`/test build`
pull-br-integration-test	`ad6455a`	link	true	`/test pull-br-integration-test`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

tiancaiamao · 2025-02-27T11:17:40Z

Should not cherry-pick 8.5

This is an automated cherry-pick of pingcap#59487

ad6455a

Signed-off-by: ti-chi-bot <[email protected]>

ti-chi-bot mentioned this pull request Feb 27, 2025

privilege, domain: reduce the memory jitter of privilege reload activity for 2M users #59487

Merged

13 tasks

ti-chi-bot assigned tiancaiamao Feb 27, 2025

ti-chi-bot bot added do-not-merge/cherry-pick-not-approved cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Feb 27, 2025

tiancaiamao closed this Feb 27, 2025

ti-chi-bot bot added do-not-merge/cherry-pick-not-approved cherry-pick-approved Cherry pick PR approved by release team. and removed cherry-pick-approved Cherry pick PR approved by release team. do-not-merge/cherry-pick-not-approved labels Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

privilege, domain: reduce the memory jitter of privilege reload activity for 2M users (#59487) #59812

privilege, domain: reduce the memory jitter of privilege reload activity for 2M users (#59487) #59812

ti-chi-bot commented Feb 27, 2025

ti-chi-bot commented Feb 27, 2025

ti-chi-bot bot commented Feb 27, 2025

ti-chi-bot bot commented Feb 27, 2025

tiancaiamao commented Feb 27, 2025

privilege, domain: reduce the memory jitter of privilege reload activity for 2M users (#59487) #59812

privilege, domain: reduce the memory jitter of privilege reload activity for 2M users (#59487) #59812

Conversation

ti-chi-bot commented Feb 27, 2025

What problem does this PR solve?

What changed and how does it work?

Check List

Release note

ti-chi-bot commented Feb 27, 2025

ti-chi-bot bot commented Feb 27, 2025

ti-chi-bot bot commented Feb 27, 2025

tiancaiamao commented Feb 27, 2025