Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

privilege, domain: reduce the memory jitter of privilege reload activity for 2M users #59487

Merged
merged 36 commits into from
Feb 27, 2025

Conversation

tiancaiamao
Copy link
Contributor

@tiancaiamao tiancaiamao commented Feb 12, 2025

What problem does this PR solve?

Issue Number: close #59403, ref #55563

Problem Summary:

I create 2M users, and for example, make 10% or 50% of the users active (in-memory).

Then I observe that even when the workload is gone, the tidb-server memory usage jitter periodically.
For example, this one:

image

What changed and how does it work?

There are several changes.

  1. Before this PR, loadAll() is used when the active user count > 1024 ... that's the direct root cause of the jitter.

That's because loadSomeUsers() does not support tooooo many filter condition.
The SQL "select * from user where user = 'a' or user = 'b' or user = 'c' or ..." works poorly when there are too many or conditions. This is a known issue #43885 that we write the code the recursive way and cause stackoverflow.

So the first change is to enhance loadSomeUsers() to support unlimited user count.

It works like this:

  • if user count > 1024, use the 'or user = xx' filter condition to construct the SQL
  • otherwise, use load all SQL but do the 'user = xx' filter condition in the user space.
  1. Use this SQLExecutor.ExecuteInternal() streaming API to replace RestrictedSQLExec.ExecRestrictedSQL() API

The problem of ExecRestrictedSQL is that the API design not fit here.
Its drainRecordSet return []chunk.Row as result and here it can be 2M huge array.
What we need is a streaming API, doing the filter condition at the same time rather than take the whole data set and filter out later.

  1. Use deep copy in the loadTable decode function.

I suspect there is a leak like #59403, the decode function may using a shallow copy and it references the chunk data.
So the whole chunk cannot be freed.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)

The memory usage now:

image

The privilege reload activity is every 10min and you can see that the max memory usage is much less than before:

image

  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented Feb 12, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 12, 2025
Copy link

tiprow bot commented Feb 12, 2025

Hi @tiancaiamao. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot removed the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 12, 2025
@tiancaiamao
Copy link
Contributor Author

/test pull-br-integration-test

Copy link

tiprow bot commented Feb 17, 2025

@tiancaiamao: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test pull-br-integration-test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiancaiamao
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Feb 17, 2025

@tiancaiamao: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiancaiamao
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 17, 2025
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 25, 2025
Copy link

ti-chi-bot bot commented Feb 25, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-02-15 09:25:24.867288698 +0000 UTC m=+694167.263510785: ☑️ agreed by CbcWestwolf.
  • 2025-02-25 05:54:27.451768251 +0000 UTC m=+335215.404926517: ☑️ agreed by lcwangchao.

@tiancaiamao
Copy link
Contributor Author

/test check-dev2

Copy link

tiprow bot commented Feb 26, 2025

@tiancaiamao: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiancaiamao
Copy link
Contributor Author

/test check-dev2

Copy link

tiprow bot commented Feb 26, 2025

@tiancaiamao: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@tiancaiamao
Copy link
Contributor Author

/test check-dev2

Copy link

tiprow bot commented Feb 27, 2025

@tiancaiamao: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test check-dev2

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot bot added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Feb 27, 2025
Copy link
Contributor

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

for domain part

Copy link

ti-chi-bot bot commented Feb 27, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CbcWestwolf, lance6716, lcwangchao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Feb 27, 2025
@ti-chi-bot ti-chi-bot bot merged commit b61e0e1 into pingcap:master Feb 27, 2025
24 checks passed
ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Feb 27, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #59812.
But this PR has conflicts, please resolve them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leak of LoadPrivilegeLoop?
5 participants