privilege, domain: reduce the memory jitter of privilege reload activity for 2M users (#59487) #59812
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an automated cherry-pick of #59487
What problem does this PR solve?
Issue Number: close #59403, ref #55563
Problem Summary:
I create 2M users, and for example, make 10% or 50% of the users active (in-memory).
Then I observe that even when the workload is gone, the tidb-server memory usage jitter periodically.
For example, this one:
What changed and how does it work?
There are several changes.
loadAll()
is used when the active user count > 1024 ... that's the direct root cause of the jitter.That's because
loadSomeUsers()
does not support tooooo many filter condition.The SQL "select * from user where user = 'a' or user = 'b' or user = 'c' or ..." works poorly when there are too many or conditions. This is a known issue #43885 that we write the code the recursive way and cause stackoverflow.
So the first change is to enhance
loadSomeUsers()
to support unlimited user count.It works like this:
SQLExecutor.ExecuteInternal()
streaming API to replaceRestrictedSQLExec.ExecRestrictedSQL()
APIThe problem of
ExecRestrictedSQL
is that the API design not fit here.Its
drainRecordSet
return[]chunk.Row
as result and here it can be 2M huge array.What we need is a streaming API, doing the filter condition at the same time rather than take the whole data set and filter out later.
I suspect there is a leak like #59403, the decode function may using a shallow copy and it references the chunk data.
So the whole chunk cannot be freed.
Check List
Tests
The memory usage now:
The privilege reload activity is every 10min and you can see that the max memory usage is much less than before:
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.