-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: refactor cost model formulas and constants #10581
Conversation
/rebuild |
Codecov Report
@@ Coverage Diff @@
## master #10581 +/- ##
================================================
- Coverage 81.4101% 81.2307% -0.1795%
================================================
Files 426 426
Lines 92513 92028 -485
================================================
- Hits 75315 74755 -560
- Misses 11826 11904 +78
+ Partials 5372 5369 -3 |
/bench |
5c4dfa9
to
b97ea8e
Compare
/rebuild |
/run-all-tests |
/run-common-test tidb-test=pr/840 |
1 similar comment
/run-common-test tidb-test=pr/840 |
/run-all-tests tidb-test=pr/840 |
/run-all-tests tidb-test=pr/840 |
1 similar comment
/run-all-tests tidb-test=pr/840 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
f949384
to
0617428
Compare
colHist, ok := coll.Columns[col.UniqueID] | ||
// Normally this would not happen, it is for compatibility with old version stats which | ||
// does not include TotColSize. | ||
if !ok || (colHist.TotColSize == 0 && (colHist.NullCount != coll.Count)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can calculate (colHist.TotColSize == 0 && (colHist.NullCount != coll.Count))
once outside the for
loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get a valid colHist
to make this computation check, if we move this check outside the for
loop, the code is pretty ugly.
copTask := &copTask{ | ||
tablePlan: ts, | ||
indexPlanFinished: true, | ||
cst: scanFactor * rowSize * 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about replacing 1.0
with ts.stats.RowCount
? That will be much clearer.
rCount := rTask.count() | ||
if len(p.RightConditions) > 0 { | ||
cpuCost += lCount * rCount * cpuFactor | ||
rCount *= selectionFactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe rCount
is incorrect when we can use index scan on the inner side table, in which condition the scan range is decided by the correlated outer side join key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we cannot know the selectivity of the outer key until execution.
cpuCost += probeCost + (innerConcurrency+1.0)*concurrencyFactor | ||
// Memory cost of hash tables for inner rows. The computed result is the upper bound, | ||
// since the executor is pipelined and not all workers are always in full load. | ||
memoryCost := innerConcurrency * (batchSize * distinctFactor) * innerCnt * memoryFactor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we consider avg row size for each inner row?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The row in memory would have different size compared with its representation in disk and network. Currently, we are using a very small default memoryFactor
in order to choose the fastest plan which makes full utilization of resources. To make cost model friendly for memory management, we need to consider row size here indeed. We can leave this to another separate PR later?
/rebuild |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-all-tests |
@eurekaka merge failed. |
/run-all-tests tidb-test=pr/840 |
What problem does this PR solve?
Our current cost model is too naive to pick out the physical plans we prefer in some scenarios, for example:
Besides, cost computings for different operators are not uniform now: some operators consider memory cost, others do not; some operators consider operator parallelism, others do not;
What is changed and how it works?
This PR tries to
Check List
Tests
Code changes
Side effects
Related changes