-
-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge joins populate join stats #2247
Conversation
335d4c7
to
2a4324a
Compare
Joins with a merge join option and table statistics for both merge join indexes will use the histogram merging logic for estimating join cardinality.
211186b
to
8775f82
Compare
if grp == nil { | ||
return m.NewExprGroup(rel) | ||
grp = m.NewExprGroup(rel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be simplified as:
rel.Injective = isInjectiveMerge(rel, leftCompareExprs, rightCompareExprs)
if grp == nil {
return m.NewExprGroup(rel)
}
rel.CmpCnt = len(leftCompareExprs)
...
func floatMax(i, j float64) float64 { | ||
if i > j { | ||
return i | ||
func InterpolateNewCounts(from, to sql.Statistic) sql.Statistic { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring for this funciton?
var mergeStats sql.Statistic | ||
var n RelExpr = rel | ||
var done bool | ||
for n != nil && !done { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this end up being O(N^2) for a group of N expressions? (Like, if it gets called for each expression as it gets added to the group, then the first will have 1 iteration, the second 2, and so on...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CardMemoGroups
is called once per join planning I think, so at most this should only traverse the group once. It could be shorter even, if we find an injective lookup there's no more info to check.
if lIdx, ok := n.Left.First.(*IndexScan); ok { | ||
leftChildStats = lIdx.Stats | ||
} | ||
if rIdx, ok := n.Right.First.(*IndexScan); ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these aren't index scans, we just don't use their stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge join children have index scans when we convert a filter into a range scan. If there are costed range scans, we either use the filtered index scan (if it's the same as the merge join), or apply the selectivity estimate from the range scan to the merge index buckets.
sql/memo/rel_props.go
Outdated
rightChildStats = rIdx.Stats | ||
} | ||
|
||
// single prefix always safe for merge join |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this comment elaborate? It's not clear how it relates to the following line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a todo maybe. I had a hard time finding how many index columns were equality prefix comparable between the two indexes, and recording MergeJoin.CmpCnt
had some issues when I was testing. One is always safe so I went with that.
switch n := n.(type) { | ||
case *LookupJoin: | ||
injective = injective || n.Injective | ||
case *MergeJoin: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section could use some more comments explaining why certain checks / assignments are done.
When merge joins are a join operator for a memo group, use the two indexes in the merge to estimate the join cardinality. Small updates so that join cardinality estimates work in the coster. A few tests that make use of join statistics. The tests are affected both by stat estimates and costing methodology. It's a bit hard to separate the two, since more accurate stat estimates so often identify issues with costing. The join statistic tests are subject to shifting based on whether the smallest table is estimated to be smaller than the smallest join cardinality estimate. Better tests would be less subject to noise. Tests for avoiding anti-patterns for specific join operators would also be useful.