-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: Prefer index with zone constraints that most closely match locality #35306
Conversation
Awesome! People are going to be really excited about this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)
pkg/ccl/logictestccl/testdata/logic_test/zone, line 8 at r2 (raw file):
statement ok CREATE TABLE t ( k INT PRIMARY KEY,
[nit] avoid tabs
pkg/sql/opt/cat/zone.go, line 67 at r1 (raw file):
// IsRequired is true if this is a required constraint, or false if this is // a prohibited constraint (signified by initial + or - character). IsRequired() bool
[nit] This method is not ideal because "not required" doesn't in general imply "prohibited"; but I don't have a good suggestion (an enum is probably overkill).
pkg/sql/opt/cat/zone.go, line 76 at r1 (raw file):
} // FormatZone nicely formats a catalog table using a treeprinter for
[nit] a zone
pkg/sql/opt/testutils/opttester/opt_tester.go, line 237 at r2 (raw file):
// // - locality: used to set the locality of the node that plans the query. This // can effect costing when there are multiple possible indexes to choose
[nit] affect
pkg/sql/opt/testutils/testcat/set_zone_config.go, line 29 at r1 (raw file):
// // Supported commands: // - INJECT STATISTICS: imports table statistics from a JSON object.
[nit] copy-pasta
pkg/sql/opt/xform/coster.go, line 105 at r2 (raw file):
randIOCostFactor = 4 // latencyCostFactor represents the throughput impact of doing scans on index
[nit] on indexes / on an index
pkg/sql/opt/xform/coster.go, line 491 at r2 (raw file):
// indicating 0% and 1.0 indicating 100%. In order to match, each successive // locality tier must match at least one REQUIRED constraint and not match any // PROHIBITED constraints. If a locality tier does not match, then tiers after
[nit] I'd add "Locality tiers are hierarchical, so if a tier does not match ..". Perhaps mention that the constraints themselves are not assumed to be in any particular order.
pkg/sql/opt/xform/coster.go, line 547 at r2 (raw file):
} return 1.0 * float64(matchCount) / float64(len(c.locality.Tiers))
[nit] 1.0 *
does nothing
pkg/sql/opt/xform/coster.go, line 574 at r2 (raw file):
// because that is the amount of data that we could potentially transfer over // the network. return memo.Cost(numCols+numScannedCols) * costFactor
It's a little strange that the latency cost scales with the number of columns.
pkg/sql/opt/xform/testdata/coster/zone, line 3 at r2 (raw file):
exec-ddl CREATE TABLE abc ( a INT PRIMARY KEY,
[nit] tabs
pkg/sql/opt/xform/testdata/coster/zone, line 27 at r2 (raw file):
exec-ddl CREATE TABLE xy ( x INT PRIMARY KEY,
[nit] tabs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/cat/zone.go, line 67 at r1 (raw file):
Previously, RaduBerinde wrote…
[nit] This method is not ideal because "not required" doesn't in general imply "prohibited"; but I don't have a good suggestion (an enum is probably overkill).
Yeah, I went back and forth on that, including just using the raw structs from the config
package (in which case we'd use the existing enum). I don't really have a strong preference, but did dislike creating an identical enum...
pkg/sql/opt/xform/coster.go, line 574 at r2 (raw file):
Previously, RaduBerinde wrote…
It's a little strange that the latency cost scales with the number of columns.
I'd argue that it may not even scale with the number of rows, in terms of impact on throughput. I figured I'd keep it simple for now, and just let it no more than double the CPU cost of a scan. I think we need more thinking and research to figure out a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/xform/coster.go, line 574 at r2 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
I'd argue that it may not even scale with the number of rows, in terms of impact on throughput. I figured I'd keep it simple for now, and just let it no more than double the CPU cost of a scan. I think we need more thinking and research to figure out a better approach.
My other thought is that amount of data sent over longer stretches of the network may indeed impact overall system throughput. In that case, fatter rows would matter.
bors r+ |
Build failed |
Add a Zone method to the cat.Index interface that associates it with the zone it belongs to. Add new zone-related interfaces to the cat package. Implement the new interfaces and methods in both opt_catalog.go and test_catalog.go. Release note: None
Customers can create multiple indexes that are identical, except that they have different locality constraints. This commit teaches the optimizer to prefer the index that most closely matches the locality of the gateway node that is planning the query. This enables scenarios where reference data like a zip code table can be replicated to different regions, and queries will use the copy in the same region. Release note: None
bors r+ |
35306: opt: Prefer index with zone constraints that most closely match locality r=andy-kimball a=andy-kimball Customers can create multiple indexes that are identical, except that they have different locality constraints. This commit teaches the optimizer to prefer the index that most closely matches the locality of the gateway node that is planning the query. This enables scenarios where reference data like a zip code table can be replicated to different regions, and queries will use the copy in the same region. Co-authored-by: Andrew Kimball <[email protected]>
Build succeeded |
Customers can create multiple indexes that are identical, except that they
have different locality constraints. This commit teaches the optimizer to
prefer the index that most closely matches the locality of the gateway node
that is planning the query. This enables scenarios where reference data like
a zip code table can be replicated to different regions, and queries will
use the copy in the same region.