Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: Prefer index with zone constraints that most closely match locality #35306

Merged
merged 2 commits into from
Mar 5, 2019

Conversation

andy-kimball
Copy link
Contributor

Customers can create multiple indexes that are identical, except that they
have different locality constraints. This commit teaches the optimizer to
prefer the index that most closely matches the locality of the gateway node
that is planning the query. This enables scenarios where reference data like
a zip code table can be replicated to different regions, and queries will
use the copy in the same region.

@andy-kimball andy-kimball requested review from justinj, rytaft, RaduBerinde and a team March 1, 2019 07:27
@andy-kimball andy-kimball requested a review from a team as a code owner March 1, 2019 07:27
@andy-kimball andy-kimball requested review from a team March 1, 2019 07:27
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@danhhz
Copy link
Contributor

danhhz commented Mar 1, 2019

Awesome! People are going to be really excited about this

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Cool work!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @justinj, @RaduBerinde, and @rytaft)


pkg/ccl/logictestccl/testdata/logic_test/zone, line 8 at r2 (raw file):

statement ok
CREATE TABLE t (
	k INT PRIMARY KEY,

[nit] avoid tabs


pkg/sql/opt/cat/zone.go, line 67 at r1 (raw file):

	// IsRequired is true if this is a required constraint, or false if this is
	// a prohibited constraint (signified by initial + or - character).
	IsRequired() bool

[nit] This method is not ideal because "not required" doesn't in general imply "prohibited"; but I don't have a good suggestion (an enum is probably overkill).


pkg/sql/opt/cat/zone.go, line 76 at r1 (raw file):

}

// FormatZone nicely formats a catalog table using a treeprinter for

[nit] a zone


pkg/sql/opt/testutils/opttester/opt_tester.go, line 237 at r2 (raw file):

//
//  - locality: used to set the locality of the node that plans the query. This
//    can effect costing when there are multiple possible indexes to choose

[nit] affect


pkg/sql/opt/testutils/testcat/set_zone_config.go, line 29 at r1 (raw file):

//
// Supported commands:
//  - INJECT STATISTICS: imports table statistics from a JSON object.

[nit] copy-pasta


pkg/sql/opt/xform/coster.go, line 105 at r2 (raw file):

	randIOCostFactor = 4

	// latencyCostFactor represents the throughput impact of doing scans on index

[nit] on indexes / on an index


pkg/sql/opt/xform/coster.go, line 491 at r2 (raw file):

// indicating 0% and 1.0 indicating 100%. In order to match, each successive
// locality tier must match at least one REQUIRED constraint and not match any
// PROHIBITED constraints. If a locality tier does not match, then tiers after

[nit] I'd add "Locality tiers are hierarchical, so if a tier does not match ..". Perhaps mention that the constraints themselves are not assumed to be in any particular order.


pkg/sql/opt/xform/coster.go, line 547 at r2 (raw file):

	}

	return 1.0 * float64(matchCount) / float64(len(c.locality.Tiers))

[nit] 1.0 * does nothing


pkg/sql/opt/xform/coster.go, line 574 at r2 (raw file):

	// because that is the amount of data that we could potentially transfer over
	// the network.
	return memo.Cost(numCols+numScannedCols) * costFactor

It's a little strange that the latency cost scales with the number of columns.


pkg/sql/opt/xform/testdata/coster/zone, line 3 at r2 (raw file):

exec-ddl
CREATE TABLE abc (
	a INT PRIMARY KEY,

[nit] tabs


pkg/sql/opt/xform/testdata/coster/zone, line 27 at r2 (raw file):

exec-ddl
CREATE TABLE xy (
	x INT PRIMARY KEY,

[nit] tabs

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/cat/zone.go, line 67 at r1 (raw file):

Previously, RaduBerinde wrote…

[nit] This method is not ideal because "not required" doesn't in general imply "prohibited"; but I don't have a good suggestion (an enum is probably overkill).

Yeah, I went back and forth on that, including just using the raw structs from the config package (in which case we'd use the existing enum). I don't really have a strong preference, but did dislike creating an identical enum...


pkg/sql/opt/xform/coster.go, line 574 at r2 (raw file):

Previously, RaduBerinde wrote…

It's a little strange that the latency cost scales with the number of columns.

I'd argue that it may not even scale with the number of rows, in terms of impact on throughput. I figured I'd keep it simple for now, and just let it no more than double the CPU cost of a scan. I think we need more thinking and research to figure out a better approach.

Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)


pkg/sql/opt/xform/coster.go, line 574 at r2 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I'd argue that it may not even scale with the number of rows, in terms of impact on throughput. I figured I'd keep it simple for now, and just let it no more than double the CPU cost of a scan. I think we need more thinking and research to figure out a better approach.

My other thought is that amount of data sent over longer stretches of the network may indeed impact overall system throughput. In that case, fatter rows would matter.

@andy-kimball
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 4, 2019

Build failed

Add a Zone method to the cat.Index interface that associates it with the
zone it belongs to. Add new zone-related interfaces to the cat package.
Implement the new interfaces and methods in both opt_catalog.go and
test_catalog.go.

Release note: None
Customers can create multiple indexes that are identical, except that they
have different locality constraints. This commit teaches the optimizer to
prefer the index that most closely matches the locality of the gateway node
that is planning the query. This enables scenarios where reference data like
a zip code table can be replicated to different regions, and queries will
use the copy in the same region.

Release note: None
@andy-kimball
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Mar 5, 2019
35306: opt: Prefer index with zone constraints that most closely match locality r=andy-kimball a=andy-kimball

Customers can create multiple indexes that are identical, except that they
have different locality constraints. This commit teaches the optimizer to
prefer the index that most closely matches the locality of the gateway node
that is planning the query. This enables scenarios where reference data like
a zip code table can be replicated to different regions, and queries will
use the copy in the same region.

Co-authored-by: Andrew Kimball <[email protected]>
@craig
Copy link
Contributor

craig bot commented Mar 5, 2019

Build succeeded

@craig craig bot merged commit 8ec4ee5 into cockroachdb:master Mar 5, 2019
@andy-kimball andy-kimball deleted the latency branch March 5, 2019 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants