Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: add SHOW RANGES statements to sql_split_syntax #14366

Merged
merged 1 commit into from
Apr 13, 2017

Conversation

RaduBerinde
Copy link
Member

@RaduBerinde RaduBerinde commented Mar 24, 2017

While writing tests for TESTING_RELOCATE, I found myself implementing a
test-only function that returns information for all ranges in a table or index.
It seems much more useful to provide it through a statement, and use logic test
infrastructure for writing tests.


This change is Reviewable

@knz
Copy link
Contributor

knz commented Mar 26, 2017

Reviewed 1 of 1 files at r1.
Review status: all files reviewed at latest revision, 1 unresolved discussion, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 188 at r1 (raw file):

```sql
SHOW RANGES TABLE t
  1. nit: it's customary in SQL to add a "FROM" keyword in there to read more naturally: "SHOW RANGES FROM TABLE".

  2. add a comment here (and perhaps earlier in this RFC) about what you expect to happen for interleaved tables.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

docs/RFCS/sql_split_syntax.md, line 188 at r1 (raw file):

Previously, knz (kena) wrote…
  1. nit: it's customary in SQL to add a "FROM" keyword in there to read more naturally: "SHOW RANGES FROM TABLE".

  2. add a comment here (and perhaps earlier in this RFC) about what you expect to happen for interleaved tables.

Added FROM. Though perhaps SHOW RANGES FOR TABLE would be more natural?

Added notes on interleaved tables (the semantics are fairly straightforward). I'll make sure to add some tests with those in the implementation.


Comments from Reviewable

@maddyblue
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

```sql
SHOW RANGES FROM TABLE t

How do we feel about SELECT * FROM crdb_internal.ranges? This would allow for multiple ranges to be selected at once. It could include database, table, and index columns for more filtering. This is really useful to the dump command, which could switch from a page-based fetch to a range-based fetch, and get all the ranges in a single request instead of one per table.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, mjibson (Matt Jibson) wrote…

How do we feel about SELECT * FROM crdb_internal.ranges? This would allow for multiple ranges to be selected at once. It could include database, table, and index columns for more filtering. This is really useful to the dump command, which could switch from a page-based fetch to a range-based fetch, and get all the ranges in a single request instead of one per table.

I think there is a lot more complexity to doing something like that. For example, there will need to be a bunch of custom code to avoid scanning ALL the ranges when we're only interested in a table.

I'm not against it - it might be what we want in the long term - but it is more involved. The proposed SHOW RANGES is pretty easy to implement (I already have an implementation); I would rather go ahead with that for now so I can finish implementing the new primitives that we need for adding better distsql tests.

If you want to work out the details of your proposal (either as a PR to this RFC or in a new RFC), I would have no problem retiring SHOW RANGES in favor of this.


Comments from Reviewable

@petermattis
Copy link
Collaborator

:lgtm: modulo bike shedding the specific syntax.


Review status: 0 of 1 files reviewed at latest revision, 2 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 188 at r1 (raw file):

Previously, RaduBerinde wrote…

Added FROM. Though perhaps SHOW RANGES FOR TABLE would be more natural?

Added notes on interleaved tables (the semantics are fairly straightforward). I'll make sure to add some tests with those in the implementation.

Why are the {TABLE,INDEX} keywords necessary? We have SHOW TABLES FROM <database>, not SHOW TABLES FROM DATABASE <database>. Similarly, SHOW INDEXES FROM <table>.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

docs/RFCS/sql_split_syntax.md, line 188 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Why are the {TABLE,INDEX} keywords necessary? We have SHOW TABLES FROM <database>, not SHOW TABLES FROM DATABASE <database>. Similarly, SHOW INDEXES FROM <table>.

Hm, good question. The one case I can think of where it helps is when you refer to an index by name (without the table). That works today with ALTER INDEX.


Comments from Reviewable

@bdarnell
Copy link
Contributor

:lgtm:


Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 188 at r1 (raw file):

Previously, RaduBerinde wrote…

Hm, good question. The one case I can think of where it helps is when you refer to an index by name (without the table). That works today with ALTER INDEX.

I could also imagine SHOW RANGES FROM DATABASE (per @mjibson's comment about dump), so the qualifier might help with namespaces in other ways too.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, RaduBerinde wrote…

I think there is a lot more complexity to doing something like that. For example, there will need to be a bunch of custom code to avoid scanning ALL the ranges when we're only interested in a table.

I'm not against it - it might be what we want in the long term - but it is more involved. The proposed SHOW RANGES is pretty easy to implement (I already have an implementation); I would rather go ahead with that for now so I can finish implementing the new primitives that we need for adding better distsql tests.

If you want to work out the details of your proposal (either as a PR to this RFC or in a new RFC), I would have no problem retiring SHOW RANGES in favor of this.

If it's easy to do this as a virtual table instead of a custom SHOW command, I think it would be better to do it that way (especially since this is mainly for our own usage so the more verbose syntax doesn't matter as much). I'm fine with an inefficient implementation that scans all ranges and then filters them.


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Each output row contains:
 - pretty printed range start key (skipping the `/Table/<tableID>/<indexID>`

Any chance these could be a tuple of the decoded values instead of a pretty-printed string?


Comments from Reviewable

@petermattis
Copy link
Collaborator

Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 188 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I could also imagine SHOW RANGES FROM DATABASE (per @mjibson's comment about dump), so the qualifier might help with namespaces in other ways too.

Does ALTER INDEX work with an unqualified index name? How does that work? Oh, we scan through all of the tables! Ouch!


Comments from Reviewable

@danhhz
Copy link
Contributor

danhhz commented Mar 27, 2017

Review status: 0 of 1 files reviewed at latest revision, 3 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

If it's easy to do this as a virtual table instead of a custom SHOW command, I think it would be better to do it that way (especially since this is mainly for our own usage so the more verbose syntax doesn't matter as much). I'm fine with an inefficient implementation that scans all ranges and then filters them.

+1 more for virtual table over SHOW


Comments from Reviewable

@andreimatei
Copy link
Contributor

:lgtm:


Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Any chance these could be a tuple of the decoded values instead of a pretty-printed string?

if we move to a virtual table, then I guess these would need to be a SQL array (instead of a tuple)?


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Each output row contains:
 - pretty printed range start key (skipping the `/Table/<tableID>/<indexID>`

if I remember correctly, the pretty printing is best effort (it's done without a table descriptor and so not all types can be decoded correctly)


docs/RFCS/sql_split_syntax.md, line 206 at r2 (raw file):

 - pretty printed range end key (NULL for the last range of the table/index);
 - list of replica store IDs (as an int array);
 - current lease holder (NULL if none).

NULL if none and/or unknown? I think we should discuss the implementation here too - is the output produced using caches or not? We were discussing have two flavors (with/without caches). What do you think?


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Looks like most people agree that we want a system table. I think that's fine, but will require fleshing out more details and more implementation work which I fear may be a distraction at this point from #13665 and #13666 (which was the motivating factor behind this). My preference would be to go ahead with the SHOW RANGES implementation temporarily so I can progress on the distsql issues, and keep this PR open while we figure out the details of the system table.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, danhhz (Daniel Harrison) wrote…

+1 more for virtual table over SHOW

@bdarnell note that in general, to show the Lease Holder (which is important for distsql planning), we have to issue a LeaseInfoRequest for each range! So in general, this could be very slow.. On the other hand, with a table we can select out only the columns we need so we can skip these requests if the column isn't needed.

We can also add special code that detects a table filter and restrict the range to that. But that brings another question around table name qualification. With SHOW RANGES, we can qualify a bare table name with the current databaase. Having it be part of a filter expression (where presumably the table is just a string) wouldn't allow that, so we would have to always specify the database and the table if we want to avoid the full range scan.


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Any chance these could be a tuple of the decoded values instead of a pretty-printed string?

Can there be cases where the split key doesn't correspond to a complete value? Technically it could be some key prefix which doesn't really map to a valid value, though I don't know if that ever happens in practice.

The other question - what would the resulting schema be? (assuming we switch to a system table). Would these values be part of a single column as a tuple?


docs/RFCS/sql_split_syntax.md, line 206 at r2 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

NULL if none and/or unknown? I think we should discuss the implementation here too - is the output produced using caches or not? We were discussing have two flavors (with/without caches). What do you think?

The version I am proposing would be without caches. The only reason to have the "with cache" flavor is to debug the cache itself (right?) so including in the syntax will be subject to bikeshedding. One thing to consider is how would this work with a system table? Would we have two system tables? (SHOW RANGES was nicer in that we could just add a "(CACHE)" option or something)


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Previously, RaduBerinde wrote…

Can there be cases where the split key doesn't correspond to a complete value? Technically it could be some key prefix which doesn't really map to a valid value, though I don't know if that ever happens in practice.

The other question - what would the resulting schema be? (assuming we switch to a system table). Would these values be part of a single column as a tuple?

@andreimatei do we support ARRAY table columns? (I thought we didn't?) Which would also be a problem for the replicas list. On the other hand, maybe virtual tables don't need to have the same restrictions as real tables.


Comments from Reviewable

@bdarnell
Copy link
Contributor

Maybe we should call this SHOW TESTING_RANGES to discourage people from using it if it's going to be temporary.


Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, RaduBerinde wrote…

@bdarnell note that in general, to show the Lease Holder (which is important for distsql planning), we have to issue a LeaseInfoRequest for each range! So in general, this could be very slow.. On the other hand, with a table we can select out only the columns we need so we can skip these requests if the column isn't needed.

We can also add special code that detects a table filter and restrict the range to that. But that brings another question around table name qualification. With SHOW RANGES, we can qualify a bare table name with the current databaase. Having it be part of a filter expression (where presumably the table is just a string) wouldn't allow that, so we would have to always specify the database and the table if we want to avoid the full range scan.

Ah, I didn't realize there was an RPC per range to get the lease; I thought it was just a scan of the meta ranges. Maybe there should be two tables, one for stuff in the range descriptors and one for lease info, and you could join them. But that's probably too much for now.


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Can there be cases where the split key doesn't correspond to a complete value?

Yeah, good point. It's fine for this to be a pretty-printed string; I was just thinking that if it's easy it might be nice to have the real values.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Ok - so I will plan to go ahead with a temporary SHOW TESTING_RANGES. In the meantime, I will keep this PR open, give the details of the new system table(s) some thought and update this PR with an updated proposal.


Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Ah, I didn't realize there was an RPC per range to get the lease; I thought it was just a scan of the meta ranges. Maybe there should be two tables, one for stuff in the range descriptors and one for lease info, and you could join them. But that's probably too much for now.

The join thing is a nice idea. It will require improving the infrastructure for system tables to only populate rows needed by filters though.


Comments from Reviewable

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Mar 27, 2017
This syntax is temporary and it will be replaced with a more versatile system
table (see discussion in cockroachdb#14366). This is useful for implementing/testing
`TESTING_RELOCATE` before we are able to flesh out all the details of the system
table proposals.
@RaduBerinde
Copy link
Member Author

#14390 is out for the temporary TESTING_RANGES.

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Mar 28, 2017
This syntax is temporary and it will be replaced with a more versatile system
table (see discussion in cockroachdb#14366). This is useful for implementing/testing
`TESTING_RELOCATE` before we are able to flesh out all the details of the system
table proposals.
@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 197 at r2 (raw file):

Previously, RaduBerinde wrote…

The join thing is a nice idea. It will require improving the infrastructure for system tables to only populate rows needed by filters though.

I've been thinking about this more. We don't yet support joins by point-lookups (other than index joins) so the entire table with leases would need to be populated before the join. I will go with a single table for now, but I'll make sure that we omit the lease requests if if we don't select for that column.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Updated the RFC with a proposed system table.


Review status: 0 of 1 files reviewed at latest revision, 6 unresolved discussions, some commit checks pending.


docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

`replicas`     | ARRAY(INT) | Replica store IDs
`lease_holder` | INT        | Lease holder store ID
`range_cache`  | STRING     | Range cache information

@andreimatei - let me know what you are thinking for the range cache information schema (is a debug string sufficient?)


Comments from Reviewable

RaduBerinde added a commit to RaduBerinde/cockroach that referenced this pull request Mar 29, 2017
This syntax is temporary and it will be replaced with a more versatile system
table (see discussion in cockroachdb#14366). This is useful for implementing/testing
`TESTING_RELOCATE` before we are able to flesh out all the details of the system
table proposals.
@andreimatei
Copy link
Contributor

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 206 at r2 (raw file):

Previously, RaduBerinde wrote…

The version I am proposing would be without caches. The only reason to have the "with cache" flavor is to debug the cache itself (right?) so including in the syntax will be subject to bikeshedding. One thing to consider is how would this work with a system table? Would we have two system tables? (SHOW RANGES was nicer in that we could just add a "(CACHE)" option or something)

the reason to have the version using the cache is to debug the cache itself, but also to be able to predict (or perhaps retroactively understand) how a scan is planned by a particular node.
Perhaps we can give the cache information in a SELECT as an index hint?


docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, RaduBerinde wrote…

@andreimatei - let me know what you are thinking for the range cache information schema (is a debug string sufficient?)

There's 2 caches to speak of - the leaseholder cache and the range descriptor cache. Ideally we could ask to use none, either, or both.
The range desc cache might completely disagree with the reality and contain completely different ranges. So I don't think that we simply tack on a string field to results coming from ranges that were discovered in the meta keys. I was thinking that if the query asks for that cache to be used, there's be no indication in that output that it was used.

For the leaseholder cache, I think a string field annotating a range coming from wherever it might have come (meta or range desc cache) is enough.


docs/RFCS/sql_split_syntax.md, line 212 at r3 (raw file):

`range_cache`  | STRING     | Range cache information

The last two columns could be hidden (so they are only available if `SELECT`ed

I think it might be saner to just return these cols as NULL rather than magically showing or hiding them. This magic might have consequences in how we present the schema of this table and such.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

There's 2 caches to speak of - the leaseholder cache and the range descriptor cache. Ideally we could ask to use none, either, or both.
The range desc cache might completely disagree with the reality and contain completely different ranges. So I don't think that we simply tack on a string field to results coming from ranges that were discovered in the meta keys. I was thinking that if the query asks for that cache to be used, there's be no indication in that output that it was used.

For the leaseholder cache, I think a string field annotating a range coming from wherever it might have come (meta or range desc cache) is enough.

I see. I am hesitant to introduce a mechanism that "tweaks" a SELECT statement into returning different results for (what looks like) the same table.. I think it would be cleaner to just have a separate table (crdb_internal.range_cache). It would be a much smaller table anyway since it would only contain what's in the cache.


docs/RFCS/sql_split_syntax.md, line 212 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I think it might be saner to just return these cols as NULL rather than magically showing or hiding them. This magic might have consequences in how we present the schema of this table and such.

The "magic" is the same thing we use for implicit primary keys. There's not much to it - it mostly restricts the set of columns when resolving *; everything else works just like with other columns.

I don't get the "return as NULL" suggestion; if values for them are required, we would populate them.

Anyway, I don't feel strongly on making them hidden; it's fine if we require explicit SELECT renders if we want to avoid the overhead.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, RaduBerinde wrote…

I see. I am hesitant to introduce a mechanism that "tweaks" a SELECT statement into returning different results for (what looks like) the same table.. I think it would be cleaner to just have a separate table (crdb_internal.range_cache). It would be a much smaller table anyway since it would only contain what's in the cache.

What's the usecase for using just one of the caches? Don't we always use both in the span resolver?


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, RaduBerinde wrote…

What's the usecase for using just one of the caches? Don't we always use both in the span resolver?

Would having a range_cache table that shows the leaseholder from the leaseholder cache and the range descriptor from the range descriptor be sufficient? Do these caches break up the spans differently? Even if they do, we can "merge" the split keys to show all the data..


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions.


docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, RaduBerinde wrote…

Would having a range_cache table that shows the leaseholder from the leaseholder cache and the range descriptor from the range descriptor be sufficient? Do these caches break up the spans differently? Even if they do, we can "merge" the split keys to show all the data..

I added info about a second range_cache table which merges the info from the two caches, like we discussed. PTAL. Thanks!


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, RaduBerinde wrote…

I added info about a second range_cache table which merges the info from the two caches, like we discussed. PTAL. Thanks!

@andreimatei weekly ping :)


Comments from Reviewable

@andreimatei
Copy link
Contributor

:lgtm:


Review status: 0 of 1 files reviewed at latest revision, 8 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 202 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can there be cases where the split key doesn't correspond to a complete value?

Yeah, good point. It's fine for this to be a pretty-printed string; I was just thinking that if it's easy it might be nice to have the real values.

we could also maybe add a function that takes a rangeid and returns the split key as an array


docs/RFCS/sql_split_syntax.md, line 210 at r3 (raw file):

Previously, RaduBerinde wrote…

@andreimatei weekly ping :)

LGTM!


docs/RFCS/sql_split_syntax.md, line 212 at r3 (raw file):

Previously, RaduBerinde wrote…

The "magic" is the same thing we use for implicit primary keys. There's not much to it - it mostly restricts the set of columns when resolving *; everything else works just like with other columns.

I don't get the "return as NULL" suggestion; if values for them are required, we would populate them.

Anyway, I don't feel strongly on making them hidden; it's fine if we require explicit SELECT renders if we want to avoid the overhead.

whatever you think


docs/RFCS/sql_split_syntax.md, line 227 at r4 (raw file):

spans that are in one of the caches show up in the resulting table. If a span
(or part of a span) is in one cache but not the other, the corresponding field
is NULL. A single span from a cache may be split across multiple rows if the

what do you mean by "split across multiple rows" exactly? We're going to "left inner-join" the range cache with the lease cache by range id, not do anything fancier, right?
The lease cache currently unfortunately doesn't have any information other than the range id and a replica.


Comments from Reviewable

@RaduBerinde
Copy link
Member Author

Review status: 0 of 1 files reviewed at latest revision, 7 unresolved discussions, all commit checks successful.


docs/RFCS/sql_split_syntax.md, line 227 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

what do you mean by "split across multiple rows" exactly? We're going to "left inner-join" the range cache with the lease cache by range id, not do anything fancier, right?
The lease cache currently unfortunately doesn't have any information other than the range id and a replica.

Ah, I thought both have ranges; the idea was that if one cache thinks there's a range from A and C and the other thinks there are two ranges from A to B and from B to C, there will be two rows (A->B, B->C) which both show the same info from the first cache.

If it only has the range id, then yes you are correct, it's effectively a join by rangeID. I will update the wording.


Comments from Reviewable

While writing tests for `TESTING_RELOCATE`, I found myself implementing a
test-only function that returns information for all ranges in a table or index.
It seems much more useful to provide it through a statement, and use logic test
infrastructure for writing tests.
@RaduBerinde RaduBerinde merged commit db66100 into cockroachdb:master Apr 13, 2017
@RaduBerinde RaduBerinde deleted the rfc-show-ranges branch April 13, 2017 16:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants