-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/schemachange: add SET LOCALITY REGIONAL BY ROW [AS] #62909
Conversation
hmm, i don't understand the failure:
nothing looks dead o_o schema changes seem like they're doing ok. |
ah nvm it's higher up
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get an honest experience report? How is this going?
I think you need a rebase for the test failure.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @otan)
pkg/workload/schemachange/error_screening.go, line 808 at r2 (raw file):
tx, ` SELECT EXISTS (SELECT table_id FROM crdb_internal.schema_changes
I'm torn here. One the one hand, ideally we'll use the catalog tables and they will work great and yay. On the other hand, many (almost all) catalog tables lack virtual indexes which means that the more of these checks which use those tables we have, the more likely we are to scan the entire descriptor table and thus conflict with all other schema change transactions. On the other, other hand, we already do have a lot of checks which result in said scans.
I'm filing an issue (#62930) to create virtual indexes for every virtual table which we can index by table. That will help a lot. In the meantime, you could write this like:
SELECT json_array_length(
crdb_internal.pb_to_json(
'cockroach.sql.sqlbase.Descriptor',
descriptor
)->'table'->'mutations'
)
= 0
FROM system.descriptor
WHERE id = $1::REGCLASS;
Take it or leave it, I really am fine if you leave it.
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file):
tx *pgx.Tx, tableName *tree.TableName, ) (bool, error) { if len(og.opsInTxn) > 0 {
🤔 why? The changes are likely for a different table.
pkg/workload/schemachange/schemachange.go, line 358 at r2 (raw file):
Quoted 8 lines of code…
// Some ops can only be done executed if they are standalone (e.g. // anything using ALTER PRIMARY KEY). Early exit if so. if len(w.opGen.opsInTxn) > 0 { switch w.opGen.opsInTxn[len(w.opGen.opsInTxn)-1] { case alterTableLocality: break } }
isn't this just saying that it's the last operation, not that it's standalone. I also don't really understand the condition or why to special case this. Is it just that you'd like it to not fail most of the time and it is only likely to succeed if there's just one operation? I've been thinking that probably this uniform distribution of the number of commands isn't wise. Perhaps we should do something more exponential.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I get an honest experience report? How is this going?
it's.... ok. it's a little annoying having to think of all the error cases upfront, and some data is hard to access (e.g. existing schema change). the UDT stuff seems rare to hit as well. but rewarding for sure :)
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file):
Previously, ajwerner wrote…
🤔 why? The changes are likely for a different table.
hmm, it needs to be the only mutation on the table and might not be committed yet?
but the schema_change / scan you mention above takes care of this i think, so removing it!
pkg/workload/schemachange/schemachange.go, line 358 at r2 (raw file):
Previously, ajwerner wrote…
// Some ops can only be done executed if they are standalone (e.g. // anything using ALTER PRIMARY KEY). Early exit if so. if len(w.opGen.opsInTxn) > 0 { switch w.opGen.opsInTxn[len(w.opGen.opsInTxn)-1] { case alterTableLocality: break } }
isn't this just saying that it's the last operation, not that it's standalone. I also don't really understand the condition or why to special case this. Is it just that you'd like it to not fail most of the time and it is only likely to succeed if there's just one operation? I've been thinking that probably this uniform distribution of the number of commands isn't wise. Perhaps we should do something more exponential.
hmm, i removed it. might be because earlier on i had a bad in progress schema_changes check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 3 files at r3.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @otan)
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
hmm, it needs to be the only mutation on the table and might not be committed yet?
but the schema_change / scan you mention above takes care of this i think, so removing it!
seems like it is still present
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file):
Previously, ajwerner wrote…
seems like it is still present
Deleting this gives me:
W210406 12:20:16.933591 175236 kv/kvserver/raft_transport.go:633 [n1] 65343 while processing outgoing Raft queue to node 2: rpc error: code = Canceled desc = grpc: the client connection is closing:
--- FAIL: TestWorkload (9.18s)
schema_change_external_test.go:111:
Error Trace: schema_change_external_test.go:111
Error: Received unexpected error:
***UNEXPECTED ERROR; Received an unexpected execution error: ERROR: unimplemented: table table342 is currently undergoing a schema change (SQLSTATE 0A000)
(1) forced error mark
| "fatal error when running txn"
| github.com/cockroachdb/errors/withstack/*withstack.withStack::
Wraps: (2) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/workload/schemachange.(*schemaChangeWorker).runInTxn
| /Users/otan/go/src/github.com/cockroachdb/cockroach/pkg/workload/schemachange/schemachange.go:392
| github.com/cockroachdb/cockroach/pkg/workload/schemachange.(*schemaChangeWorker).run
| /Users/otan/go/src/github.com/cockroachdb/cockroach/pkg/workload/schemachange/schemachange.go:426
| github.com/cockroachdb/cockroach/pkg/ccl/testccl/workload/schemachange_test.TestWorkload.func3.1
| /Users/otan/go/src/github.com/cockroachdb/cockroach/pkg/ccl/testccl/workload/schemachange/schema_change_external_test.go:100
| golang.org/x/sync/errgroup.(*Group).Go.func1
| /Users/otan/go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
| runtime.goexit
| /usr/local/opt/[email protected]/libexec/src/runtime/asm_amd64.s:1374
Wraps: (3) ***UNEXPECTED ERROR; Received an unexpected execution error
Wraps: (4) ERROR: unimplemented: table table342 is currently undergoing a schema change (SQLSTATE 0A000)
Error types: (1) *markers.withMark (2) *withstack.withStack (3) *errutil.withPrefix (4) pgx.PgError
Test: TestWorkload
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file): Previously, otan (Oliver Tan) wrote…
or this :\ |
pkg/workload/schemachange/schemachange.go, line 358 at r2 (raw file):
on re-running the test, it seems like removing this gives:
|
pkg/workload/schemachange/schemachange.go, line 358 at r2 (raw file): Previously, otan (Oliver Tan) wrote…
(might be wrong on this one, but hitting the hay now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file):
Previously, otan (Oliver Tan) wrote…
{ "workerId": 0, "clientTimestamp": "12:21:46.345019", "ops": [ "BEGIN", "ALTER TABLE schema271.table279 ADD COLUMN col279_286 INT4", "SELECT 'validating all objects'", "ALTER TABLE schema271.table279 SET LOCALITY REGIONAL BY ROW" ], "expectedExecErrors": "", "expectedCommitErrors": "", "message": "***UNEXPECTED ERROR; Received an unexpected execution error: ERROR: unimplemented: cannot perform a locality change on table279 with other schema changes on table279 in the same transaction (SQLSTATE 0A000)", "txStatus": "TxStatusRollbackSuccess" }
or this :\
cockroach/pkg/sql/alter_primary_key.go
Lines 94 to 106 in af0e2cb
if mut.MutationID < currentMutationID { | |
// We can handle indexes being deleted concurrently. We do this | |
// in order to not be blocked on index drops created by a previous | |
// primary key change. If we errored out when seeing a previous | |
// index drop, then users would see a confusing message that a | |
// schema change is in progress when it doesn't seem like one is. | |
// TODO (rohany): This feels like such a hack until (#45510) is fixed. | |
if mut.GetIndex() != nil && mut.Direction == descpb.DescriptorMutation_DROP { | |
continue | |
} | |
return unimplemented.NewWithIssuef( | |
45510, "table %s is currently undergoing a schema change", tableDesc.Name) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I accept if you add some commentary around that odd rule.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/workload/schemachange/operation_generator.go, line 462 at r2 (raw file): Previously, ajwerner wrote…
realised the latter one was a typo -- should be > 0 not = 0 for the length check. |
i've had successful runs barring other issues in the schemachange workload (SET DATA TYPE runs into other issues?) so i'm merging this one! bors r=ajwerner |
62909: workload/schemachange: add SET LOCALITY REGIONAL BY ROW [AS] r=ajwerner a=otan See individual commits for details. Co-authored-by: Oliver Tan <[email protected]>
bors r- ooh, one more thing |
Canceled. |
Just noticed that the case here has gone cold. Was that intentional? Should this PR be closed? |
Still needs to be done? With our blocks on concurrent changes it'll take a bit more massaging... |
This previously did not take into account implicit primary keys. Release note: None
Certain operations cannot be performed during ALTER PK. Release note: None
REGIONAL BY ROW tables do not allow index changes during an ADD/DROP REGION. Release note: None
Release note: None
Release note: None
works after reverting #63725 so merging this one bors r=ajwerner |
Build succeeded: |
See individual commits for details.