Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
95904: roachtest: normalize versions in multitenant-upgrade roachtest r=ajstorm a=healthy-pod

If the multitenant-upgrade roachtest uses a mix
of release/non-release binaries, it may be using versions
that are technically the same but fail to confirm that because
version in test binaries are incremented by 1M.

This code change fixes the issue by normalizing versions before
comparing them.

Closes #95648

Epic: none
Release note: None

95911: kv: perform PushTxn(PUSH_TIMESTAMP) without Raft consensus r=arulajmani a=nvanbenschoten

This PR contains a sequence of three commits that combine to resolve #94728.

### check txn push marker on commit, not txn record creation

The first commit moves the point when a transaction checks the timestamp cache for its minimum commit timestamp from transaction record creation time back to commit time. This allows us to use the timestamp cache to communicate successful `PushTxn(TIMESTAMP)` to a pushee with an existing record without rewriting its transaction record.

For details, see the changes to the state machine diagram attached to `Replica.CanCreateTxnRecord` for a visual depiction of this change.

### always promote PUSH_TIMESTAMP to PUSH_ABORT on failed staging record 

The second commit simplifies logic in PushTxnRequest that promoted a `PUSH_TIMESTAMP` to a `PUSH_ABORT` when it 
 found a STAGING transaction record that it knew to be part of a failed parallel commit attempt.

The logic tried to be smart and minimize the cases where it needed to promote a `PUSH_TIMESTAMP` to a `PUSH_ABORT`. It was avoiding doing so if it had previously found an intent with a higher epoch. In practice, this optimization doesn't seem to matter. It was also making logic in a following commit harder to write because it was preserving cases where a `PUSH_TIMESTAMP` would succeed against a STAGING transaction record. We don't want to support such state transitions, so eliminate them.

### don't rewrite txn record on PushTxn(TIMESTAMP)

With the previous two commits, transactions will check the timestamp cache before committing to determine whether they have had their commit timestamp pushed. The final commit exploits this to avoid ever rewriting a transaction's record on a timestamp push. Instead, the timestamp cache is used, regardless of whether the record already existed or not. Doing so avoids consensus.

Release note: None

96115: schemachanger: Implement `DROP CONSTRAINT` in declarative schema changer r=Xiang-Gu a=Xiang-Gu

This PR implements `ALTER TABLE t DROP CONSTRAINT cons_name` in declarative schema changer.

Supported constraints include Checks, FK, and UniqueWithoutIndex.

Dropping PK or Unique constraints will fall back to legacy schema changer, which in turn spits out an "not supported yet" error.

Epic: None

96241: sql: use RLock in connExecutor.CancelQuery and connExecutor.CancelActiveQueries r=rafiss,yuzefovich a=ecwall

Fixes #95994

`connExecutor.CancelQuery` and `connExecutor.CancelActiveQueries` do not modify `mu.ActiveQueries` or the `*queryMetas` inside so they can safely use `RLock` instead of `Lock`.

Release note: None

96273: sql/schemachanger: forward fit compatibility changes for 22.2 rules r=fqazi a=fqazi

Informs: #95849

Previously, some constraint-related rules in the 22.2 set incorrectly used logic for 23.X. This patch addresses those to get compatibility back. Additionally, some minor clean-up in rules-related helpers to ensure proper compatibility.

With this change, a manual diff shows both branches are now equal in terms of rules (outside of renames). A roachtest will be coming soon to assert this.

Epic: none
Release note: None

96302: backupccl: add missing context cancel checks to restore r=stevendanna a=adityamaru

In #95257 we saw a restore grind to a halt 2 hours into a 5 hour roachtest. The stacks indicated that we may have seen a context cancellation that was not being respected by the goroutine running `generateAndSendImportSpans`. This resulted in the `generative_split_and_scatter_processor` getting stuck writing to a channel nobody was reading from
(https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/backupccl/restore_span_covering.go#L516) since the other goroutines
in the processor had seen the ctx cancellation and exited. A side effect of the generative processor not shutting down was that the downstream restore data processors would also hang on their call to `input.Next()` as they would not receive a row or a meta from the generative processor signalling them to shutdown. This fix adds a ctx cancellation check to the goroutine described above, thereby allowing a
graceful teardown of the flow.

Informs: #95257

Release note (bug fix): fixes a bug where a restore flow could hang indefinitely in the face of a context cancellation, manifesting as a stuck restore job.

Co-authored-by: healthy-pod <[email protected]>
Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Xiang Gu <[email protected]>
Co-authored-by: Evan Wall <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
Co-authored-by: adityamaru <[email protected]>
  • Loading branch information
7 people committed Feb 2, 2023
7 parents 6321c86 + e046e46 + 3dbb321 + 2428a43 + 38299ef + 372b2f4 + d7a6974 commit 22244a7
Show file tree
Hide file tree
Showing 84 changed files with 2,960 additions and 1,152 deletions.
6 changes: 3 additions & 3 deletions pkg/bench/rttanalysis/testdata/benchmark_expectations
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ exp,benchmark
15,AlterTableDropColumn/alter_table_drop_1_column
15,AlterTableDropColumn/alter_table_drop_2_columns
15,AlterTableDropColumn/alter_table_drop_3_columns
9,AlterTableDropConstraint/alter_table_drop_1_check_constraint
9,AlterTableDropConstraint/alter_table_drop_2_check_constraints
9,AlterTableDropConstraint/alter_table_drop_3_check_constraints
11,AlterTableDropConstraint/alter_table_drop_1_check_constraint
11,AlterTableDropConstraint/alter_table_drop_2_check_constraints
11,AlterTableDropConstraint/alter_table_drop_3_check_constraints
9,AlterTableSplit/alter_table_split_at_1_value
13,AlterTableSplit/alter_table_split_at_2_values
17,AlterTableSplit/alter_table_split_at_3_values
Expand Down
2 changes: 1 addition & 1 deletion pkg/ccl/backupccl/restore_data_processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ func (rd *restoreDataProcessor) Start(ctx context.Context) {
_ = rd.phaseGroup.Wait()
}
rd.phaseGroup = ctxgroup.WithContext(ctx)
log.Infof(ctx, "starting restore data")
log.Infof(ctx, "starting restore data processor")

entries := make(chan execinfrapb.RestoreSpanEntry, rd.numWorkers)
rd.sstCh = make(chan mergedSST, rd.numWorkers)
Expand Down
14 changes: 10 additions & 4 deletions pkg/ccl/backupccl/restore_job.go
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ func restore(
genSpan := func(ctx context.Context, spanCh chan execinfrapb.RestoreSpanEntry) error {
defer close(spanCh)
return generateAndSendImportSpans(
restoreCtx,
ctx,
dataToRestore.getSpans(),
backupManifests,
layerToBackupManifestFileIterFactory,
Expand All @@ -334,7 +334,6 @@ func restore(
// Count number of import spans.
var numImportSpans int
var countTasks []func(ctx context.Context) error
log.Infof(restoreCtx, "rh_debug: starting count task")
spanCountTask := func(ctx context.Context) error {
for range countSpansCh {
numImportSpans++
Expand Down Expand Up @@ -397,7 +396,12 @@ func restore(

if idx >= mu.ceiling {
for i := mu.ceiling; i <= idx; i++ {
importSpan := <-importSpanCh
importSpan, ok := <-importSpanCh
if !ok {
// The channel has been closed, there is nothing left to do.
log.Infof(ctx, "exiting restore checkpoint loop as the import span channel has been closed")
return nil
}
mu.inFlightImportSpans[i] = importSpan.Span
}
mu.ceiling = idx + 1
Expand All @@ -416,7 +420,6 @@ func restore(
for j := mu.highWaterMark + 1; j < mu.ceiling && mu.requestsCompleted[j]; j++ {
mu.highWaterMark = j
}

for j := prevHighWater; j < mu.highWaterMark; j++ {
delete(mu.requestsCompleted, j)
delete(mu.inFlightImportSpans, j)
Expand Down Expand Up @@ -1714,6 +1717,7 @@ func (r *restoreResumer) doResume(ctx context.Context, execCtx interface{}) erro
return err
}
}
log.Infof(ctx, "finished restoring the pre-data bundle")
}

if !preValidateData.isEmpty() {
Expand All @@ -1734,6 +1738,7 @@ func (r *restoreResumer) doResume(ctx context.Context, execCtx interface{}) erro
}

resTotal.Add(res)
log.Infof(ctx, "finished restoring the validate data bundle")
}
{
// Restore the main data bundle. We notably only restore the system tables
Expand All @@ -1755,6 +1760,7 @@ func (r *restoreResumer) doResume(ctx context.Context, execCtx interface{}) erro
}

resTotal.Add(res)
log.Infof(ctx, "finished restoring the main data bundle")
}

if err := insertStats(ctx, r.job, p.ExecCfg(), remappedStats); err != nil {
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/backupccl/restore_processor_planning.go
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,7 @@ func distRestore(
NumEntries: int64(numImportSpans),
NumNodes: int64(numNodes),
UseSimpleImportSpans: useSimpleImportSpans,
JobID: jobID,
}

proc := physicalplan.Processor{
Expand Down
17 changes: 12 additions & 5 deletions pkg/ccl/backupccl/restore_span_covering.go
Original file line number Diff line number Diff line change
Expand Up @@ -497,7 +497,7 @@ func generateAndSendImportSpans(
var covFilesByLayer [][]backuppb.BackupManifest_File
var firstInSpan bool

flush := func() {
flush := func(ctx context.Context) error {
entry := execinfrapb.RestoreSpanEntry{
Span: lastCovSpan,
}
Expand All @@ -513,8 +513,14 @@ func generateAndSendImportSpans(
}

if len(entry.Files) > 0 {
spanCh <- entry
select {
case <-ctx.Done():
return ctx.Err()
case spanCh <- entry:
}
}

return nil
}

for _, span := range requiredSpans {
Expand Down Expand Up @@ -630,7 +636,9 @@ func generateAndSendImportSpans(
lastCovSpan.EndKey = coverSpan.EndKey
lastCovSpanSize = lastCovSpanSize + newCovFilesSize
} else {
flush()
if err := flush(ctx); err != nil {
return err
}
lastCovSpan = coverSpan
covFilesByLayer = filesByLayer
lastCovSpanSize = covSize
Expand All @@ -646,8 +654,7 @@ func generateAndSendImportSpans(
}
}

flush()
return nil
return flush(ctx)
}

// fileSpanStartAndEndKeyIterator yields (almost) all of the start and end keys
Expand Down
15 changes: 15 additions & 0 deletions pkg/ccl/schemachangerccl/backup_base_generated_test.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -65,21 +65,21 @@ Schema change plan for DROP DATABASE ‹multi_region_test_db› CASCADE;
│ │ ├── PUBLIC → VALIDATED PrimaryIndex:{DescID: 108, IndexID: 1, ConstraintID: 1}
│ │ └── PUBLIC → ABSENT IndexName:{DescID: 108, Name: table_regional_by_table_pkey, IndexID: 1}
│ └── 15 Mutation operations
│ ├── MarkDescriptorAsDropped {"DescriptorID":104}
│ ├── MarkDescriptorAsDropped {"DescriptorID":105}
│ ├── RemoveSchemaParent {"Parent":{"ParentDatabaseID":104,"SchemaID":105}}
│ ├── MarkDescriptorAsDropped {"DescriptorID":106}
│ ├── RemoveObjectParent {"ObjectID":106,"ParentSchemaID":105}
│ ├── MarkDescriptorAsDropped {"DescriptorID":107}
│ ├── RemoveObjectParent {"ObjectID":107,"ParentSchemaID":105}
│ ├── MarkDescriptorAsDropped {"DescriptorID":108}
│ ├── RemoveObjectParent {"ObjectID":108,"ParentSchemaID":105}
│ ├── RemoveBackReferenceInTypes {"BackReferencedDescriptorID":108}
│ ├── DrainDescriptorName {"Namespace":{"DescriptorID":104,"Name":"multi_region_tes..."}}
│ ├── MarkDescriptorAsDropped {"DescriptorID":105}
│ ├── RemoveSchemaParent {"Parent":{"ParentDatabaseID":104,"SchemaID":105}}
│ ├── MarkDescriptorAsDropped {"DescriptorID":104}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":105,"Name":"public"}}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":106,"Name":"crdb_internal_re...","SchemaID":105}}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":107,"Name":"_crdb_internal_r...","SchemaID":105}}
│ └── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":108,"Name":"table_regional_b...","SchemaID":105}}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":108,"Name":"table_regional_b...","SchemaID":105}}
│ └── DrainDescriptorName {"Namespace":{"DescriptorID":104,"Name":"multi_region_tes..."}}
├── PreCommitPhase
│ ├── Stage 1 of 2 in PreCommitPhase
│ │ ├── 53 elements transitioning toward ABSENT
Expand Down Expand Up @@ -194,25 +194,25 @@ Schema change plan for DROP DATABASE ‹multi_region_test_db› CASCADE;
│ │ ├── PUBLIC → ABSENT PrimaryIndex:{DescID: 108, IndexID: 1, ConstraintID: 1}
│ │ └── PUBLIC → ABSENT IndexName:{DescID: 108, Name: table_regional_by_table_pkey, IndexID: 1}
│ └── 26 Mutation operations
│ ├── MarkDescriptorAsDropped {"DescriptorID":104}
│ ├── RemoveDatabaseRoleSettings {"DatabaseID":104}
│ ├── MarkDescriptorAsDropped {"DescriptorID":105}
│ ├── RemoveSchemaParent {"Parent":{"ParentDatabaseID":104,"SchemaID":105}}
│ ├── MarkDescriptorAsDropped {"DescriptorID":106}
│ ├── RemoveObjectParent {"ObjectID":106,"ParentSchemaID":105}
│ ├── MarkDescriptorAsDropped {"DescriptorID":107}
│ ├── RemoveObjectParent {"ObjectID":107,"ParentSchemaID":105}
│ ├── MarkDescriptorAsDropped {"DescriptorID":108}
│ ├── RemoveObjectParent {"ObjectID":108,"ParentSchemaID":105}
│ ├── RemoveBackReferenceInTypes {"BackReferencedDescriptorID":108}
│ ├── DrainDescriptorName {"Namespace":{"DescriptorID":104,"Name":"multi_region_tes..."}}
│ ├── MarkDescriptorAsDropped {"DescriptorID":105}
│ ├── RemoveSchemaParent {"Parent":{"ParentDatabaseID":104,"SchemaID":105}}
│ ├── RemoveColumnNotNull {"ColumnID":1,"TableID":108}
│ ├── MarkDescriptorAsDropped {"DescriptorID":104}
│ ├── RemoveDatabaseRoleSettings {"DatabaseID":104}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":105,"Name":"public"}}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":106,"Name":"crdb_internal_re...","SchemaID":105}}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":107,"Name":"_crdb_internal_r...","SchemaID":105}}
│ ├── DrainDescriptorName {"Namespace":{"DatabaseID":104,"DescriptorID":108,"Name":"table_regional_b...","SchemaID":105}}
│ ├── RemoveColumnNotNull {"ColumnID":1,"TableID":108}
│ ├── MakeDeleteOnlyColumnAbsent {"ColumnID":4294967295,"TableID":108}
│ ├── MakeDeleteOnlyColumnAbsent {"ColumnID":4294967294,"TableID":108}
│ ├── DrainDescriptorName {"Namespace":{"DescriptorID":104,"Name":"multi_region_tes..."}}
│ ├── MakeDeleteOnlyColumnAbsent {"ColumnID":1,"TableID":108}
│ ├── SetJobStateOnDescriptor {"DescriptorID":104,"Initialize":true}
│ ├── SetJobStateOnDescriptor {"DescriptorID":105,"Initialize":true}
Expand Down
Loading

0 comments on commit 22244a7

Please sign in to comment.