Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
61737: backupccl: prevent restoring a newer backup r=dt a=dt

This adds a check during RESTORE that the backup(s) being restored were produced by a version
no newer than the current version. The build info for the version that produced a
backup is persisted in the manifest and is used for this check.

Note: this is checking the _build_ version, not the cluster version.  A follow-up
change should likely be added to persist cluster versions and check them as well
so that most-migration data cannot be restored to a pre-migration cluster even if
both are on the same binary version.

Release note (ops change): RESTORE now refuses to restore BACKUPs made by newer versions.

62046: docs: update import-into diagram r=gemma-shay a=gemma-shay

Release note: None

62192: stmtdiagnostics: fix rare stress flake r=ajwerner a=ajwerner

For reasons I do not want to think about, the server startup took more than
10s.

```
[07:46:17][Step 2/2] I210317 07:46:07.208310 3685 sql/temporary_schema.go:610  [n1] 209  temporary object cleaner next scheduled to run at 2021-03-17 08:16:07.205507825 +0000 UTC
[07:46:17][Step 2/2] I210317 07:46:07.210153 3695 sql/sqlliveness/slstorage/slstorage.go:352  [n1] 210  inserted sqlliveness session e8bd7965952f4a739b406f76c84bd0dd
[07:46:17][Step 2/2] I210317 07:46:07.210209 3695 sql/sqlliveness/slinstance/slinstance.go:144  [n1] 211  created new SQL liveness session e8bd7965952f4a739b406f76c84bd0dd
[07:46:17][Step 2/2] I210317 07:46:07.210499 3683 1@util/log/event_log.go:32  [n1] 212 ={"Timestamp":1615967167203026366,"EventType":"node_join","NodeID":1,"StartedAt":1615967167137956829,"LastUp":1615967167137956829}
[07:46:17][Step 2/2] I210317 07:46:08.981572 3392 1@gossip/gossip.go:1505  [n1] 213  node has connected to cluster via gossip
[07:46:17][Step 2/2] I210317 07:46:08.981683 3392 kv/kvserver/stores.go:269  [n1] 214  wrote 0 node addresses to persistent storage
[07:46:17][Step 2/2] I210317 07:46:12.148614 3997 kv/kvserver/replica_consistency.go:255  [n1,consistencyChecker,s1,r4/1:/System{/tsd-tse}] 215  triggering stats recomputation to resolve delta of {ContainsEstimates:1672 LastUpdateNanos:1615967167145269126 IntentAge:0 GCBytesAge:0 LiveBytes:-55936 LiveCount:-836 KeyBytes:-42560 KeyCount:-836 ValBytes:-13376 ValCount:-836 IntentBytes:0 IntentCount:0 SeparatedIntentCount:0 SysBytes:0 SysCount:0 AbortSpanBytes:0}
[07:46:17][Step 2/2] I210317 07:46:17.143751 3623 2@server/status/runtime.go:553  [n1] 216  runtime stats: 152 MiB RSS, 202 goroutines (stacks: 3.9 MiB), 33 MiB/67 MiB Go alloc/total (heap fragmentation: 6.8 MiB, heap reserved: 13 MiB, heap released: 7.0 MiB), 2.0 MiB/6.6 MiB CGO alloc/total (0.0 CGO/sec), 0.0/0.0 %(u/s)time, 0.0 %gc (0x), 14 MiB/14 MiB (r/w)net
[07:46:17][Step 2/2] I210317 07:46:17.161245 3603 3@vendor/github.com/cockroachdb/pebble/db.go:1449  [n1,pebble] 217  [JOB 4] WAL created 000005
[07:46:17][Step 2/2] W210317 07:46:17.291363 3716 sql/set_cluster_setting.go:320  [n1,intExec=optInToDiagnosticsStatReporting] 218  SET CLUSTER SETTING "diagnostics.reporting.enabled" timed out waiting for value "true", observed "false"
[07:46:17][Step 2/2] W210317 07:46:17.291930 3152 sqlmigrations/migrations.go:436  [n1] 219  failed to run SET CLUSTER SETTING diagnostics.reporting.enabled = true: optInToDiagnosticsStatReporting: setting updated but timed out waiting to read new value
[07:46:17][Step 2/2] I210317 07:46:17.342969 4046 util/log/event_log.go:32  [n1,intExec=optInToDiagnosticsStatReporting] 220 ={"Timestamp":1615967177339253721,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"diagnostics.reporting.enabled\" = true","User":"root","ApplicationName":"$ internal-optInToDiagnosticsStatReporting","SettingName":"diagnostics.reporting.enabled","Value":"true"}
[07:46:17][Step 2/2] I210317 07:46:17.347724 4091 migration/migrationmanager/manager.go:102  [n1,intExec=set-setting,migration-mgr] 221  no need to migrate, cluster already at newest version
[07:46:17][Step 2/2] I210317 07:46:17.353450 4091 util/log/event_log.go:32  [n1,intExec=set-setting] 222 ={"Timestamp":1615967177345048785,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING version = $1","User":"root","ApplicationName":"$ internal-set-setting","PlaceholderValues":["'20.2-48'"],"SettingName":"version","Value":"20.2-48"}
[07:46:17][Step 2/2] I210317 07:46:17.365911 4566 util/log/event_log.go:32  [n1,intExec=initializeClusterSecret] 223 ={"Timestamp":1615967177363067087,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING \"cluster.secret\" = gen_random_uuid()::STRING","User":"root","ApplicationName":"$ internal-initializeClusterSecret","SettingName":"cluster.secret","Value":"3dfbfa15-a35d-415c-a7d7-eccb33839f0b"}
[07:46:17][Step 2/2] I210317 07:46:17.369968 4166 5@util/log/event_log.go:32  [n1,intExec=create-default-DB] 224 ={"Timestamp":1615967177367703431,"EventType":"create_database","Statement":"CREATE DATABASE IF NOT EXISTS defaultdb","User":"root","DescriptorID":50,"ApplicationName":"$ internal-create-default-DB","DatabaseName":"defaultdb"}
[07:46:17][Step 2/2] I210317 07:46:17.372657 4576 5@util/log/event_log.go:32  [n1,intExec=create-default-DB] 225 ={"Timestamp":1615967177370176064,"EventType":"create_database","Statement":"CREATE DATABASE IF NOT EXISTS postgres","User":"root","DescriptorID":51,"ApplicationName":"$ internal-create-default-DB","DatabaseName":"postgres"}
[07:46:17][Step 2/2] I210317 07:46:17.389150 3152 server/server_sql.go:840  [n1] 226  done ensuring all necessary startup migrations have run
[07:46:17][Step 2/2] I210317 07:46:17.389222 3152 1@server/server.go:2061  [n1] 227  serving sql connections
[07:46:17][Step 2/2] I210317 07:46:17.389252 4840 jobs/job_scheduler.go:360  [n1] 228  waiting 2m0s before scheduled jobs daemon start
[07:46:17][Step 2/2]     statement_diagnostics_test.go:209:
[07:46:17][Step 2/2]         	Error Trace:	statement_diagnostics_test.go:209
[07:46:17][Step 2/2]         	Error:      	Not equal:
[07:46:17][Step 2/2]         	            	expected: 1
[07:46:17][Step 2/2]         	            	actual  : 2
[07:46:17][Step 2/2]         	Test:       	TestChangePollInterval
```

Fixes #62144.

Release note: None

Co-authored-by: David Taylor <[email protected]>
Co-authored-by: Gemma Shay <[email protected]>
Co-authored-by: Andrew Werner <[email protected]>
  • Loading branch information
4 people committed Mar 18, 2021
4 parents fc2f2c3 + 2c4257d + 05f0970 + 61f1c9b commit ba8f03e
Show file tree
Hide file tree
Showing 10 changed files with 273 additions and 138 deletions.
8 changes: 4 additions & 4 deletions docs/generated/sql/bnf/import_into.bnf
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import_stmt ::=
'IMPORT' 'INTO' table_name '(' column_name ( ( ',' column_name ) )* ')' 'CSV' 'DATA' '(' file_location ( ( ',' file_location ) )* ')' 'WITH' option '=' value ( ( ',' option '=' value ) )*
| 'IMPORT' 'INTO' table_name '(' column_name ( ( ',' column_name ) )* ')' 'CSV' 'DATA' '(' file_location ( ( ',' file_location ) )* ')'
| 'IMPORT' 'INTO' table_name 'CSV' 'DATA' '(' file_location ( ( ',' file_location ) )* ')' 'WITH' option '=' value ( ( ',' option '=' value ) )*
| 'IMPORT' 'INTO' table_name 'CSV' 'DATA' '(' file_location ( ( ',' file_location ) )* ')'
'IMPORT' 'INTO' table_name '(' column_name ( ( ',' column_name ) )* ')' ( 'CSV' | 'AVRO' | 'DELIMITED' ) 'DATA' '(' file_location ( ( ',' file_location ) )* ')' 'WITH' option '=' value ( ( ',' option '=' value ) )*
| 'IMPORT' 'INTO' table_name '(' column_name ( ( ',' column_name ) )* ')' ( 'CSV' | 'AVRO' | 'DELIMITED' ) 'DATA' '(' file_location ( ( ',' file_location ) )* ')'
| 'IMPORT' 'INTO' table_name ( 'CSV' | 'AVRO' | 'DELIMITED' ) 'DATA' '(' file_location ( ( ',' file_location ) )* ')' 'WITH' option '=' value ( ( ',' option '=' value ) )*
| 'IMPORT' 'INTO' table_name ( 'CSV' | 'AVRO' | 'DELIMITED' ) 'DATA' '(' file_location ( ( ',' file_location ) )* ')'
309 changes: 177 additions & 132 deletions pkg/ccl/backupccl/backup.pb.go

Large diffs are not rendered by default.

4 changes: 3 additions & 1 deletion pkg/ccl/backupccl/backup.proto
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ option go_package = "backupccl";
import "build/info.proto";
import "roachpb/api.proto";
import "roachpb/data.proto";
import "roachpb/metadata.proto";
import "sql/stats/table_statistic.proto";
import "sql/catalog/descpb/structured.proto";
import "sql/catalog/descpb/tenant.proto";
Expand Down Expand Up @@ -110,6 +111,7 @@ message BackupManifest {
int32 node_id = 10 [(gogoproto.customname) = "NodeID",
(gogoproto.casttype) = "github.com/cockroachdb/cockroach/pkg/roachpb.NodeID"];
build.Info build_info = 11 [(gogoproto.nullable) = false];
roachpb.Version cluster_version = 25 [(gogoproto.nullable) = false];

bytes id = 18 [(gogoproto.nullable) = false,
(gogoproto.customname) = "ID",
Expand All @@ -126,7 +128,7 @@ message BackupManifest {
int32 descriptor_coverage = 22 [
(gogoproto.casttype) = "github.com/cockroachdb/cockroach/pkg/sql/sem/tree.DescriptorCoverage"];

// NEXT ID: 25
// NEXT ID: 26
}

message BackupPartitionDescriptor{
Expand Down
1 change: 1 addition & 0 deletions pkg/ccl/backupccl/backup_planning.go
Original file line number Diff line number Diff line change
Expand Up @@ -1078,6 +1078,7 @@ func backupPlanHook(
IntroducedSpans: newSpans,
FormatVersion: BackupFormatDescriptorTrackingVersion,
BuildInfo: build.GetInfo(),
ClusterVersion: p.ExecCfg().Settings.Version.ActiveVersion(ctx).Version,
ClusterID: p.ExecCfg().ClusterID(),
StatisticsFilenames: statsFiles,
DescriptorCoverage: backupStmt.Coverage(),
Expand Down
63 changes: 63 additions & 0 deletions pkg/ccl/backupccl/backup_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7326,6 +7326,69 @@ func TestCleanupDoesNotDeleteParentsWithChildObjects(t *testing.T) {
})
}

func TestManifestTooNew(t *testing.T) {
defer leaktest.AfterTest(t)()
defer log.Scope(t).Close(t)
_, _, sqlDB, rawDir, cleanupFn := BackupRestoreTestSetup(t, singleNode, 1, InitManualReplication)
defer cleanupFn()
sqlDB.Exec(t, `CREATE DATABASE r1`)
sqlDB.Exec(t, `BACKUP DATABASE r1 TO 'nodelocal://0/too_new'`)
sqlDB.Exec(t, `DROP DATABASE r1`)
// Prove we can restore.
sqlDB.Exec(t, `RESTORE DATABASE r1 FROM 'nodelocal://0/too_new'`)
sqlDB.Exec(t, `DROP DATABASE r1`)

// Load/deserialize the manifest so we can mess with it.
manifestPath := filepath.Join(rawDir, "too_new", backupManifestName)
manifestData, err := ioutil.ReadFile(manifestPath)
require.NoError(t, err)
manifestData, err = decompressData(manifestData)
require.NoError(t, err)
var backupManifest BackupManifest
require.NoError(t, protoutil.Unmarshal(manifestData, &backupManifest))

// Bump the version and write it back out to make it look newer.
backupManifest.ClusterVersion = roachpb.Version{Major: 99, Minor: 1}
manifestData, err = protoutil.Marshal(&backupManifest)
require.NoError(t, err)
require.NoError(t, ioutil.WriteFile(manifestPath, manifestData, 0644 /* perm */))
// Also write the checksum file to match the new manifest.
checksum, err := getChecksum(manifestData)
require.NoError(t, err)
require.NoError(t, ioutil.WriteFile(manifestPath+backupManifestChecksumSuffix, checksum, 0644 /* perm */))

// Verify we reject it.
sqlDB.ExpectErr(t, "backup from version 99.1 is newer than current version", `RESTORE DATABASE r1 FROM 'nodelocal://0/too_new'`)

// Bump the version down and write it back out to make it look older.
backupManifest.ClusterVersion = roachpb.Version{Major: 20, Minor: 2, Internal: 2}
manifestData, err = protoutil.Marshal(&backupManifest)
require.NoError(t, err)
require.NoError(t, ioutil.WriteFile(manifestPath, manifestData, 0644 /* perm */))
// Also write the checksum file to match the new manifest.
checksum, err = getChecksum(manifestData)
require.NoError(t, err)
require.NoError(t, ioutil.WriteFile(manifestPath+backupManifestChecksumSuffix, checksum, 0644 /* perm */))

// Prove we can restore again.
sqlDB.Exec(t, `RESTORE DATABASE r1 FROM 'nodelocal://0/too_new'`)
sqlDB.Exec(t, `DROP DATABASE r1`)

// Nil out the version to match an old backup that lacked it.
backupManifest.ClusterVersion = roachpb.Version{}
manifestData, err = protoutil.Marshal(&backupManifest)
require.NoError(t, err)
require.NoError(t, ioutil.WriteFile(manifestPath, manifestData, 0644 /* perm */))
// Also write the checksum file to match the new manifest.
checksum, err = getChecksum(manifestData)
require.NoError(t, err)
require.NoError(t, ioutil.WriteFile(manifestPath+backupManifestChecksumSuffix, checksum, 0644 /* perm */))
// Prove we can restore again.
sqlDB.Exec(t, `RESTORE DATABASE r1 FROM 'nodelocal://0/too_new'`)
sqlDB.Exec(t, `DROP DATABASE r1`)

}

// TestManifestBitFlip tests that we can detect a corrupt manifest when a bit
// was flipped on disk for both an unencrypted and an encrypted manifest.
func TestManifestBitFlip(t *testing.T) {
Expand Down
12 changes: 12 additions & 0 deletions pkg/ccl/backupccl/restore_planning.go
Original file line number Diff line number Diff line change
Expand Up @@ -1589,6 +1589,18 @@ func doRestorePlan(
return err
}

currentVersion := p.ExecCfg().Settings.Version.ActiveVersion(ctx)
for i := range mainBackupManifests {
if v := mainBackupManifests[i].ClusterVersion; v.Major != 0 {
// This is the "cluster" version that does not change between patches but
// rather just tracks migrations run. If the backup is more migrated than
// this cluster, then this cluster isn't ready to restore this backup.
if currentVersion.Less(v) {
return errors.Errorf("backup from version %s is newer than current version %s", v, currentVersion)
}
}
}

// Validate that the table coverage of the backup matches that of the restore.
// This prevents FULL CLUSTER backups to be restored as anything but full
// cluster restores and vice-versa.
Expand Down
2 changes: 1 addition & 1 deletion pkg/cmd/docgen/diagrams.go
Original file line number Diff line number Diff line change
Expand Up @@ -896,7 +896,7 @@ var specs = []stmtSpec{
replace: map[string]string{
"table_option": "table_name",
"insert_column_item": "column_name",
"import_format": "'CSV'",
"import_format": "( 'CSV' | 'AVRO' | 'DELIMITED' )",
"string_or_placeholder": "file_location",
"kv_option": "option '=' value"},
unlink: []string{"table_name", "column_name", "file_location", "option", "value"},
Expand Down
1 change: 1 addition & 0 deletions pkg/sql/stmtdiagnostics/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ go_test(
"//pkg/security",
"//pkg/security/securitytest",
"//pkg/server",
"//pkg/settings/cluster",
"//pkg/sql",
"//pkg/testutils",
"//pkg/testutils/serverutils",
Expand Down
3 changes: 3 additions & 0 deletions pkg/sql/stmtdiagnostics/stament_diagnostics_helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,6 @@ func (r *Registry) InsertRequestInternal(ctx context.Context, fprint string) (in
id, err := r.insertRequestInternal(ctx, fprint)
return int64(id), err
}

// PollingInterval is exposed to override in tests.
var PollingInterval = pollingInterval
8 changes: 8 additions & 0 deletions pkg/sql/stmtdiagnostics/statement_diagnostics_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,9 @@ import (
"github.com/cockroachdb/cockroach/pkg/keys"
"github.com/cockroachdb/cockroach/pkg/kv/kvserver"
"github.com/cockroachdb/cockroach/pkg/roachpb"
"github.com/cockroachdb/cockroach/pkg/settings/cluster"
"github.com/cockroachdb/cockroach/pkg/sql"
"github.com/cockroachdb/cockroach/pkg/sql/stmtdiagnostics"
"github.com/cockroachdb/cockroach/pkg/testutils"
"github.com/cockroachdb/cockroach/pkg/testutils/serverutils"
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
Expand Down Expand Up @@ -184,7 +186,13 @@ func TestChangePollInterval(t *testing.T) {
})
return seen
}
settings := cluster.MakeTestingClusterSettings()

// Set an extremely long initial polling interval to not hit flakes due to
// server startup taking more than 10s.
stmtdiagnostics.PollingInterval.Override(&settings.SV, time.Hour)
args := base.TestServerArgs{
Settings: settings,
Knobs: base.TestingKnobs{
Store: &kvserver.StoreTestingKnobs{
TestingRequestFilter: func(ctx context.Context, request roachpb.BatchRequest) *roachpb.Error {
Expand Down

0 comments on commit ba8f03e

Please sign in to comment.