-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: add RESTORE with schema_only #85231
Conversation
210c7ce
to
509396c
Compare
93bca7e
to
968da3a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this really ended up being a tiny diff in the actual restore code -- which is great w.r.t coverage. Great stuff!
} | ||
// The only table data restored during a schemaOnly restore are from system tables, | ||
// which only get restored during a cluster restore. | ||
if table.GetParentID() != keys.SystemDatabaseID && schemaOnly { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-minor nit: could flip these, both for readability (i.e. an stop reading if not schemaOnly) and I suppose to avoid a function call in the false case
Fixes cockroachdb#83470 Release note (sql change): This pr adds the schema_only flag to RESTORE, allowing a user to run a normal RESTORE, without restoring any user table data. This can be used to quickly validate that a given backup is restorable. A schema_only restore runtime is O(# of descriptors) which is a fraction of a regular restore's runtime O(# of table rows). Note that during a cluster level, schema_only restore, the system tables are read from S3 and written to disk, as this provides important validation coverage without much runtime cost (system tables should not be large). After running a successful schema_only RESTORE, the user can revert the cluster to its pre-restore state by simply dropping the descriptors the schema_only restore added (e.g. if the user restored a database, they can drop the database after the restore completes). Note that in the cluster level case, the restored system data cannot be reverted, this shouldn't matter, as the cluster was empty before hand. For the Backup validation use case, RESTORE with schema_only provides near total validation coverage. In other words, if a user's schema_only RESTORE works, they can be quite confident that a real RESTORE will work. There's one notable place schema_only RESTORE lacks coverage: It doesn't read (or write) from any of the SSTs that store backed up user table data. To ensure a Backup's SSTs are where the RESTORE cmd would expect them to be, a user should run SHOW BACKUP ... with check_files. Further, in an upcoming patch, another flag for RESTORE validation will be introduced -- the verify_backup_table_data flag -- which extends schema_only functionality to read the table data from S3 and conduct checksums on it. Like with the schema_only flag, no table data will be ingested into the cluster.
968da3a
to
b811739
Compare
TFTR! bors r=dt |
Build succeeded: |
Fixes #83470
Release note (sql change): This pr adds the schema_only flag to RESTORE,
allowing a user to run a normal RESTORE, without restoring any user table data.
This can be used to quickly validate that a given backup is restorable. A
schema_only restore runtime is O(# of descriptors) which is a fraction of a
regular restore's runtime O(# of table rows).
Note that during a cluster level, schema_only restore, the system tables are
read from S3 and written to disk, as this provides important validation
coverage without much runtime cost (system tables should not be large).
After running a successful schema_only RESTORE, the user can revert the cluster
to its pre-restore state by simply dropping the descriptors the schema_only
restore added (e.g. if the user restored a database, they can drop the
database after the restore completes). Note that in the cluster level case, the
restored system data cannot be reverted, this shouldn't matter, as the cluster
was empty before hand.
For the Backup validation use case, RESTORE with schema_only provides near
total validation coverage. In other words, if a user's schema_only RESTORE
works, they can be quite confident that a real RESTORE will work. There's one
notable place schema_only RESTORE lacks coverage:
It doesn't read (or write) from any of the SSTs that store backed up user table
data. To ensure a Backup's SSTs are where the RESTORE cmd would expect them
to be, a user should run SHOW BACKUP ... with check_files. Further, in an
upcoming patch, another flag for RESTORE validation will be introduced --
the verify_backup_table_data flag -- which extends schema_only functionality
to read the table data from S3 and conduct checksums on it. Like with the
schema_only flag, no table data will be ingested into the cluster.