Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: add RESTORE with schema_only #85231

Merged
merged 1 commit into from
Aug 9, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Jul 28, 2022

Fixes #83470

Release note (sql change): This pr adds the schema_only flag to RESTORE,
allowing a user to run a normal RESTORE, without restoring any user table data.
This can be used to quickly validate that a given backup is restorable. A
schema_only restore runtime is O(# of descriptors) which is a fraction of a
regular restore's runtime O(# of table rows).

Note that during a cluster level, schema_only restore, the system tables are
read from S3 and written to disk, as this provides important validation
coverage without much runtime cost (system tables should not be large).

After running a successful schema_only RESTORE, the user can revert the cluster
to its pre-restore state by simply dropping the descriptors the schema_only
restore added (e.g. if the user restored a database, they can drop the
database after the restore completes). Note that in the cluster level case, the
restored system data cannot be reverted, this shouldn't matter, as the cluster
was empty before hand.

For the Backup validation use case, RESTORE with schema_only provides near
total validation coverage. In other words, if a user's schema_only RESTORE
works, they can be quite confident that a real RESTORE will work. There's one
notable place schema_only RESTORE lacks coverage:

It doesn't read (or write) from any of the SSTs that store backed up user table
data. To ensure a Backup's SSTs are where the RESTORE cmd would expect them
to be, a user should run SHOW BACKUP ... with check_files. Further, in an
upcoming patch, another flag for RESTORE validation will be introduced --
the verify_backup_table_data flag -- which extends schema_only functionality
to read the table data from S3 and conduct checksums on it. Like with the
schema_only flag, no table data will be ingested into the cluster.

@msbutler msbutler self-assigned this Jul 28, 2022
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler msbutler force-pushed the butler-restore-schema-only branch 2 times, most recently from 210c7ce to 509396c Compare July 28, 2022 18:08
@msbutler msbutler marked this pull request as ready for review July 28, 2022 18:10
@msbutler msbutler requested a review from a team as a code owner July 28, 2022 18:10
@msbutler msbutler requested review from a team July 28, 2022 18:10
@msbutler msbutler requested review from a team as code owners July 28, 2022 18:10
@msbutler msbutler requested review from rhu713 and dt and removed request for a team July 28, 2022 18:10
@msbutler msbutler force-pushed the butler-restore-schema-only branch 3 times, most recently from 93bca7e to 968da3a Compare August 3, 2022 13:43
@msbutler
Copy link
Collaborator Author

msbutler commented Aug 3, 2022

@rhu713 @dt this is ready for a look! Right now, it can merge into master.

@dt would you prefer to wait to review until after #85492 lands?

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this really ended up being a tiny diff in the actual restore code -- which is great w.r.t coverage. Great stuff!

}
// The only table data restored during a schemaOnly restore are from system tables,
// which only get restored during a cluster restore.
if table.GetParentID() != keys.SystemDatabaseID && schemaOnly {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-minor nit: could flip these, both for readability (i.e. an stop reading if not schemaOnly) and I suppose to avoid a function call in the false case

Fixes cockroachdb#83470

Release note (sql change): This pr adds the schema_only flag to RESTORE,
allowing a user to run a normal RESTORE, without restoring any user table data.
This can be used to quickly validate that a given backup is restorable. A
schema_only restore runtime is O(# of descriptors) which is a fraction of a
regular restore's runtime O(# of table rows).

Note that during a cluster level, schema_only restore, the system tables are
read from S3 and written to disk, as this provides important validation
coverage without much runtime cost (system tables should not be large).

After running a successful schema_only RESTORE, the user can revert the cluster
to its pre-restore state by simply dropping the descriptors the schema_only
restore added (e.g. if the user restored a database, they can drop the
database after the restore completes). Note that in the cluster level case, the
restored system data cannot be reverted, this shouldn't matter, as the cluster
was empty before hand.

For the Backup validation use case, RESTORE with schema_only provides near
total validation coverage. In other words, if a user's schema_only RESTORE
works, they can be quite confident that a real RESTORE will work. There's one
notable place schema_only RESTORE lacks coverage:

It doesn't read (or write) from any of the SSTs that store backed up user table
data. To ensure a Backup's SSTs are where the RESTORE cmd would expect them
to be, a user should run SHOW BACKUP ... with check_files. Further, in an
upcoming patch, another flag for RESTORE validation will be introduced --
the verify_backup_table_data flag -- which extends schema_only functionality
to read the table data from S3 and conduct checksums on it. Like with the
schema_only flag, no table data will be ingested into the cluster.
@msbutler msbutler force-pushed the butler-restore-schema-only branch from 968da3a to b811739 Compare August 9, 2022 13:46
@msbutler
Copy link
Collaborator Author

msbutler commented Aug 9, 2022

TFTR!

bors r=dt

@craig
Copy link
Contributor

craig bot commented Aug 9, 2022

Build succeeded:

@craig craig bot merged commit 7c18668 into cockroachdb:master Aug 9, 2022
@msbutler msbutler deleted the butler-restore-schema-only branch August 11, 2022 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: add schema_only option to RESTORE
3 participants