Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: handle range keys in BACKUP #84875

Merged
merged 1 commit into from
Jul 27, 2022

Conversation

msbutler
Copy link
Collaborator

@msbutler msbutler commented Jul 21, 2022

Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support. This refactor
introduces a 5% regression in backup runtime, even when the backup has no range
keys, though #83051 hopes to address this gap. See details below on the
benchmark experiment.

At this point a backup with range keys is restorable, thanks to #84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when #70428 gets fixed.

Fixes #71155

Release note: none

To benchmark this diff, the following commands were used on the following sha
a5ccdc3, with and without this commit, over
three trials:

roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER
roachprod put $CLUSTER [build] cockroach

roachprod wipe $CLUSTER; roachprod start $CLUSTER;
roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000"
roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'"

The backup on the control binary took on average 478 seconds with a stdev of 13
seconds, while the backup with the treatment binary took on average 499 seconds
with stddev of 8 seconds.

@msbutler msbutler requested a review from a team as a code owner July 21, 2022 20:33
@msbutler msbutler requested a review from a team July 21, 2022 20:33
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@msbutler
Copy link
Collaborator Author

msbutler commented Jul 21, 2022

2 questions:

  • should we merge before benchmarking?
  • should this diff hide behind some version gate? Is there any chance of a mixed version backup finding any range tombstones?

@msbutler
Copy link
Collaborator Author

msbutler commented Jul 22, 2022

hmmm I need to think a bit harder about potentially changing kvclient.GetAllRevisions. If some other system is going to be issuing range tombstones on the descriptor table, then kvclient.GetAllRevisions needs to be changed.

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Jul 22, 2022

should this diff hide behind some version gate? Is there any chance of a mixed version backup finding any range tombstones?

We have an MVCCRangeTombstones version gate, which should prevent callers from writing MVCC range tombstones until the cluster is upgraded, so I don't think we need any additional version gates.

As with all version gates, though, it's left to client discipline to check the gate, but we have some additional protection in that the Pebble database also needs to be upgraded to support range keys (in the EnablePebbleFormatVersionRangeKeys migration), so any premature range key writes are likely to fail anyway.

Also, on the restore path I suppose we might want to check this, to prevent anyone from taking a backup in a 22.2 cluster and then restoring into an unfinalized mixed-version cluster. We should have a test for this, to make sure we don't just silently ignore the range keys in this case, or something.

What about restoring into a previous version, do we even support that? I hope not. :/

@msbutler msbutler force-pushed the butler-mvcc-backup branch from 1026431 to 9579303 Compare July 22, 2022 17:22
@msbutler
Copy link
Collaborator Author

msbutler commented Jul 22, 2022

The restore will immediately fail if the restoring cluster is at lower mixed / finalized version of the backup! (#61737). So, given that range keys will not be written until the cluster has finalized to 22.1, I don't think further version gates are necessary in BACKUP/RESTORE.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good at a high level. I'll defer to bulk on the overall review.

if err != nil {
// To speed up SST reading, surface all the point keys first, flush,
// then surface all the range keys and flush.
if err := s.copyPointKeys(resp.dataSST); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me a little sad we have to open up an iterator on this twice, and read the footer, the index, etc again just to change the iter opts.

I wonder if it'd make sense to include "OnlyPointKeys" in the ExportResponse or something the client could use as an optimization?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the infrastructure in place to reconfigure an iterator, just need to expose the API for it. Want me to submit a PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait is the idea to add KeyTypes: storage.IterNonInterleavedPointAndRangeKeys, i.e. a single iterator will surface all point keys and then all range keys in the given span? That seems great to me.

I'm not sure if ExportRequest needs to change, but I'll leave that decision to Erik.

Copy link
Collaborator Author

@msbutler msbutler Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I'm unsure how big a perf improvement we would get. I'm going to benchmark the current PR, as is, to understand how big of a regression there is by introducing these two iterators here.

Copy link
Contributor

@erikgrinaker erikgrinaker Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I think David's idea is for the ExportResponse to contain information about whether the SST contains point and/or range keys, and my suggestion is to export MVCCIterator.SetOptions() so you can do iter.SetOptions(IterOptions{KeyTypes: IterKeyTypeRangesOnly})).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, makes sense! That seems like a worthwhile quick win -- especially for backups without revision history, which will never export range keys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly, if we somehow know we didn't export a range key, we could just skip re-opening/reconfiguring the iterator entirely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that we're already tracking that metadata in the SST itself though. Can we get that from the SST reader @jbowens?

Copy link
Collaborator

@jbowens jbowens Jul 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether or not the table has a range key? Yeah, there's a sstable property recording the count. We can test reader.Properties.NumRangeKeys() > 0

pkg/kv/kvclient/revision_reader.go Show resolved Hide resolved
@msbutler msbutler force-pushed the butler-mvcc-backup branch from 34bc85d to 189fca7 Compare July 26, 2022 11:15
Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.

Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support. This refactor
introduces a 5% regression in backup runtime, even when the backup has no range
keys, though cockroachdb#83051 hopes to address this gap. See details below on the
benchmark experiment.

At this point a backup with range keys is restorable, thanks to cockroachdb#84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when cockroachdb#70428 gets fixed.

Fixes cockroachdb#71155

Release note: none

To benchmark this diff, the following commands were used on the following sha
a5ccdc3, with and without this commit, over
three trials:
```
roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER
roachprod put $CLUSTER [build] cockroach

roachprod wipe $CLUSTER; roachprod start $CLUSTER;
roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000"
roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'"
```

The backup on the control binary took on average 478 seconds with a stdev of 13
seconds, while the backup with the treatment binary took on average 499 seconds
with stddev of 8 seconds.
@msbutler msbutler force-pushed the butler-mvcc-backup branch from 189fca7 to d5de88d Compare July 26, 2022 20:41
@msbutler
Copy link
Collaborator Author

Just updated the commit message with details on the benchmarking I did for this commit. For a backup without range keys, this diff results in a 5% regression in backup runtime performance. Unless anyone objects, I'll merge this and will rerun the benchmarks once the pebbleIterator speeds up in the next few weeks.

@msbutler
Copy link
Collaborator Author

TFTRS!

bors r=erikgrinaker

@craig
Copy link
Contributor

craig bot commented Jul 27, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 27, 2022

Build succeeded:

@craig craig bot merged commit 811ca1f into cockroachdb:master Jul 27, 2022
@msbutler msbutler deleted the butler-mvcc-backup branch July 30, 2022 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backup: MVCC Bulk Operations Are Missing from Backups
5 participants