-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: handle range keys in BACKUP #84875
Conversation
2 questions:
|
hmmm I need to think a bit harder about potentially changing |
We have an As with all version gates, though, it's left to client discipline to check the gate, but we have some additional protection in that the Pebble database also needs to be upgraded to support range keys (in the Also, on the restore path I suppose we might want to check this, to prevent anyone from taking a backup in a 22.2 cluster and then restoring into an unfinalized mixed-version cluster. We should have a test for this, to make sure we don't just silently ignore the range keys in this case, or something. What about restoring into a previous version, do we even support that? I hope not. :/ |
1026431
to
9579303
Compare
The restore will immediately fail if the restoring cluster is at lower mixed / finalized version of the backup! (#61737). So, given that range keys will not be written until the cluster has finalized to 22.1, I don't think further version gates are necessary in BACKUP/RESTORE. |
9579303
to
34bc85d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good at a high level. I'll defer to bulk on the overall review.
if err != nil { | ||
// To speed up SST reading, surface all the point keys first, flush, | ||
// then surface all the range keys and flush. | ||
if err := s.copyPointKeys(resp.dataSST); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me a little sad we have to open up an iterator on this twice, and read the footer, the index, etc again just to change the iter opts.
I wonder if it'd make sense to include "OnlyPointKeys" in the ExportResponse or something the client could use as an optimization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the infrastructure in place to reconfigure an iterator, just need to expose the API for it. Want me to submit a PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait is the idea to add KeyTypes: storage.IterNonInterleavedPointAndRangeKeys,
i.e. a single iterator will surface all point keys and then all range keys in the given span? That seems great to me.
I'm not sure if ExportRequest needs to change, but I'll leave that decision to Erik.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I'm unsure how big a perf improvement we would get. I'm going to benchmark the current PR, as is, to understand how big of a regression there is by introducing these two iterators here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, I think David's idea is for the ExportResponse
to contain information about whether the SST contains point and/or range keys, and my suggestion is to export MVCCIterator.SetOptions()
so you can do iter.SetOptions(IterOptions{KeyTypes: IterKeyTypeRangesOnly}))
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aha, makes sense! That seems like a worthwhile quick win -- especially for backups without revision history, which will never export range keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly, if we somehow know we didn't export a range key, we could just skip re-opening/reconfiguring the iterator entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that we're already tracking that metadata in the SST itself though. Can we get that from the SST reader @jbowens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether or not the table has a range key? Yeah, there's a sstable property recording the count. We can test reader.Properties.NumRangeKeys() > 0
34bc85d
to
189fca7
Compare
Previously BACKUP would not back up range tombstones. With this patch, BACKUPs with revision_history will backup range tombstones. Non-revision history backups are not affected by this diff because MVCCExportToSST filters all tombstones out of the backup already. Specifically, this patch replaces the iterators used in the backup_processor with the pebbleIterator, which has baked in range key support. This refactor introduces a 5% regression in backup runtime, even when the backup has no range keys, though cockroachdb#83051 hopes to address this gap. See details below on the benchmark experiment. At this point a backup with range keys is restorable, thanks to cockroachdb#84214. Note that the restore codebase still touches iterators that are not range key aware. This is not a problem because restored data does not have range keys, nor do the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher and in CheckSSTConflicts) will be updated when cockroachdb#70428 gets fixed. Fixes cockroachdb#71155 Release note: none To benchmark this diff, the following commands were used on the following sha a5ccdc3, with and without this commit, over three trials: ``` roachprod create -n 5 --gce-machine-type=n2-standard-16 $CLUSTER roachprod put $CLUSTER [build] cockroach roachprod wipe $CLUSTER; roachprod start $CLUSTER; roachprod run $CLUSTER:1 -- "./cockroach workload init bank --rows 1000000000" roachprod sql $CLUSTER:1 -- -e "BACKUP INTO 'gs://somebucket/michael-rangkey?AUTH=implicit'" ``` The backup on the control binary took on average 478 seconds with a stdev of 13 seconds, while the backup with the treatment binary took on average 499 seconds with stddev of 8 seconds.
189fca7
to
d5de88d
Compare
Just updated the commit message with details on the benchmarking I did for this commit. For a backup without range keys, this diff results in a 5% regression in backup runtime performance. Unless anyone objects, I'll merge this and will rerun the benchmarks once the pebbleIterator speeds up in the next few weeks. |
TFTRS! bors r=erikgrinaker |
Build failed (retrying...): |
Build succeeded: |
Previously BACKUP would not back up range tombstones. With this patch, BACKUPs
with revision_history will backup range tombstones. Non-revision history backups
are not affected by this diff because MVCCExportToSST filters all tombstones
out of the backup already.
Specifically, this patch replaces the iterators used in the backup_processor
with the pebbleIterator, which has baked in range key support. This refactor
introduces a 5% regression in backup runtime, even when the backup has no range
keys, though #83051 hopes to address this gap. See details below on the
benchmark experiment.
At this point a backup with range keys is restorable, thanks to #84214. Note
that the restore codebase still touches iterators that are not range key aware.
This is not a problem because restored data does not have range keys, nor do
the empty ranges restore dumps data into. These iterators (e.g. in SSTBatcher
and in CheckSSTConflicts) will be updated when #70428 gets fixed.
Fixes #71155
Release note: none
To benchmark this diff, the following commands were used on the following sha
a5ccdc3, with and without this commit, over
three trials:
The backup on the control binary took on average 478 seconds with a stdev of 13
seconds, while the backup with the treatment binary took on average 499 seconds
with stddev of 8 seconds.