Skip to content

Commit

Permalink
storageccl: disable time-bound iteration optimization in BACKUP
Browse files Browse the repository at this point in the history
Time-bound iterators are a performance optimization that allows us to
entirely skip over sstables in RocksDB that don't have data relevant to
the time bounds in a request.

This can have a dramatic impact on performance, but we've seen a number
of extremely subtle and hard to detect correctness issues with this (see
issues cockroachdb#28358 cockroachdb#34819). As a result, we've decided to skip the
optimization everywhere that it isn't absolutely necessary for the
feature to work (leaving one place: poller-based changefeeds, which are
being phased out anyway). This will both give increased confidance in
correctness as well as eliminate any need to consider and investigate
time-bound iterators when/if someone hits a correctness bug.

This commit introduces the plumbing necessary for an individual
ExportRequest to control whether time-bound iterators are allowed or
disallowed.

A bool is introduced to the ExportRequest proto that explictly allows
time-bound iterators. This means that in a rolling upgrade, it's
possible for an old changefeed-poller node to send a request without the
field set to a new node, which sees the unset field as false, disabling
the optimization. An alternative is to invert the semantics of the bool
(zero-value = enable, true = disable the optimization), but in case any
new uses of ExportRequest are introduced, I'd much prefer the zero-value
of this field be the safer default of disabled.

As part of the investigation for whether we could turn them off for
incremental BACKUP (cockroachdb#35671), I reran some of the initial measurements of
time-bound iterator impact on incremental backup. I installed tpcc-1000
on a 3 node n1-standard-16 cluster, then ran a full backup, then ran
load for 1 hour, noting the time T. With load running, I ran 6
incremental backups from T alternating between tbi and no-tbi:

    tbi incremental backup runtimes: 3m45s,3m56s,4m6s
    no-tbi incremental backup runtimes: 6m45s,6m27s,6m48s

Any impact on normal traffic (latencies etc) seemed to be in the noise.

Closes cockroachdb#35671.

Release note (enterprise change): In exchange for increased correctness
confidance, `BACKUP`s using `INCREMENTAL FROM` now take slightly longer.
  • Loading branch information
danhhz committed Mar 28, 2019
1 parent ac26449 commit 59d3635
Show file tree
Hide file tree
Showing 6 changed files with 459 additions and 375 deletions.
11 changes: 6 additions & 5 deletions pkg/ccl/changefeedccl/poller.go
Original file line number Diff line number Diff line change
Expand Up @@ -471,11 +471,12 @@ func (p *poller) exportSpan(

header := roachpb.Header{Timestamp: end}
req := &roachpb.ExportRequest{
RequestHeader: roachpb.RequestHeaderFromSpan(span),
StartTime: start,
MVCCFilter: roachpb.MVCCFilter_All,
ReturnSST: true,
OmitChecksum: true,
RequestHeader: roachpb.RequestHeaderFromSpan(span),
StartTime: start,
MVCCFilter: roachpb.MVCCFilter_All,
ReturnSST: true,
OmitChecksum: true,
EnableTimeBoundIteratorOptimization: true,
}
if isFullScan {
req.MVCCFilter = roachpb.MVCCFilter_Latest
Expand Down
3 changes: 3 additions & 0 deletions pkg/ccl/changefeedccl/table_history.go
Original file line number Diff line number Diff line change
Expand Up @@ -236,6 +236,9 @@ func fetchTableDescriptorVersions(
MVCCFilter: roachpb.MVCCFilter_All,
ReturnSST: true,
OmitChecksum: true,
// TODO(dan): Remove this in a PR separate from the one that disables
// time-bound iterators for BACKUP.
EnableTimeBoundIteratorOptimization: true,
}
res, pErr := client.SendWrappedWith(ctx, db.NonTransactionalSender(), header, req)
if log.V(2) {
Expand Down
67 changes: 40 additions & 27 deletions pkg/ccl/storageccl/engineccl/mvcc.go
Original file line number Diff line number Diff line change
Expand Up @@ -69,41 +69,49 @@ var _ engine.SimpleIterator = &MVCCIncrementalIterator{}

// IterOptions bundles options for NewMVCCIncrementalIterator.
type IterOptions struct {
StartTime hlc.Timestamp
EndTime hlc.Timestamp
UpperBound roachpb.Key
WithStats bool
StartTime hlc.Timestamp
EndTime hlc.Timestamp
UpperBound roachpb.Key
WithStats bool
EnableTimeBoundIteratorOptimization bool
}

// NewMVCCIncrementalIterator creates an MVCCIncrementalIterator with the
// specified engine and options.
func NewMVCCIncrementalIterator(e engine.Reader, opts IterOptions) *MVCCIncrementalIterator {
io := engine.IterOptions{
UpperBound: opts.UpperBound,
WithStats: opts.WithStats,
}

// Time-bound iterators only make sense to use if the start time is set.
var sanityIter engine.Iterator
if opts.EnableTimeBoundIteratorOptimization && !opts.StartTime.IsEmpty() {
// The call to startTime.Next() converts our exclusive start bound into the
// inclusive start bound that MinTimestampHint expects. This is strictly a
// performance optimization; omitting the call would still return correct
// results.
io.MinTimestampHint = opts.StartTime.Next()
io.MaxTimestampHint = opts.EndTime
// It is necessary for correctness that sanityIter be created before iter.
// This is because the provided Reader may not be a consistent snapshot, so
// the two could end up observing different information. The hack around
// sanityCheckMetadataKey only works properly if all possible discrepancies
// between the two iterators lead to intents and values falling outside of
// the timestamp range **from iter's perspective**. This allows us to simply
// ignore discrepancies that we notice in advance(). See #34819.
sanityIter = e.NewIterator(engine.IterOptions{
UpperBound: opts.UpperBound,
})
}

return &MVCCIncrementalIterator{
e: e,
upperBound: opts.UpperBound,
// It is necessary for correctness that sanityIter be created before
// iter. This is because the provided Reader may not be a consistent
// snapshot, so the two could end up observing different information.
// The hack around sanityCheckMetadataKey only works properly if all
// possible discrepancies between the two iterators lead to intents
// and values falling outside of the timestamp range **from iter's
// perspective**. This allows us to simply ignore discrepancies that
// we notice in advance(). See #34819.
sanityIter: e.NewIterator(engine.IterOptions{
UpperBound: opts.UpperBound,
}),
iter: e.NewIterator(engine.IterOptions{
// The call to startTime.Next() converts our exclusive start bound into
// the inclusive start bound that MinTimestampHint expects. This is
// strictly a performance optimization; omitting the call would still
// return correct results.
MinTimestampHint: opts.StartTime.Next(),
MaxTimestampHint: opts.EndTime,
UpperBound: opts.UpperBound,
WithStats: opts.WithStats,
}),
startTime: opts.StartTime,
endTime: opts.EndTime,
iter: e.NewIterator(io),
startTime: opts.StartTime,
endTime: opts.EndTime,
sanityIter: sanityIter,
}
}

Expand Down Expand Up @@ -225,6 +233,11 @@ func (i *MVCCIncrementalIterator) advance() {
// sees that exact key. Otherwise, it returns false. It's used in the workaround
// in `advance` for a time-bound iterator bug.
func (i *MVCCIncrementalIterator) sanityCheckMetadataKey() ([]byte, bool, error) {
if i.sanityIter == nil {
// If sanityIter is not set, it's because we're not using time-bound
// iterator and we don't need the sanity check.
return i.iter.UnsafeValue(), true, nil
}
unsafeKey := i.iter.UnsafeKey()
i.sanityIter.Seek(unsafeKey)
if ok, err := i.sanityIter.Valid(); err != nil {
Expand Down
7 changes: 4 additions & 3 deletions pkg/ccl/storageccl/export.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,9 +116,10 @@ func evalExport(
// TODO(dan): Move all this iteration into cpp to avoid the cgo calls.
// TODO(dan): Consider checking ctx periodically during the MVCCIterate call.
iter := engineccl.NewMVCCIncrementalIterator(batch, engineccl.IterOptions{
StartTime: args.StartTime,
EndTime: h.Timestamp,
UpperBound: args.EndKey,
StartTime: args.StartTime,
EndTime: h.Timestamp,
UpperBound: args.EndKey,
EnableTimeBoundIteratorOptimization: args.EnableTimeBoundIteratorOptimization,
})
defer iter.Close()
for iter.Seek(engine.MakeMVCCMetadataKey(args.Key)); ; iterFn(iter) {
Expand Down
Loading

0 comments on commit 59d3635

Please sign in to comment.