-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storageccl: disable time-bound iteration optimization in BACKUP #36191
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 6 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @dt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could also see having a default-false cluster setting populate the flag in backup's requests, with a name like kv.backup.experimental_incremental_iterator.enabled
-- that way if we encounter a situation where we go "huh, I wonder if TBI would behave significantly differently here?", it is cheaper/easier to find out than building a new binary (and potentially asking someone to deploy it).
Reviewable status:
complete! 2 of 0 LGTMs obtained (waiting on @dt)
Time-bound iterators are a performance optimization that allows us to entirely skip over sstables in RocksDB that don't have data relevant to the time bounds in a request. This can have a dramatic impact on performance, but we've seen a number of extremely subtle and hard to detect correctness issues with this (see issues cockroachdb#28358 cockroachdb#34819). As a result, we've decided to skip the optimization everywhere that it isn't absolutely necessary for the feature to work (leaving one place: poller-based changefeeds, which are being phased out anyway). This will both give increased confidance in correctness as well as eliminate any need to consider and investigate time-bound iterators when/if someone hits a correctness bug. This commit introduces the plumbing necessary for an individual ExportRequest to control whether time-bound iterators are allowed or disallowed. A bool is introduced to the ExportRequest proto that explictly allows time-bound iterators. This means that in a rolling upgrade, it's possible for an old changefeed-poller node to send a request without the field set to a new node, which sees the unset field as false, disabling the optimization. An alternative is to invert the semantics of the bool (zero-value = enable, true = disable the optimization), but in case any new uses of ExportRequest are introduced, I'd much prefer the zero-value of this field be the safer default of disabled. As part of the investigation for whether we could turn them off for incremental BACKUP (cockroachdb#35671), I reran some of the initial measurements of time-bound iterator impact on incremental backup. I installed tpcc-1000 on a 3 node n1-standard-16 cluster, then ran a full backup, then ran load for 1 hour, noting the time T. With load running, I ran 6 incremental backups from T alternating between tbi and no-tbi: tbi incremental backup runtimes: 3m45s,3m56s,4m6s no-tbi incremental backup runtimes: 6m45s,6m27s,6m48s Any impact on normal traffic (latencies etc) seemed to be in the noise. Closes cockroachdb#35671. Release note (enterprise change): In exchange for increased correctness confidance, `BACKUP`s using `INCREMENTAL FROM` now take slightly longer.
misspell is a jerk bors r=dt,tbg
I like this idea but I'm hesitant. If we don't trust TBIs, then I'm not sure we should offer such an easy escape hatch to turn them back on. Part of the motivation for doing this is so we don't have to waste already scarce engineering resources debugging whether they could have been at fault in some given issue. |
36191: storageccl: disable time-bound iteration optimization in BACKUP r=dt,tbg a=danhhz Time-bound iterators are a performance optimization that allows us to entirely skip over sstables in RocksDB that don't have data relevant to the time bounds in a request. This can have a dramatic impact on performance, but we've seen a number of extremely subtle and hard to detect correctness issues with this (see issues #28358 #34819). As a result, we've decided to skip the optimization everywhere that it isn't absolutely necessary for the feature to work (leaving one place: poller-based changefeeds, which are being phased out anyway). This will both give increased confidance in correctness as well as eliminate any need to consider and investigate time-bound iterators when/if someone hits a correctness bug. This commit introduces the plumbing necessary for an individual ExportRequest to control whether time-bound iterators are allowed or disallowed. A bool is introduced to the ExportRequest proto that explictly allows time-bound iterators. This means that in a rolling upgrade, it's possible for an old changefeed-poller node to send a request without the field set to a new node, which sees the unset field as false, disabling the optimization. An alternative is to invert the semantics of the bool (zero-value = enable, true = disable the optimization), but in case any new uses of ExportRequest are introduced, I'd much prefer the zero-value of this field be the safer default of disabled. As part of the investigation for whether we could turn them off for incremental BACKUP (#35671), I reran some of the initial measurements of time-bound iterator impact on incremental backup. I installed tpcc-1000 on a 3 node n1-standard-16 cluster, then ran a full backup, then ran load for 1 hour, noting the time T. With load running, I ran 6 incremental backups from T alternating between tbi and no-tbi: tbi incremental backup runtimes: 3m45s,3m56s,4m6s no-tbi incremental backup runtimes: 6m45s,6m27s,6m48s Any impact on normal traffic (latencies etc) seemed to be in the noise. Closes #35671. Release note (enterprise change): In exchange for increased correctness confidance, `BACKUP`s using `INCREMENTAL FROM` now take slightly longer. Co-authored-by: Daniel Harrison <[email protected]>
Build succeeded |
Time-bound iterators are a performance optimization that allows us to
entirely skip over sstables in RocksDB that don't have data relevant to
the time bounds in a request.
This can have a dramatic impact on performance, but we've seen a number
of extremely subtle and hard to detect correctness issues with this (see
issues #28358 #34819). As a result, we've decided to skip the
optimization everywhere that it isn't absolutely necessary for the
feature to work (leaving one place: poller-based changefeeds, which are
being phased out anyway). This will both give increased confidance in
correctness as well as eliminate any need to consider and investigate
time-bound iterators when/if someone hits a correctness bug.
This commit introduces the plumbing necessary for an individual
ExportRequest to control whether time-bound iterators are allowed or
disallowed.
A bool is introduced to the ExportRequest proto that explictly allows
time-bound iterators. This means that in a rolling upgrade, it's
possible for an old changefeed-poller node to send a request without the
field set to a new node, which sees the unset field as false, disabling
the optimization. An alternative is to invert the semantics of the bool
(zero-value = enable, true = disable the optimization), but in case any
new uses of ExportRequest are introduced, I'd much prefer the zero-value
of this field be the safer default of disabled.
As part of the investigation for whether we could turn them off for
incremental BACKUP (#35671), I reran some of the initial measurements of
time-bound iterator impact on incremental backup. I installed tpcc-1000
on a 3 node n1-standard-16 cluster, then ran a full backup, then ran
load for 1 hour, noting the time T. With load running, I ran 6
incremental backups from T alternating between tbi and no-tbi:
Any impact on normal traffic (latencies etc) seemed to be in the noise.
Closes #35671.
Release note (enterprise change): In exchange for increased correctness
confidance,
BACKUP
s usingINCREMENTAL FROM
now take slightly longer.