Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storageccl: disable time-bound iteration optimization in BACKUP #36191

Merged
merged 1 commit into from
Mar 28, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Mar 26, 2019

Time-bound iterators are a performance optimization that allows us to
entirely skip over sstables in RocksDB that don't have data relevant to
the time bounds in a request.

This can have a dramatic impact on performance, but we've seen a number
of extremely subtle and hard to detect correctness issues with this (see
issues #28358 #34819). As a result, we've decided to skip the
optimization everywhere that it isn't absolutely necessary for the
feature to work (leaving one place: poller-based changefeeds, which are
being phased out anyway). This will both give increased confidance in
correctness as well as eliminate any need to consider and investigate
time-bound iterators when/if someone hits a correctness bug.

This commit introduces the plumbing necessary for an individual
ExportRequest to control whether time-bound iterators are allowed or
disallowed.

A bool is introduced to the ExportRequest proto that explictly allows
time-bound iterators. This means that in a rolling upgrade, it's
possible for an old changefeed-poller node to send a request without the
field set to a new node, which sees the unset field as false, disabling
the optimization. An alternative is to invert the semantics of the bool
(zero-value = enable, true = disable the optimization), but in case any
new uses of ExportRequest are introduced, I'd much prefer the zero-value
of this field be the safer default of disabled.

As part of the investigation for whether we could turn them off for
incremental BACKUP (#35671), I reran some of the initial measurements of
time-bound iterator impact on incremental backup. I installed tpcc-1000
on a 3 node n1-standard-16 cluster, then ran a full backup, then ran
load for 1 hour, noting the time T. With load running, I ran 6
incremental backups from T alternating between tbi and no-tbi:

tbi incremental backup runtimes: 3m45s,3m56s,4m6s
no-tbi incremental backup runtimes: 6m45s,6m27s,6m48s

Any impact on normal traffic (latencies etc) seemed to be in the noise.

Closes #35671.

Release note (enterprise change): In exchange for increased correctness
confidance, BACKUPs using INCREMENTAL FROM now take slightly longer.

@danhhz danhhz requested review from dt, tbg and a team March 26, 2019 22:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

`missspell` had a complaint or two.

Reviewed 6 of 6 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt)

Copy link
Member

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I could also see having a default-false cluster setting populate the flag in backup's requests, with a name like kv.backup.experimental_incremental_iterator.enabled -- that way if we encounter a situation where we go "huh, I wonder if TBI would behave significantly differently here?", it is cheaper/easier to find out than building a new binary (and potentially asking someone to deploy it).

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @dt)

Time-bound iterators are a performance optimization that allows us to
entirely skip over sstables in RocksDB that don't have data relevant to
the time bounds in a request.

This can have a dramatic impact on performance, but we've seen a number
of extremely subtle and hard to detect correctness issues with this (see
issues cockroachdb#28358 cockroachdb#34819). As a result, we've decided to skip the
optimization everywhere that it isn't absolutely necessary for the
feature to work (leaving one place: poller-based changefeeds, which are
being phased out anyway). This will both give increased confidance in
correctness as well as eliminate any need to consider and investigate
time-bound iterators when/if someone hits a correctness bug.

This commit introduces the plumbing necessary for an individual
ExportRequest to control whether time-bound iterators are allowed or
disallowed.

A bool is introduced to the ExportRequest proto that explictly allows
time-bound iterators. This means that in a rolling upgrade, it's
possible for an old changefeed-poller node to send a request without the
field set to a new node, which sees the unset field as false, disabling
the optimization. An alternative is to invert the semantics of the bool
(zero-value = enable, true = disable the optimization), but in case any
new uses of ExportRequest are introduced, I'd much prefer the zero-value
of this field be the safer default of disabled.

As part of the investigation for whether we could turn them off for
incremental BACKUP (cockroachdb#35671), I reran some of the initial measurements of
time-bound iterator impact on incremental backup. I installed tpcc-1000
on a 3 node n1-standard-16 cluster, then ran a full backup, then ran
load for 1 hour, noting the time T. With load running, I ran 6
incremental backups from T alternating between tbi and no-tbi:

    tbi incremental backup runtimes: 3m45s,3m56s,4m6s
    no-tbi incremental backup runtimes: 6m45s,6m27s,6m48s

Any impact on normal traffic (latencies etc) seemed to be in the noise.

Closes cockroachdb#35671.

Release note (enterprise change): In exchange for increased correctness
confidance, `BACKUP`s using `INCREMENTAL FROM` now take slightly longer.
@danhhz
Copy link
Contributor Author

danhhz commented Mar 28, 2019

misspell is a jerk

bors r=dt,tbg

I could also see having a default-false cluster setting populate the flag in backup's requests, with a name like kv.backup.experimental_incremental_iterator.enabled -- that way if we encounter a situation where we go "huh, I wonder if TBI would behave significantly differently here?", it is cheaper/easier to find out than building a new binary (and potentially asking someone to deploy it).

I like this idea but I'm hesitant. If we don't trust TBIs, then I'm not sure we should offer such an easy escape hatch to turn them back on. Part of the motivation for doing this is so we don't have to waste already scarce engineering resources debugging whether they could have been at fault in some given issue.

craig bot pushed a commit that referenced this pull request Mar 28, 2019
36191: storageccl: disable time-bound iteration optimization in BACKUP r=dt,tbg a=danhhz

Time-bound iterators are a performance optimization that allows us to
entirely skip over sstables in RocksDB that don't have data relevant to
the time bounds in a request.

This can have a dramatic impact on performance, but we've seen a number
of extremely subtle and hard to detect correctness issues with this (see
issues #28358 #34819). As a result, we've decided to skip the
optimization everywhere that it isn't absolutely necessary for the
feature to work (leaving one place: poller-based changefeeds, which are
being phased out anyway). This will both give increased confidance in
correctness as well as eliminate any need to consider and investigate
time-bound iterators when/if someone hits a correctness bug.

This commit introduces the plumbing necessary for an individual
ExportRequest to control whether time-bound iterators are allowed or
disallowed.

A bool is introduced to the ExportRequest proto that explictly allows
time-bound iterators. This means that in a rolling upgrade, it's
possible for an old changefeed-poller node to send a request without the
field set to a new node, which sees the unset field as false, disabling
the optimization. An alternative is to invert the semantics of the bool
(zero-value = enable, true = disable the optimization), but in case any
new uses of ExportRequest are introduced, I'd much prefer the zero-value
of this field be the safer default of disabled.

As part of the investigation for whether we could turn them off for
incremental BACKUP (#35671), I reran some of the initial measurements of
time-bound iterator impact on incremental backup. I installed tpcc-1000
on a 3 node n1-standard-16 cluster, then ran a full backup, then ran
load for 1 hour, noting the time T. With load running, I ran 6
incremental backups from T alternating between tbi and no-tbi:

    tbi incremental backup runtimes: 3m45s,3m56s,4m6s
    no-tbi incremental backup runtimes: 6m45s,6m27s,6m48s

Any impact on normal traffic (latencies etc) seemed to be in the noise.

Closes #35671.

Release note (enterprise change): In exchange for increased correctness
confidance, `BACKUP`s using `INCREMENTAL FROM` now take slightly longer.

Co-authored-by: Daniel Harrison <[email protected]>
@craig
Copy link
Contributor

craig bot commented Mar 28, 2019

Build succeeded

@craig craig bot merged commit 6b834b6 into cockroachdb:master Mar 28, 2019
@danhhz danhhz deleted the backup_notbi branch March 28, 2019 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants