Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: don't apply ExcludeDataFromBackup outside config bounds #120188

Merged
merged 3 commits into from
Mar 20, 2024

Conversation

stevendanna
Copy link
Collaborator

@stevendanna stevendanna commented Mar 11, 2024

Some span configuration updates result in a replica split. However, between the new span configuration being set on the replica and the split happening, the span configuration stored on the replica may be incorrect for a portion of the replica's keyspace.

This is a general problem with asyncronous span configuration application. But, here we introduce a specific fix for the ExcludeDataFromBackup span configuration item.

When setting the span configuration, we now also store the bounds of that span configuration. ExcludeDataFromBackup only returns true if the data being considered for exclusion is contained by those bounds.

Further, we only set DataExcludedFromBackup to true in
BatchTimestampBeforeGCError if the span of the batch that generated
the error is contained by the span configuration bound. This field is
used to ignore BatchTimestampBeforeGCErrors. As a result, not setting
it to true could result in BACKUP failures until the range
splits. However, the alternative is silently ignoring a possible real
error.

Note that this doesn't solve all problems related to DataExcludedFromBackup. For instance, span configuration propagation is asyncronous and thus data for a given table may continue to be excluded from backups for some period after ALTER TABLE <TABLE> SET (exclude_data_from_backup = false) has been set.

Fixes #120122

Release note (bug fix): Fix a rare condition in which a BACKUP taken shortly after ALTER TABLE

SET (exclude_data_from_backup = true) could result in data from an unrelated table being excluded from a backup.

@stevendanna stevendanna requested review from a team as code owners March 11, 2024 01:58
@stevendanna stevendanna requested a review from a team March 11, 2024 01:58
@stevendanna stevendanna requested a review from a team as a code owner March 11, 2024 01:58
@stevendanna stevendanna requested review from dt and removed request for a team March 11, 2024 01:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@stevendanna stevendanna force-pushed the span-config-bounds-check branch from fd25b00 to 02ea6bc Compare March 11, 2024 10:20
@stevendanna stevendanna requested a review from arulajmani March 11, 2024 16:41
Comment on lines 1248 to 1249
// GetSpanConfigForKeyWithBounds is added for the spanconfig.StoreReader interface, required for
// SpanConfigConformanceReport.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetSpanConfigForKeyWithBounds is added for the spanconfig.StoreReader interface, required for
// SpanConfigConformanceReport.
// GetSpanConfigForKeyWithBounds is added for the spanconfig.StoreReader interface.

Comment on lines 346 to 348
// GetSpanConfigForKeyWithBounds looks of the span config for the given
// key. It's part of spanconfig.StoreReader interface. Note that it is only
// usable for the system tenant config.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// GetSpanConfigForKeyWithBounds looks of the span config for the given
// key. It's part of spanconfig.StoreReader interface. Note that it is only
// usable for the system tenant config.
// GetSpanConfigForKeyWithBounds returns the span config for the given
// key and the bounds that span the configuration applies to. It's part of spanconfig.StoreReader interface. Note that it is only
// usable for the system tenant config.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo nits

Reviewed 25 of 25 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @dt and @stevendanna)


pkg/config/system.go line 348 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…
// GetSpanConfigForKeyWithBounds returns the span config for the given
// key and the bounds that span the configuration applies to. It's part of spanconfig.StoreReader interface. Note that it is only
// usable for the system tenant config.

Updated comment looks good to me


pkg/kv/kvserver/replica.go line 1230 at r1 (raw file):

func (r *Replica) excludeReplicaFromBackupRLocked(rspan roachpb.RSpan) bool {
	if r.mu.conf.ExcludeDataFromBackup {
		// If ExcludeDataFromBackup is set, we also want to ensure that

nit: consider de-duplicating some of the logic shared here and above.


pkg/spanconfig/spanconfig.go line 292 at r1 (raw file):

	ComputeSplitKey(ctx context.Context, start, end roachpb.RKey) (roachpb.RKey, error)
	GetSpanConfigForKey(ctx context.Context, key roachpb.RKey) (roachpb.SpanConfig, error)
	GetSpanConfigForKeyWithBounds(ctx context.Context, key roachpb.RKey) (roachpb.SpanConfig, roachpb.Span, error)

nit: instead of having these two methods, should we instead have GetSpanConfigForKey return the span it applies to as well? Callers that don't care about the span are free to ignore it.

Separately, consider adding a comment here and/or explicitly naming the second return value to something like "appliesTo".

@stevendanna stevendanna force-pushed the span-config-bounds-check branch from 02ea6bc to cb34c24 Compare March 19, 2024 10:52
Copy link
Collaborator Author

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani and @dt)


pkg/kv/kvserver/replica.go line 1230 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: consider de-duplicating some of the logic shared here and above.

Done.


pkg/spanconfig/spanconfig.go line 292 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: instead of having these two methods, should we instead have GetSpanConfigForKey return the span it applies to as well? Callers that don't care about the span are free to ignore it.

Separately, consider adding a comment here and/or explicitly naming the second return value to something like "appliesTo".

Done and done.

@stevendanna stevendanna force-pushed the span-config-bounds-check branch from cb34c24 to 2811280 Compare March 19, 2024 10:54
Some span configuration updates result in a replica split. However,
between the new span configuration being set on the replica and the
split happening, the span configuration stored on the replica may be
incorrect for a portion of the replica's keyspace.

This is a general problem with asyncronous span configuration
application. But, here we introduce a specific fix for the
ExcludeDataFromBackup span configuration item.

When setting the span configuration, we now also store the bounds of
that span configuration. ExcludeDataFromBackup only returns true if
the data being considered for exclusion is contained by those bounds.

Further, we only set DataExcludedFromBackup to true in
BatchTimestampBeforeGCError if the span of the batch that generated
the error is contained by the span configuration bound. This field is
used to ignore BatchTimestampBeforeGCErrors. As a result, not setting
it to true could result in BACKUP failures until the range
splits. However, the alternaative is silently ignoring a possible real
error.

Note that this doesn't solve all problems related to
DataExcludedFromBackup. For instance, span configuration propagation
is asyncronous and thus data for a given table may continue to be
excluded from backups for some period after `ALTER TABLE <TABLE> SET
(exclude_data_from_backup = false)` has been set.

Fixes cockroachdb#120122

Release note (bug fix): Fix a rare condition in which a BACKUP taken
shortly after ALTER TABLE <TABLE> SET (exclude_data_from_backup =
true) could result in data from an unrelated table being excluded from
a backup.
@stevendanna stevendanna force-pushed the span-config-bounds-check branch from 2811280 to e798a48 Compare March 19, 2024 17:33
@stevendanna
Copy link
Collaborator Author

bors r=arulajmani

@craig
Copy link
Contributor

craig bot commented Mar 20, 2024

@craig craig bot merged commit c4f5b9b into cockroachdb:master Mar 20, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

exclude_data_from_backup may erroneously be applied to newly-created table
3 participants