-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: err on snapshot of large ranges #7788
Conversation
Review status: 0 of 4 files reviewed at latest revision, 1 unresolved discussion, some commit checks failed. storage/replica_raftstorage.go, line 260 [r2] (raw file):
I was imagining we'd only do this if the snapshot size was excessive (i.e 3-4x the target range size), but perhaps it is always better to delay a snapshot if splitting needs to be done. @bdarnell? Comments from Reviewable |
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 1 of 1 files at r3. storage/replica_raftstorage.go, line 258 [r2] (raw file):
can be below the size check storage/replica_raftstorage_test.go, line 34 [r2] (raw file):
isn't storage/replica_raftstorage_test.go, line 70 [r2] (raw file):
small nit: check that this is the right error, not just non-nil Comments from Reviewable |
Review status: 2 of 4 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 258 [r2] (raw file):
|
Reviewed 2 of 2 files at r4. storage/replica_raftstorage_test.go, line 73 [r4] (raw file):
log the unexpected error Comments from Reviewable |
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage_test.go, line 85 [r4] (raw file):
remove? Comments from Reviewable |
Reviewed 1 of 1 files at r1, 1 of 3 files at r2, 1 of 2 files at r4, 1 of 1 files at r5. storage/replica_raftstorage.go, line 260 [r2] (raw file):
|
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 260 [r2] (raw file):
|
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 260 [r2] (raw file):
|
Reviewed 1 of 1 files at r5. Comments from Reviewable |
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 260 [r2] (raw file):
|
Review status: all files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 260 [r2] (raw file):
|
457d172
to
a5e1af3
Compare
Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. storage/replica_raftstorage.go, line 260 [r2] (raw file):
|
Review status: 0 of 5 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed. Comments from Reviewable |
4659d9f
to
f13a7d3
Compare
If a range needs to be split, return an err rather than attempting to generate a snapshot. This avoids generating excessively large snapshots. Suggested in cockroachdb#7581.
In a privately reported user issue, we've seen that [our attempts](cockroachdb#7788) at [preventing large snapshots](cockroachdb#7581) can result in replica unavailability. Our current approach to limiting large snapshots assumes is that its ok to block snapshots indefinitely while waiting for a range to first split. Unfortunately, this can create a dependency cycle where a range requires a snapshot to split (because it can't achieve an up-to-date quorum without it) but isn't allowed to perform a snapshot until its size is reduced below the threshold. This can result in unavailability even when a majority of replicas remain live. Currently, we still need this snapshot size limit because unbounded snapshots can result in OOM errors that crash entire nodes. However, once snapshots are streamed from disk to disk, never needing to buffer in-memory on the sending or receiving side, we should be able to remove any snapshot size limit (see cockroachdb#16954). As a holdover, this change introduces a `permitLargeSnapshots` flag on a replica which is set when the replica is too large to snapshot but observes splits failing. When set, the flag allows snapshots to ignore the size limit until the snapshot goes through and splits are able to succeed again. Release note: None
In a privately reported user issue, we've seen that [our attempts](cockroachdb#7788) at [preventing large snapshots](cockroachdb#7581) can result in replica unavailability. Our current approach to limiting large snapshots assumes is that its ok to block snapshots indefinitely while waiting for a range to first split. Unfortunately, this can create a dependency cycle where a range requires a snapshot to split (because it can't achieve an up-to-date quorum without it) but isn't allowed to perform a snapshot until its size is reduced below the threshold. This can result in unavailability even when a majority of replicas remain live. Currently, we still need this snapshot size limit because unbounded snapshots can result in OOM errors that crash entire nodes. However, once snapshots are streamed from disk to disk, never needing to buffer in-memory on the sending or receiving side, we should be able to remove any snapshot size limit (see cockroachdb#16954). As a holdover, this change introduces a `permitLargeSnapshots` flag on a replica which is set when the replica is too large to snapshot but observes splits failing. When set, the flag allows snapshots to ignore the size limit until the snapshot goes through and splits are able to succeed again. Release note (bug fix): Fixed a scenario where a range that is too big to snapshot can lose availability even with a majority of nodes alive.
In a privately reported user issue, we've seen that [our attempts](cockroachdb#7788) at [preventing large snapshots](cockroachdb#7581) can result in replica unavailability. Our current approach to limiting large snapshots assumes is that its ok to block snapshots indefinitely while waiting for a range to first split. Unfortunately, this can create a dependency cycle where a range requires a snapshot to split (because it can't achieve an up-to-date quorum without it) but isn't allowed to perform a snapshot until its size is reduced below the threshold. This can result in unavailability even when a majority of replicas remain live. Currently, we still need this snapshot size limit because unbounded snapshots can result in OOM errors that crash entire nodes. However, once snapshots are streamed from disk to disk, never needing to buffer in-memory on the sending or receiving side, we should be able to remove any snapshot size limit (see cockroachdb#16954). As a holdover, this change introduces a `permitLargeSnapshots` flag on a replica which is set when the replica is too large to snapshot but observes splits failing. When set, the flag allows snapshots to ignore the size limit until the snapshot goes through and splits are able to succeed again. Release note (bug fix): Fixed a scenario where a range that is too big to snapshot can lose availability even with a majority of nodes alive.
In a privately reported user issue, we've seen that [our attempts](cockroachdb#7788) at [preventing large snapshots](cockroachdb#7581) can result in replica unavailability. Our current approach to limiting large snapshots assumes is that its ok to block snapshots indefinitely while waiting for a range to first split. Unfortunately, this can create a dependency cycle where a range requires a snapshot to split (because it can't achieve an up-to-date quorum without it) but isn't allowed to perform a snapshot until its size is reduced below the threshold. This can result in unavailability even when a majority of replicas remain live. Currently, we still need this snapshot size limit because unbounded snapshots can result in OOM errors that crash entire nodes. However, once snapshots are streamed from disk to disk, never needing to buffer in-memory on the sending or receiving side, we should be able to remove any snapshot size limit (see cockroachdb#16954). As a holdover, this change introduces a `permitLargeSnapshots` flag on a replica which is set when the replica is too large to snapshot but observes splits failing. When set, the flag allows snapshots to ignore the size limit until the snapshot goes through and splits are able to succeed again. Release note (bug fix): Fixed a scenario where a range that is too big to snapshot can lose availability even with a majority of nodes alive.
In a privately reported user issue, we've seen that [our attempts](cockroachdb#7788) at [preventing large snapshots](cockroachdb#7581) can result in replica unavailability. Our current approach to limiting large snapshots assumes is that its ok to block snapshots indefinitely while waiting for a range to first split. Unfortunately, this can create a dependency cycle where a range requires a snapshot to split (because it can't achieve an up-to-date quorum without it) but isn't allowed to perform a snapshot until its size is reduced below the threshold. This can result in unavailability even when a majority of replicas remain live. Currently, we still need this snapshot size limit because unbounded snapshots can result in OOM errors that crash entire nodes. However, once snapshots are streamed from disk to disk, never needing to buffer in-memory on the sending or receiving side, we should be able to remove any snapshot size limit (see cockroachdb#16954). As a holdover, this change introduces a `permitLargeSnapshots` flag on a replica which is set when the replica is too large to snapshot but observes splits failing. When set, the flag allows snapshots to ignore the size limit until the snapshot goes through and splits are able to succeed again. Release note (bug fix): Fixed a scenario where a range that is too big to snapshot can lose availability even with a majority of nodes alive.
This change is