-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving rollback effectiveness #1776
Conversation
protected epochs - fixing the bounds calculation logic for fetching the epochs. - including and using the rollbackSamplingInterval as a scorch option
index/scorch/persister.go
Outdated
persistedSnapshots []*snapshotMetaData) map[uint64]struct{} { | ||
|
||
// make a map of epochs to protect from deletion | ||
protectedEpochs := make(map[uint64]struct{}, s.numSnapshotsToKeep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think providing a length parameter while "make"ing over a map does anything does it? Just this would suffice ..
protectedEpochs := make(map[uint64]struct{})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index/scorch/persister.go
Outdated
protectedEpochs[persistedSnapshots[0].epoch] = struct{}{} | ||
nextSnapshotToProtect := | ||
persistedSnapshots[0].timeStamp.Add(s.rollbackSamplingInterval) | ||
protectedSnapshots := 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this, you can do len(protectedEpochs) to determine the number of protectedSnapshots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
protectedSnapshots over here answers as to how many snapshots have been protected until now.
index/scorch/persister.go
Outdated
timeStamp time.Time | ||
} | ||
|
||
func (s *Scorch) rootBoltSnapshotEpochTimeStamps() ([]*snapshotMetaData, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest a better name, perhaps rootBoltSnapshotMetaData()
owing to it's return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index/scorch/persister.go
Outdated
} | ||
rv = append(rv, &snapshotMetaData{ | ||
epoch: snapshotEpoch, | ||
timeStamp: timeStamp}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
rv = append(rv, &snapshotMetaData{
epoch: snapshotEpoch,
timeStamp: timeStamp,
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
index/scorch/persister.go
Outdated
// index is bound to be the closest possible snapshot to the | ||
// required timestamp of nextSnapshotToProtect | ||
if persistedSnapshots[i].timeStamp.After(nextSnapshotToProtect) { | ||
protectedEpochs[persistedSnapshots[i-1].epoch] = struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this doesn't feel right, shouldn't you be adding to the map ..
protectedEpochs[persistedSnapshots[i].epoch] = struct{}{}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nextSnapshotToProtect essentially indicates the timestamp of the next snapshot we want to protect. There can be a situation that we don't have a snapshot in our list of persisted snapshots, with the exact timestamp that we're trying to find, during which we just protect the snapshot closest to the required timestamp, but newer than it. So, we are trying to find the first timestamp (at index i
) that's older than the nextSnapshotToProtect and we can say for sure that the snapshot at i-1
has a timestamp greater (so its newer) than nextSnapshotToProtect which is bound to be the closest one.
index/scorch/persister.go
Outdated
start := lastProtectedSnapshot + 1 | ||
end := start + s.numSnapshotsToKeep - protectedSnapshots | ||
|
||
// If we don't have enough snapshots, just take all of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good idea, this'd violate the numSnapshotsToKeep contract. Also I feel the math here has been over engineered, and can be simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Over here, I've favoured having enough (numSnapshotsToKeep
) number of snapshots over the concept of picking specific rollback points that are separated by rollbackSamplingInterval
duration. So, if we don't have "enough" snapshots in the persisted snapshots list to guarantee that we can get numSnapshotsToKeep
number of protected snapshots, then we just protect the next numSnapshotsToKeep - protectedSnapshots number of snapshots.
However I'm still looking out for more edge cases that need to handled, and I'll add comments for the same.
7cbe686
to
91cdf84
Compare
enough "time series" type protected snapshots - protecting contiguous latest snapshots (reduces the amount of index build in case of a partial rollback point hit). - handling the sparse mutation scenario. basically, when the index receives heavy mutations for a time period and then after a low traffic time, receives small amount of mutations.
54c3509
to
fe3434a
Compare
rootBolt *bolt.DB | ||
asyncTasks sync.WaitGroup | ||
numSnapshotsToKeep int | ||
rollbackRetentionFactor float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's chat on how you're using this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the description with more explanation about the latest patch.
rollbackRetentionFactor refers to how much of the rollback checkpoints to conserve when we run the rootBoltSnapshotsMetaData after a long period of no mutations. This is because after this long time period of no mutations, because we delete the snapshots that are older than the numSnapshotsToKeep * rollbackSamplingInterval from the boltDB, we could stand to lose a lot of the rollback checkpoints and hence its necessary to retain a certain portion of it so that we maintain the effectiveness of these checkpoints (atleast to a certain degree).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so this is the situation when there hasn't been a lot of mutations for a long period - in this situation I feel we will simply not hit a rollback situation at all in realistic situations. So that considered do we really need to preserve older snapshots in the event of no recent snapshots - whose epochs fall within numSnapshotsToKeep * rollbackSamplingInterval
.
asyncTasks sync.WaitGroup | ||
numSnapshotsToKeep int | ||
rollbackRetentionFactor float64 | ||
checkPoints []*snapshotMetaData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the references to the older snapshot epochs need to be held here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've updated the reasoning behind this in the description.
index/scorch/persister.go
Outdated
// the very latest snapshot(ie the current one), | ||
// the snapshot that was persisted 10 minutes before the current one, | ||
// the snapshot that was persisted 20 minutes before the current one | ||
var RollbackSamplingInterval = 2 * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought we'd default it to 0? So today's behavior will remain unchanged with this change. The user will optionally be able to configure this. For couchbase we'll set the config within cbft.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, my bad, I've included this in the recent patch
|
||
// Controls what portion of the earlier rollback points to retain during | ||
// a infrequent/sparse mutation scenario | ||
var RollbackRetentionFactor = float64(0.5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the purpose of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I've updated the description which clarifies this part
ca33fdc
to
8ac3743
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Thejas-bhat One way to handle sparse/infrequent mutations is to use a hybrid approach that combines snapshot-based and log-based techniques. Snapshots are taken at regular intervals, and incremental logs are stored between snapshots. During a rollback, the system uses the latest snapshot and replays the mutations from the logs to restore the system to the desired state. This approach reduces data storage and provides recovery from failures/rollbacks. The retention policy can be dynamically adjusted based on mutation frequency and traffic patterns, rather than using a fixed policy.
@iamrajiv, thanks for pointing it out. The thing is that what you described as "replaying the snapshots from the logs" is something that a datastore type of a component would possibly do. The replaying part from the logs, basically means indexing back the corresponding set of documents by calling the indexing APIs. However, the way in which the "replay" happens could vary and made tunable as per the application, such as the number of batches you're using in this replay, equal sized vs variable sized batches etc. before passing on to bleve to index and recover back to the desired state (which need not be the same as the saved snapshots). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Thejas-bhat a couple more questions around the retention scheme you've used. Perhaps we can chat on this tomorrow.
index/scorch/persister.go
Outdated
@@ -113,6 +114,7 @@ OUTER: | |||
select { | |||
case <-s.closeCh: | |||
break OUTER | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this (unrelated) new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad
index/scorch/persister.go
Outdated
// (comparison in terms of minutes), which is the interval of our time | ||
// series. In this case, add the epoch rv | ||
if int(snapshots[i].timeStamp.Sub(snapshots[ptr].timeStamp).Minutes()) > | ||
int(interval.Minutes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why cast both the Minutes() above into int - we can simply leave them in the float64 format as it is just a greater than comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
index/scorch/persister.go
Outdated
ptr = i + 1 | ||
numSnapshotsProtected++ | ||
} else if int(snapshots[i].timeStamp.Sub(snapshots[ptr].timeStamp).Minutes()) == | ||
int(interval.Minutes()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
index/scorch/persister.go
Outdated
// by retention factor), so that we don't start protected | ||
// contiguous snapshots (in which case we would not be protected | ||
// snapshots that are far apart for the rollback to be effective | ||
// enough) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't follow this line, could use some re-wording here perhaps?
... so that we don't start protected
// contiguous snapshots (in which case we would not be protected
// snapshots that are far apart for the rollback to be effective
// enough)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
rootBolt *bolt.DB | ||
asyncTasks sync.WaitGroup | ||
numSnapshotsToKeep int | ||
rollbackRetentionFactor float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so this is the situation when there hasn't been a lot of mutations for a long period - in this situation I feel we will simply not hit a rollback situation at all in realistic situations. So that considered do we really need to preserve older snapshots in the event of no recent snapshots - whose epochs fall within numSnapshotsToKeep * rollbackSamplingInterval
.
8ac3743
to
91086ab
Compare
I think that the probability of hitting a rollback state that's older than So, given this relative/application dependent situation, I think it's better to retain some of those older snapshots, because if we have negligible indexing traffic for greater than |
By older snapshots - if you mean those with epochs older than |
The main concept involved here is to build a time series out of the persistedSnapshots that's there in boltdb
with an interval = rollbackSamplingInterval, i.e. the datapoints (the protected snapshots) in the series are
separated by rollbackSamplingInterval amount of duration from each other. This ensures that we hit those partial
rollback paths MUCH better.
Let's say when the removeOldBoltSnapshots is invoked for the first time, the list of persister epochs is
[t5 t4 t3 t2 t1 t0]
with respect to the current timestamps. This ensures that our data space over which we are finding these rollbackpoints
is a bounded one and that too numSnapshotsToKeep * rollbackSamplingInterval = 3 * 10 = last 30 mins worth of persistedSnapshots
Since this is the first time its invoked, the list would be the same.
[t5 t4 t3 t2 t1 t0]
the last element in the list as the first datapoint in our time series. each iteration of the loop would try to find the next
datapoint for our time series by calculating the time difference between the previous datapoint and the current element's (in our list)
timestamp and this should be equal to the rollbackSamplingInterval (the interval of our time series). If we find such an element we add
to our list.
Let's say even after parsing the list we haven't protected enough snapshots (<numSnapshotsToKeep). In this case, we just protect contiguous
elements in our list starting from the index corresponding to the last datapoint of our time series.
So, the very first time, and let's assume that this is a heavy mutating scenario, all the timestamps from t0 to t5 could be very close to
each other, and we most likely hit the else case, where we protect the contiguous elements. So, the list would look something like this
[t2 t1 t0]
Now after a some time, we reach a point where the list looks something like
[t14 t13 t12 t11 t10 t9 t8 t7 t2 t1 t0]
where t0-2 are the timestamps from the previous iteration, and let's say that timestamps t7 and t14 are the next two datapoints that are
needed in our time series. In this case, as per step 2 we are to protect the t7 and t14. So, the list looks like
[t14 t7 t0]
Now in the case when the time.Now() is more than the numSnapshotsToKeep * interval, in that case, in step 1 we delete the older
snapshots and the list ends up getting converted from
[t20 t19 t18 t17 t16 t15 t14 t7 t0]
to:
[t20 t19 t18 t17 t16 t15 t14 t7]
And we start from step 2, in which case we won't find the t21 which we want to protect, so the list looks like
[t15 t14 t7]
And the system goes on and keeps trying to do the above steps
In case of sparse/infrequent mutations scenario, ie mutations coming in after a long time of traffic, with the above approach we would not store time-separated type of timestamps. We could potentially start storing contiguous snapshots just like before and if these sparse scenarios are a lot in an index's lifetime, the stored snapshots would essentially be contiguous ones for most of the time and the effectiveness of the rollback points would be very less.
In order to handle this scenario, the concept is to keep track of the rollback points from the previous iteration of protecting snapshots and after the long duration of zero traffic, we see how many of these rollback points to preserve.
Basically, when we fetch the snapshots from boltdb and while deleting the very first snapshot which is older than numSnapshotsToKeep * rollbackSamplingInterval (which is the expirationDuration) we compare very exactly does this snapshot fall in the sorted list of rollback points.
If it lies at an index (retentionFactor * len(s.checkPoints)) that would essentially lead to deleting of "too many" rollback points, we essentially modify the expirationDuration (by getting a boundaryCheckPoint) such that in the next iteration of the rootBoltSnapshotMetaData, we would preserve snapshots until we actually retain "retentionFactor" portion of the rollback points in the boltdb.
Now, the remaining snapshots that are older than the retained ones are deleted from boltdb. Rest of the steps remain the same and when we get the new set of rollback checkpoints, we update in the s.Checkpoints.