Improving rollback effectiveness #1776

Thejas-bhat · 2022-12-28T12:10:54Z

Essentially, this PR aims to improve the existing rollback effectiveness by collecting the possible rollback points which are spread across a configurable time interval.
Update 25th Jan: The approach has been changed completely, although tagging the index with the time stamp is still there.
Approach:

The main concept involved here is to build a time series out of the persistedSnapshots that's there in boltdb
with an interval = rollbackSamplingInterval, i.e. the datapoints (the protected snapshots) in the series are
separated by rollbackSamplingInterval amount of duration from each other. This ensures that we hit those partial
rollback paths MUCH better.

Let's say when the removeOldBoltSnapshots is invoked for the first time, the list of persister epochs is
[t5 t4 t3 t2 t1 t0]

We first remove all the snapshots from the boltdb that are older than numSnapshotsToKeep * rollbackSamplingInterval
with respect to the current timestamps. This ensures that our data space over which we are finding these rollbackpoints
is a bounded one and that too numSnapshotsToKeep * rollbackSamplingInterval = 3 * 10 = last 30 mins worth of persistedSnapshots

Since this is the first time its invoked, the list would be the same.
[t5 t4 t3 t2 t1 t0]

Now, we parse the list of persisted snapshots in reverse order (because the time increases in a reverse order), by starting with
the last element in the list as the first datapoint in our time series. each iteration of the loop would try to find the next
datapoint for our time series by calculating the time difference between the previous datapoint and the current element's (in our list)
timestamp and this should be equal to the rollbackSamplingInterval (the interval of our time series). If we find such an element we add
to our list.

Let's say even after parsing the list we haven't protected enough snapshots (<numSnapshotsToKeep). In this case, we just protect contiguous
elements in our list starting from the index corresponding to the last datapoint of our time series.

So, the very first time, and let's assume that this is a heavy mutating scenario, all the timestamps from t0 to t5 could be very close to
each other, and we most likely hit the else case, where we protect the contiguous elements. So, the list would look something like this
[t2 t1 t0]

Now after a some time, we reach a point where the list looks something like
[t14 t13 t12 t11 t10 t9 t8 t7 t2 t1 t0]

where t0-2 are the timestamps from the previous iteration, and let's say that timestamps t7 and t14 are the next two datapoints that are
needed in our time series. In this case, as per step 2 we are to protect the t7 and t14. So, the list looks like
[t14 t7 t0]

Now in the case when the time.Now() is more than the numSnapshotsToKeep * interval, in that case, in step 1 we delete the older
snapshots and the list ends up getting converted from
[t20 t19 t18 t17 t16 t15 t14 t7 t0]
to:
[t20 t19 t18 t17 t16 t15 t14 t7]

And we start from step 2, in which case we won't find the t21 which we want to protect, so the list looks like
[t15 t14 t7]

And the system goes on and keeps trying to do the above steps

In case of sparse/infrequent mutations scenario, ie mutations coming in after a long time of traffic, with the above approach we would not store time-separated type of timestamps. We could potentially start storing contiguous snapshots just like before and if these sparse scenarios are a lot in an index's lifetime, the stored snapshots would essentially be contiguous ones for most of the time and the effectiveness of the rollback points would be very less.

In order to handle this scenario, the concept is to keep track of the rollback points from the previous iteration of protecting snapshots and after the long duration of zero traffic, we see how many of these rollback points to preserve.

Basically, when we fetch the snapshots from boltdb and while deleting the very first snapshot which is older than numSnapshotsToKeep * rollbackSamplingInterval (which is the expirationDuration) we compare very exactly does this snapshot fall in the sorted list of rollback points.
If it lies at an index (retentionFactor * len(s.checkPoints)) that would essentially lead to deleting of "too many" rollback points, we essentially modify the expirationDuration (by getting a boundaryCheckPoint) such that in the next iteration of the rootBoltSnapshotMetaData, we would preserve snapshots until we actually retain "retentionFactor" portion of the rollback points in the boltdb.

Now, the remaining snapshots that are older than the retained ones are deleted from boltdb. Rest of the steps remain the same and when we get the new set of rollback checkpoints, we update in the s.Checkpoints.

protected epochs - fixing the bounds calculation logic for fetching the epochs. - including and using the rollbackSamplingInterval as a scorch option

abhinavdangeti · 2023-01-10T01:11:45Z

index/scorch/persister.go

+	persistedSnapshots []*snapshotMetaData) map[uint64]struct{} {
+
+	// make a map of epochs to protect from deletion
+	protectedEpochs := make(map[uint64]struct{}, s.numSnapshotsToKeep)


I don't think providing a length parameter while "make"ing over a map does anything does it? Just this would suffice ..

protectedEpochs := make(map[uint64]struct{})

abhinavdangeti · 2023-01-10T01:12:09Z

index/scorch/persister.go

+	protectedEpochs[persistedSnapshots[0].epoch] = struct{}{}
+	nextSnapshotToProtect :=
+		persistedSnapshots[0].timeStamp.Add(s.rollbackSamplingInterval)
+	protectedSnapshots := 1


No need for this, you can do len(protectedEpochs) to determine the number of protectedSnapshots.

protectedSnapshots over here answers as to how many snapshots have been protected until now.

abhinavdangeti · 2023-01-10T01:17:01Z

index/scorch/persister.go

+	timeStamp time.Time
+}
+
+func (s *Scorch) rootBoltSnapshotEpochTimeStamps() ([]*snapshotMetaData, error) {


I'd suggest a better name, perhaps rootBoltSnapshotMetaData() owing to it's return values.

abhinavdangeti · 2023-01-10T01:17:50Z

index/scorch/persister.go

+			}
+			rv = append(rv, &snapshotMetaData{
+				epoch:     snapshotEpoch,
+				timeStamp: timeStamp})


nit:

rv = append(rv, &snapshotMetaData{ epoch: snapshotEpoch, timeStamp: timeStamp, })

abhinavdangeti · 2023-01-10T01:20:07Z

index/scorch/persister.go

+		// index is bound to be the closest possible snapshot to the
+		// required timestamp of nextSnapshotToProtect
+		if persistedSnapshots[i].timeStamp.After(nextSnapshotToProtect) {
+			protectedEpochs[persistedSnapshots[i-1].epoch] = struct{}{}


Hmm this doesn't feel right, shouldn't you be adding to the map ..

protectedEpochs[persistedSnapshots[i].epoch] = struct{}{}

nextSnapshotToProtect essentially indicates the timestamp of the next snapshot we want to protect. There can be a situation that we don't have a snapshot in our list of persisted snapshots, with the exact timestamp that we're trying to find, during which we just protect the snapshot closest to the required timestamp, but newer than it. So, we are trying to find the first timestamp (at index i) that's older than the nextSnapshotToProtect and we can say for sure that the snapshot at i-1 has a timestamp greater (so its newer) than nextSnapshotToProtect which is bound to be the closest one.

abhinavdangeti · 2023-01-10T01:21:32Z

index/scorch/persister.go

+	start := lastProtectedSnapshot + 1
+	end := start + s.numSnapshotsToKeep - protectedSnapshots
+
+	// If we don't have enough snapshots, just take all of them.


Not a good idea, this'd violate the numSnapshotsToKeep contract. Also I feel the math here has been over engineered, and can be simplified.

Over here, I've favoured having enough (numSnapshotsToKeep) number of snapshots over the concept of picking specific rollback points that are separated by rollbackSamplingInterval duration. So, if we don't have "enough" snapshots in the persisted snapshots list to guarantee that we can get numSnapshotsToKeep number of protected snapshots, then we just protect the next numSnapshotsToKeep - protectedSnapshots number of snapshots.

However I'm still looking out for more edge cases that need to handled, and I'll add comments for the same.

enough "time series" type protected snapshots - protecting contiguous latest snapshots (reduces the amount of index build in case of a partial rollback point hit). - handling the sparse mutation scenario. basically, when the index receives heavy mutations for a time period and then after a low traffic time, receives small amount of mutations.

abhinavdangeti · 2023-02-07T22:55:54Z

index/scorch/scorch.go

-	rootBolt           *bolt.DB
-	asyncTasks         sync.WaitGroup
+	numSnapshotsToKeep       int
+	rollbackRetentionFactor  float64


Let's chat on how you're using this.

I've updated the description with more explanation about the latest patch.

rollbackRetentionFactor refers to how much of the rollback checkpoints to conserve when we run the rootBoltSnapshotsMetaData after a long period of no mutations. This is because after this long time period of no mutations, because we delete the snapshots that are older than the numSnapshotsToKeep * rollbackSamplingInterval from the boltDB, we could stand to lose a lot of the rollback checkpoints and hence its necessary to retain a certain portion of it so that we maintain the effectiveness of these checkpoints (atleast to a certain degree).

Hmm, so this is the situation when there hasn't been a lot of mutations for a long period - in this situation I feel we will simply not hit a rollback situation at all in realistic situations. So that considered do we really need to preserve older snapshots in the event of no recent snapshots - whose epochs fall within numSnapshotsToKeep * rollbackSamplingInterval.

abhinavdangeti · 2023-02-07T22:56:22Z

index/scorch/scorch.go

-	asyncTasks         sync.WaitGroup
+	numSnapshotsToKeep       int
+	rollbackRetentionFactor  float64
+	checkPoints              []*snapshotMetaData


Do the references to the older snapshot epochs need to be held here?

Yeah I've updated the reasoning behind this in the description.

abhinavdangeti · 2023-02-07T23:03:11Z

index/scorch/persister.go

+//    the very latest snapshot(ie the current one),
+//    the snapshot that was persisted 10 minutes before the current one,
+//	  the snapshot that was persisted 20 minutes before the current one
+var RollbackSamplingInterval = 2 * time.Minute


Thought we'd default it to 0? So today's behavior will remain unchanged with this change. The user will optionally be able to configure this. For couchbase we'll set the config within cbft.

Yep, my bad, I've included this in the recent patch

abhinavdangeti · 2023-02-07T23:03:31Z

index/scorch/persister.go

+
+// Controls what portion of the earlier rollback points to retain during
+// a infrequent/sparse mutation scenario
+var RollbackRetentionFactor = float64(0.5)


Not sure I understand the purpose of this.

My bad, I've updated the description which clarifies this part

iamrajiv

@Thejas-bhat One way to handle sparse/infrequent mutations is to use a hybrid approach that combines snapshot-based and log-based techniques. Snapshots are taken at regular intervals, and incremental logs are stored between snapshots. During a rollback, the system uses the latest snapshot and replays the mutations from the logs to restore the system to the desired state. This approach reduces data storage and provides recovery from failures/rollbacks. The retention policy can be dynamically adjusted based on mutation frequency and traffic patterns, rather than using a fixed policy.

Thejas-bhat · 2023-02-22T15:05:01Z

@iamrajiv, thanks for pointing it out. The thing is that what you described as "replaying the snapshots from the logs" is something that a datastore type of a component would possibly do.

The replaying part from the logs, basically means indexing back the corresponding set of documents by calling the indexing APIs. However, the way in which the "replay" happens could vary and made tunable as per the application, such as the number of batches you're using in this replay, equal sized vs variable sized batches etc. before passing on to bleve to index and recover back to the desired state (which need not be the same as the saved snapshots).
This particular scenario is something that a datastore, that can stream documents to be indexed, would be apt for.

abhinavdangeti

@Thejas-bhat a couple more questions around the retention scheme you've used. Perhaps we can chat on this tomorrow.

abhinavdangeti · 2023-02-22T20:25:44Z

index/scorch/persister.go

@@ -113,6 +114,7 @@ OUTER:
 		select {
 		case <-s.closeCh:
 			break OUTER
+


Drop this (unrelated) new line.

abhinavdangeti · 2023-02-22T20:29:36Z

index/scorch/persister.go

+		// (comparison in terms of minutes), which is the interval of our time
+		// series. In this case, add the epoch rv
+		if int(snapshots[i].timeStamp.Sub(snapshots[ptr].timeStamp).Minutes()) >
+			int(interval.Minutes()) {


Why cast both the Minutes() above into int - we can simply leave them in the float64 format as it is just a greater than comparison.

abhinavdangeti · 2023-02-22T20:29:44Z

index/scorch/persister.go

+			ptr = i + 1
+			numSnapshotsProtected++
+		} else if int(snapshots[i].timeStamp.Sub(snapshots[ptr].timeStamp).Minutes()) ==
+			int(interval.Minutes()) {


abhinavdangeti · 2023-02-22T20:32:20Z

index/scorch/persister.go

+// by retention factor), so that we don't start protected
+// contiguous snapshots (in which case we would not be protected
+// snapshots that are far apart for the rollback to be effective
+// enough)


Couldn't follow this line, could use some re-wording here perhaps?

... so that we don't start protected
// contiguous snapshots (in which case we would not be protected
// snapshots that are far apart for the rollback to be effective
// enough)

abhinavdangeti · 2023-02-22T20:35:16Z

index/scorch/scorch.go

-	rootBolt           *bolt.DB
-	asyncTasks         sync.WaitGroup
+	numSnapshotsToKeep       int
+	rollbackRetentionFactor  float64


Hmm, so this is the situation when there hasn't been a lot of mutations for a long period - in this situation I feel we will simply not hit a rollback situation at all in realistic situations. So that considered do we really need to preserve older snapshots in the event of no recent snapshots - whose epochs fall within numSnapshotsToKeep * rollbackSamplingInterval.

Thejas-bhat · 2023-02-24T10:43:44Z

I think that the probability of hitting a rollback state that's older than numSnapshotsToKeep * rollbackSamplingInterval is low when we have the samplingInterval is of high value. Let's say 10 mins and numSnapshotsToKeep is 3. So, chances of rollback state older than 30 min would be very low in realistic situations. However, if the sampling interval is like 2 min or something, then the chances of hitting a state older than 6 mins need not be that low. So, it mainly depends on the sampling interval and number of snapshots to protect which is supposed to be configurable thing according to the application using bleve.

So, given this relative/application dependent situation, I think it's better to retain some of those older snapshots, because if we have negligible indexing traffic for greater than numSnapshotsToKeep * rollbackSamplingInterval duration and also if the application is having this type of traffic a lot in its index lifetime, then we would end up storing contiguous snapshots (the old way) more often than the proposed method (which is supposed to be better). Now, since rollback can happen at any point of time and if we store contiguous snapshots most of the times, we would be likely to have a poor rollback behaviour.

abhinavdangeti · 2023-02-24T23:46:33Z

So, given this relative/application dependent situation, I think it's better to retain some of those older snapshots, because if we have negligible indexing traffic for greater than numSnapshotsToKeep * rollbackSamplingInterval duration and also if the application is having this type of traffic a lot in its index lifetime, then we would end up storing contiguous snapshots (the old way) more often than the proposed method (which is supposed to be better). Now, since rollback can happen at any point of time and if we store contiguous snapshots most of the times, we would be likely to have a poor rollback behaviour.

By older snapshots - if you mean those with epochs older than numSnapshotsToKeep * rollbackSamplingInterval then I just worry this could cause un-predictable behavior. My earlier argument (at least for Couchbase) is - if there's not been a lot of traffic the chance that we'll end up in a rollback situation is going to be quite low. So, it should simply be up to the application to decide what to set for numSnapshotsToKeep and rollbackSamplingInterval per their usage pattern and our job is to only retain snapshots whose epochs fall within these constraints.

Thejas-bhat added 4 commits December 28, 2022 17:36

time interval based rollback

7c9f9a8

- handling some edge cases while fetching the

bb56e6f

protected epochs - fixing the bounds calculation logic for fetching the epochs. - including and using the rollbackSamplingInterval as a scorch option

accepting a new rollbackSamplingInterval as a config value

2920ed7

some comments

174f8b7

Thejas-bhat requested a review from abhinavdangeti December 30, 2022 16:35

Thejas-bhat requested review from metonymic-smokey and moshaad7 January 9, 2023 09:59

abhinavdangeti requested changes Jan 10, 2023

View reviewed changes

Thejas-bhat added 3 commits January 13, 2023 14:05

renaming vars and methods

0ef3fe7

bug fix: fixing the logic of choosing the protected epochs

21c7b98

bug fix exact timestamp handling, added unit tests

a334570

abhinavdangeti added this to the v2.3.7 milestone Jan 16, 2023

the new working approach for getting the rollback points

91cdf84

Thejas-bhat force-pushed the rollbackImprove branch from 7cbe686 to 91cdf84 Compare January 25, 2023 10:06

Thejas-bhat requested a review from abhinavdangeti January 25, 2023 10:06

Thejas-bhat force-pushed the rollbackImprove branch from 54c3509 to fe3434a Compare February 6, 2023 08:45

abhinavdangeti marked this pull request as ready for review February 6, 2023 20:42

abhinavdangeti reviewed Feb 7, 2023

View reviewed changes

Thejas-bhat force-pushed the rollbackImprove branch from ca33fdc to 8ac3743 Compare February 13, 2023 14:41

iamrajiv reviewed Feb 15, 2023

View reviewed changes

abhinavdangeti reviewed Feb 22, 2023

View reviewed changes

defaulting the rollback checkpoint behaviour to the old one

91086ab

Thejas-bhat force-pushed the rollbackImprove branch from 8ac3743 to 91086ab Compare February 24, 2023 10:29

abhinavdangeti mentioned this pull request Mar 1, 2023

Improve the effectiveness of rollback points in scorch #1504

Closed

abhinavdangeti approved these changes Mar 1, 2023

View reviewed changes

fixed duplicate protection of same snapshots

13cf330

Thejas-bhat merged commit 51444f0 into blevesearch:master Mar 2, 2023

Improving rollback effectiveness #1776

Improving rollback effectiveness #1776

Conversation

Thejas-bhat commented Dec 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thejas-bhat Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

iamrajiv left a comment

Choose a reason for hiding this comment

Thejas-bhat commented Feb 22, 2023

abhinavdangeti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thejas-bhat commented Feb 24, 2023

abhinavdangeti commented Feb 24, 2023

Thejas-bhat commented Dec 28, 2022 •

edited

Loading

Thejas-bhat Jan 13, 2023 •

edited

Loading