Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving rollback effectiveness #1776

Merged
merged 11 commits into from
Mar 2, 2023

Conversation

Thejas-bhat
Copy link
Member

@Thejas-bhat Thejas-bhat commented Dec 28, 2022

  • Essentially, this PR aims to improve the existing rollback effectiveness by collecting the possible rollback points which are spread across a configurable time interval.
  • Update 25th Jan: The approach has been changed completely, although tagging the index with the time stamp is still there.
  • Approach:

The main concept involved here is to build a time series out of the persistedSnapshots that's there in boltdb
with an interval = rollbackSamplingInterval, i.e. the datapoints (the protected snapshots) in the series are
separated by rollbackSamplingInterval amount of duration from each other. This ensures that we hit those partial
rollback paths MUCH better.

Let's say when the removeOldBoltSnapshots is invoked for the first time, the list of persister epochs is
[t5 t4 t3 t2 t1 t0]

  1. We first remove all the snapshots from the boltdb that are older than numSnapshotsToKeep * rollbackSamplingInterval
    with respect to the current timestamps. This ensures that our data space over which we are finding these rollbackpoints
    is a bounded one and that too numSnapshotsToKeep * rollbackSamplingInterval = 3 * 10 = last 30 mins worth of persistedSnapshots

Since this is the first time its invoked, the list would be the same.
[t5 t4 t3 t2 t1 t0]

  1. Now, we parse the list of persisted snapshots in reverse order (because the time increases in a reverse order), by starting with
    the last element in the list as the first datapoint in our time series. each iteration of the loop would try to find the next
    datapoint for our time series by calculating the time difference between the previous datapoint and the current element's (in our list)
    timestamp and this should be equal to the rollbackSamplingInterval (the interval of our time series). If we find such an element we add
    to our list.

Let's say even after parsing the list we haven't protected enough snapshots (<numSnapshotsToKeep). In this case, we just protect contiguous
elements in our list starting from the index corresponding to the last datapoint of our time series.

So, the very first time, and let's assume that this is a heavy mutating scenario, all the timestamps from t0 to t5 could be very close to
each other, and we most likely hit the else case, where we protect the contiguous elements. So, the list would look something like this
[t2 t1 t0]

Now after a some time, we reach a point where the list looks something like
[t14 t13 t12 t11 t10 t9 t8 t7 t2 t1 t0]

where t0-2 are the timestamps from the previous iteration, and let's say that timestamps t7 and t14 are the next two datapoints that are
needed in our time series. In this case, as per step 2 we are to protect the t7 and t14. So, the list looks like
[t14 t7 t0]

Now in the case when the time.Now() is more than the numSnapshotsToKeep * interval, in that case, in step 1 we delete the older
snapshots and the list ends up getting converted from
[t20 t19 t18 t17 t16 t15 t14 t7 t0]
to:
[t20 t19 t18 t17 t16 t15 t14 t7]

And we start from step 2, in which case we won't find the t21 which we want to protect, so the list looks like
[t15 t14 t7]

And the system goes on and keeps trying to do the above steps

In case of sparse/infrequent mutations scenario, ie mutations coming in after a long time of traffic, with the above approach we would not store time-separated type of timestamps. We could potentially start storing contiguous snapshots just like before and if these sparse scenarios are a lot in an index's lifetime, the stored snapshots would essentially be contiguous ones for most of the time and the effectiveness of the rollback points would be very less.

In order to handle this scenario, the concept is to keep track of the rollback points from the previous iteration of protecting snapshots and after the long duration of zero traffic, we see how many of these rollback points to preserve.

Basically, when we fetch the snapshots from boltdb and while deleting the very first snapshot which is older than numSnapshotsToKeep * rollbackSamplingInterval (which is the expirationDuration) we compare very exactly does this snapshot fall in the sorted list of rollback points.
If it lies at an index (retentionFactor * len(s.checkPoints)) that would essentially lead to deleting of "too many" rollback points, we essentially modify the expirationDuration (by getting a boundaryCheckPoint) such that in the next iteration of the rootBoltSnapshotMetaData, we would preserve snapshots until we actually retain "retentionFactor" portion of the rollback points in the boltdb.

Now, the remaining snapshots that are older than the retained ones are deleted from boltdb. Rest of the steps remain the same and when we get the new set of rollback checkpoints, we update in the s.Checkpoints.

 protected epochs
- fixing the bounds calculation logic for fetching
 the epochs.
- including and using the rollbackSamplingInterval
 as a scorch option
persistedSnapshots []*snapshotMetaData) map[uint64]struct{} {

// make a map of epochs to protect from deletion
protectedEpochs := make(map[uint64]struct{}, s.numSnapshotsToKeep)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think providing a length parameter while "make"ing over a map does anything does it? Just this would suffice ..

protectedEpochs := make(map[uint64]struct{})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

protectedEpochs[persistedSnapshots[0].epoch] = struct{}{}
nextSnapshotToProtect :=
persistedSnapshots[0].timeStamp.Add(s.rollbackSamplingInterval)
protectedSnapshots := 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this, you can do len(protectedEpochs) to determine the number of protectedSnapshots.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

protectedSnapshots over here answers as to how many snapshots have been protected until now.

timeStamp time.Time
}

func (s *Scorch) rootBoltSnapshotEpochTimeStamps() ([]*snapshotMetaData, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest a better name, perhaps rootBoltSnapshotMetaData() owing to it's return values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
rv = append(rv, &snapshotMetaData{
epoch: snapshotEpoch,
timeStamp: timeStamp})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

                    rv = append(rv, &snapshotMetaData{
				epoch:     snapshotEpoch,
				timeStamp: timeStamp,
                    })

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// index is bound to be the closest possible snapshot to the
// required timestamp of nextSnapshotToProtect
if persistedSnapshots[i].timeStamp.After(nextSnapshotToProtect) {
protectedEpochs[persistedSnapshots[i-1].epoch] = struct{}{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this doesn't feel right, shouldn't you be adding to the map ..

protectedEpochs[persistedSnapshots[i].epoch] = struct{}{}

Copy link
Member Author

@Thejas-bhat Thejas-bhat Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nextSnapshotToProtect essentially indicates the timestamp of the next snapshot we want to protect. There can be a situation that we don't have a snapshot in our list of persisted snapshots, with the exact timestamp that we're trying to find, during which we just protect the snapshot closest to the required timestamp, but newer than it. So, we are trying to find the first timestamp (at index i) that's older than the nextSnapshotToProtect and we can say for sure that the snapshot at i-1 has a timestamp greater (so its newer) than nextSnapshotToProtect which is bound to be the closest one.

start := lastProtectedSnapshot + 1
end := start + s.numSnapshotsToKeep - protectedSnapshots

// If we don't have enough snapshots, just take all of them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good idea, this'd violate the numSnapshotsToKeep contract. Also I feel the math here has been over engineered, and can be simplified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over here, I've favoured having enough (numSnapshotsToKeep) number of snapshots over the concept of picking specific rollback points that are separated by rollbackSamplingInterval duration. So, if we don't have "enough" snapshots in the persisted snapshots list to guarantee that we can get numSnapshotsToKeep number of protected snapshots, then we just protect the next numSnapshotsToKeep - protectedSnapshots number of snapshots.

However I'm still looking out for more edge cases that need to handled, and I'll add comments for the same.

@abhinavdangeti abhinavdangeti added this to the v2.3.7 milestone Jan 16, 2023
  enough "time series" type protected snapshots - protecting contiguous
  latest snapshots (reduces the amount of index build in case of a partial
  rollback point hit).
- handling the sparse mutation scenario. basically, when the index receives
  heavy mutations for a time period and then after a low traffic time,
  receives small amount of mutations.
@abhinavdangeti abhinavdangeti marked this pull request as ready for review February 6, 2023 20:42
rootBolt *bolt.DB
asyncTasks sync.WaitGroup
numSnapshotsToKeep int
rollbackRetentionFactor float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's chat on how you're using this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the description with more explanation about the latest patch.

rollbackRetentionFactor refers to how much of the rollback checkpoints to conserve when we run the rootBoltSnapshotsMetaData after a long period of no mutations. This is because after this long time period of no mutations, because we delete the snapshots that are older than the numSnapshotsToKeep * rollbackSamplingInterval from the boltDB, we could stand to lose a lot of the rollback checkpoints and hence its necessary to retain a certain portion of it so that we maintain the effectiveness of these checkpoints (atleast to a certain degree).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so this is the situation when there hasn't been a lot of mutations for a long period - in this situation I feel we will simply not hit a rollback situation at all in realistic situations. So that considered do we really need to preserve older snapshots in the event of no recent snapshots - whose epochs fall within numSnapshotsToKeep * rollbackSamplingInterval.

asyncTasks sync.WaitGroup
numSnapshotsToKeep int
rollbackRetentionFactor float64
checkPoints []*snapshotMetaData
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the references to the older snapshot epochs need to be held here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've updated the reasoning behind this in the description.

// the very latest snapshot(ie the current one),
// the snapshot that was persisted 10 minutes before the current one,
// the snapshot that was persisted 20 minutes before the current one
var RollbackSamplingInterval = 2 * time.Minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought we'd default it to 0? So today's behavior will remain unchanged with this change. The user will optionally be able to configure this. For couchbase we'll set the config within cbft.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, my bad, I've included this in the recent patch


// Controls what portion of the earlier rollback points to retain during
// a infrequent/sparse mutation scenario
var RollbackRetentionFactor = float64(0.5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the purpose of this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I've updated the description which clarifies this part

Copy link
Contributor

@iamrajiv iamrajiv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thejas-bhat One way to handle sparse/infrequent mutations is to use a hybrid approach that combines snapshot-based and log-based techniques. Snapshots are taken at regular intervals, and incremental logs are stored between snapshots. During a rollback, the system uses the latest snapshot and replays the mutations from the logs to restore the system to the desired state. This approach reduces data storage and provides recovery from failures/rollbacks. The retention policy can be dynamically adjusted based on mutation frequency and traffic patterns, rather than using a fixed policy.

@Thejas-bhat
Copy link
Member Author

@iamrajiv, thanks for pointing it out. The thing is that what you described as "replaying the snapshots from the logs" is something that a datastore type of a component would possibly do.

The replaying part from the logs, basically means indexing back the corresponding set of documents by calling the indexing APIs. However, the way in which the "replay" happens could vary and made tunable as per the application, such as the number of batches you're using in this replay, equal sized vs variable sized batches etc. before passing on to bleve to index and recover back to the desired state (which need not be the same as the saved snapshots).
This particular scenario is something that a datastore, that can stream documents to be indexed, would be apt for.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thejas-bhat a couple more questions around the retention scheme you've used. Perhaps we can chat on this tomorrow.

@@ -113,6 +114,7 @@ OUTER:
select {
case <-s.closeCh:
break OUTER

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this (unrelated) new line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad

// (comparison in terms of minutes), which is the interval of our time
// series. In this case, add the epoch rv
if int(snapshots[i].timeStamp.Sub(snapshots[ptr].timeStamp).Minutes()) >
int(interval.Minutes()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cast both the Minutes() above into int - we can simply leave them in the float64 format as it is just a greater than comparison.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ptr = i + 1
numSnapshotsProtected++
} else if int(snapshots[i].timeStamp.Sub(snapshots[ptr].timeStamp).Minutes()) ==
int(interval.Minutes()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// by retention factor), so that we don't start protected
// contiguous snapshots (in which case we would not be protected
// snapshots that are far apart for the rollback to be effective
// enough)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't follow this line, could use some re-wording here perhaps?

... so that we don't start protected
// contiguous snapshots (in which case we would not be protected
// snapshots that are far apart for the rollback to be effective
// enough)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

rootBolt *bolt.DB
asyncTasks sync.WaitGroup
numSnapshotsToKeep int
rollbackRetentionFactor float64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so this is the situation when there hasn't been a lot of mutations for a long period - in this situation I feel we will simply not hit a rollback situation at all in realistic situations. So that considered do we really need to preserve older snapshots in the event of no recent snapshots - whose epochs fall within numSnapshotsToKeep * rollbackSamplingInterval.

@Thejas-bhat
Copy link
Member Author

I think that the probability of hitting a rollback state that's older than numSnapshotsToKeep * rollbackSamplingInterval is low when we have the samplingInterval is of high value. Let's say 10 mins and numSnapshotsToKeep is 3. So, chances of rollback state older than 30 min would be very low in realistic situations. However, if the sampling interval is like 2 min or something, then the chances of hitting a state older than 6 mins need not be that low. So, it mainly depends on the sampling interval and number of snapshots to protect which is supposed to be configurable thing according to the application using bleve.

So, given this relative/application dependent situation, I think it's better to retain some of those older snapshots, because if we have negligible indexing traffic for greater than numSnapshotsToKeep * rollbackSamplingInterval duration and also if the application is having this type of traffic a lot in its index lifetime, then we would end up storing contiguous snapshots (the old way) more often than the proposed method (which is supposed to be better). Now, since rollback can happen at any point of time and if we store contiguous snapshots most of the times, we would be likely to have a poor rollback behaviour.

@abhinavdangeti
Copy link
Member

So, given this relative/application dependent situation, I think it's better to retain some of those older snapshots, because if we have negligible indexing traffic for greater than numSnapshotsToKeep * rollbackSamplingInterval duration and also if the application is having this type of traffic a lot in its index lifetime, then we would end up storing contiguous snapshots (the old way) more often than the proposed method (which is supposed to be better). Now, since rollback can happen at any point of time and if we store contiguous snapshots most of the times, we would be likely to have a poor rollback behaviour.

By older snapshots - if you mean those with epochs older than numSnapshotsToKeep * rollbackSamplingInterval then I just worry this could cause un-predictable behavior. My earlier argument (at least for Couchbase) is - if there's not been a lot of traffic the chance that we'll end up in a rollback situation is going to be quite low. So, it should simply be up to the application to decide what to set for numSnapshotsToKeep and rollbackSamplingInterval per their usage pattern and our job is to only retain snapshots whose epochs fall within these constraints.

@Thejas-bhat Thejas-bhat merged commit 51444f0 into blevesearch:master Mar 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants