Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitstore Enhanchements #6474

Merged
merged 197 commits into from
Jul 13, 2021
Merged

Splitstore Enhanchements #6474

merged 197 commits into from
Jul 13, 2021

Conversation

vyzo
Copy link
Contributor

@vyzo vyzo commented Jun 14, 2021

This PR enhances the logic in splitstore:

  • Dead experimental code related to full compaction with gc is removed.
  • We change the compaction range model to be right to left, so that multiple compactions that would happen after a lengthy sync are coalesced to one.
  • We add logic to implement a noop coldstore so that we can run with fixed hardware requirements (modulo chain headers).
  • We introduce our own state walking code to replace WalkSnapshot so as to improve performance, be scalable regardless of height, and also include all reachable state objects to account for potential misses.
  • We walk the entire range from current ts to compaction boundary during compaction, so as to correctly react to sync gaps; previously we only walked at the boundary, which could result in premature purging in the case of sync gaps.
  • We kill the tracking store. The base utility of the tracking store was to protect new writes during compaction, but tracking the writeEpoch. However, this breaks down (very) badly in reality as vm flushing (see vm.Copy) intelligently tries to avoid duplicate writes and occurs checks with Has.
  • We fix a race between purge and objects being considered live by higher layers that would lead to a fatal error with a noop coldstore and result in live state objects being prematurely moved to cold storage otherwise. The issue is that between marking and purge an object might be recreated and checked for existence in the blockstore by the vm with Has. If the object was not marked as rechable and the access happened before purge, then it would result in a miss. We fix this by making the compaction transactional, in that during compaction accesses to objects are recorded so as to not purge live objects.
  • We treat Has as an implicit (recursive) write to account for vm behaviour on Copy.
  • We keep all headers in the hotstore as it is not currently safe to discard them.

Follow up:

  • We need a way to mark network initiated blockstore accesses such as not to consider these objects live for compaction purposes and blow the cache. This will need a blockstore interface change to accept options.
  • We need a way for the vm to explicitly tell us whether to treat a Has request as an implicit write. We currently do it for every invocation (which is safe for now, given our usage patterns), but this is brittle; this will similarly require a blockstore interface change to accept options.
  • We need to identify how many headers we need to keep in the case of fixed hardware requirements (e.g. boosters), so as not to slowly grow the hotstore with chain headers.

@vyzo vyzo requested a review from raulk June 14, 2021 17:54
This was referenced Jun 14, 2021
@jacobheun jacobheun added the team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs label Jun 15, 2021
@jacobheun jacobheun added this to the v1.11.x milestone Jun 17, 2021
@vyzo vyzo force-pushed the feat/splitstore-redux branch from 923eaeb to 98c6530 Compare June 28, 2021 12:22
@vyzo
Copy link
Contributor Author

vyzo commented Jun 28, 2021

rebased on master.

vyzo and others added 18 commits July 4, 2021 18:38
this is necessary to avoid wearing clown shoes when the node stays
offline for an extended period of time (more than 1 finality).

Basically it gets quite slow if we do the full 2 finality walk, so we
try to avoid it unless necessary.
The conditions under which a full walk is necessary is if there is a
sync gap (most likely because the node was offline) during which the
tracking of writes is inaccurate because we have not yet delivered the
HeadChange notification.  In this case, it is possible to have
actually hot blocks to be tracked before the boundary and fail to mark
them accordingly.  So when we detect a sync gap, we do the full walk;
if there is no sync gap, we can just use the much faster boundary
epoch walk.
for maximal safety.
@vyzo vyzo mentioned this pull request Jul 10, 2021
@vyzo
Copy link
Contributor Author

vyzo commented Jul 10, 2021

Follow up issue for testing: #6725

vyzo and others added 3 commits July 13, 2021 03:11
so that we preclude the following scenario:
    Start compaction.
    Start view.
    Finish compaction.
    Start compaction.

which would not wait for the view to complete.
Because stebalien has allergies.
Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address my final final review in #6718 before merging. But I'm going to give this a 👍 anyways because the change shouldn't be too difficult.

blockstore/splitstore/splitstore.go Outdated Show resolved Hide resolved
vyzo added 2 commits July 13, 2021 09:01
    
We can add after Wait is called, which is problematic with WaitGroups.
This instead uses a mx/cond combo and waits while the count is > 0.
The only downside is that we might needlessly wait for (a bunch) of views
that started while the txn is active, but we can live with that.
@vyzo
Copy link
Contributor Author

vyzo commented Jul 13, 2021

Addressed the final final review issue in 257423e, and finetuned in af39952

@vyzo vyzo requested a review from magik6k July 13, 2021 06:14
@vyzo
Copy link
Contributor Author

vyzo commented Jul 13, 2021

Summoning @magik6k -- this is ready for you!

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't spot anything to be nitpicky about, I guess this means we can see if it floats.

@magik6k magik6k merged commit c37401a into master Jul 13, 2021
@magik6k magik6k deleted the feat/splitstore-redux branch July 13, 2021 10:43
blockstore/splitstore/splitstore.go Show resolved Hide resolved
quiet = true
log.Warnf("error checking markset: %s", err)
}
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still inconsistent with trackTxnRef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in what way?
Are you concerned about an error not causing the transaction to abort?
Currently the only way the markset errors is if it has been closed, ie the transaction has been aborted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In trackTxnRef, we track the ref even when we error. Here, we skip when we error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes you are right; let me fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 7785467

}
s.txnViewsWaiting = false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we're waiting for all the views to end, but then we're not doing anything atomically. That can't be right.

Should we be waiting for all the non-tracked views to end? I.e., should we have a return at https://github.com/filecoin-project/lotus/pull/6474/files#diff-eac9e730a0594047de6e81aa421dcd33aac194e505a18945ca45e72db789687cR642?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, what? we have already started the transaction.
All the views are tracked now as they always increment the rlock.
The bariier just ensures that there is no view that started before the transaction that hasn't ended.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. This should be fine.

Ideally, we'd track views started after the transaction starts separately, but that's not strictly necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

To summarize our sync conversation: a long running view is a bug and i'd very much rather hang the compaction (which is something we'll see in the logs) than do a potentially catastrophic delete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team/ignite Issues and PRs being tracked by Team Ignite at Protocol Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants