-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Warp Sync: Alternative snapshot formats #8565
Comments
@ngotchac and I had a conversation about more predictable state chunk starting points and came to this proposal: We get rid of duplicate-code optimizations where contract code shared between accounts is only present once (this may not affect size significantly for a few reasons). We follow the same chunk-splitting algorithm as normal, but also enforce that groups of chunks only cover certain ranges of addresses. For example, if we want to have sets of state chunks each covering 1/256th of accounts, we would have a chunk starting with the first account after
Then, assuming each node can process K accounts before the pruning history is up, we can assign the range size based on the number of accounts and our assumption of K. Nodes will produce as many "sub-snapshots" of K accounts, randomly selecting ranges, as possible before the state of the snapshot block is pruned. notes on the account-counting heuristic:
|
That will only work given uniform distribution of accounts, seems that it could be easily attacked by creating a bunch of addresses within that particular chunk range. |
another algorithm for account-counting: do N_SAMPLES random walks down the account trie. build calculate approximate number of accounts with:
it's definitely an attack vector to some degree, but it won't cause oversized chunks, just more chunks covering that range (and perhaps the nodes attempting to produce that range will not finish in time). But if the amount of accounts per range is an underestimation it's very likely that some nodes will manage to complete the chunks covering that range regardless, so it's still an improvement over the current system. At the very least, they can still propagate the chunks they did manage to produce and it will be just the ones at the very end of the range which will be harder to find on the network. |
@rphmeier @Tbaut for fast search of state_root , how about add indexing table for state_root and blockheight by 1000 or 10000? I've modified simple offset based export function. it is little time decrease, but, if index is provided for search, it will very faster and distributed function. reference commit It also applied to making snapshotdb. |
Let's move this to the forum. |
@5chdn Where's the forum about this? |
forum.parity.io, are you interested to follow this topic? |
I might be, can't sign up with any of my emails though. Is it closed forum? |
@5chdn there's error on signup forum.parity.io |
let me discuss this |
We want to investigate new snapshot formats which are better for the following properties:
Snapshot chunks are currently divided into two categories:
W.r.t. different consensus engines, the "security" chunks will look completely different but will usually contain reusable data. For example, validator-set based consensus systems prove security with a series of bootstrapped handoffs (as long as we relax the weak subjectivity model to assume that old validator sets remain uncorrupted). All finalized handoffs can be reused, although usually their proof is small enough that all the handoffs can fit in a single chunk. Depending on the state churn and snapshot distance we may also be able to reuse some state chunks.
A keystone/delta model where we have intermittent "full" state snapshots every N*K blocks and the snapshots between them (every K blocks) only store deltas over that state is one possibility.
One major problem with the current snapshot system is that it is too heavy for most nodes to produce a snapshot before they prune the state that it encodes from their database. State chunks are currently very tightly packed using a method that makes it impossible to determine which account a chunk starts at or the exact data of the account entry without having produced all the chunks before. One possibility is to design predictable scheme for the boundaries of chunks will allow nodes to produce some of the state chunks but not all.
We can augment this scheme with random sampling: nodes which don't produce full snapshots will randomly sample some accounts and produce account entries for them, which they will keep on disk. They will refuse to propagate any snapshot where their random sample doesn't match the data in the snapshot. Assuming all repropagating nodes have their own random sample and a sufficiently large network, this makes it very unlikely for bad snapshot data to make its way through to unsynced nodes.
cc @ngotchac, @Vurich, @ordian
The text was updated successfully, but these errors were encountered: