Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync Performance: Support requesting kernel MMRs separate from the rest of TxHashSet #1968

Open
DavidBurkett opened this issue Nov 13, 2018 · 10 comments
Assignees

Comments

@DavidBurkett
Copy link
Contributor

Creating this issue to track progress on a subset of the proposals in #952. The current plan is to:

  1. Support requesting Kernel MMRs in one request, and outputs + rangeproofs in a separate request. A new (temporary) capability will be added to advertise support for these new requests.
  2. Return raw data for the kernel MMR instead of using BZip2, which is unnecessarily slow, and doesn't provide any real compression due to the randomness of our data.
  3. Return only the leaves for the kernel MMR, which is all that's necessary for the consumer to rebuild it (since kernels are never pruned).

Related:
#1219
#952
#804

@ignopeverell
Copy link
Contributor

It would also be interesting to have the rangeproofs be requested separately for a given set of output positions (possibly through a bloom filter). The idea would be to have a client request a random subset of range proofs of its choosing for validation. This would be useful for light clients that could validate range proofs statistically by getting say only 20% of them. We could even have some light clients not request any range proof at all and trust the miners have done that validation for them.

@DavidBurkett
Copy link
Contributor Author

So you're thinking an additional message for that? We would still be required to have a separate message that gets ALL of the non-pruned leaves for the outputs and rangeproofs, correct?

@DavidBurkett
Copy link
Contributor Author

DavidBurkett commented Nov 14, 2018

I'm sure something similar has already been proposed, but I wanted to make a moderately concrete proposal for how to efficiently request and receive kernels, so I can get everyone's thoughts on it.

PROPOSAL:

Have 2 requests and 2 responses for kernel retrieval. We can worry about the names later, but for now:

GetKernelRoots {
Hash block_hash,
u64 block_height
}

KernelRoots {
Hash block_hash,
u64 block_height,
Vec kernel_mmr_roots
}

Kernel_mmr_roots could hold a merklish root for each, say, 512 TxKernels (leaves) in the kernel_mmr. So, for the kernels from leaf index 0 to leaf index 511, find the merklish root of those, and that would be the first hash in kernel_mmr_roots. Continue to do the same until you reach the final TxKernel/leaf in the kernel_mmr.

GetKernels {
Hash block_hash,
u64 block_height,
u64 start_leaf_index (ie. 0, 512, 1024, etc.)? (or maybe Hash kernel_mmr_root?)
}

Kernels {
Hash kernel_mmr_root
Vec kernels /// The (up to) 512 kernels requested, in order they appear in leafset
}

After first calling GetKernelRoots for 1 peer, you will have all of the merklish roots for each batch of kernels that are requested using the GetKernels message. This allows you to easily parallelize the calls to GetKernels, since you could request kernels 0-511 from Peer A, 512-1023 from Peer B, and so on. Also, you can validate each batch as it comes in (as part of a Kernels message), since you know what the merklish root of it should be.

@antiochp
Copy link
Member

Just reading through this now.

Also, you can validate each batch as it comes in (as part of a Kernels message), since you know what the merklish root of it should be.

You can validate that what you received is what the sender says they sent, but this won't actually validate that the kernels are "correct". To do this you must validate the kernel MMR root against the block headers.

There are a couple of existing things we can leverage here -

  • each header commits to the following -
    • kernel_mmr_size
    • kernel_root

We can potentially download in parallel from different peers, but to validate the kernels we can only do this sequentially, by appending the kernels to the "one true" kernel MMR and verifying the root at various MMR sizes (corresponding to various headers).

Say header 1 committed to the kernel MMR root for a kernel MMR of size 7.
We could retrieve kernels, maybe all of them, maybe some chunk of them.
And then appending them incrementally to the kernel MMR, stop at size 7 and verify the root matches the kernel_root in the header.
Then apply more kernels until we hit the MMR size for the next header and verify the root again.
This way we verify the root at each header height and confirm the MMR is growing exactly as expected.

In fact we currently do this, but in reverse order as part of the txhashset.zip validation. We "rewind" back through the header chain, verifying the kernel MMR root for each header.
But we only do this currently based on the fully built kernel MMR received from a peer (we don't rebuild it locally).

If we're going to rebuild locally from just the leaves then we can do both simultaneously - build it incrementally and verify it as we go.

Make sense?

@DavidBurkett
Copy link
Contributor Author

You can validate that what you received is what the sender says they sent, but this won't actually validate that the kernels are "correct". To do this you must validate the kernel MMR root against the block headers.
I might not have been clear in my original message. You should be able to validate it against the values you received during the first call to GetKernelRoots. The flow would be as follows:

  1. Send GetKernelRoots message to single peer of choice
  2. Validate response in KernelRoots message to ensure that merklish root of all of the 'batch' roots that you received is the same as the root you have stored in the header's kernel_mmr_root field.
  3. Make parallel calls to different peers requesting the kernel 'batches' using the GetKernels message.
  4. As each 'batch' comes in via a Kernels message, verify the merklish root of the 'batch' using the kernels in the batch.
  5. Once all 'batches' are received and verified, build the entire kernel MMR.

As an example, assume there are 1024 kernels. The call to GetKernelRoots will produce a response in a KernelRoots message with something like Vec(kernel_mmr_roots) = ['123456...', 'abcde...']. You can then call GetKernels to request the first 512 kernels from Peer A, and request the next 512 kernels from Peer B. Peer A should respond with Kernels{'kernel_mmr_root'='123456...', kernels=['kern0...', 'kern1...', ..., 'kern511...']}. Peer B should respond with Kernels{'kernel_mmr_root'='abcde...', kernels=['kern0...', 'kern1...', ..., 'kern511...']}. Each of those should be verifiable independently. Does that make sense?

@DavidBurkett
Copy link
Contributor Author

@antiochp Just re-read the rest of your message. Why would we need to 'rewind' and check each header against the kernel_root individually? Shouldn't verifying against the most recent one be enough to know that you have the right kernels, since it's a cryptographically secure hash? Or am I missing something?

@tromp
Copy link
Contributor

tromp commented Nov 14, 2018

the most recent kernel_root can be set to the mmr root of an arbitrary kernel mmr, not necessarily one consistent with historical kernels.

@ignopeverell
Copy link
Contributor

I agree with this:

If we're going to rebuild locally from just the leaves then we can do both simultaneously - build it incrementally and verify it as we go.

From the header chain, receiving those kernels by chunks, we can validate against the root hashes in the header chain as kernels come. We could even have some lookahead for download to introduce some parallelization as we also know exact kernel indices from the blocks headers.

@DavidBurkett
Copy link
Contributor Author

I agree with the suggestions, and am currently going with the following 2 messages for kernel sync:

/// Request to get the kernels(leaves) from the kernel MMR up to the given block.
/// This will return the kernels from the specified leaf index onward, up to MAX_KERNELS at a time.
pub struct GetKernels {
/// Hash of the last block for which the kernels should be provided.
/// Also used to identify the chain/fork the sender is requesting kernels for.
pub last_hash: Hash,
/// Height of the corresponding block.
pub last_height: u64,
/// The (leaf) index of the first kernel being requested.
pub first_kernel_index: u64,
}

/// Response to a Get kernels request containing the requested kernels.
pub struct Kernels {
/// Hash of the block identifying the chain/fork the kernels belong to.
pub last_hash: Hash,
/// Height of the corresponding block.
pub last_height: u64,
/// The (leaf) index of the first kernel returned.
pub first_kernel_index: u64,
/// The requested kernels in the order they appear in the Kernel MMR leafset.
pub kernels: Vec,
}

@Anynomouss
Copy link

@yeastplume This issue can be closed since PIBD separates the kernel MMRs and the TxHashSet and downloads them from multiple peers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants