Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/mempool sync #2884

Merged
merged 53 commits into from
Jan 6, 2022
Merged

Feat/mempool sync #2884

merged 53 commits into from
Jan 6, 2022

Conversation

jcnelson
Copy link
Member

@jcnelson jcnelson commented Oct 16, 2021

This PR implements an anti-entropy protocol for querying transactions from other nodes' mempools, based on https://docs.google.com/document/d/1uHLUZEkzJJA8HtKfKVZmZIn5n7sY0X9j6gg90IyixYM/edit?usp=sharing.

It does the following:

  • Adds a bloom filter and counting bloom filter implementation. Importantly, the counting bloom filter is (a) disk-backed -- specifically, backed by a Sqlite blob -- and (b) can be converted into a bloom filter for issuing queries to other nodes.

  • Adds POST /v2/mempool/query, which takes a MemPoolSyncData structure as input and streams back zero or more recently-arrived transactions from the mempool (as determined by block height).

  • Adds a state machine to the main P2P state machine, which will cause a node to periodically ask a randomly-chosen outbound neighbor for any transactions in its mempool that it does not have locally.

The protocol in this PR is a simple set-reconciliation protocol. A client node will periodically ask a peer node for the list of recent transactions that it does not have in its own mempool. It does this by sending the remote peer a sketch of its mempool, which the remote peer uses to iterate through its set of recent transactions and stream back transactions it has that are not in the sketch.

A MemPoolSyncData structure encodes this sketch. Specifically, it encodes either a list of 8-byte transaction ID prefixes ("tags"), or a bloom filter with a protocol-defined error rate and size. The /v2/mempool/query handler will handle either variant; the client uses the variant that is cheaper for it to generate with an acceptable error rate.

The MemPoolSyncData represents a sketch calculated from node-specific transaction tags. These are calculated as siphash(node_seed, txid), which is an 8-byte value. The node_seed is a node-specific random 32-byte value. The reason for doing this is to ensure that a transaction has a different identifier for each node, so that even if a transaction is "masked" in one node's sketch via a false positive, it will almost certainly not be masked in other nodes' sketches. It also ensures that no one can create a "malicious" transaction whose txid masks another txid in all nodes' mempools. Note that this is an important anti-censorship tactic, because if the remote peer sees a positive match for a transaction in the client peer's MemPoolSyncData, it will not serve that transaction (even if it's a false positive).

Nodes can individually decide how many transactions they'll reply in a mempool query (the maximum is 8192, which is a protocol constant). To make sure that different nodes serve different transactions even though they can serve different numbers of transactions, a node iterates through transactions in the sketch in a deterministic random order by using its own sketch seed to calculate a random permutation of transactions to consider. Specifically, each txid in the mempool has a randomized_txid (i.e. derived as siphash(node_seed, txid)), and transactions are iterated through in randomized_txid order. This way, a heavily-loaded node can get away with serving fewer than the maximum transactions, and downstream client peers are still just as likely to receive a transaction as any other.

Given that everyone's attention is currently on SIP-012, I'm leaving this in draft status for now. I'd like to leave it running in production for a while to verify that a set of nodes reliably keep their mempools in sync.

…ementation, for use in mempool synchronization. Importantly, the disk-backed counting bloom filter lives in a sqlite3 table and maintains its buckets efficiently (i.e. in a blob), and can be converted into a bloom filter.
…h functions in the bloom filter. It used to be in std::hash, and still is, but is deprecated. So, just require it directly.
…last few blocks' worth of transactions. Whenever a transaction is accepted, it's added to the bloom filter. Whenever a transaction arrives for a never-before-seen chain height, it removes the last-recent transactions from the counting bloom filter. In addition, this patch extends the mempool to paginate through the mempool's recent transactions in a deterministic but random order (that is node-specific), as part of the ability to stream transactions out of the mempool in response to a mempool query. Due to the size of a bloom filter query, this code also handles direct requests for transactions as a list of 8-byte prefixes. In both cases, transactions are queried by taking the siphash over the node-specific seed and the txid, so that different nodes will put the same transaction into different buckets (so someone who wanted to induce false positives in one node's bloom filter would be unable to do so for other nodes)
…a list of transaction tags (8-byte prefixes), or a bloom filter with a node-specific initial hash state
…ynchronize the node's mempool, and how long it's allowed to last
…ol-downloaded transactions are propagated to the relayer through the NetworkResult
@jcnelson
Copy link
Member Author

@kantai Thanks for all your comments! I think I've addressed them all now.

…pool state transition loop for advancing the state machine and recording data obtained by state transition functions; refactor state-transition functions to simply return new data instead of trying to store it in the PeerNetwork themselves; put all mempool_sync_reset() calls into the top-level mempool state machine loop; run mempool state machine in parallel to the main state machine so we don't stall the latter.
@jcnelson jcnelson requested a review from kantai December 17, 2021 16:26
Copy link
Contributor

@gregorycoppola gregorycoppola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, empirical data looks good.

Are there any integration tests being checked in for this?


#[test]
#[ignore]
fn test_rpc_mempool_query_bloom() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this testing?

let req_md = http_request.metadata().clone();
println!("{:?}", http_response);
match http_response {
HttpResponseType::MemPoolTxs(_, txs) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this the answer?

@@ -0,0 +1,1016 @@
// Copyright (C) 2013-2020 Blockstack PBC, a public benefit corporation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://app.codecov.io/gh/blockstack/stacks-blockchain/compare/2884/tree/src/util/bloom.rs

Code coverage is pretty good for this file.

BitField::clear isn't tested and some error conditions aren't tested. Might as well add a test for clear while you're in the area, up to you about error conditions.

Copy link
Member

@kantai kantai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, my only remaining comment is (possibly bikeshedding?) about the name of stream_transactions. I do feel pretty strongly that it needs to have more details in its rustdoc at a minimum, and really should be called something other than stream_transactions.

Copy link
Contributor

@pavitthrap pavitthrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a few minor comments

@@ -3489,6 +3543,63 @@ mod test {
peer_1.sortdb = Some(sortdb1);
peer_2.sortdb = Some(sortdb2);

// stuff some transactions into peer_2's mempool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a comment that this relates to the test test_rpc_mempool_query_txtags and test_rpc_mempool_query_bloom

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/net/p2p.rs Outdated
@@ -2109,6 +2142,42 @@ impl PeerNetwork {
Ok(done)
}

/// Do a mempool sync. Return any transactions we might receive.
/// Return true if we finish the sync.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update comment here perhaps - function does not return true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

src/net/p2p.rs Outdated
(false, Some(url)) => {
// success! can advance
self.mempool_sync_data_url = Some(url);
self.mempool_state = MempoolSyncState::ResolveURL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to store the url with the resolve URL state itself? Like MempoolSyncState::ResolveURL(url). That way we can also remove the logic in the ResolveURL state for the case were mempool_sync_data_url is None.
Similar suggestion for the SendQuery and RecvResponse state.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea. Done.

/// Make a mempool sync request.
/// If sufficiently sparse, use a MemPoolSyncData::TxTags variant
/// Otherwise, use a MemPoolSyncData::BloomFilter variant
/// If force_bloom_filter is true, then always make a bloom filter. The reason for doin this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

force_bloom_filter is not used in this function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +1759 to +1764
let mut tags_table = HashSet::new();
if let MemPoolSyncData::TxTags(_, ref tags) = data {
for tag in tags.iter() {
tags_table.insert(tag.clone());
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we precompute this set since we may call this function multiple times in a row?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of tags is small enough in practice that it shouldn't matter.

Comment on lines +1334 to +1343
for txid in txids.iter() {
if !recent_set.contains(&txid) && bf.contains_raw(&txid.0) {
fp_count += 1;
}
if bf.contains_raw(&txid.0) {
present_count += 1;
} else {
absent_count += 1;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this logic not in the branch for TxTags?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code path is verifying that the bloom filter's false positive rate is sufficiently small. This does not apply to TxTags, since the TxTags query variant does not use a bloom filter.

let recent_txids = mempool.get_bloom_txids().unwrap();
if recent_txids.len() < (present_count + absent_count) as usize {
nonrecent_fp_rate = (fp_count as f64)
/ ((present_count + absent_count - (recent_txids.len() as u32)) as f64);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be easier to parse this statement if present_count + absent_count was replaced with txids.len()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is deliberate -- we don't care to test the code in the if-block if recent_txids.len() < (present_count + absent_count) as usize { /* ... */ } if the query is a TxTags variant.

@blockstack-devops
Copy link
Contributor

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@stacks-network stacks-network locked as resolved and limited conversation to collaborators Nov 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants