blockchain sync: reduce disk writes from 2 to 1 per tx #9135

jeffro256 · 2024-01-24T23:08:21Z

Summary

Pros:

During sync, instead of performing 1 write, then 1 read, then one write for each tx in the chain, we just write once. This increases the lifespan of the disk and speeds up badly buffered / not buffered I/O. On a newer NVME and with a Ryzen 9 3900X, blockchain sync was around 3-4% faster. Differences will be more pronounced for systems bottle-necked by disk speed.
This PR is backwards compatible to receive NOTIFY_NEW_BLOCK commands, but the code paths between handle_notify_new_block and handle_notify_new_fluffy_block are merged for less code surface and review time.

Cons:

Complicated review

Hopefully this will move monerod towards being slightly more workable for hard drives in the future.

Design

New: `cryptonote::ver_non_input_consensus()`

I have created a function cryptonote::ver_non_input_consensus() in tx_verification_utils that checks all consensus rules for a group of transactions besides the checks in Blockchain::check_tx_inputs(). For Blockchain::handle_block_to_main_chain, this is the condition that txs must satisfy before being attempted to be checked for inputs and added to blocks. This function is the most important component that MUST be correct or otherwise chain splits / inflation could occur. To audit the correctness of this function, start at the function cryptonote::core::handle_incoming_txs() in the old code and step through of the rules checked until the end of the function cryptonote::tx_memory_pool::add_tx(). cryptonote::ver_non_input_consensus() should cover all of those rules.

Modified: `core::handle_incoming_tx[s]()`

Before, cryptonote::core::handle_incoming_txs() was responsible for parsing all txs (inside blocks and for pool), checking their semantics, passing those txs to the mempool, and notifying ZMQ. Now, cryptonote::core::handle_incoming_txs() is deleted and there is only cryptonote::core::handle_incoming_tx(). cryptonote::core::handle_incoming_tx() is now basically just a wrapper around tx_memory_pool::add_tx(), additionally triggering ZMQ events, and is only called for new transaction notifications from the protocol handler (not block downloads).

Modified: `tx_memory_pool::add_tx()`

All of the consensus checks besides Blockchain::check_tx_inputs() inside of add_tx() were removed and replaced with a call to cryptonote::ver_non_input_consensus(). The relay checks remain the same.

Modified: `Blockchain::add_block()`

add_block() now takes a structure called a "pool supplement" which is simply a map of TXIDs to their corresponding cryptonote::transaction and transaction blob. When handle_block_to_main_chain attempts to take transactions from the transaction pool to add a new block, if that fails, then it falls back on taking txs from the pool supplement. The pool supplement has all the non-input consensus rules checked after the PoW check is done. If the block ends up getting handled in Blockchain::handle_alternative_block, then the pool supplement transactions are added to the tx_memory_pool after their respective alt PoW checks.

Modified: `t_cryptonote_protocol_handler::handle_notify_new_fluffy_block()`

The main difference with this function now is that we construct a pool supplement and pass that to core::handle_incoming_block() instead of calling core::handle_incoming_txs() to add everything to the mempool first.

Modified: `t_cryptonote_protocol_handler::try_add_next_blocks()`

The changes are very similar to the changes made to handle_notify_new_fluffy_block.

Modified: `t_cryptonote_protocol_handler::handle_notify_new_block()`

Before, this function has separate handling logic, but now we just convert the NOTIFY_NEW_BLOCK request into a NOTIFY_NEW_FLUFFY_BLOCK request and call handle_notify_new_block with it. This saves us having to make the same changes to both code paths.

src/cryptonote_core/tx_verification_utils.cpp

jeffro256 · 2024-01-25T08:19:44Z

I'm thinking about having core::handle_incoming_txs basically do nothing except pass the tx to tx_memory_pool::add_tx, which then passes the transaction through the verify_pool_supplement tests. This would make the code so much more robust against future discrepancy between changes to the pool rules and the verify_pool_supplement rules, but it would require some more refactoring.

vtnerd

I stopped my review because I (think) found a breaking change to ZMQ - this no longer broadcasts a transaction first seen in a new block. This case is explicitly mentioned in the docs. It looks like txes are still broadcast while chain sync is occurring, so this breaking change makes things real inconsistent.

I think you'll have to carry around a ZMQ object until handle_main_chain to get this working. This could arguable improve how alternate block txes are handled (by not broadcasting them), but then there is the reorg case where txes are seen for the first time on the reorg.

I'm not certain how hard this is to hack together, and I hope we don't have to revert the docs (and thereby make it hell on downstream projects).

contrib/epee/include/net/levin_protocol_handler.h

src/cryptonote_core/blockchain.cpp

src/cryptonote_protocol/cryptonote_protocol_handler-base.cpp

src/cryptonote_protocol/cryptonote_protocol_handler.inl

src/cryptonote_core/cryptonote_core.cpp

vtnerd · 2024-01-26T16:37:51Z

src/cryptonote_core/tx_pool.cpp

-      fee = tx.rct_signatures.txnFee;
-    }
-
+    const uint64_t fee = get_tx_fee(tx);


This now throws on error instead of returning false. Worth a try/catch (or is this verified elsewhere before this) ?

It should be verified inside ver_non_input_consensus(), but it's worth double checking

I didn't see another check for the fee, just a check for overflow on inputs and outputs, done separately for each.

I'm not certain how an exception escaping this function alters the behavior of the code (you'd probably have a better idea than me at this point).

In core::check_tx_semantic, we check input overflow, output overflow, but also that inputs sum > outputs sum for v1 transactions.

src/cryptonote_core/tx_pool.cpp

src/cryptonote_core/tx_verification_utils.cpp

src/cryptonote_core/cryptonote_core.cpp

src/cryptonote_protocol/cryptonote_protocol_handler.inl

vtnerd

A little more confident that this will work. But I will probably do a third pass after your responses to these questions.

src/cryptonote_core/cryptonote_core.cpp

vtnerd · 2024-01-28T23:50:41Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

-          fullConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});
-        }
+        LOG_DEBUG_CC(context, "PEER SUPPORTS FLUFFY BLOCKS - RELAYING THIN/COMPACT WHATEVER BLOCK");
+        fluffyConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});


This is forcing fluffy blocks on someone that explicitly requested no fluffy blocks. But losing chain sync until they disable the flag is basically the same thing with more steps.

0xFFFC0000

@jeffro256 Putting the benchmark results I sent in DM here, until we find which operations actually causing the slow-down. results-2500blocks-5iter.txt

0xFFFC0000 · 2024-02-01T12:13:16Z

New performance results: the performance problem of pop_blocks specifically related to this PR has been fixed in the new push. The only remaining part is we still have a little bit of performance drop for sync operation. I am attaching the file in case anyone wants to check it.

results-2500blocks-5iter-v2.txt

jeffro256 · 2024-02-02T07:05:23Z

I think I've found a reason why the sync time of this PR looks slower than the sync time of master that test script: between the call to pop_blocks and flush_txpool, which is several seconds in some cases, the master node can use the popped txs already inside the mempool to skip most the checks (especially Blockchain::check_tx_inputs) before validating a block which gives it a significant boost. This state won't happen organically during normal sync, so this test script doesn't quite capture the normal behavior during sync when you didn't already have the txs in the mempool.

To fix the script, instead of doing:

pop_blocks
flush_txpool
Wait for sync

You could do:

Start monerod offline
pop_blocks
flush_txpool
stop_daemon
Start monerod online
Wait for sync

This does have the downside of including the start-up time as the sync time, and the choice of peers on new instance may affect the speed at which it syncs, but you could minimize these effects by increasing the number of popped blocks.

vtnerd

This is looking pretty good. Mainly curious about your response to one more ZMQ related thing - I think we'll have to accept it as a new "feature" of the design.

src/cryptonote_core/cryptonote_core.cpp

src/cryptonote_core/blockchain.cpp

vtnerd · 2024-02-04T22:39:50Z

src/cryptonote_core/blockchain.cpp

@@ -1196,7 +1198,7 @@ bool Blockchain::switch_to_alternative_blockchain(std::list<block_extended_info>
    block_verification_context bvc = {};

    // add block to main chain
-    bool r = handle_block_to_main_chain(bei.bl, bvc, false);


Marker for me to remember to investigate further. The false bool was to prevent notifications of a failed reorg - hopefully the new code retains the same consideration.

Actually I have an open question about this - see #6347 . But it looks like notify was being ignored, and this is just cleanup on that?

notify was being ignored with the newest versions of master, this PR reflects that behavior, but I don't know if this was the original intended behavior... it looks like it isn't

src/cryptonote_core/blockchain.cpp

vtnerd · 2024-02-05T13:28:27Z

src/cryptonote_core/tx_pool.cpp

@@ -390,20 +329,29 @@ namespace cryptonote
    ++m_cookie;

    MINFO("Transaction added to pool: txid " << id << " weight: " << tx_weight << " fee/byte: " << (fee / (double)(tx_weight ? tx_weight : 1)) << ", count: " << m_added_txs_by_id.size());
+    if (tvc.m_added_to_pool && meta.matches(relay_category::legacy))
+      m_blockchain.notify_txpool_event({txpool_event{


I think this now notifies during a "return" to txpool call, where it wasn't being notified in that situation previously. The documentation doesn't list anything about this particular case, so we may have to accept this change. Its a rather rare edge case anyway.

I can make this conditional on !kept_by_block which would prevent notifications on return to txpool and reorgs.

Nevermind that comment, this would cause alt block handling to not notify

vtnerd · 2024-02-05T18:15:03Z

src/cryptonote_core/tx_pool.cpp

-      fee = tx.rct_signatures.txnFee;
-    }
-
+    const uint64_t fee = get_tx_fee(tx);


I didn't see another check for the fee, just a check for overflow on inputs and outputs, done separately for each.

I'm not certain how an exception escaping this function alters the behavior of the code (you'd probably have a better idea than me at this point).

vtnerd · 2024-02-05T18:36:34Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

+        const crypto::hash &tx_hash = new_block.tx_hashes[tx_idx];
+
+        blobdata tx_blob;
+        std::vector<blobdata> tx_blobs;


Small nitpick on performance, you can move tx_blobs and missed_txs before the loop, and .clear() right here.

This is a readability thing for me, but I personally don't like making the scope of variables any wider than it needs to be, especially for such an already complex function. If there's evidence that it measurably impacts performance, however, I would definitely be okay changing it to what you're suggesting.

relevant link

jeffro256 · 2024-02-06T05:58:20Z

Okay @vtnerd the last commits should hopefully handle ZMQ tx notifications better. We only notify A) when an incoming relayed transaction is new and added to the pool, B) a tx from a pool supplement was used to add a block, or C) an alt block contained a new tx and it was added to the pool.

j-berman

This is looking solid -- I have mostly nits + 1 comment on the latest zmq change

j-berman · 2024-02-22T21:12:07Z

src/cryptonote_core/tx_verification_utils.cpp

+
+    // Cache the hard fork version on success
+    if (verified)
+        ps.nic_verified_hf_version = hf_version;


The ps is const?

nic_verified_hf_version is marked mutable

j-berman · 2024-02-22T23:50:20Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl


+    const std::unordered_set<crypto::hash> blk_tx_hashes(blk.tx_hashes.cbegin(), blk.tx_hashes.cend());
+


I think it would be nice to have a check here that blk_entry.txs.size() == blk_tx_hashes.size() && blk_tx_hashes.size() == blk.tx_hashes.size()

This guarantees there aren't duplicates and that all blk_tx_hashes will map 1-to-1 with tx_entries. I can't find if this exact check is done somewhere else (probably is), but figure this would be a good early place for it anyway (either here or immediately after make_pool_supplement_from_block_entry inside try_add_next_blocks).

There's a check in make_pool_supplement_from_block_entry that all deserialized transactions belong to that block.

!blk_tx_hashes.count(tx_hash) in the make_pool_supplement_from_block_entry above this one checks that for all tx_entries, there's at least 1 matching block hash. Strictly going off that check (and ignoring all other code), it appears there could still be duplicates in this section's blk_entry.txs and blk.tx_hashes, and separately blk.tx_hashes could also have more hashes than are present in blk_entry.txs (which is the expected case when the make_pool_supplement_from_block_entry above handles tx_entries from a new fluffy block, not when syncing a block). In combination with the check you mentioned above, making sure all the container sizes are equal after constructing the set in this function immediately makes sure that when syncing a block, there aren't duplicate blk_entry.txs and that blk.tx_hashes captures all blk_entry.txs 1-to-1.

I don't see anything wrong with not doing the size check here, but it's a bit of a pain to verify there aren't issues surrounding this, and it seems an easy thing to check here.

Okay, yeah you were right, I mistakenly thought that making sure that each tx is bound to the block would prevent dups. Technically, this also doesn't check for dups See latest commit for update. We check that for all pool supllements that that the number of txs entries is less than or equal the number of hashes. For full blocks, we check that they are equal.

One edge case: if there's a dup in blk.tx_hashes, the equivalent dup in blk_entry.txs, and an extra hash in blk.tx_hashes, then the function would still return true with the dup included in the pool_supplement

Also checking blk_tx_hashes.size() == blk.tx_hashes.size() should prevent that

j-berman · 2024-02-23T02:24:40Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

-          fullConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});
-        }
+        LOG_DEBUG_CC(context, "PEER SUPPORTS FLUFFY BLOCKS - RELAYING THIN/COMPACT WHATEVER BLOCK");
+        fluffyConnections.push_back({context.m_remote_address.get_zone(), context.m_connection_id});


--no-fluffy-blocks meant a node wouldn't send fluffy blocks to its peers, not that a node wouldn't be relayed fluffy blocks from its peers (can quickly sanity check with monerod --no-fluffy-blocks --log-level 1 and see that new blocks are still received as fluffy blocks). Someone would have to manually build a v18 monerod that sets the final bit of m_support_flags to 0 in order to ask peers not to relay fluffy blocks to their node.

So to be clear, this PR as is shouldn't prevent current nodes on the network using any v18 release version of monerod from syncing/relaying/processing blocks, even nodes with the --no-fluffy-blocks flag (which still receive fluffy blocks today anyway).

Maybe the log could say "RELAYING FLUFFY BLOCK TO PEER" instead of "PEER SUPPORTS FLUFFY BLOCKS" because it's no longer checking if the peer supports fluffy blocks via the support_flags.

src/rpc/core_rpc_server.cpp

src/cryptonote_core/cryptonote_core.cpp

src/cryptonote_core/blockchain.cpp

src/cryptonote_core/tx_pool.cpp

j-berman

I've been running the latest for weeks now, running smooth on my end.

I've also combed through these changes many times now -- thanks for your work on this.

Minor comments in this latest review round, I'm ready to approve to after this.

src/blockchain_utilities/blockchain_import.cpp

src/cryptonote_protocol/cryptonote_protocol_handler.inl

tests/functional_tests/p2p.py

j-berman · 2024-03-26T07:05:59Z

tests/functional_tests/p2p.py

+        res = daemon2.get_transactions([txid])
+        assert len(res.txs) == 1
+        tx_details = res.txs[0]
+        assert not tx_details.in_pool


This test fails sporadically on this line on my local. I investigated, it looks like an existing bug unrelated to this PR.

If the transfer includes an output in its ring that unlocked in block N, after popping blocks to N-2, that tx is no longer a valid tx because that output isn't unlocked yet (it fails here). You'd expect that once the chain advances in daemon2.generateblocks above, then the tx becomes valid again and should therefore be included in a later block upon advancing, but it looks like this if statement is incorrect:

monero/src/cryptonote_core/tx_pool.cpp

Lines 1480 to 1482 in c821478

//if we already failed on this height and id, skip actual ring signature check

if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))

return false;

And it should instead be:

//if we already failed on this height and id, skip actual ring signature check if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height) && txd.last_failed_height >= m_blockchain.get_current_blockchain_height()) return false;

The ring sigs can become valid again if we're at a higher height than when the tx originally failed, so it should pass that if statement and continue on to the check_tx_inputs step again if so.

EDIT: slight edit to support a reorg making a tx valid

You're right about that check being wrong. However, even your proposed changes aren't conservative enough if you want to handle popped blocks: if the chain purely goes backwards in time (which only normally happens when pop_blocks is called), a transaction output with a custom unlock_time might actually UNLOCK. This is because Blockchain::get_adjusted_time() is not monotonic, so an output that is unlocked now may become locked again in a future block.

Interesting, so this should be good:

if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height) && txd.last_failed_height == m_blockchain.get_current_blockchain_height()-1)

Yes that should be good. If we wanted to get incredibly pedantic, we would also have to check that the hard fork version is greater than or equal to HF_VERSION_DETERMINISTIC_UNLOCK_TIME, since your system's wall-time might also not be monotonic, and consensus validation of a tx with a ring containing an output with a UNIX-interpreted unlock_time isn't necessarily deterministic. But I don't think we should worry about that case.

We should go with if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height) && txd.last_failed_height == m_blockchain.get_current_blockchain_height()-1) IMO, since that wall-time thing won't affect any future or past transactions, it's only a technicality.

tests/functional_tests/p2p.py

j-berman

Approving changes that include this commit: jeffro256@c388e12

Seems to be a github bug the PR doesn't include that commit

selsta · 2024-04-18T01:02:09Z

I applied this pull request locally and comments 5 commit is missing... not sure what's going on

vtnerd

This is really close, but I had a questions, in tx_pool.cpp in particular.

vtnerd · 2024-07-30T18:38:59Z

src/cryptonote_core/blockchain.cpp

@@ -5349,6 +5433,12 @@ void Blockchain::set_user_options(uint64_t maxthreads, bool sync_on_blocks, uint
  m_max_prepare_blocks_threads = maxthreads;
 }

+void Blockchain::set_txpool_notify(TxpoolNotifyCallback&& notify)
+{
+  std::lock_guard<decltype(m_txpool_notifier_mutex)> lg(m_txpool_notifier_mutex);


We might want to use boost::lock_guard instead, as we typically use boost for thread related things. I don't think it matters in this case; the suggestion is mostly for aesthetics/consistency.

I used std::lock_guard to not further cement Boost dependencies, and since std::mutex and std::lock_guard are already used within the codebase, I think it shouldn't affect binary size. However, I'm not incredibly opinionated either way.

We're already using it all over the place.

vtnerd · 2024-07-30T20:05:10Z

src/cryptonote_core/blockchain.cpp

@@ -5367,6 +5457,22 @@ void Blockchain::add_miner_notify(MinerNotifyCallback&& notify)
  }
 }

+void Blockchain::notify_txpool_event(std::vector<txpool_event>&& event)
+{
+  std::lock_guard<decltype(m_txpool_notifier_mutex)> lg(m_txpool_notifier_mutex);


src/cryptonote_core/blockchain.cpp

vtnerd · 2024-08-04T01:56:49Z

src/cryptonote_core/tx_pool.cpp


-      if(txd.last_failed_id != null_hash && m_blockchain.get_current_blockchain_height() > txd.last_failed_height && txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))
-        return false;//we already sure that this tx is broken for this height
+    if (txd.last_failed_id == top_block_hash)


I think this change is incorrect. You need something like:

if (txd.last_failed_height && txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height)) return false;

The first check is needed because null_hash is technically a valid value (but exceptionally rare). I think the original code should've included this.

The txd.get_current_blockchain_height() > txd.last_failed_height check can probably be removed.

However, the last check is the most important - this appears to be tracking/caching whether a tx inputs are invalid after a certain height. Your change here will force a check_tx_inputs check every new block, instead of only after a reorg.

Your change here will force a check_tx_inputs check every new block, instead of only after a reorg.

Yes, this was the intended goal. The act of checking tx unlock times against a non-monotonic moving value of get_adjusted_time() makes it so that a transaction can pass check_tx_inputs at block X, but fail at block Y>X, and then re-pass at block Z>Y. This is discussed further at monero-project/meta#966 (comment).

Nit: I think this section in this PR is good, and am for speeding it up in the general case in a future PR.

Behavior of the current code (excluding this PR):

For txs that are ready to go, it currently re-calls check_tx_inputs every time. There is no circumstance where it will short-circuit return true txs that should be ready to go.

For txs that are not ready to go, which should be an edge case minority of txs passed into this function, it makes an incorrect attempt at short-circuiting false. I say this is an edge case minority of txs because it would be a tx that was valid at one time that later became invalid, which should be rare (a reorg deep enough it would invalidate the ring signature, or unlock time reverts to locked).

Your change here will force a check_tx_inputs check every new block, instead of only after a reorg.

I agree that the check done in this PR could correctly short-circuit false in more circumstances, however, considering this should be a rare edge case, it's reasonable to argue this would be unnecessary error-prone complexity for this function. As such I'm good with this PR's approach as is.

I think it's also worth noting that we shouldn't have to run check_tx_inputs for txs that at one point were ready to go prior, so long as they were deemed ready to go on the same chain and don't reference any outputs with time-based unlock times. Aka there is a circumstance where we can short-circuit true that I think would significantly benefit this function in the general case. Considering this function impacts mining (see #8381), I think it's probably worth pursuing such a change in a future PR. It would be easiest to do with FCMP++ because there would be no need to deal with unlock time complexity.

vtnerd · 2024-08-04T02:33:21Z

src/cryptonote_core/cryptonote_core.h

@@ -1139,9 +1069,6 @@ namespace cryptonote

     time_t start_time;

-     std::unordered_set<crypto::hash> bad_semantics_txes[2];


This has been around since 2017, and its removal isn't strictly needed for this PR. I would keep it, and analyze later in a separate PR.

The locations of the inserts/finds will have to move slightly, but it's still possible.

Would it be harder to review that it is removed or that the new updates to the code are correct?

My initial thought would be that it is harder to review with it removed. I'd have to dig into why it was added to make sure that its removal isn't going to affect anything.

bad_semantics_txes acts as an optimization for a happy-case failure where a bad transaction is being floated around, but not modified. I think that bad_semantics_txes maybe makes sense for handling individual mempool transactions, but not for transactions passed as part of a block. For two reasons: 1) we now do PoW verification for blocks before transactions, which makes the cache largely worthless, and 2) calling ver_consensus_non_input() on a pool_supplement_t verifies a group of transactions are all valid, so in order to restore the functionality of bad_semantics_txes for transactions passed in a block, we'd have to rewrite/review ver_consensus_non_input() being able to return exactly which transactions failed (which isn't always possible with batch verification).

src/cryptonote_core/tx_pool.cpp

jeffro256 · 2025-01-29T06:10:59Z

Oops sorry for that latest push, I rolled back to c6f2ccd.

jeffro256 · 2025-02-05T18:08:14Z

This PR is ready for re-review

CRITICAL FIXME's: - sum of inputs == sum of outputs (requires using an updated rerandomized output and blinded C Blind at tx construction time) - serialize tree root for rct_ver_cache_t (planning to use the compressed tree root and then de-compress at verification time) Planning to rebase onto monero-project#9135

Gingeropolous · 2025-02-08T13:46:40Z

welp, the good news is that i got a node to sync with git pull origin pull/9135/head in it.

I've deleted my other notes, because as noted by others, I need to improve my testing setup to actually compare. So as of now, all I can confidently say is that I got the node to sync with 9135 pulled in.

nahuhh · 2025-02-08T14:15:09Z

If you add dynamic spans, make sure to also add dynamic block sync size (and manually increase speed limits)

iamamyth · 2025-02-08T18:56:44Z

There are tons of bottlenecks in the synchronization process. If you want to see what effect this PR has on real disk activity (which still matters for many reasons, the most obvious being writes wear out the disk), there are many options:

Dump the contents of /proc/<pid>/io for the daemon pre and post sync, and compare each version. You should hopefully see fewer bytes written in this branch.
Use atop (must be installed first) with suitable flags to limit the output to your monerod process.
Use iotop (must be installed first) in batch + accumulated mode.
Use auditd to perform a detailed audit of IO activity, including operations such as fsync (this is a bit tricky because you need to configure auditd properly to limit the scope of audit activity, otherwise it might audit itself and flush each entry, effectively an infinite loop).

One caveat: For the test to be fair, you'd want the "end" for both to be a particular block/height. A simple option would be to feed the daemon under test from an exclusive node you control which has a snapshot of the blockchain up to a fixed point; it'll speed up the test and get at the relevant info.

iamamyth · 2025-02-08T19:19:10Z

One other note: From the config you posted, I couldn't tell if your new daemon (labeled "syncing node") has both up and down equal, or just increases the default download rate; I would suggest making them symmetric.

Gingeropolous · 2025-02-13T23:34:45Z

current master with 9135 pulled in is now running on xmrchain.net

(edited to add: I didn't perform an initial sync using 9135 on the explorer node, just recompiled and started wherever the node was. More of a stability test than actually testing the effects on IBD).

had uptime of 21 hours. Now running 9135 + 9765 on master on xmrchain.net

vtnerd

Went through yet again. Sorry two more questions! This is about ready either way.

vtnerd · 2025-02-17T00:08:40Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

-          reg_arg.b = b;
-          relay_block(reg_arg, context);
+          // Relay an empty block
+          arg.b.txs.clear();


Why this change? It seems to differ than what the comments say above (that we only relay unknown txes whereas this is never relayed txes).

Oh I see. I think I wrote that comment before changing this part. There's a few valid schemes here with different trade-offs:

Be optimistic and relay an empty block to peers, expecting that they will have the necessary transactions in their pool or already confirmed the block

Be slightly cautious but mostly optimistic and put the transaction that you didn't know about alongside the block. If all nodes adopt this behavior, then in the happy case, it automatically converges on the behavior of scheme 1

Be pessimistic and put all transactions in the block

Option 1 is fastest in the best case scenario. Option 3 is fastest in the worst case scenario. Option 2 was somewhere in the middle, and might be the best in an average case. I think I intended for scheme 2 to be the implemented one at first, but I opted out because it leaks information about the structure of the network, and I haven't done the research to know whether or not that's a reasonable privacy risk. At the end of the day, it's an opinionated decision, but they are all "correct"; all will eventually relay the block. Option 3 is decidedly the worst from an average case performance view, and is basically what we had before fluffy blocks. The fundamental assumption behind fluffy blocks in general is that your peer probably knows about the transactions in the block you're about to relay to them by the time you relay the block. All in all, I think default relaying empty blocks (Option 1) will be fine in most cases. It definitely gives miners a disadvantage who mine blocks with transactions that break majority-held relay rules, which can be a good thing or a bad thing depending on your view.

But yeah, if we stick with Option 1, I will need to amend that comment.

If you didn't know about transaction(s), there's a decent chance your peer didn't either. And I'm not sure the privacy leak argument is very strong - you still have to ask a peer for missing transactions anyway.

If you didn't know about transaction(s), there's a decent chance your peer didn't either.

I agree

And I'm not sure the privacy leak argument is very strong - you still have to ask a peer for missing transactions anyway.

But that would be a privacy leak for the node that you don't control, whereas choosing what you relay leaks something about your node to someone else's. As for whether or not this is okay, what about the scenario that a node in the stem phase of transaction propagation (or stem adjacent) mines that transaction and propagates the block before the blackhole period is over? I think the reference code doesn't include such transactions in the block template, but I could be wrong. And if there is an alternative buggy implementation that does include such transactions in blocks, couldn't Scheme 2 propagate the result of this bug farther to spy nodes than if reference nodes went within Scheme 1?

At the end of the day, I'm not certain that this isn't a privacy leak, so personally I'd rather error on the side of caution, but I'm definitely open to changing my mind.

This issue is probably tied to #9334

I think this is fine for now. We can always tweak shortly anyway.

nah has tx a,b,c,d,e
vtnerd has tx a,b,c,d
jeffro has tx a,b,c

nah mines the block (with tx a,b,c,d,e), and sends to vtnerd, who sends to jeffro

How would each scenario play out? Whats the worst case situation for each?

do tx d+e get relayed from nah to vtnerd? what does vtnerd send to jeffro?

vtnerd leaks to jeffro that he didnt have e? But what about jeffro who is missing d+e?

abcde are all sent from nah to vtnerd to jeffro, even if jeffro already has them

vtnerd doesn't send the fluffy block to jeffro until e is received (probably from nah). In current code, vtnerd sends e to jeffro as an additional tx, but in this PR nothing gets sent. The current code leaks to jeffro that vtnerd didn't know about e.

Correct (until this patch, which changes that behavior). jeffro has to ask vtnerd about d+e in either scenario.

This never happens.

This PR arguably has less leaks. There's also the case where a node pretends to not know about a tx, based on settings and get_tx parameters.

I looked into how often NOTIFY_REQUEST_FLUFFY_MISSING_TX will appear in node logs when the log level is net.p2p.msg:INFO. I re-analyzed some node log data that was collected last year and described on page 20 of version 0.3 of "March 2024 Suspected Black Marble Flooding Against Monero: Privacy, User Experience, and Countermeasures".

Using logs from nodes that accepted incoming connections, I found that for a given p2p connection, the NOTIFY_REQUEST_FLUFFY_MISSING_TX message occurs in 0.7 percent (i.e. less than one percent) of blocks. The message is correlated with time: if one connection emits the message for a particular block, then other connections are more likely to emit the message. This makes sense because the message would tend to be emitted when a transaction was confirmed in a mined block before it could be propagated throughout the network. In that event, multiple nodes would need to emit the message for the same block.

Probably, this event is rare because transactions propagate throughout the network sooner than they are added to mining pools' block template. When I last measured transaction propagation times two years ago, the median time to propagate throughout the network was two seconds during the Dandelion++ fluff phase. On the other hand, most mining pools add new transactions to their block templates once every 10-30 seconds.

Those were the statistics under current network conditions. If transaction volume is so high that the txpool becomes congested, we would probably expect that the message is emitted even less frequently. The default behavior of monerod's block template construction is to first order the transactions by fee (there are 4 standard tiers) and then order them by first-seen. Therefore, conditional on fee (which is usually set automatically to the same tier for everyone), transactions are first-in/first-out. The transactions that have been broadcasted first and have been waiting a while are the transactions that would be confirmed in the next block, so it is unlikely that a node would be missing them.

vtnerd · 2025-02-17T00:11:14Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

-          return 1;
-        }
-      }      
+        MERROR("sent bad block entry: there are duplicate tx hashes in parsed block: "


I think you still need:

if (context.m_requested_objects.size() != arg.b.txs.size()) { // error }

along with a check to ensure that every requested txid was provided by the peer.
?

I'm like 80% sure that m_requested_objects is only relevant for the NOTIFY_REQUEST_GET_OBJECTS/NOTIFY_RESPONSE_GET_OBJECTS commands. When a node is missing txs in a fluffy block, they send a NOTIFY_REQUEST_FLUFFY_MISSING_TX command, which is responded to immediately with a new fluffy block containing the missing transactions. But these requests/responses don't touch m_requested_objects, which is for persistent, cross-command, state of block hashes.

I looked at our existing code + the history of FLUFFY_BLOCKS. This code you removed appears to be dead code. However, we may want to consider adding the constraints intended ... another patch? This one has enough already.

Yeah this comment has the right idea:

monero/src/cryptonote_protocol/cryptonote_protocol_handler.inl

Line 640 in 97c0ce7

// hijacking m_requested objects in connection context to patch up

. But I don't think it actually did anything, since we didn't set the size of m_requested_objects anywhere in this flow. If anything, it would just break honest syncing if that connection also sent us a fluffy block. Well, maybe not since we return early from handling fluffy blocks if we're in the synchronizing state, so yeah it's probably just dead code that doesn't do anything.

jeffro256 · 2025-02-18T19:04:54Z

Currently on c6f2ccd. I will amend the erroneous comment, then squash.

Gingeropolous · 2025-02-20T00:46:20Z

So I was running 9135 (and the http max thing) on master and it "aborted" and i got this: "corrupted size vs. prev_size". This is on the same box that I had the issue above, however its on a different HDD on that box. So the database is now being stored on that secondary hard drive.

iamamyth · 2025-02-20T04:56:22Z

The failing functional test is a problem with the commit (it replicates old, broken patterns in new places), rather than an old merge base, see comments here: #9740.

jeffro256 · 2025-02-21T19:45:39Z

So I was running 9135 (and the http max thing) on master and it "aborted" and i got this: "corrupted size vs. prev_size". This is on the same box that I had the issue above, however its on a different HDD on that box. So the database is now being stored on that secondary hard drive.

Would it be possible to run a memtest on that machine? That error can be caused by a programming bug in monerod or it could be caused by heap corruption due to bad physical memory. I know that you've had other corruption issues recently, and IIRC @Rucknium has said before that one of the MRL research machines has already replaced bad RAM sticks in the past. So if you would check to see if it's a hardware issue, that would be greatly appreciated.

* Speed up propagation polling * Remove duplicated "reorg" testing which doesn't give enough time for txs to propagate * Test two different types of block propagations: shared tx, and new tx

Gingeropolous · 2025-02-22T13:51:18Z

@jeffro256 , this is being run on my seed node, which is a remote box unrelated to the research cluster. To your point though, I don't have confidence in the seed node hardware. I want to setup a new box with a HDD in the lab to test this patch, because this current experience hasn't been clear. I just need to squeeze some more time from the ol' time fruit.

j-berman

Looking good. Some comments worth implementing imo, and some nits. Feel free to ignore the comments prefaced with "nit"

src/cryptonote_core/blockchain.cpp

j-berman · 2025-02-24T19:50:52Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

+          std::vector<cryptonote::blobdata> tx_blobs;
+          std::vector<crypto::hash> missed_txs;
+
+          bool need_tx = !m_core.pool_has_tx(tx_hash);


Nit: we can theoretically not have a tx in the pool when handle_single_incoming_block executes, then once it returns, receive a tx from another connection, then get to this point and already have the tx. Thus need_tx_indices can end up empty.

It's not an issue because handle_request_fluffy_missing_tx will still return a fluffy block even for an empty request for txs.

Might be cleaner to return the missing txs in handle_single_incoming_block instead. Not a blocker for this PR.

It's not an issue because handle_request_fluffy_missing_tx will still return a fluffy block even for an empty request for txs.

This might be the "better" option in the sense that this automatically makes ourselves re-encounter the fluffy block, even if another connection doesn't pass it to us in the future.

A more "optimal" solution would be caching blocks which pass PoW verification but are missing txs, and then triggering a re-verify of these blocks when the mempool is updated.

src/cryptonote_protocol/cryptonote_protocol_handler.inl

j-berman · 2025-02-25T02:32:56Z

src/cryptonote_protocol/cryptonote_protocol_handler.inl

+          std::vector<crypto::hash> missed_txs;
+
+          bool need_tx = !m_core.pool_has_tx(tx_hash);
+          need_tx = need_tx && (!m_core.get_transactions({tx_hash}, tx_blobs, missed_txs, /*pruned=*/true)


Nit: m_core.get_transactions could be replaced by m_core.get_blockchain_storage().have_tx(tx_hash) here.

Yes it could, but personally don't like that this would be the first time in the cryptonote protocol handler that the blockchain storage is exposed. We could add a core endpoint though ..

j-berman · 2025-02-25T03:42:46Z

src/cryptonote_core/blockchain.h

     *
     * @return false if any outputs do not conform, otherwise true
     */
-    bool check_tx_outputs(const transaction& tx, tx_verification_context &tvc) const;
+    static bool check_tx_outputs(const transaction& tx,


Nit: for a future PR, I would move this function into tx_verification_utils. Makes sense not to do it here to keep the diff smaller.

Agreed, yeah was just trying to minimize the diff as it's already huge. This pure function is ripe for relocation afterwards, though.

j-berman · 2025-02-25T04:02:59Z

src/cryptonote_core/tx_verification_utils.cpp

+    const std::uint8_t hf_version)
+{
+    // We already verified the pool supplement for this hard fork version! Yippee!
+    if (ps.nic_verified_hf_version == hf_version)


Nit: I'm not seeing how to trigger this if statement on re-review. Looks like the pool supplement is only used once and then discarded in all cases. Am I missing something there?

Doesn't look like an issue to me being here, just a little confusing

Nope not missing anything AFAIK, could be worth removing

j-berman · 2025-02-25T04:24:24Z

src/cryptonote_core/tx_pool.cpp


-      if(txd.last_failed_id != null_hash && m_blockchain.get_current_blockchain_height() > txd.last_failed_height && txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))
-        return false;//we already sure that this tx is broken for this height
+    if (txd.last_failed_id == top_block_hash)


Nit: I think this section in this PR is good, and am for speeding it up in the general case in a future PR.

Behavior of the current code (excluding this PR):

For txs that are ready to go, it currently re-calls check_tx_inputs every time. There is no circumstance where it will short-circuit return true txs that should be ready to go.

For txs that are not ready to go, which should be an edge case minority of txs passed into this function, it makes an incorrect attempt at short-circuiting false. I say this is an edge case minority of txs because it would be a tx that was valid at one time that later became invalid, which should be rare (a reorg deep enough it would invalidate the ring signature, or unlock time reverts to locked).

Your change here will force a check_tx_inputs check every new block, instead of only after a reorg.

I agree that the check done in this PR could correctly short-circuit false in more circumstances, however, considering this should be a rare edge case, it's reasonable to argue this would be unnecessary error-prone complexity for this function. As such I'm good with this PR's approach as is.

I think it's also worth noting that we shouldn't have to run check_tx_inputs for txs that at one point were ready to go prior, so long as they were deemed ready to go on the same chain and don't reference any outputs with time-based unlock times. Aka there is a circumstance where we can short-circuit true that I think would significantly benefit this function in the general case. Considering this function impacts mining (see #8381), I think it's probably worth pursuing such a change in a future PR. It would be easiest to do with FCMP++ because there would be no need to deal with unlock time complexity.

src/cryptonote_core/blockchain.cpp

0xFFFC0000 added enhancement pending review important labels Jan 25, 2024

SChernykh reviewed Jan 25, 2024

View reviewed changes

src/cryptonote_core/tx_verification_utils.cpp Outdated Show resolved Hide resolved

vtnerd suggested changes Jan 26, 2024

View reviewed changes

jeffro256 force-pushed the bc_sync_skip_mempool branch 2 times, most recently from 10b0c2b to fe370e9 Compare January 26, 2024 23:47

0xFFFC0000 suggested changes Jan 28, 2024

View reviewed changes

src/cryptonote_protocol/cryptonote_protocol_handler.inl Outdated Show resolved Hide resolved

vtnerd suggested changes Jan 28, 2024

View reviewed changes

0xFFFC0000 reviewed Jan 30, 2024

View reviewed changes

jeffro256 mentioned this pull request Feb 1, 2024

[DRAFT] 10x improvement to pop_blocks operation with flag denoting txis are already verified #9148

Draft

vtnerd reviewed Feb 5, 2024

View reviewed changes

j-berman reviewed Feb 23, 2024

View reviewed changes

j-berman reviewed Mar 26, 2024

View reviewed changes

Rucknium mentioned this pull request Mar 27, 2024

Monero Research Lab Meeting - Wed 27 March 2024, 17:00 UTC monero-project/meta#983

Closed

j-berman approved these changes Apr 17, 2024

View reviewed changes

jeffro256 closed this Apr 23, 2024

jeffro256 reopened this Apr 23, 2024

Rucknium mentioned this pull request Jun 5, 2024

Monero Research Lab Meeting - Wed 05 June 2024, 17:00 UTC monero-project/meta#1020

Closed

j-berman mentioned this pull request Jul 17, 2024

Blockchain: fix temp fails causing alt blocks to be permanently invalid #9395

Merged

Boog900 mentioned this pull request Jul 17, 2024

[DONT MERGE] cryptonote_core: move validation to after sync operation. spackle-xmr/monero#13

Closed

jeffro256 force-pushed the bc_sync_skip_mempool branch from c388e12 to c3d093e Compare July 25, 2024 14:00

jeffro256 mentioned this pull request Jul 25, 2024

blockchain sync: reduce disk writes from 2 to 1 per tx [STRESSNET] spackle-xmr/monero#17

Merged

vtnerd suggested changes Aug 4, 2024

View reviewed changes

jeffro256 force-pushed the bc_sync_skip_mempool branch from 2e88523 to eeb5a06 Compare August 14, 2024 20:01

selsta added the daemon label Jan 28, 2025

jeffro256 force-pushed the bc_sync_skip_mempool branch from be29fed to c6f2ccd Compare January 29, 2025 06:09

jeffro256 mentioned this pull request Jan 29, 2025

cryptonote_core: misc v17 consensus rules #9751

Draft

vtnerd reviewed Feb 17, 2025

View reviewed changes

vtnerd approved these changes Feb 18, 2025

View reviewed changes

blockchain sync: reduce disk writes from 2 to 1 per tx

bbe0dd6

jeffro256 force-pushed the bc_sync_skip_mempool branch from c6f2ccd to bbe0dd6 Compare February 18, 2025 21:04

vtnerd approved these changes Feb 19, 2025

View reviewed changes

Rucknium mentioned this pull request Feb 21, 2025

Monero Research Lab Meeting - Wed 19 February 2025, 17:00 UTC monero-project/meta#1159

Closed

j-berman 9135 review: skip preprocessing for single blocks

4de5164

iamamyth 9135 review: functional tests overhaul

fb3adce

* Speed up propagation polling * Remove duplicated "reorg" testing which doesn't give enough time for txs to propagate * Test two different types of block propagations: shared tx, and new tx

Boog900 mentioned this pull request Feb 24, 2025

cryptonote_core: only verify txpool when the hardfork value has changed. #9404

Open

j-berman reviewed Feb 26, 2025

View reviewed changes

jeffro256 added 2 commits February 27, 2025 01:01

@j-berman 9135 review: 2/25/25

fd5e64f

@j-berman 9135 review: revert fluffy depth check

f11392b

j-berman approved these changes Feb 27, 2025

View reviewed changes

tobtoht added the consensus label Feb 28, 2025


		const std::unordered_set<crypto::hash> blk_tx_hashes(blk.tx_hashes.cbegin(), blk.tx_hashes.cend());

	//if we already failed on this height and id, skip actual ring signature check
	if(txd.last_failed_id == m_blockchain.get_block_id_by_height(txd.last_failed_height))
	return false;

		@@ -1139,9 +1069,6 @@ namespace cryptonote

		time_t start_time;

		std::unordered_set<crypto::hash> bad_semantics_txes[2];

blockchain sync: reduce disk writes from 2 to 1 per tx #9135

Are you sure you want to change the base?

blockchain sync: reduce disk writes from 2 to 1 per tx #9135

Conversation

jeffro256 commented Jan 24, 2024 • edited Loading

Summary

Design

New: cryptonote::ver_non_input_consensus()

Modified: core::handle_incoming_tx[s]()

Modified: tx_memory_pool::add_tx()

Modified: Blockchain::add_block()

Modified: t_cryptonote_protocol_handler::handle_notify_new_fluffy_block()

Modified: t_cryptonote_protocol_handler::try_add_next_blocks()

Modified: t_cryptonote_protocol_handler::handle_notify_new_block()

jeffro256 commented Jan 25, 2024 • edited Loading

vtnerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtnerd left a comment

Choose a reason for hiding this comment

vtnerd Jan 28, 2024 • edited Loading

Choose a reason for hiding this comment

0xFFFC0000 left a comment • edited Loading

Choose a reason for hiding this comment

0xFFFC0000 commented Feb 1, 2024

jeffro256 commented Feb 2, 2024

vtnerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffro256 commented Feb 6, 2024

j-berman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-berman left a comment

Choose a reason for hiding this comment

j-berman Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffro256 Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-berman left a comment

Choose a reason for hiding this comment

selsta commented Apr 18, 2024

vtnerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffro256 Aug 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeffro256 commented Jan 29, 2025

jeffro256 commented Feb 5, 2025

jeffro256 commented Jan 24, 2024 •

edited

Loading

New: `cryptonote::ver_non_input_consensus()`

Modified: `core::handle_incoming_tx[s]()`

Modified: `tx_memory_pool::add_tx()`

Modified: `Blockchain::add_block()`

Modified: `t_cryptonote_protocol_handler::handle_notify_new_fluffy_block()`

Modified: `t_cryptonote_protocol_handler::try_add_next_blocks()`

Modified: `t_cryptonote_protocol_handler::handle_notify_new_block()`

jeffro256 commented Jan 25, 2024 •

edited

Loading

vtnerd Jan 28, 2024 •

edited

Loading

0xFFFC0000 left a comment •

edited

Loading

j-berman Mar 26, 2024 •

edited

Loading

jeffro256 Apr 10, 2024 •

edited

Loading

jeffro256 Aug 13, 2024 •

edited

Loading

Gingeropolous commented Feb 8, 2025 •

edited

Loading

iamamyth commented Feb 8, 2025 •

edited

Loading

iamamyth commented Feb 8, 2025 •

edited

Loading

Gingeropolous commented Feb 13, 2025 •

edited

Loading

jeffro256 Feb 17, 2025 •

edited

Loading

vtnerd Feb 19, 2025 •

edited

Loading

vtnerd Feb 18, 2025 •

edited

Loading