Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC server: thread safety (+ small fix to on_getblockhash) #7936

Conversation

j-berman
Copy link
Collaborator

@j-berman j-berman commented Sep 12, 2021

Overview

From @moneromooo-monero in IRC:

The daemon RPC is also not really thread safe.
Stuff like getinfo can get you inconsistent data, technically.
Like, read curent height, block gets added, read current difficulty.

From @hyc :

eh, getinfo should be using a single read txn around all of its steps
so yeah, there ought to be a block_rtxn_start() call near the beginning and an rtxn_stop() call near the end of that
I guess that function has now been wrapped in db_rtxn_guard

RPC Functions made thread safe

I used an rtxnguard to make the following functions thread safe:

  • on_get_info
  • on_get_blocks
  • on_get_blocks_by_height
  • on_get_transaction_pool_hashes_bin
  • on_get_transaction_pool_hashes
  • on_getblockhash
  • on_getblocktemplate
  • on_get_last_block_header
  • on_get_block_header_by_hash
  • on_get_block_headers_range
  • on_get_block_header_by_height
  • on_get_block
  • on_get_coinbase_tx_sum
  • on_get_alternate_chains
  • on_get_txpool_backlog
  • fixed a small bug where on_getblockhash would return a 0 hash when the height requested exceeds blockchain height.

Will make the following thread safe in separate PR(s)

  • on_flush_txpool
  • on_get_transactions
  • on_get_transaction_pool
  • on_is_key_image_spent
  • on_send_raw_tx
  • on_submitblock
  • on_generateblocks
  • on_pop_blocks
  • on_relay_tx
  • on_get_output_distribution
  • on_get_output_distribution_bin
  • on_prune_blockchain
  • on_flush_cache
  • on_hard_fork_info
  • on_sync_info
  • on_rpc_access_info
  • on_rpc_access_submit_nonce
  • found a small bug in on_get_transactions, will patch separately as well
  • the daemon handler

EDIT: I updated this PR based on @hyc's comments below. When I first submitted the PR, I only included the db_rtxn_guard in get_info; now the PR includes it in more functions so that there are no visible in-between conditions on db reads. I will wait to squash commits until the PR looks good to go, so changes from my prior commit before @hyc's comments are accessible.

I originally included the following comment in the PR description (which is what @hyc was responding to below):

[Aside from get_info] Other functions seem to behave correctly regardless AFAICT, so didn't really do much here beyond [update get_info]. For example, on_is_key_image_spent first checks the chain for spent key images, then checks the pool. If a key gets added to the pool after reading the chain but before reading the pool, seems like you'd still want the function to return that the key image has been spent.

I understand how this logic was not ideal for a number of reasons.

I didn't include on_is_key_image_spent in this PR since I realized spent key images are read from memory when checking the pool, so seems it'll need a bigger change to make thread safe.

- grab an lmdb read transaction guard to ensure the thread executing on_get_info reads consistent data from the db
+ other low hanging fruit
@hyc
Copy link
Collaborator

hyc commented Sep 12, 2021

If a key gets added to the pool after reading the chain but before reading the pool,

If a readtxn is being held, then that's not possible.

@j-berman
Copy link
Collaborator Author

j-berman commented Sep 12, 2021

Right and I was thinking it seems like you would want it to be possible, which is why I didn't use a readtxn there, or in other places that behaved similarly. Edit: sorry, poorly worded on my part.

@hyc
Copy link
Collaborator

hyc commented Sep 12, 2021

No, you wouldn't want that. You should want all state changes to be all-or-nothing, with no visible in-between conditions.

Meanwhile, for the theoretical case of a key image appearing after the client checked - that could happen anyway, just depending on luck and the exact times a txn arrives vs a client querying.

@j-berman j-berman changed the title RPC server: make get_info thread safe (+ other tiny concurrency wins) [WIP] RPC server: make get_info thread safe (+ other tiny concurrency wins) Sep 13, 2021
- fixed on_getblockhash error resp return when requested height >= blockchain height
@j-berman j-berman force-pushed the rpc-thread-safety-release branch from 12b885f to 38e1ec1 Compare September 14, 2021 04:25
@j-berman j-berman changed the title [WIP] RPC server: make get_info thread safe (+ other tiny concurrency wins) RPC server: thread safety (+ small fix to on_getblockhash) Sep 14, 2021
@rbrunner7
Copy link
Contributor

I have two question regarding this PR:

First, is this still really waiting for a review, is there still a goal to get this (or its "master" branch equivalent) merged eventually, or did some reason surface for not pursuing this further?

If it's still waiting for review I may try my luck and hopefully help towards a goal of getting it merged.

Second, out of curiosity: Does this aim to work towards a state where several clients can share a single RPC daemon instance without problems? If yes, does this already come close, or is still a lot of work waiting for later to achieve that?

@selsta
Copy link
Collaborator

selsta commented Jan 6, 2022

It's simply missing a review :)

Copy link
Contributor

@rbrunner7 rbrunner7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I checked whether there are "on_" procedures that set a lock now thanks to these changes and call other "on_" procedures that in turn try to set their own lock which then would probably result in a deadlock, but did not find any.

I approve already because I did not find any really important changes to propose.

@@ -2117,6 +2127,7 @@ namespace cryptonote
return r;

CHECK_PAYMENT_MIN1(req, res, COST_PER_BLOCK_HEADER, false);
db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be moved a bit further down, about 8 lines, to after the test about restriction.

@@ -2185,6 +2196,7 @@ namespace cryptonote
if (use_bootstrap_daemon_if_necessary<COMMAND_RPC_GET_BLOCK_HEADERS_RANGE>(invoke_http_mode::JON_RPC, "getblockheadersrange", req, res, r))
return r;

db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course not terribly important, but if we are at it anyway already, the test about restriction could come first, then the lock, and only then the check against current blockchain height.

@@ -2606,6 +2622,7 @@ namespace cryptonote
bool core_rpc_server::on_get_coinbase_tx_sum(const COMMAND_RPC_GET_COINBASE_TX_SUM::request& req, COMMAND_RPC_GET_COINBASE_TX_SUM::response& res, epee::json_rpc::error& error_resp, const connection_context *ctx)
{
RPC_TRACKER(get_coinbase_tx_sum);
db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could get the blockchain db locked for an awfully long time, which might hold up all other things quite a bit, but at least the sum would be correct to the piconero :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's using a read-only LMDB transaction to ensure consistent reads of the db, which doesn't lock out writers. From the LMDB docs:

as long as a transaction is open, a consistent view of the database is kept alive, which requires storage. A read-only transaction that no longer requires this consistent view should be terminated (committed or aborted) when the view is no longer needed (but see below for an optimization).

There can be multiple simultaneously active read-only transactions but only one that can write. Once a single read-write transaction is opened, all further attempts to begin one will block until the first one is committed or aborted. This has no effect on read-only transactions, however, and they may continue to be opened at any time.

I also tested behavior in this PR generally by using usleep + a local test env so I can mine blocks quickly. Can test behavior like this for example:

bool core_rpc_server::on_get_coinbase_tx_sum(const COMMAND_RPC_GET_COINBASE_TX_SUM::request& req, COMMAND_RPC_GET_COINBASE_TX_SUM::response& res, epee::json_rpc::error& error_resp, const connection_context *ctx)
  {
    RPC_TRACKER(get_coinbase_tx_sum);
    db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());
    const uint64_t bc_height = m_core.get_current_blockchain_height();

    printf("Height before sleep: %lu\n", bc_height);
    // sleep, then mine some blocks while asleep, then make sure height still reads the same after waking up
    usleep(1000 * 1000 * 30);

    // can even call status (or other RPC calls) from the daemon while asleep and see *those* would return latest height before this next line prints
    printf("Height after sleep: %lu\n", m_core.get_current_blockchain_height());

I could even make an RPC call to on_get_coinbase_sum again and grab another read txn guard and the daemon proceeds to fall asleep, while the first on_get_coinbase_sum was still asleep.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. So experience from other databases does not carry over 1-to-1 to LMDB, as it can do more in parallel as databases can usually do.

And yeah, your test here really settles the issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, LMDB is rather unique in its capabilities.

@j-berman
Copy link
Collaborator Author

j-berman commented Jan 9, 2022

Thanks @rbrunner7 :)

Does this aim to work towards a state where several clients can share a single RPC daemon instance without problems? If yes, does this already come close, or is still a lot of work waiting for later to achieve that?

Hmmmm, sort of. This ensures that a single client's request will have a consistent view of the db. For example in get_info it returns fields height and difficulty. To return these values, the endpoint make separate reads to the db. Assume 100 blocks get mined between the first db read for height and second db read for difficulty (extreme scenario for illustrative purposes), then the difficulty would correspond to a block 100 blocks later, but the height would still be the original height. This PR ensures the difficulty corresponds to the height read and returned in the response. If you have multiple clients sharing an RPC daemon instance, and all make a request at the same height, then they would all be guaranteed to the see the same height and difficulty. Whereas without this PR, it's technically possible for some clients to see the same height but a different difficulty.

@j-berman
Copy link
Collaborator Author

j-berman commented Apr 6, 2022

Closing in favor of #7937. That PR is to master, this PR is to the release branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants