RPC server: thread safety (+ small fix to on_getblockhash) #7936

j-berman · 2021-09-12T09:20:09Z

Overview

The daemon RPC is also not really thread safe.
Stuff like getinfo can get you inconsistent data, technically.
Like, read curent height, block gets added, read current difficulty.

From @hyc :

eh, getinfo should be using a single read txn around all of its steps
so yeah, there ought to be a block_rtxn_start() call near the beginning and an rtxn_stop() call near the end of that
I guess that function has now been wrapped in db_rtxn_guard

RPC Functions made thread safe

I used an rtxnguard to make the following functions thread safe:

on_get_info
on_get_blocks
on_get_blocks_by_height
on_get_transaction_pool_hashes_bin
on_get_transaction_pool_hashes
on_getblockhash
on_getblocktemplate
on_get_last_block_header
on_get_block_header_by_hash
on_get_block_headers_range
on_get_block_header_by_height
on_get_block
on_get_coinbase_tx_sum
on_get_alternate_chains
on_get_txpool_backlog

fixed a small bug where on_getblockhash would return a 0 hash when the height requested exceeds blockchain height.

Will make the following thread safe in separate PR(s)

on_flush_txpool
on_get_transactions
on_get_transaction_pool
on_is_key_image_spent
on_send_raw_tx
on_submitblock
on_generateblocks
on_pop_blocks
on_relay_tx
on_get_output_distribution
on_get_output_distribution_bin
on_prune_blockchain
on_flush_cache
on_hard_fork_info
on_sync_info
on_rpc_access_info
on_rpc_access_submit_nonce

found a small bug in on_get_transactions, will patch separately as well
the daemon handler

EDIT: I updated this PR based on @hyc's comments below. When I first submitted the PR, I only included the db_rtxn_guard in get_info; now the PR includes it in more functions so that there are no visible in-between conditions on db reads. I will wait to squash commits until the PR looks good to go, so changes from my prior commit before @hyc's comments are accessible.

I originally included the following comment in the PR description (which is what @hyc was responding to below):

[Aside from get_info] Other functions seem to behave correctly regardless AFAICT, so didn't really do much here beyond [update get_info]. For example, on_is_key_image_spent first checks the chain for spent key images, then checks the pool. If a key gets added to the pool after reading the chain but before reading the pool, seems like you'd still want the function to return that the key image has been spent.

I understand how this logic was not ideal for a number of reasons.

I didn't include on_is_key_image_spent in this PR since I realized spent key images are read from memory when checking the pool, so seems it'll need a bigger change to make thread safe.

- grab an lmdb read transaction guard to ensure the thread executing on_get_info reads consistent data from the db + other low hanging fruit

hyc · 2021-09-12T13:06:13Z

If a key gets added to the pool after reading the chain but before reading the pool,

If a readtxn is being held, then that's not possible.

j-berman · 2021-09-12T18:57:42Z

Right and I was thinking it seems like you would want it to be possible, which is why I didn't use a readtxn there, or in other places that behaved similarly. Edit: sorry, poorly worded on my part.

hyc · 2021-09-12T19:24:47Z

No, you wouldn't want that. You should want all state changes to be all-or-nothing, with no visible in-between conditions.

Meanwhile, for the theoretical case of a key image appearing after the client checked - that could happen anyway, just depending on luck and the exact times a txn arrives vs a client querying.

- fixed on_getblockhash error resp return when requested height >= blockchain height

rbrunner7 · 2022-01-06T18:16:08Z

I have two question regarding this PR:

First, is this still really waiting for a review, is there still a goal to get this (or its "master" branch equivalent) merged eventually, or did some reason surface for not pursuing this further?

If it's still waiting for review I may try my luck and hopefully help towards a goal of getting it merged.

Second, out of curiosity: Does this aim to work towards a state where several clients can share a single RPC daemon instance without problems? If yes, does this already come close, or is still a lot of work waiting for later to achieve that?

selsta · 2022-01-06T18:17:39Z

It's simply missing a review :)

rbrunner7

Looks good.

I checked whether there are "on_" procedures that set a lock now thanks to these changes and call other "on_" procedures that in turn try to set their own lock which then would probably result in a deadlock, but did not find any.

I approve already because I did not find any really important changes to propose.

rbrunner7 · 2022-01-07T13:34:34Z

src/rpc/core_rpc_server.cpp

@@ -2117,6 +2127,7 @@ namespace cryptonote
      return r;

    CHECK_PAYMENT_MIN1(req, res, COST_PER_BLOCK_HEADER, false);
+    db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());


This could be moved a bit further down, about 8 lines, to after the test about restriction.

rbrunner7 · 2022-01-07T13:39:55Z

src/rpc/core_rpc_server.cpp

@@ -2185,6 +2196,7 @@ namespace cryptonote
    if (use_bootstrap_daemon_if_necessary<COMMAND_RPC_GET_BLOCK_HEADERS_RANGE>(invoke_http_mode::JON_RPC, "getblockheadersrange", req, res, r))
      return r;

+    db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());


Of course not terribly important, but if we are at it anyway already, the test about restriction could come first, then the lock, and only then the check against current blockchain height.

rbrunner7 · 2022-01-07T13:44:18Z

src/rpc/core_rpc_server.cpp

@@ -2606,6 +2622,7 @@ namespace cryptonote
  bool core_rpc_server::on_get_coinbase_tx_sum(const COMMAND_RPC_GET_COINBASE_TX_SUM::request& req, COMMAND_RPC_GET_COINBASE_TX_SUM::response& res, epee::json_rpc::error& error_resp, const connection_context *ctx)
  {
    RPC_TRACKER(get_coinbase_tx_sum);
+    db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db());


This could get the blockchain db locked for an awfully long time, which might hold up all other things quite a bit, but at least the sum would be correct to the piconero :)

It's using a read-only LMDB transaction to ensure consistent reads of the db, which doesn't lock out writers. From the LMDB docs:

as long as a transaction is open, a consistent view of the database is kept alive, which requires storage. A read-only transaction that no longer requires this consistent view should be terminated (committed or aborted) when the view is no longer needed (but see below for an optimization).

There can be multiple simultaneously active read-only transactions but only one that can write. Once a single read-write transaction is opened, all further attempts to begin one will block until the first one is committed or aborted. This has no effect on read-only transactions, however, and they may continue to be opened at any time.

I also tested behavior in this PR generally by using usleep + a local test env so I can mine blocks quickly. Can test behavior like this for example:

bool core_rpc_server::on_get_coinbase_tx_sum(const COMMAND_RPC_GET_COINBASE_TX_SUM::request& req, COMMAND_RPC_GET_COINBASE_TX_SUM::response& res, epee::json_rpc::error& error_resp, const connection_context *ctx) { RPC_TRACKER(get_coinbase_tx_sum); db_rtxn_guard rtxn_guard(&m_core.get_blockchain_storage().get_db()); const uint64_t bc_height = m_core.get_current_blockchain_height(); printf("Height before sleep: %lu\n", bc_height); // sleep, then mine some blocks while asleep, then make sure height still reads the same after waking up usleep(1000 * 1000 * 30); // can even call status (or other RPC calls) from the daemon while asleep and see *those* would return latest height before this next line prints printf("Height after sleep: %lu\n", m_core.get_current_blockchain_height());

I could even make an RPC call to on_get_coinbase_sum again and grab another read txn guard and the daemon proceeds to fall asleep, while the first on_get_coinbase_sum was still asleep.

Interesting. So experience from other databases does not carry over 1-to-1 to LMDB, as it can do more in parallel as databases can usually do.

And yeah, your test here really settles the issue.

Yes, LMDB is rather unique in its capabilities.

j-berman · 2022-01-09T02:37:12Z

Thanks @rbrunner7 :)

Does this aim to work towards a state where several clients can share a single RPC daemon instance without problems? If yes, does this already come close, or is still a lot of work waiting for later to achieve that?

Hmmmm, sort of. This ensures that a single client's request will have a consistent view of the db. For example in get_info it returns fields height and difficulty. To return these values, the endpoint make separate reads to the db. Assume 100 blocks get mined between the first db read for height and second db read for difficulty (extreme scenario for illustrative purposes), then the difficulty would correspond to a block 100 blocks later, but the height would still be the original height. This PR ensures the difficulty corresponds to the height read and returned in the response. If you have multiple clients sharing an RPC daemon instance, and all make a request at the same height, then they would all be guaranteed to the see the same height and difficulty. Whereas without this PR, it's technically possible for some clients to see the same height but a different difficulty.

j-berman · 2022-04-06T03:36:40Z

Closing in favor of #7937. That PR is to master, this PR is to the release branch

Make core_rpc_server::on_get_info thread safe + ez concurrency wins

d26c121

- grab an lmdb read transaction guard to ensure the thread executing on_get_info reads consistent data from the db + other low hanging fruit

j-berman mentioned this pull request Sep 12, 2021

RPC server: thread safety (+ small fix to on_getblockhash) #7937

Merged

j-berman changed the title ~~RPC server: make get_info thread safe (+ other tiny concurrency wins)~~ [WIP] RPC server: make get_info thread safe (+ other tiny concurrency wins) Sep 13, 2021

Make more functions in RPC server thread safe via db_rtxn_guard

38e1ec1

- fixed on_getblockhash error resp return when requested height >= blockchain height

j-berman force-pushed the rpc-thread-safety-release branch from 12b885f to 38e1ec1 Compare September 14, 2021 04:25

j-berman changed the title ~~[WIP] RPC server: make get_info thread safe (+ other tiny concurrency wins)~~ RPC server: thread safety (+ small fix to on_getblockhash) Sep 14, 2021

rbrunner7 approved these changes Jan 7, 2022

View reviewed changes

rbrunner comments

942f7c2

rbrunner7 mentioned this pull request Mar 27, 2022

Monero Dev Meeting: v15 network upgrade - Sat 2 April 2022 @ 17:00 UTC monero-project/meta#680

Closed

tobtoht approved these changes Mar 29, 2022

View reviewed changes

j-berman closed this Apr 6, 2022

LeoNero mentioned this pull request Jul 9, 2022

functional_daemon_test does not pass on first test run on local setup monero-rs/monero-rpc-rs#56

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RPC server: thread safety (+ small fix to on_getblockhash) #7936

RPC server: thread safety (+ small fix to on_getblockhash) #7936

j-berman commented Sep 12, 2021 •

edited

Loading

hyc commented Sep 12, 2021

j-berman commented Sep 12, 2021 •

edited

Loading

hyc commented Sep 12, 2021

rbrunner7 commented Jan 6, 2022

selsta commented Jan 6, 2022

rbrunner7 left a comment

rbrunner7 Jan 7, 2022

rbrunner7 Jan 7, 2022

rbrunner7 Jan 7, 2022

j-berman Jan 9, 2022

rbrunner7 Jan 9, 2022

hyc Jan 9, 2022

j-berman commented Jan 9, 2022

j-berman commented Apr 6, 2022

RPC server: thread safety (+ small fix to on_getblockhash) #7936

RPC server: thread safety (+ small fix to on_getblockhash) #7936

Conversation

j-berman commented Sep 12, 2021 • edited Loading

Overview

RPC Functions made thread safe

Will make the following thread safe in separate PR(s)

hyc commented Sep 12, 2021

j-berman commented Sep 12, 2021 • edited Loading

hyc commented Sep 12, 2021

rbrunner7 commented Jan 6, 2022

selsta commented Jan 6, 2022

rbrunner7 left a comment

Choose a reason for hiding this comment

rbrunner7 Jan 7, 2022

Choose a reason for hiding this comment

rbrunner7 Jan 7, 2022

Choose a reason for hiding this comment

rbrunner7 Jan 7, 2022

Choose a reason for hiding this comment

j-berman Jan 9, 2022

Choose a reason for hiding this comment

rbrunner7 Jan 9, 2022

Choose a reason for hiding this comment

hyc Jan 9, 2022

Choose a reason for hiding this comment

j-berman commented Jan 9, 2022

j-berman commented Apr 6, 2022

j-berman commented Sep 12, 2021 •

edited

Loading

j-berman commented Sep 12, 2021 •

edited

Loading