Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cryptonote_core: introducing rwlock mechanism #9181

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

0xFFFC0000
Copy link
Collaborator

Draft : 9173

This is a low-level explanation of design decisions. For simple use, just use RWLOCK and RLOCK macros which take care of all the details.

RecursiveReadWriteLock Design

Since we are recursively calling read/write operations, we needed a new locking mechanism to take care of that. All the design decisions are exactly the same as canonical ReadWrite locks, with a few extra decisions to make it fit other needs. Here are general ideas about this lock:

  1. Writes and Reads are exclusive. Meaning if there are read(s) going on, acquiring write will block.
  2. You can have as many concurrent reads as UINT32_MAX.
  3. Once you have the write lock you can start read operation(s) inside it too. Meaning a write transaction can have read transaction(s). But read transactions, cannot have write unless all reads are released the lock.
  4. RWLOCK and RLOCK macros are provided which abstracts all the details. Generally, you only need to use these macros and should not need to touch low-level locking API.
  5. RWLOCK and RLOCK are valid for the scope.

API Design

As I mentioned write transaction can include a read transaction too. For example, this is a low-level example of acquiring and releasing the lock:

E.g.

// Next line will acquire the write lock
// and will return true, meaning you are required to call end_write.
bool rw_release_required = m_blockchain.start_write();
.
.
    // Next line will acquire the read lock
    // and will return false, meaning you are not required to call end_read.
    bool r_release_required = m_blockchain.start_read();
    .
    .
    .
    if (r_release_required) m_blockchain.end_read();
.
.
if (rw_release_required) m_blockchain.end_write();

Or if we use macros to take care of this:

{
    // Write lock will be acquired next line.
    // and released when the scope ends.
    RWLOCK(m_blockchain);
    .
    .
    {
        // Read lock will be acquired next line,
        // and released when the scope ends.
        RLOCK(m_blockchain);
        .
        .
    }

}

Another example of recursive reads:

{
    // Read lock will be acquired next line.
    // and released when the scope ends.
    RLOCK(m_blockchain);
    .
    .
    {
        // Read lock will be acquired next line,
        // and released when the scope ends.
        RLOCK(m_blockchain);
        .
        .
    }

}

Another example of recursive writes:

{
    // Write lock will be acquired next line.
    // and released when the scope ends.
    RWLOCK(m_blockchain);
    .
    .
    {
        // Write lock will be acquired next line,
        RWLOCK(m_blockchain);
        .
        .
    }

}

Testing

This PR contains a testing suite for the lock. The test runs 4 different cases with 10, 50, 100 and 1000 threads, and each one has two iterations. In each test, a random number of writers and readers will start, and each reader and writer will run a random number of cycles. Each cycle is a wait and randomly (~20%) decide to recurse or not.

Copy link
Contributor

@vtnerd vtnerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you answered in IRC already, but why is this needed? The LMDB class already has proper locking, so what is this doing? And if not, maybe the LMDB class needs to be updated instead of adding yet another layer of locking ontop of the existing locks.

@0xFFFC0000
Copy link
Collaborator Author

I think you answered in IRC already, but why is this needed? The LMDB class already has proper locking, so what is this doing?

#9193 (comment)

And if not, maybe the LMDB class needs to be updated instead of adding yet another layer of locking ontop of the existing locks.

I believe if you look at the code carefully, you would realize it does not add another layer. It replaces an old locking mechanism which does not support read-write with newer.

@vtnerd
Copy link
Contributor

vtnerd commented Mar 1, 2024

I believe if you look at the code carefully, you would realize it does not add another layer. It replaces an old locking mechanism which does not support read-write with newer.

Yes, I skimmed the code way too quickly. But still, same question, any idea what these mutexes are protecting?

@0xFFFC0000
Copy link
Collaborator Author

I believe if you look at the code carefully, you would realize it does not add another layer. It replaces an old locking mechanism which does not support read-write with newer.

Yes, I skimmed the code way too quickly. But still, same question, any idea what these mutexes are protecting?

Provides reader-writer locking for Blockchain object itself in blockchain.cpp.

@vtnerd
Copy link
Contributor

vtnerd commented Mar 1, 2024

Provides reader-writer locking for Blockchain object itself in blockchain.cpp.

That object shouldn't need protection, unless the pointer itself changes (like with init). It has locking internally where needed.

It looks like a reader lock isn't needed at all for these other functions; I don't think this is helping anything and is just further cluttering the code.

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 1, 2024

Provides reader-writer locking for Blockchain object itself in blockchain.cpp.

That object shouldn't need protection, unless the pointer itself changes (like with init). It has locking internally where needed.

It looks like a reader lock isn't needed at all for these other functions; I don't think this is helping anything and is just further cluttering the code.

No. We still need locking. For example take a value like this [1].

moneromoooo: If there is no locking but the db txn, monerod could start adding two blocks at once, and this would get written by two threads.
moneromoooo: The main good a rw lock gives us is allowing various RPC requests to come and be served in parallel.
moneromoooo: Currently, they're serialized.
moneromoooo: So the lock is (still) needed, and the conversion to rw is an optimization.

Thanks to @moneromooo-monero for his detailed explanation.

Update: link was fixed, should point to m_timestamps_and_difficulties_height.

  1. uint64_t m_timestamps_and_difficulties_height;

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 1, 2024

As extra optimization: There few methods that we can remove locking for example. Which I have to do. For example bool Blockchain::have_tx_keyimg_as_spent(const crypto::key_image &key_im). Which caller itself should take care of locking if he/she wants consistency. I am going to update those 5,6 functions. But overall extra locking in those functions does not have any impact on correctness, so not directly related to our discussion.

Update: Done. Details at [1].

  1. cryptonote_core: introducing rwlock mechanism #9181 (comment)

@0xFFFC0000 0xFFFC0000 force-pushed the dev/0xfffc/rw-lock-blockchain branch from 70ce45a to 21b68cd Compare March 2, 2024 08:09
@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 2, 2024

I returned those 8 functions to unlock status as @moneromooo-monero mentioned in our private discussion.

Explanation: for these functions that we don't prove consistency, if the caller wants consistency between calls, he/she has to use locking himself/herself.

I left the original warning message:

  // WARNING: this function does not take m_blockchain_lock, and thus should only call read only
  // m_db functions which do not depend on one another (ie, no getheight + gethash(height-1), as
  // well as not accessing class members, even read only (ie, m_invalid_blocks). The caller must
  // lock if it is otherwise needed.

List

  1. bool Blockchain::have_tx(const crypto::hash &id) const [1]
  2. bool Blockchain::have_tx_keyimg_as_spent(const crypto::key_image &key_im) const [2]
  3. uint64_t Blockchain::get_current_blockchain_height() const [3]
  4. crypto::hash Blockchain::get_tail_id() const [4]
  5. crypto::hash Blockchain::get_block_id_by_height(uint64_t height) const [5]
  6. difficulty_type Blockchain::block_difficulty(uint64_t i) const [6]
  7. bool Blockchain::have_block_unlocked(const crypto::hash& id, int *where) const [7]
  8. size_t Blockchain::get_total_transactions() const [8]

Ref:

  1. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R114
  2. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R124
  3. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R270
  4. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R707
  5. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R771
  6. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R2466
  7. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R2815
  8. https://github.com/monero-project/monero/compare/70ce45ae01f9ae68b98a21869782e3e200c2e790..21b68cd92740eccc41848b918d51df535c3b5925#diff-3dcea545b20bc1145f2766e64ca91d4e547d6072462970fd67830e1be6cdf793R2859

@0xFFFC0000 0xFFFC0000 force-pushed the dev/0xfffc/rw-lock-blockchain branch 2 times, most recently from 3f86cfb to 494d6be Compare March 2, 2024 11:06
@rbrunner7
Copy link
Contributor

After carefully reading through your discussion with @vtnerd unfortunately I still could not form something like a "high level" understanding of the purpose and the aim of this PR. I feel that such an understanding is necessary to judge the "value" of this PR, and for going into any review with correct expectations.

I suspect the following:

With current code, there are no critical problems with locking. Things are always locked when they should get locked. There are no known races and / or deadlocks. Basically, nothing known that you could call "bug". But, because now all locks are exclusive, even for operations that could run in parallel without any problems, performance of DB access is lower than it could be, as all access gets strictly serialized. With introducing a distinction between non-exclusive read locks and exclusive read/write locks performance gains can be achieved. For introducing this distinction, no new layers or other drastic complications of the code are needed.

Is my understanding correct?

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 3, 2024

With current code, there are no critical problems with locking. Things are always locked when they should get locked. There are no known races and / or deadlocks. Basically, nothing known that you could call "bug". But, because now all locks are exclusive, even for operations that could run in parallel without any problems, performance of DB access is lower than it could be, as all access gets strictly serialized. With introducing a distinction between non-exclusive read locks and exclusive read/write locks performance gains can be achieved. For introducing this distinction, no new layers or other drastic complications of the code are needed.

Is my understanding correct?

Important Please keep in mind in the following comment serialization I am talking in the context of synchronization [1].

@rbrunner7 Yes, you are correct.

  1. The purpose of a ReadWrite lock is to allow fine-grained control over locking [2]. Having multiple readers concurrently does not impose any serialization issue. But having writer and reader threads together, or having multiple writer threads together, does pose a serious threat to serialization.
  2. The PR does replace the old locking mechanism we had. The old locking mechanism is an ordinary recursive_mutex lock, with no support for a read-write operation. This means that we lock the entire Blockchain even if it is not required.
  3. Yes, as you said it is correct and there is no specific bug. Therefore I didn't label/mention "bug" for this PR. But extremely inefficient design.
  4. And no new layers or other drastic complications of the code are added.
  5. Keep in mind this is an improvement for the locking mechanism. In the long term, we want to have a much more fine-grained (read-write) locking mechanism. Even inside Blockchain we sometimes lock the entire method, and that is not required. There are suggestions to introduce a single locking mechanism to take care of all of this in a single layer. There are arguments to do it in BlockchainLMDB or Blockchain or other suggestions. But we are nowhere close to that point right now. So that is a long-term goal.

Let me go through a complete example, keep in mind this is for illustration purposes and might some details. Imagine 2 threads are listening to RPC:

  1. Thread 1 receive on_mining_status request. It calls get_difficulty_for_next_block. Now look at the code for get_difficulty_for_next_block. You are writing to value m_timestamps_and_difficulties_height. So you need a lock to protect the values. Right now we are using this. Which is basically boost::recursive_mutex. A simple recursive lock. Now take a look at the src/cryptonote_core/blockchain.cpp:884 in PR.

Ref:

  1. https://en.wikipedia.org/wiki/Synchronization_(computer_science)
  2. https://en.wikipedia.org/wiki/Readers%E2%80%93writer_lock

@vtnerd
Copy link
Contributor

vtnerd commented Mar 3, 2024

@0xFFFC0000 is correct in that a reader lock is necessary to access the db, since another function could modify the pointer, etc. Although that doesn't seem to happen often, as this would've needed changing a long time ago.

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 3, 2024

@0xFFFC0000 is correct in that a reader lock is necessary to access the db, since another function could modify the pointer, etc. Although that doesn't seem to happen often, as this would've needed changing a long time ago.

It is not about the pointer. Take a look at the example I provided to rbrunner, or look at the code.

write_mode(false),
read_mode(0) {};

bool start_read() noexcept {
Copy link
Contributor

@vtnerd vtnerd Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use boost::shared_mutex with boost::shared_lock and boost::unique_lock directly? Why the extra logic? It already handles all the waiting logic, etc. Is it so the thread can re-enter the write lock? This doesn't appear possible in the two current use cases.

And if you want the extra logic, why not use the same naming scheme so that boost::unique_lock and boost::shared_lock can be used instead of the custom RAII object/macro (that allocates memory, whereas the boost lock templates do not).

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the core problem of what makes this complex. Blockchain APIs in blockchain.cpp can call each other recursively.

Now you can redesign all of them, and remove recursive locking. But that would be much more work than this. And needs a lot of logical/semantical changes.

For example, A->B->C.

A acquires write-lock.
B acquires read-lock.
C acquires read-lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then follow the existing naming scheme for boost::scoped_lock and boost::shared_lock.

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What naming scheme?

And keep in mind, these four methods:

  1. start_read
  2. end_read
  3. start_write
  4. end_write

Should not be used directly. The main interface to use this lock is RLOCK and RWLOCK. The only reason we need those 4 methods is in case the user wants to do something custom with the lock. For example in this specific case [1] we have to use these methods instead of RLOCK and RWLOCK since we are locking and unlocking the lock inside the scope [2]. Keeping the logic exactly like the original implementation [3].

  1. https://github.com/0xFFFC0000/monero/blob/494d6be27e9de576f291428d42fc401d4196327f/src/cryptonote_core/blockchain.cpp#L5045
  2. https://github.com/0xFFFC0000/monero/blob/494d6be27e9de576f291428d42fc401d4196327f/src/cryptonote_core/blockchain.cpp#L5065
  3. m_blockchain_lock.unlock();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What naming scheme?

If you used the names that shared_mutex uses, then unique_lock and shared_lock could be used instead of the RLOCK and RWLOCK macros. This does not prevent direct calls into those functions.

So:

  • start_read -> lock_shared
  • end_read -> unlock_shared
  • start_write -> lock
  • end_write -> unlock

Then RLOCK becomes boost::shared_lock<reader_writer_lock> and RWLOCK becomes boost::unique_lock<reader_writer_lock>. I think this is preferable; no allocation is performed.

However, using those classes may require other functions (such as try_shared_lock)?

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. Few points to consider:

  1. RLOCK/RWLOCK was @moneromooo-monero suggestion and appeared to be a good solution.
  2. What you are saying makes sense to me too. But We have to go one level deeper. And possibility introduce a lot of complexity where we exactly don't want any complexity. For example, we have to evaluate what are the exact requirements of boost::shared_lock and boost::unique_lock [1]. That includes support for all the operations they provide. On top of that, that code should be debugged and make sure 100% of those features (that we don't want 99% of them) don't have any bugs or unintended side effects. That looks to me like a slippery slope.

https://github.com/boostorg/thread/blob/aec18d337f41d8e3081ee65f5cf3b5090179ab0e/include/boost/thread/lock_types.hpp#L503

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind, that we eventually want to remove recursive locking. This PR is a mid-term solution. Until we clean all the APIs inside the blockchain. Once all of those APIs are cleaned. We can simply use std::shared_mutex and std::mutex.

Cleaning those APIs will introduce a lot of semantical/logical changes to the code. So it is better to go slowly IMHO,

rw_mutex.unlock_shared();
read_mode--;
if (rw_mutex.try_lock()) {
CHECK_AND_ASSERT_MES2(!read_mode, "Reader is not zero but goes to read_mode = 0 by " << boost::this_thread::get_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function returns here without calling unlock() ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something. But I specifically used CHECK_AND_ASSERT_MES2 to not return and only assert fail and report it [1].

and the unlock for rw_mutex is at line 220.

  1. https://github.com/monero-project/monero/blob/7b7958bbd9d76375c47dc418b4adabba0f0b1785/contrib/epee/include/misc_log_ex.h#L205C64-L205C73

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I looked up the definition for the macro, and it doesn't return (whereas I thought it does).

}

void lock_reader() noexcept {
CHECK_AND_ASSERT_MES2(read_mode < UINT32_MAX, "Maximum number of readers reached.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks a post condition of lock_reader ... ? And doesn't the check need to be made inside of the scoped_lock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again I am using CHECK_AND_ASSERT_MES2 and I believe it does not throw. So even if assert fails, it only logs and moves ahead [1].

One thing to keep in mind is, that the check for the UINT32_MAX number of readers is auxiliary.

For example, the Linux kernel uses this function to calculate the maximum number of threads allowed [2] and its value is int [3]. I cannot think of any scenario in which we can surpass UINT32_MAX.

  1. https://github.com/monero-project/monero/blob/7b7958bbd9d76375c47dc418b4adabba0f0b1785/contrib/epee/include/misc_log_ex.h#L205C64-L205C73
  2. https://github.com/torvalds/linux/blob/04b8076df2534f08bb4190f90a24e0f7f8930aca/kernel/fork.c#L1014
  3. https://github.com/torvalds/linux/blob/04b8076df2534f08bb4190f90a24e0f7f8930aca/kernel/fork.c#L132

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, CHECK_AND_ASSERT_MES2 doesn't return immediately? Whereas CHECK_AND_ASSERT_MES_NO_RET does? Confusing.

And I still think you have to perform the check after acquiring the mutex.

For example, the Linux kernel uses this function to calculate the maximum number of threads allowed [2] and its value is int [3]. I cannot think of any scenario in which we can surpass UINT32_MAX.

What's the purpose of the check then? Just to inform the user? The only other option is to spin loop block.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I still think you have to perform the check after acquiring the mutex.

The reason is the mutex is blocking. and for the debugging, I wanted to see the checks before blocks. But this is one of the cases that I am open to either way. I will put it on my TODO list. In the next push will move the check inside the lock.

Just to inform the user?
Yes. Exactly. Since debugging locks is hard and tricky I left it there if in an extremely rare scenario, in 15 years, if the user hits this problem, at least we can see it in the log files. I am open to removing that check too, since not directly related to core logic of the lock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this check is auxiliary, technically read_mode either needs to be atomic, or the check needs to happen when internal_mutex is owned, if you want the loaded value of read_mode to be well-defined.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vtnerd @jeffro256 . Fixed in new push.

@vtnerd
Copy link
Contributor

vtnerd commented Mar 3, 2024

It is not about the pointer. Take a look at the example I provided to rbrunner, or look at the code.

You added locks where none were present before, that's why I keep going back to that example. But yes, it also changed to reader locks where a full mutex was present before.

@vtnerd
Copy link
Contributor

vtnerd commented Mar 3, 2024

Or at least so I thought - I can't find any now.

@vtnerd
Copy link
Contributor

vtnerd commented Mar 3, 2024

Sorry for spamming - ::scan_outputkeys_for_indexes was one such example I was thinking of.

@selsta
Copy link
Collaborator

selsta commented Mar 3, 2024

I've written this before but I'll also add a comment here. There was or is an issue where the daemon would fall behind if there's really heavy RPC traffic. I've had my node fall behind like 5+ blocks, after some time it would catch up again. The assumption was that it's due to the current lock mechanism. moneromooo suggested a while ago that a read / write lock might help. That's why I suggested @0xFFFC0000 to implement this.

This daemon falling behind issue happens rarely and is difficult to reproduce, so I don't know if this PR improves things, but I just wanted to add context why this was implemented.

@0xFFFC0000
Copy link
Collaborator Author

It is not about the pointer. Take a look at the example I provided to rbrunner, or look at the code.

You added locks where none were present before, that's why I keep going back to that example. But yes, it also changed to reader locks where a full mutex was present before.

Those 8 methods are different. I have put them here [1]. The design we had was to not use any locks there and put the burden of locking/consistency on the caller. That is understandable.

But IMHO we have to use reader lock there too. But since I don't want to introduce any semantical change to the code. I have already removed them.

Overall that is like extremely tiny part of the PR. Maybe 0.5%? or less!

  1. cryptonote_core: introducing rwlock mechanism #9181 (comment)

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 3, 2024

I've written this before but I'll also add a comment here. There was or is an issue where the daemon would fall behind if there's really heavy RPC traffic. I've had my node fall behind like 5+ blocks, after some time it would catch up again. The assumption was that it's due to the current lock mechanism. moneromooo suggested a while ago that a read / write lock might help. That's why I suggested @0xFFFC0000 to implement this.

This daemon falling behind issue happens rarely and is difficult to reproduce, so I don't know if this PR improves things, but I just wanted to add context why this was implemented.

This PR might (and might not) directly improve that falling behind. But this is necessary work that we have to eventually take care of. The first step is to introduce a read-write lock. Then we have to reduce the locking surface. In my explanation to @rbrunner7 I mentioned:

  1. For example there are methods that we are locking the entire method. Which is useless. and will decrease performance.
  2. In some of the methods we will be able to use RLOCK instead of RWLOCK.

@0xFFFC0000
Copy link
Collaborator Author

Sorry for spamming - ::scan_outputkeys_for_indexes was one such example I was thinking of.

No worries. We are evaluating a serious PR, and these discussions are normal.

Yes scan_outputkeys_for_indexes, that one has extra RLOCK. That is leftover from my tests and should be removed. I don't think any other one has extra locking. Since I have tried to intentionally not introduce any logical/semantical changes.

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 3, 2024

It is not about the pointer. Take a look at the example I provided to rbrunner, or look at the code.

You added locks where none were present before, that's why I keep going back to that example. But yes, it also changed to reader locks where a full mutex was present before.

Please let me know. Because this is a serious mistake in the code and completely against my design goal. One of my main goals was to not introduce any logical/semantical changes to the code while changing the underlying locking behaviour.

read_mode(0) {};

bool start_read() noexcept {
if (have_write()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the behavior that we want: that we can't call a read function inside of a write function, even inside the same thread?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind: this returns whether lock was just now acquired, not whether overall acquisition is a success

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. have_write is for cases where you do have write-lock and recursively want to acquire read-lock too.

Comment on lines 239 to 305
do {
boost::mutex::scoped_lock lock(internal_mutex);
if (!read_mode && rw_mutex.try_lock()) {
writer_id = new_owner;
write_mode = true;
read_mode = 0;
return;
}
condition.wait(lock);
} while(true);
}
Copy link
Contributor

@jeffro256 jeffro256 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this locking/waiting pattern may cause more issues than the current code because we are now much more likely to experience writer starvation. Before, if there were 1000 readers and one writer, the writer was on equal footing as the rest of the read threads. Current code is still not fair, but the writer thread has the same chance as acquiring exclusive access as all other threads, which are attempting to acquire exclusively as well. However, now, the write lock will continually thrash and refuse to even attempt the lock if there is even one active reader, indefinitely. As long as new readers keep coming in, the writer is never even guaranteed the chance to even ask for the lock. In practice, this may cause the chain to never move forward with enough readers.

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starvation might happen in extremely rare cases. That is why I am insisting this is an intermediary step until most of the APIs inside Blockchain are fixed.

But your analysis is not 100% accurate. In the current locking mechanism, we lock the entire blockchain for every read. Let me explain a hypothetical scenario:

There are multiple read transactions: R1, R2, R3, R4, R5, R6, R7, and R8.
There are single write transactions: W1.

In the current locking mechanism:
Core1: R1 -> R2 -> R3 -> R4 -> R5 -> R6 -> R7 -> R8 -> W1
Core2:
Core3:
Core4:
Core5:
Core6:
Core7:
Core8:

But in reader-writer locking mechanism:
Core1: R1 W1
Core2: R2
Core3: R3
Core4: R4
Core5: R5
Core6: R6
Core7: R7
Core8: R8

Now what you saying is a stream of read transactions coming to starve the write. It is a theoretical possibility.

Basically, an attacker should brute-force you with read transactions to the point that all of your cores are 100%.

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

McKenney, Paul E., Silas Boyd-Wickizer, and Jonathan Walpole. "RCU usage in the linux kernel: One decade later." Technical report (2013).

Talks about a similar issue in the section 5.3 Retry Readers.

One other good discussion is: openssl/openssl#20715 (comment)

Copy link
Collaborator Author

@0xFFFC0000 0xFFFC0000 Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fixed in a new push. The reader_writer lock does have an internal queue right now. The queue is for keeping track of work.

Let me give you an example of how it works now:

There are multiple read transactions: R1, R2, R3, R4, R5, R6, R7, R8, R9, R10, R11.
There are single write transactions: W1.

The order of arrival is: R1, R2, R3, R4, R5, R6, R7, R8, W1, R9, R10, R11.

In the current locking mechanism:
Core1: R1 -> R2 -> R3 -> R4 -> R5 -> R6 -> R7 -> R8 -> W1 -> R9-> R10-> R11
Core2:
Core3:
Core4:
Core5:
Core6:
Core7:
Core8:

But in new reader-writer locking mechanism:
Core1: R1 W1 R9
Core2: R2 ____ R10
Core3: R3 ____ R11
Core4: R4
Core5: R5
Core6: R6
Core7: R7
Core8: R8

@0xFFFC0000 0xFFFC0000 force-pushed the dev/0xfffc/rw-lock-blockchain branch from 494d6be to 88673fb Compare March 4, 2024 20:29
@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 4, 2024

Updated the implementation of reader_writer. Changelog:

  1. Rebased to master and added recent changes from the master.
  2. Adding a queue to keep the order of operations. Reads are parallel, and write is exclusive, but now the order between read(s) and write will be preserved, and starvation cannot happen.
  3. Fixed the mistake in ::scan_outputkeys_for_indexes.
  4. Fixed the issue mentioned by @vtnerd about accessing a variable before locking.
  5. Added a few other minor test cases.

@SChernykh
Copy link
Contributor

I took a quick look at the code and I don't see any issues (yet). I'll check more thoroughly later. What would be really nice, is if the unit test was compiled with -fsanitize=thread -Og -fno-omit-frame-pointer -g compiler flags. Thread sanitizer is a must for this kind of code.

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 5, 2024

I took a quick look at the code and I don't see any issues (yet). I'll check more thoroughly later. What would be really nice, is if the unit test was compiled with -fsanitize=thread -Og -fno-omit-frame-pointer -g compiler flags. Thread sanitizer is a must for this kind of code.

Thanks. Great suggestion. Few things to consider.

  1. We do compile the test inside unit_tests executable. So one approach is to separate the lock test and compile it with its own flag.
  2. Another approach is use preprocessor to detect the compiler and only add those flags when the compiler is GCC. (I have to check whether these flags can be enabled on function specific basis with something like __attribute__.
  3. I have to rename the test. Since it is basically deadlock test, rather than consistency test.

@SChernykh
Copy link
Contributor

SChernykh commented Mar 5, 2024

I think it's better to separate it to another executable.
But you can try to hack CMakeLists.txt with something like

if (CMAKE_CXX_COMPILER_ID MATCHES GNU OR CMAKE_CXX_COMPILER_ID MATCHES Clang)
  set_source_files_properties(reader_writer_lock_tests.cpp PROPERTIES COMPILE_FLAGS "-fsanitize=thread -Og -fno-omit-frame-pointer -g")
endif()

But I'm not sure if such mix will work properly.

P.S. After a bit more thinking, it's still better to make it a separate binary and run the test with a timeout in CI builds, because it can potentially deadlock.

@0xFFFC0000
Copy link
Collaborator Author

I think it's better to separate it to another executable.

But you can try to hack CMakeLists.txt with something like


if (CMAKE_CXX_COMPILER_ID MATCHES GNU OR CMAKE_CXX_COMPILER_ID MATCHES Clang)

  set_source_files_properties(reader_writer_lock_tests.cpp PROPERTIES COMPILE_FLAGS "-fsanitize=thread -Og -fno-omit-frame-pointer -g")

endif()

But I'm not sure if such mix will work properly.

P.S. After a bit more thinking, it's still better to make it a separate binary and run the test with a timeout in CI builds, because it can potentially deadlock.

Separate binary is preferable. Sadly Gtest doesn't provide any kind of timeout. So we have to make it separate.

About Cmake hack. I am not sure, since all of those source files are going to combined to a object file and optimized by linker. So passing those flags on source file basis might not work. Worth trying but. Will try it.

@SChernykh
Copy link
Contributor

Gtest doesn't provide a timeout, but you can set the timeout in .yml files for the CI in https://github.com/monero-project/monero/tree/master/.github/workflows

@0xFFFC0000
Copy link
Collaborator Author

Gtest doesn't provide a timeout, but you can set the timeout in .yml files for the CI in https://github.com/monero-project/monero/tree/master/.github/workflows

Yes. I agree with you. Separate binary is more preferable here.

@jeffro256
Copy link
Contributor

Looks like this implementation still suffers from reproducible writer starvation (though I haven't debugged the reason this time). Here's a test case I wrote for it: jeffro256@3ae55c6. I wrote an alternative implementation for the recursive read/write lock in this commit here: jeffro256@98cb62a. It only needs 1 boost::shared_mutex per instance and a map of integers per thread (thread-local, doesn't need to be synchronized). There isn't any wait queue or additional mutexes or condition variables, so it should be a little more lightweight and easier to review IMO. It also fully implements Lockable and SharedLockable C++14 concepts.

@0xFFFC0000
Copy link
Collaborator Author

Looks like this implementation still suffers from reproducible writer starvation (though I haven't debugged the reason this time). Here's a test case I wrote for it: jeffro256@3ae55c6. I wrote an alternative implementation for the recursive read/write lock in this commit here: jeffro256@98cb62a. It only needs 1 boost::shared_mutex per instance and a map of integers per thread (thread-local, doesn't need to be synchronized). There isn't any wait queue or additional mutexes or condition variables, so it should be a little more lightweight and easier to review IMO. It also fully implements Lockable and SharedLockable C++14 concepts.

Just to double check. Are you sure you are using the new version? Because the lock I see in your test is the older one.

@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 5, 2024

Looks like this implementation still suffers from reproducible writer starvation (though I haven't debugged the reason this time). Here's a test case I wrote for it: jeffro256@3ae55c6.

I think your description is not accurate. I tried your test with the new implementation but was not able to show writer starvation. (Just make sure you are updating your clone.)

Here is a modified version of your test [1]. Which includes serialized prints to debug the order of requests and acquires. It perfectly demonstrates that the writer once requests the lock, goes in order. As it should be. You can increase the number of readers, and no reader is able to successfully lock after a writer requests lock. Please let me know if you think I am making a mistake somewhere.

I wrote an alternative implementation for the recursive read/write lock in this commit here: jeffro256@98cb62a. It only needs 1 boost::shared_mutex per instance and a map of integers per thread (thread-local, doesn't need to be synchronized). There isn't any wait queue or additional mutexes or condition variables, so it should be a little more lightweight and easier to review IMO. It also fully implements Lockable and SharedLockable C++14 concepts.

I have to debug this lock. As we discussed in our private conversation, the problem is not writing a lock, as it is a very simple task, the problem is supporting every use case of the recursive locking/unlocking we have in the APIs, while not changing much of the code inside Blockchain. I will create a different draft PR with your lock. And if passes the tests, we can benchmark it. To be fair though, I have not optimized mine. So there are many optimization opportunities I will add. ;)

  1. https://paste.debian.net/1309586/

@0xFFFC0000
Copy link
Collaborator Author

This repo contains performance benchmarks for two lock versions.

Mine is not optimized. But overall queue inefficiency does show itself in numbers. Although it is much simpler, overall this is not an accurate benchmark, but can give us a clue. For testing purposes, I will use @jeffro256 lock in PR next to see if it does all the tests of the Blockchain:blockchain.cpp.

image

reader_writer_lock lock is my version. Which has passed the tests.
recursive_shared_mutex is @jeffro256 submitted version. But haven't been tested yet.

@selsta
Copy link
Collaborator

selsta commented Mar 8, 2024

I tried this PR on a live node with heavy traffic and it causes one core to always be at 100%, which made the node almost unusable. I'll wait for @0xFFFC0000 to compare the two implementations and wait for more optimizations.

@0xFFFC0000 0xFFFC0000 force-pushed the dev/0xfffc/rw-lock-blockchain branch from 88673fb to 6284c1f Compare March 9, 2024 15:07
@0xFFFC0000
Copy link
Collaborator Author

0xFFFC0000 commented Mar 9, 2024

Changelog:

a. As the numbers from [1] show, @jeffro256 implementation of core lock was more performant than the original queue base lock. In this push, the lock is completely replaced by the new lock.
b. Removed extra places that the PR was locking. To make sure the semantical change to the monerod is zero. Made the PR much more easier to review.
c. Added thread sanitizer to tests as @SChernykh mentioned.
d. Made the test as a separate executable and added a timeout parameter.

  1. cryptonote_core: introducing rwlock mechanism #9181 (comment)

@0xFFFC0000 0xFFFC0000 force-pushed the dev/0xfffc/rw-lock-blockchain branch from 6284c1f to 859fb32 Compare March 9, 2024 15:15
@0xFFFC0000 0xFFFC0000 changed the title Add RecursiveReadWriteLock Locking cryptonote_core: introducing rwlock mechanism Mar 9, 2024
@0xFFFC0000
Copy link
Collaborator Author

Changing the title to reflect the PR name rules we have [1].

  1. https://github.com/monero-project/monero/pull/9210/files

0xFFFC0000 and others added 2 commits March 10, 2024 03:36
This will fix existing locking issues, and provide a foundation to implement
more fine-grained locking mechanisms in future works. reader_writer_lock include
wait_queue to prevent writer starvation. In this design, order
between read(s) and write will be preserved.

Co-authored-by: Jeffro <[email protected]>
With this PR, performance improvements of ReadWriteLock should be obvious.
@0xFFFC0000 0xFFFC0000 force-pushed the dev/0xfffc/rw-lock-blockchain branch from 859fb32 to dea2e4b Compare March 10, 2024 20:09
@hinto-janai
Copy link
Contributor

  • Will this PR extend to db_lmdb.cpp? (specifically, replacing the spinlock)
  • Are there measurements of total CPU time compared to release builds during something like initial sync?

I have an intuition that the spinning contributes to why monerod uses so much CPU time on high thread count systems compared to something like bitcoind - although I haven't timed any functions to prove this.

AFAIK, all active reader threads will be spinning while the writer writes, and the writer thread will be spinning until all readers exit. Even if for a brief moment, the CPU waste gets multiplied by the thread count.

@jeffro256
Copy link
Contributor

Which spinlock?

@hinto-janai
Copy link
Contributor

There's a struct acting as an atomic spinlock for multiple reads or 1 write:

struct mdb_txn_safe

which gets constructed (TXN_PREFIX_RDONLY macro here) and spinned on in some BlockchainLMDB functions:

uint64_t BlockchainLMDB::height() const
{
LOG_PRINT_L3("BlockchainLMDB::" << __func__);
check_open();
TXN_PREFIX_RDONLY();

@0xFFFC0000 I'm realizing that changing this too would add a lot more work, didn't mean to add pressure. Hopefully it can be changed in a different PR.

@vtnerd
Copy link
Contributor

vtnerd commented Apr 14, 2024

The writer case should only trigger when the mmap is being resized - so the readers are typically only stalled while an atomic counter is being incremented. So the wait should be fairly short.

Other projects in this situation will try to acquire the lock ~1000 times then yield CPU time. This might help in situations where the locking thread is suspended during the atomic increment. I doubt this happens often, but it could help a bit.

I wouldn't recommend anything more than a yield in this specific situation, unless I missed something else. And the difference in throughput is unlikely to be much, unless the OS frequently pauses executions during the short lock period.

@0xFFFC0000
Copy link
Collaborator Author

There's a struct acting as an atomic spinlock for multiple reads or 1 write:

struct mdb_txn_safe

which gets constructed (TXN_PREFIX_RDONLY macro here) and spinned on in some BlockchainLMDB functions:

uint64_t BlockchainLMDB::height() const
{
LOG_PRINT_L3("BlockchainLMDB::" << __func__);
check_open();
TXN_PREFIX_RDONLY();

@0xFFFC0000 I'm realizing that changing this too would add a lot more work, didn't mean to add pressure. Hopefully it can be changed in a different PR.

No pressure at all. Thanks for bringing it up actually. I am investigating it right now, and will talk about it with @vtnerd , I believe he is reference person in blockchain_db.

@@ -4662,7 +4646,7 @@ bool Blockchain::add_new_block(const block& bl, block_verification_context& bvc)
{

LOG_PRINT_L3("Blockchain::" << __func__);
crypto::hash id = get_block_hash(bl);
crypto::hash id = get_block_hash(bl);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert whitespace change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will be fixed in the new push.


return handle_block_to_main_chain(bl, id, bvc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert whitespace change

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will be fixed in the new push.

@@ -2202,7 +2200,7 @@ bool Blockchain::get_blocks(uint64_t start_offset, size_t count, std::vector<std
bool Blockchain::handle_get_objects(NOTIFY_REQUEST_GET_OBJECTS::request& arg, NOTIFY_RESPONSE_GET_OBJECTS::request& rsp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use RLOCK here, I'd recommended marking this method as const.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is great suggestion. At first, I was hesitant, but after a little bit of recollecting, that might introduce some tiny (but important) optimization + simplify the code (which we need).

I will add const qualifier for the methods we need. And do a benchmark with it that way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just an optimization. It is to prevent further bugs in the code when someone tries to modify something inside a read-only lock.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SChernykh exactly. I am adding these (const qualifiers) to the final version of this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants