-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: adjust locking in signing_shares avoid needing one main RecursiveMutex and instead using smaller regular Mutexes #6410
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linter complains
…ursiveMutex and instead using smaller regular Mutexes
c8362b3
to
6da2f28
Compare
feature_llmq_rotation.py failed in tsan CI; gonna run that locally a bit. |
std::thread workThread; | ||
CThreadInterrupt workInterrupt; | ||
|
||
SigShareMap<CSigShare> sigShares GUARDED_BY(cs); | ||
std::unordered_map<uint256, CSignedSession, StaticSaltedHasher> signedSessions GUARDED_BY(cs); | ||
Mutex cs_sigShares; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it became a bit over-complicated IMO, 7 mutexes in just one class. Does it indeed improve performance by any benchmark? Maybe you can localize some specific data that affect performance and put only that one by its own mutex?
It looks like DEADLOCKS ARE WELCOME every time you change something a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bitcoin has tried to move away from using recursive mutexes as much as possible; and generally we've tried to follow. I think if you tried to do this with 1 non-recursive mutex, you'd have a very bad time. Instead we introduce mutex for the actual data structures they need to guard, and use annotations to give us decent assurance there are no issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in theory it sounds good, but how to proof validness of each piece of data, if there no common lock? Is there any dependency between members? Is it true that each piece of member can be changed without side effect to other calculations in neighbors function?
Maybe it will work, but that's a nightmare to proof that there is jo hidden dependencies between members and review that we can't get invalid state now.
Yes, bitcoin is moving away from RecursiveMutex, but it's not like "give own mutex to each variable in the header", at least most times
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe collect logs Enter: lock contention.... started / completed
at least from some node to narrow down scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I agree... Please see 6417 as a much more minimal alternative
397a157 refactor: introduce cs_pendingSigns (pasta) Pull request description: ## Issue being fixed or feature implemented Much more minimal version of #6410; data from testnet while MNs were doing lots of instantsend locking showed minimal contention over signing_shares cs, however, the contention that did exist, most was a result of AsyncSign conflicting with something else. This should resolve the vast majority of contention in signing_shares while minimizing the complication / risk of introducing dead locks compared to 6410 ## What was done? introduce cs_pendingSigns ## How Has This Been Tested? Built locally ## Breaking Changes None ## Checklist: _Go over all the following points, and put an `x` in all the boxes that apply._ - [x] I have performed a self-review of my own code - [ ] I have commented my code, particularly in hard-to-understand areas - [ ] I have added or updated relevant unit/integration/functional/e2e tests - [ ] I have made corresponding changes to the documentation - [x] I have assigned this pull request to a milestone _(for repository code-owners and collaborators only)_ ACKs for top commit: knst: utACK 397a157 UdjinM6: utACK 397a157 Tree-SHA512: aac741c274449cf9c22625ac95e964d275dd1454647eaa2fc064c90b44a215e5509c9f9b8aceccbd49002edc21cbea883900f202204aa2138c104f4989ae5e26
What was done?
Refactor the locking / mutex usage in signing_shares to focus on smaller scopes / avoid recursive mutex
How Has This Been Tested?
Ran functional tests lots of times with tsan / enable-debug
Breaking Changes
None
Checklist: