You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
komodod executable sometimes hangs with its rpcs becoming not responsible. No crash happens though, the process remains in memory and the only option is to kill it.
The backtrace analysis with gdb debugger shows that it seems to be a deadlock:
threads 24 and 23 are waiting on two different critical sections (cs) but appear to have already made locks on those cs mutually:
Thread 23 (processing incoming messages) is waiting on the pqueue->ControlMutex in CCheckQueueControl::CCheckQueueControl() and it should have made a lock on cs_main already in ActivateBestChain():
Thread 24 (mining thread) is waiting on cs_main but it should have made a lock on pqueue->ControlMutex in ConnectBlock():
Thread 24 (Thread 0x7fcc367fc700 (LWP 7828)):
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fcced9e70f4 in __GI___pthread_mutex_lock (
mutex=mutex@entry=0x561e336a5a20 <cs_main>) at ../nptl/pthread_mutex_lock.c:115
#2 0x0000561e324f58b9 in boost::posix::pthread_mutex_lock (m=0x561e336a5a20 <cs_main>)
at /home/ubuntu/komodo/depends/x86_64-unknown-linux-gnu/share/../include/boost/thread/pthread/pthread_helpers.hpp:79
#3 boost::recursive_mutex::lock (this=0x561e336a5a20 <cs_main>)
at /home/ubuntu/komodo/depends/x86_64-unknown-linux-gnu/share/../include/boost/thread/pthread/recursive_mutex.hpp:108
#4 AnnotatedMixin<boost::recursive_mutex>::lock (this=0x561e336a5a20 <cs_main>)
at sync.h:76
#5 boost::unique_lock<AnnotatedMixin<boost::recursive_mutex> >::lock (this=0x7fcc367f6740)
at /home/ubuntu/komodo/depends/x86_64-unknown-linux-gnu/share/../include/boost/thread/lock_types.hpp:346
#6 CMutexLock<AnnotatedMixin<boost::recursive_mutex> >::Enter (nLine=<optimized out>,
pszFile=<optimized out>, pszName=<optimized out>, this=0x7fcc367f6740) at sync.h:132
#7 CMutexLock<AnnotatedMixin<boost::recursive_mutex> >::CMutexLock (this=0x7fcc367f6740,
mutexIn=..., pszName=<optimized out>, pszFile=<optimized out>, nLine=<optimized out>,
fTry=<optimized out>) at sync.h:153
#8 0x0000561e3251c94e in GetSpendHeight (inputs=...) at main.cpp:2764
#9 0x0000561e32526afd in ContextualCheckInputs (tx=..., state=..., inputs=...,
fScriptChecks=<optimized out>, flags=flags@entry=513,
cacheStore=cacheStore@entry=false, txdata=..., consensusParams=...,
consensusBranchId=1991772603, pvChecks=0x7fcc367f6b70) at main.cpp:2884
#10 0x0000561e3254d440 in ConnectBlock (block=..., state=..., pindex=<optimized out>,
pindex@entry=0x7fcc367f71d0, view=..., fJustCheck=fJustCheck@entry=true,
fCheckPOW=fCheckPOW@entry=true) at main.cpp:3676
#11 0x0000561e32560be9 in TestBlockValidity (state=..., block=...,
pindexPrev=0x7fcc248a1e30, fCheckPOW=fCheckPOW@entry=true,
fCheckMerkleRoot=fCheckMerkleRoot@entry=false) at main.cpp:5853
#12 0x0000561e325b58cb in <lambda(std::vector<unsigned char, std::allocator<unsigned char> >)>::operator()(std::vector<unsigned char, std::allocator<unsigned char> >) const (
__closure=0x7fcc1875c480, soln=std::vector of length 1344, capacity 1344 = {...})
at miner.cpp:1564
#13 0x0000561e325b6829 in std::_Function_handler<bool(std::vector<unsigned char, std::allocator<unsigned char> >), BitcoinMiner(CWallet*)::<lambda(std::vector<unsigned char, std::allocator<unsigned char> >)> >::_M_invoke(const std::_Any_data &, std::vector<unsigned char, std::allocator<unsigned char> > &&) (__functor=..., __args#0=...)
---Type <return> to continue, or q <return> to quit---
at /usr/include/c++/7/bits/std_function.h:302
#14 0x0000561e325c2fee in std::function<bool (std::vector<unsigned char, std::allocator<unsigned char> >)>::operator()(std::vector<unsigned char, std::allocator<unsigned char> >) const (this=this@entry=0x7fcc367f8240, __args#0=std::vector of length 0, capacity 0)
at /usr/include/c++/7/bits/std_function.h:706
#15 0x0000561e325befd3 in BitcoinMiner (pwallet=<optimized out>) at miner.cpp:1628
#16 0x0000561e325bfd0c in boost::_bi::list1<boost::_bi::value<CWallet*> >::operator()<void (*)(CWallet*), boost::_bi::list0> (a=<synthetic pointer>..., f=<optimized out>,
this=<optimized out>)
at /home/ubuntu/komodo/depends/x86_64-unknown-linux-gnu/share/../include/boost/bind/bind.hpp:259
#17 boost::_bi::bind_t<void, void (*)(CWallet*), boost::_bi::list1<boost::_bi::value<CWallet*> > >::operator() (this=<optimized out>)
at /home/ubuntu/komodo/depends/x86_64-unknown-linux-gnu/share/../include/boost/bind/bind.hpp:1294
#18 boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(CWallet*), boost::_bi::list1<boost::_bi::value<CWallet*> > > >::run (this=<optimized out>)
at /home/ubuntu/komodo/depends/x86_64-unknown-linux-gnu/share/../include/boost/thread/detail/thread.hpp:120
#19 0x0000561e32baa43c in thread_proxy ()
#20 0x00007fcced9e46db in start_thread (arg=0x7fcc367fc700) at pthread_create.c:463
#21 0x00007fccecd2461f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
It looks like the root of the issue is that there is lack of synchronisation of those two threads entering into ConnectBlock() so they create a dead lock. It seems there is a missing lock on the cs_main critical section before TestBlockValidity call in the ValidBlock lambda function in miner.cpp, this lock would provide the needed synchronisation.
More explanation on the issue:
The deadlock occurs because two threads, one is calling ProcessMessages() and another is a miner thread calling TestBlockValidity(), both try to acquire locks on cs_main and pqueue->ControlMutex in reverse order.
To prevent that a LOCK(cs_main) is supposed before TestBlockValidity() call in miner.cpp which is not made.
komodod executable sometimes hangs with its rpcs becoming not responsible. No crash happens though, the process remains in memory and the only option is to kill it.
The backtrace analysis with gdb debugger shows that it seems to be a deadlock:
threads 24 and 23 are waiting on two different critical sections (cs) but appear to have already made locks on those cs mutually:
Thread 23 (processing incoming messages) is waiting on the pqueue->ControlMutex in CCheckQueueControl::CCheckQueueControl() and it should have made a lock on cs_main already in ActivateBestChain():
Thread 24 (mining thread) is waiting on cs_main but it should have made a lock on pqueue->ControlMutex in ConnectBlock():
It looks like the root of the issue is that there is lack of synchronisation of those two threads entering into ConnectBlock() so they create a dead lock. It seems there is a missing lock on the cs_main critical section before TestBlockValidity call in the ValidBlock lambda function in miner.cpp, this lock would provide the needed synchronisation.
there is a related issue #483
The text was updated successfully, but these errors were encountered: