-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ Client Crashes on ClientConductor::onInterServiceTimeout #371
Comments
If we were to delay the call to MemoryMappedFile::cleanUp() X ms after the actual ClientConductor::onInterServiceTimeout then we could avoid this crash. This period (X ms) would represent the maximum amount of execution time for a single call to Publication::offer or the time elapsed between calling Publication::tryClaim and BufferClaim.commit |
For reference, I'm talking about the following code when I talk about the management of Image's log buffers has lingering resources: void ClientConductor::onUnavailableImage(
} |
cc @mjpt777 Lingering doesn't solve the underlying issue. The same thing exists in the Java version, I do believe. Lingering simply moves the time horizon. At its heart this is a race between the munmap due to the inter service timeout and the BufferClaim commit/abort operations. |
The Java code does not call the unavailable handlers when a forced close happens. I've also just pushed a change that will linger the resources for 1ms on a normal close and 1s on an inter service timeout. |
I will reflect in C++ in the next couple days if not sooner. Also, I want to make the C++ API have the agent invoker type option soon. |
I agree that lingering only reduce the probability of having this issue. If we were to store a smart ptr in the Publication instance returned by findPublication then the application would control the lifetime of the logbuffers without possible race. Anything I am missing here? |
@goglusid Hmmm. Very very good point. That might work. Will give it a think. Yeah, that might be a nice way to handle it. Might also be usable for Java as well. Keep it around until |
@tmontgomery I meant keep it around until Publication::~Publication |
Agreed. Was thinking about Java as well. Which requires an explicit close of the Publication instead of it simply going out of scope. |
…ivePublication to keep mapping around while in scope. For #371. Updated naming and layout for subcriber position in available image.
@goglusid go ahead and see about this now. The Publication (and ExclusivePublication) have a shared_ptr to the LogBuffers. So, this should be cleaner now. |
@tmontgomery Your awesomeness knows no bounds! ;p Problem solved. Thanks :D |
Thanks! No worries! We'll be making some other changes in this area shortly as well. |
@tmontgomery Could you please elaborate a bit on the other changes in this area? |
Experimenting with reference counting the mappings for #365 so multiple mappings are not needed. Also want to add the agent invoker style thread control to C++. And also change the mapping flags. |
When the following stack of functions are executed, if the C++ client still has pointers on the log buffers then it crashes.
Following is how it can happen:
Thread#1: Call Publication::tryClaim
Thread#1: Use the BufferClaim...
Thread#2[ConductorThread]: Detects a timeout and execute the following stack.
Thread#1: Calls BufferClaim.commit();
Obviously, here I'm debugging so I reach the 5 seconds timeout.
That being said, to be thread safe it seems that there's a need to managed the MemoryMappedFiles has lingering resources like the subscription's images.
aeron::util::MemoryMappedFile::cleanUp() Line 206
aeron::util::MemoryMappedFile::~MemoryMappedFile() Line 219
std::_Ref_countaeron::util::MemoryMappedFile::_Destroy() Line 578 + 0x23 bytes
std::_Ref_count_base::_Decref() Line 538
std::vector<std::shared_ptraeron::util::MemoryMappedFile,std::allocator<std::shared_ptraeron::util::MemoryMappedFile > >::_Destroy(std::shared_ptraeron::util::MemoryMappedFile * _First=0x00549310, std::shared_ptraeron::util::MemoryMappedFile * _Last=0x00549318) Line 1885 + 0x40 bytes
std::vector<std::shared_ptraeron::util::MemoryMappedFile,std::allocator<std::shared_ptraeron::util::MemoryMappedFile > >::_Tidy() Line 1952
aeron::LogBuffers::~LogBuffers() Line 84 + 0x56 bytes
aeron::LogBuffers::`scalar deleting destructor'() + 0xf bytes
std::_Ref_count_objaeron::LogBuffers::_Destroy() Line 1327
std::_Ref_count_base::_Decref() Line 538 aeron::ClientConductor::PublicationStateDefn::~PublicationStateDefn() + 0x65 bytes
std::vector<aeron::ClientConductor::PublicationStateDefn,std::allocatoraeron::ClientConductor::PublicationStateDefn >::clear() Line 1616 + 0x64 bytes
aeron::ClientConductor::onInterServiceTimeout(__int64 now=1499112928196) Line 548
aeron::ClientConductor::onHeartbeatCheckTimeouts() Line 303
aeron::concurrent::AgentRunneraeron::ClientConductor,aeron::concurrent::SleepingIdleStrategy::run() Line 64 + 0x2e bytes
The text was updated successfully, but these errors were encountered: