-
-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock during nng_close() - multi platform #1813
Comments
@gdamore |
Apologies for the thread hijack, but I wanted to bring up an issue that aligns with this discussion, even though it seems to be exclusive to Windows on our side. We've been experiencing similar deadlocks on our end when closing IPC sockets (reqrep and pair sockets), happening 1 to 2 times daily in our QA. We've been using nng 1.7.3 but I also tried out 1.8.0 now to utilize the new logging functions I've been trying to reproduce the issue within our system and have managed to narrow it down to the later part of the Could anyone offer insights into potential debugging steps or suggest additional logging that might help us narrow down the root cause? Any assistance would be greatly appreciated. Thank you! |
Environment` Details: vs 2022
|
Does this occur when using synchronous APIs or only when using aio? If your callbacks are hanging that could cause this behavior. If you have callback functions on the socket can you share them? |
On my side using the synchronous API. I forgot to mention that I recently switched from nanomsg to nng and I'm using the compatibility layer |
Firstly, Apologies for my English. I'm very bullish on nng. |
I'm also seeing this running the tests in pair0_test.c |
Wait these are occurring with the included test suites not your own code??? Can I get details about the system you are using including OS, cpu, and compiler? Is this running in a cloud or virtual environment? |
Ok I see the details there. Will update as soon as I have a diagnosis. |
I have fixed the bug, source code
|
I can confirm that the above from leowang fixes the problem I see in the test suit. Of course I cannot tell yet if it fixes the issue within our system too. |
@gdamore can you please address the original issue I opened? Everything is detailed there, I don't want to lose context due to other examples from other users which should be in different issues. |
@mikisch81 can you share your code? I tried the code , but not reproduce. |
the issue reproduces fairly easily on both macOS and Windows based on the modified demo/reqrep example |
Please post your modified code |
|
the changes to make the issue more easily reproducible:
when the server closes the socket while there is a busy receive aio task - the system goes into deadlock |
the server never exits in normal flow - in my case it hangs after 20-30 iterations. |
consider saving the diff to a text file and applying it as a git diff - or just take the whole |
Sorry! Please forgive me for ignoring the for loop in the diff. vs 2022 result after running 10 minutes my code file is from https://gist.github.com/mikisch81/428c4ad87afcc1c8881b282cd5e80eb3 |
a potential fix is proposed in #1824 - the modified demo runs for quite a long time already on my end |
Describe the bug
Continued from the closed #1543
When calling
nng_close()
there is sometimes a deadlock which causesnng_close()
to hang.This happens also when using only sync APIs (no AIOs).
Expected behavior
nng_close
should finish successfully.Actual Behavior
nng_close()
hangs.To Reproduce
I created a modified version of the
reqrep
example code here (I use it with IPC transport): https://gist.github.com/mikisch81/428c4ad87afcc1c8881b282cd5e80eb3In the modified example in the client code right before calling
nng_recv
for the reply from the server I start a thread which just callsnng_close
, after a couple of successful runs the deadlock happen:Environment Details
Additional context
Here is a snapshot of the threads in the client app during the deadlock reproduction:
I recall that the initial suspect was an application callback which is not done:
So in this example code there is no application callback at all and only blocking APIs are called.
The expected result (as I understand) is that
nng_recv
will always return ECLOSED.The text was updated successfully, but these errors were encountered: