-
-
Notifications
You must be signed in to change notification settings - Fork 497
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
macOS - fix deadlock on reqrep socket close #1824
Conversation
This looks like a good find. Furthermore, it looks like nni_aio_abort also suffers from the same flaw. I want to look at this in more detail later today before I move forward. |
I think I've convinced myself that this is precisely the right fix, and we just need to add the same change to the implementation of nni_aio_abort. |
i'm on it |
So there are other callers. Basically we also need the same logic in nni_aio_cancel, and I think nni_aio_fini. It looks like it was missed in all the paths where we tear down or abort an aio. |
when an `aio` has no `a_cancel_fn` and the task is in `task_prep` abort it on `nni_aio_stop` call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was an excellent find, and probably tricky to do so as well. Thank you for your contribution!
I'm merging this... the hang waiting for pipes to be empty feels like a missed cv_wake somewhere. I'll look for it later. I'm out of time for today. |
created a new ticket for this #1827 |
Coming back to this ... I'm now thinking that this change is responsible for a use-after-free. Essentially, we cannot simply "abort" the task, if there is no cancellation then the task has to run to completion. I'm now quite interesting to understand what the original hang was that this was supposed to fix. |
Ah, I think I see one possible problem. If we call nni_aio_abort() between a prep, and a schedule, that could lead to problems. Then schedule can return an error properly. |
when an
aio
has noa_cancel_fn
and the task is intask_prep
state abort it onnni_aio_stop
callfixes #1813 Deadlock during nng_close() - multi platform