-
Notifications
You must be signed in to change notification settings - Fork 886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.10: coll/libnbc: fix race condition with multi threaded apps #2443
Conversation
@ggouaillardet This code is pretty complicated, and honestly it deserves a comment on why this approach is correct. I haven't had time to dig deep into the code but from a quick look I am not sure about the correctness of this code in a multi-threaded scenario. You added a mutex to protect the modifications of the active request list, but because this list can be modified outside the progress function, your local next (the one protected by the newly added mutex) might become stale. However, if the progress is the only function allowed to remove active requests, then my comment is not holding and the proposed code is correct (but deserves documentation). |
We can keep the discussion here (moving my question from PR #2441): I'm curious why you didn't just reuse ... Thinking about this a bit more, and let me know if I'm correct here... |
@bosilca the MPI layer @jjhursey my understanding is that |
@ggouaillardet This commit looks correct to me. It protects the @bosilca The progress function is the only place where requests are removed from this structure so I think the locking is safe. I think that all this PR needs is a comment about what this lock is protecting. Maybe inside the |
@jjhursey thanks ! wil add comment today, sorry for the delay |
protect the mca_coll_libnbc_component.active_requests list with the new mca_coll_libnbc_component.lock mutex. Thanks Jie Hu for the report Signed-off-by: Gilles Gouaillardet <[email protected]> (back-ported from commit open-mpi/ompi@2c94a3a)
no code change Signed-off-by: Gilles Gouaillardet <[email protected]> (cherry picked from commit open-mpi/ompi@1509816)
53ce62f
to
ca12d2c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lovely. This looks ready to go !
protect the mca_coll_libnbc_component.active_requests list with
the new mca_coll_libnbc_component.lock mutex.
Thanks Jie Hu for the report
Signed-off-by: Gilles Gouaillardet [email protected]
(back-ported from commit 2c94a3a)