-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completely removed ompi_request_lock and ompi_request_cond #2448
Conversation
… need them anymore. Signed-off-by: Thananon Patinyasakdikul <[email protected]>
1b6b322
to
b25a8c3
Compare
coll_request->super.req_complete = true; | ||
opal_condition_broadcast(&ompi_request_cond); | ||
IBOFFLOAD_VERBOSE(10, ("After opal_condition_broadcast.\n")); | ||
OPAL_ATOMIC_SWAP_PTR(&coll_request->super.reg_complete, REQUEST_COMPLETED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has lost the cond bcast semantic. Probably it needs an update_sync instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original code already had an issue. The request is marked as completed and the waiting thread is informed. As the request is an MPI request, it will eventually be freed by the calling thread. However, few lines below, the coll_request is passed to handle_collfrag_done, which also put the request in an internal freelist, without checking the status of the user level request.
This component has no owner and is not maintained. I would propose we scrap it out completely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree - let's suggest it at next telecon and see if there are any objections
@@ -36,8 +36,6 @@ opal_pointer_array_t ompi_request_f_to_c_table = {{0}}; | |||
size_t ompi_request_waiting = 0; | |||
size_t ompi_request_completed = 0; | |||
size_t ompi_request_failed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also remove all these lingering globals that have no reason to exist anymore (ompi_request_waiting, ompi_request_completed, ompi_request_failed and ompi_request_poll)
…riable. Signed-off-by: Thananon Patinyasakdikul <[email protected]>
@rhc54 I am onto testing this patch on my side right now. It takes some time because we dont have MXM locally here. |
@thananon How's it going on this PR? |
There are some parts of the code that needs discussion because it is obsolete, unused and clearly wrong. We might as well remove that part but its not our teritory. @bosilca told me to wait on the weekly telecon discussion. I'm not sure about the status. |
@bosilca What do we need to discuss on this issue on the weekly telecon? |
We need to decide what we do with iboffload. |
Unless someone raises their hand to take responsibility for it, I'd say we remove it - there is precedence for such action. |
IIRC, Perhaps this is the nail in the coffin for which we should remove:
? |
Probably you want to check with ORNL folks - @manjugv |
@manjugv hasn't been active in a long while with Open MPI, and I couldn't find a github ID for Geoffroy Vallee -- so I emailed him off-list asking him to comment here. 😄 |
@jsquyres @rhc54 Thanks for asking. iboffload can be removed. We have worked on other parts (coll/ml, sbgp) internally at ORNL last year, on improving the initialization performance and scalability with encouraging results. Unfortunately, I personally don’t have the cycles to push it or maintain the code. So, I can’t justify keeping this code as a part of main tree, particularly, if it is causing issues for others and if we don’t have volunteers to maintain it. As a courtesy, I would also ask @hjelmn 's opinion. |
If we decide to go with iboffload removal, this PR can be merged right away. We can have another PR to remove iboffload. |
I am okay with removal of ibofflaod. |
I'm going to go ahead and commit this, then, and add the PR to remove the other areas so that @matcabral can see if this resolves the problem. |
Hah -- never mind, it's already been merged. 😄 @rhc54 @matcabral Now that this has been merged, it would be interesting to see if this affects the performance measured in #2644. |
ompi_request_lock and ompi_request_cond was meant to be completely removed from the request refactoring. This PR removed all of them. We don't need to lock before calling ompi_request_complete() anymore.
The removal will prevent the confusion and wrong assumption in the future.
However there's still some part in iboffload and cm_request_free that I'm not familiar with the code base. So I changed the code the way I think it should be. Please review.