Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely removed ompi_request_lock and ompi_request_cond #2448

Merged
merged 2 commits into from
Jan 4, 2017

Conversation

thananon
Copy link
Member

ompi_request_lock and ompi_request_cond was meant to be completely removed from the request refactoring. This PR removed all of them. We don't need to lock before calling ompi_request_complete() anymore.

The removal will prevent the confusion and wrong assumption in the future.
However there's still some part in iboffload and cm_request_free that I'm not familiar with the code base. So I changed the code the way I think it should be. Please review.

@thananon thananon closed this Nov 22, 2016
@thananon thananon reopened this Nov 22, 2016
… need them anymore.

Signed-off-by: Thananon Patinyasakdikul <[email protected]>
@thananon thananon force-pushed the remove_request_lock branch from 1b6b322 to b25a8c3 Compare November 22, 2016 22:58
coll_request->super.req_complete = true;
opal_condition_broadcast(&ompi_request_cond);
IBOFFLOAD_VERBOSE(10, ("After opal_condition_broadcast.\n"));
OPAL_ATOMIC_SWAP_PTR(&coll_request->super.reg_complete, REQUEST_COMPLETED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has lost the cond bcast semantic. Probably it needs an update_sync instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original code already had an issue. The request is marked as completed and the waiting thread is informed. As the request is an MPI request, it will eventually be freed by the calling thread. However, few lines below, the coll_request is passed to handle_collfrag_done, which also put the request in an internal freelist, without checking the status of the user level request.

This component has no owner and is not maintained. I would propose we scrap it out completely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - let's suggest it at next telecon and see if there are any objections

@@ -36,8 +36,6 @@ opal_pointer_array_t ompi_request_f_to_c_table = {{0}};
size_t ompi_request_waiting = 0;
size_t ompi_request_completed = 0;
size_t ompi_request_failed = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should also remove all these lingering globals that have no reason to exist anymore (ompi_request_waiting, ompi_request_completed, ompi_request_failed and ompi_request_poll)

@rhc54
Copy link
Contributor

rhc54 commented Dec 1, 2016

@thananon @bosilca Can we commit this and then (after soak) bring it to the v2.x? I'd like to see if it solves the performance issues over there.

@thananon
Copy link
Member Author

thananon commented Dec 1, 2016

@rhc54 I am onto testing this patch on my side right now. It takes some time because we dont have MXM locally here.

@jsquyres
Copy link
Member

@thananon How's it going on this PR?

@thananon
Copy link
Member Author

There are some parts of the code that needs discussion because it is obsolete, unused and clearly wrong. We might as well remove that part but its not our teritory.

@bosilca told me to wait on the weekly telecon discussion.

I'm not sure about the status.

@jsquyres
Copy link
Member

jsquyres commented Jan 3, 2017

@bosilca What do we need to discuss on this issue on the weekly telecon?

@bosilca
Copy link
Member

bosilca commented Jan 3, 2017

We need to decide what we do with iboffload.

@rhc54
Copy link
Contributor

rhc54 commented Jan 3, 2017

Unless someone raises their hand to take responsibility for it, I'd say we remove it - there is precedence for such action.

@jsquyres
Copy link
Member

jsquyres commented Jan 3, 2017

IIRC, bcol (which contains iboffload) and sbgp are only used by coll/ml, which is ompi_ignored. ORNL has been unwilling to un-ignore it / maintain it properly.

Perhaps this is the nail in the coffin for which we should remove:

  • bcol
  • sbgp
  • coll/ml

?

@shamisp
Copy link
Contributor

shamisp commented Jan 3, 2017

Probably you want to check with ORNL folks - @manjugv

@jsquyres
Copy link
Member

jsquyres commented Jan 3, 2017

@manjugv hasn't been active in a long while with Open MPI, and I couldn't find a github ID for Geoffroy Vallee -- so I emailed him off-list asking him to comment here. 😄

@manjugv
Copy link
Contributor

manjugv commented Jan 3, 2017

@jsquyres @rhc54 Thanks for asking. iboffload can be removed. We have worked on other parts (coll/ml, sbgp) internally at ORNL last year, on improving the initialization performance and scalability with encouraging results. Unfortunately, I personally don’t have the cycles to push it or maintain the code. So, I can’t justify keeping this code as a part of main tree, particularly, if it is causing issues for others and if we don’t have volunteers to maintain it.

As a courtesy, I would also ask @hjelmn 's opinion.

@thananon
Copy link
Member Author

thananon commented Jan 4, 2017

If we decide to go with iboffload removal, this PR can be merged right away. We can have another PR to remove iboffload.

@hppritcha
Copy link
Member

I am okay with removal of ibofflaod.

@rhc54
Copy link
Contributor

rhc54 commented Jan 4, 2017

I'm going to go ahead and commit this, then, and add the PR to remove the other areas so that @matcabral can see if this resolves the problem.

@jsquyres
Copy link
Member

jsquyres commented Jan 4, 2017

@thananon @bosilca @manjugv @gvallee iboffload (and friends) have been removed. This PR is now clear to rebase / merge.

@jsquyres
Copy link
Member

jsquyres commented Jan 4, 2017

Hah -- never mind, it's already been merged. 😄

@rhc54 @matcabral Now that this has been merged, it would be interesting to see if this affects the performance measured in #2644.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants