-
Notifications
You must be signed in to change notification settings - Fork 885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persistent collective communication request #2758
Conversation
e1d20cd
to
8f03b88
Compare
Are we ok on the size of ompi_predefined_communicator_t? We overran it when we added non-blocking. |
@hjelmn Thanks. I've confirmed the size.
#define PREDEFINED_COMMUNICATOR_PAD (sizeof(void*) * 192)
struct ompi_predefined_communicator_t {
struct ompi_communicator_t comm;
char padding[PREDEFINED_COMMUNICATOR_PAD - sizeof(ompi_communicator_t)];
}; With the default configure options, I confirmed
They all fit in the
Especially, increasing
|
The current implementation has the same problem as #2151. Once it is fixed, I'll update this PR in a similar way. |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/3aed646317a83647afe31b3248975017 |
The IBM CI (GNU Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/8fc08ecc32dad6655013038464bbdc10 |
The IBM CI (XL Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/94991564daf6b803703ca776d93cfe68 |
The IBM CI (PGI Compiler) build failed! Please review the log, linked below. Gist: https://gist.github.com/8b934227786cd0bd140a132c264c81d1 |
@kawashima-fj i noticed there is a @jsquyres any thoughts ? |
:bot:retest |
@ggouaillardet My Ruby script generates |
Signed-off-by: KAWASHIMA Takahiro <[email protected]>
The nbpreq COLL component implements the persistent collective communication request feature proposed at the MPI Forum. "nbpreq" is an abbreviation of "nonblocking persistent request". Signed-off-by: KAWASHIMA Takahiro <[email protected]>
Now nbpreq COLL was added and all `*_init` functions of the COLL interface are available. So let's enable the check of availability of those functions on a communicator creation. Signed-off-by: KAWASHIMA Takahiro <[email protected]>
Until the MPI Forum decides to add the persistent collective communication request feature to the MPI Standard, these functions are supported through MPI extensions with the `MPIX_` prefix. Only C bindings are supported currently. Signed-off-by: KAWASHIMA Takahiro <[email protected]>
8b764f5
to
bd8d093
Compare
I've updated the PR to reflect the latest MPI Standard draft (mpi-forum/mpi-issues#25), especially the addition of Missing parts are:
|
Portable Python or Perl would still be greatly preferred over Ruby. By introducing Ruby, you're adding another requirement on our build servers, which we try hard not to do. |
well, it sounds like Ruby would only be needed when the header file is generated. But that begs more questions than it answers, like why we're committing generated code rather than generate it during "make dist" or (preferably) "make". Otherwise, we're just asking for that file to get modified along the way and then those modifications to be undone by rerunning the script for some other change. |
@jsquyres does Open MPI currently requires python ? We do not commit automatically generated files. That being said, these files could be inside the dist tarball, so ruby would only be required when building from git, and not from tarball. |
@ggouaillardet currently, we do not. But if you gave me the choice between python and ruby, the answer would be python. All of the build systems already have python installed (because we do use it for a number of admin things outside of actually building a tarball from git), but don't use ruby anywhere. |
'zactly what Brian said. FWIW:
I don't think any of the Python scripts are used during building, but (in general) Perl is kinda on the way out and Python is on the way in... |
yeah, Python scripts are for running tests and stuff - not for building (at least, so far) |
To be clear: This Ruby script is not required when running autogen/configure/make. It is used by a component maintainer when the structure of this component is changed or a new collective communication routine is added. And manual update of the generated code is not so painful. The discussion points are:
|
FYI: The component which includes the Ruby script will become unnecessary and will be removed when another component like #4515 becomes ready. |
@ggouaillardet I am afraid I cannot fully understand your intent. I understand |
@kawashima-fj that is a good point. |
We have current struct ompi_request_t {
opal_free_list_item_t super; /**< Base type */
ompi_request_type_t req_type; /**< Enum indicating the type of the request */
ompi_status_public_t req_status; /**< Completion status */
volatile void *req_complete; /**< Flag indicating wether request has completed */
volatile ompi_request_state_t req_state; /**< enum indicate state of the request */
bool req_persistent; /**< flag indicating if the this is a persistent request */
int req_f_to_c_index; /**< Index in Fortran <-> C translation array */
ompi_request_start_fn_t req_start; /**< Called by MPI_START and MPI_STARTALL */
ompi_request_free_fn_t req_free; /**< Called by free */
ompi_request_cancel_fn_t req_cancel; /**< Optional function to cancel the request */
ompi_request_complete_fn_t req_complete_cb; /**< Called when the request is MPI completed */
void *req_complete_cb_data;
ompi_mpi_object_t req_mpi_object; /**< Pointer to MPI object that created this request */
}; |
@kawashima-fj got it, thanks ! |
@ggouaillardet not sure about the validity of a check right now. Looking through the code base we are setting the persistent flag for different types of requests, now only for those that are persistent. But this looks like a bug. |
#4618 obsolete this PR. Close. |
This PR adds the persistent collective communication request feature proposed in the MPI Forum.
The proposal in the MPI Forum and this work are both in progress. The purposes of this PR are:
Once the standardization in the MPI Forum progresses and my implementation completes, I'll update this PR and remove the WIP-DNM label.
How this feature is implemented is described in a comment of the
ompi/mca/coll/nbpreq/coll_nbpreq.h
file. This implementation does not focus on performance. Performance improvement can be achieved in other COLL component.All
MPI_*_INIT
routines have a prefixMPIX_
instead ofMPI_
and you need to includempi-ext.h
in your program to use this feature because this feature is not standardized yet. Requests created byMPIX_*_INIT
can be passed to normalMPI_START
andMPI_STARTALL
.Any comments are welcome!
Signed-off-by: KAWASHIMA Takahiro [email protected]