Persistent collective communication request #2758

kawashima-fj · 2017-01-19T04:03:08Z

This PR adds the persistent collective communication request feature proposed in the MPI Forum.

The proposal in the MPI Forum and this work are both in progress. The purposes of this PR are:

Declare that Fujitsu is working on this feature and eliminate possible duplicated works by other people
Receive comments about my implementation
Enable evaluation of this feature by other people

Once the standardization in the MPI Forum progresses and my implementation completes, I'll update this PR and remove the WIP-DNM label.

How this feature is implemented is described in a comment of the ompi/mca/coll/nbpreq/coll_nbpreq.h file. This implementation does not focus on performance. Performance improvement can be achieved in other COLL component.

All MPI_*_INIT routines have a prefix MPIX_ instead of MPI_ and you need to include mpi-ext.h in your program to use this feature because this feature is not standardized yet. Requests created by MPIX_*_INIT can be passed to normal MPI_START and MPI_STARTALL.

Any comments are welcome!

Signed-off-by: KAWASHIMA Takahiro [email protected]

hjelmn · 2017-01-23T00:11:14Z

Are we ok on the size of ompi_predefined_communicator_t? We overran it when we added non-blocking.

kawashima-fj · 2017-01-23T07:32:58Z

@hjelmn Thanks. I've confirmed the size.

sizeof(struct ompi_predefined_communicator_t) is defined to be 1536 on 64-bit architecture and 768 on 32-bit architecture.

#define PREDEFINED_COMMUNICATOR_PAD (sizeof(void*) * 192)

struct ompi_predefined_communicator_t {
    struct ompi_communicator_t comm;
    char padding[PREDEFINED_COMMUNICATOR_PAD - sizeof(ompi_communicator_t)];
};

With the default configure options, I confirmed sizeof(ompi_communicator_t) of this PR source.

Platform	configure options	`sizeof(ompi_communicator_t)`
Linux/x86_64	`--disable-debug`	1312
Linux/x86_64	`--enable-debug`	1376
Linux/i386	`--disable-debug`	700
Linux/i386	`--enable-debug`	744

They all fit in the ompi_predefined_communicator_t. But they may have different values depending on platform or configure options.

If --with-max-object-name is used and MPI_MAX_OBJECT_NAME (default: 64, maximum: 256) is increased, sizeof(ompi_communicator_t) will increase.
If --enable-peruse is used, sizeof(ompi_communicator_t) will increase by sizeof(void *).
If sizeof(pthread_mutex_t) is larger than Linux/x86_64 (40) or Linux/i386 (24), sizeof(ompi_communicator_t) will increase.

Especially, increasing MPI_MAX_OBJECT_NAME will overrun the struct. Which method do you prefer?

Increase PREDEFINED_COMMUNICATOR_PAD at some point (break binary compatibility)
Change the member c_coll of ompi_communicator_t to pointer (mca_coll_base_comm_coll_t c_coll -> mca_coll_base_comm_coll_t *c_coll)

kawashima-fj · 2017-01-25T02:02:43Z

The current implementation has the same problem as #2151. Once it is fixed, I'll update this PR in a similar way.

ibm-ompi · 2017-04-11T08:21:56Z

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/3aed646317a83647afe31b3248975017

ibm-ompi · 2017-06-02T07:42:26Z

The IBM CI (GNU Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/8fc08ecc32dad6655013038464bbdc10

ibm-ompi · 2017-06-02T07:49:59Z

The IBM CI (XL Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/94991564daf6b803703ca776d93cfe68

ibm-ompi · 2017-06-02T08:01:12Z

The IBM CI (PGI Compiler) build failed! Please review the log, linked below.

Gist: https://gist.github.com/8b934227786cd0bd140a132c264c81d1

ggouaillardet · 2017-11-22T11:03:23Z

@kawashima-fj i noticed there is a ruby script. currently, Open MPI only requires perl (e.g. no python, php nor ruby). if we decide to move forward with this PR, would you consider rewriting the script in perl ?

@jsquyres any thoughts ?

ggouaillardet · 2017-11-22T11:35:10Z

:bot:retest

kawashima-fj · 2017-11-24T00:49:37Z

@ggouaillardet My Ruby script generates coll_nbpreq_request_h and the generated coll_nbpreq_request_his committed. So it is not needed when running autogen/configure/make. It is needed only by a maintainer of this component.

Signed-off-by: KAWASHIMA Takahiro <[email protected]>

The nbpreq COLL component implements the persistent collective communication request feature proposed at the MPI Forum. "nbpreq" is an abbreviation of "nonblocking persistent request". Signed-off-by: KAWASHIMA Takahiro <[email protected]>

Now nbpreq COLL was added and all `*_init` functions of the COLL interface are available. So let's enable the check of availability of those functions on a communicator creation. Signed-off-by: KAWASHIMA Takahiro <[email protected]>

Until the MPI Forum decides to add the persistent collective communication request feature to the MPI Standard, these functions are supported through MPI extensions with the `MPIX_` prefix. Only C bindings are supported currently. Signed-off-by: KAWASHIMA Takahiro <[email protected]>

kawashima-fj · 2017-11-24T07:47:18Z

I've updated the PR to reflect the latest MPI Standard draft (mpi-forum/mpi-issues#25), especially the addition of info argument to MPI_*_INIT routines.

Missing parts are:

MPI_ prefix instead of MPIX_ prefix (wait standardization in the MPI Forum)
Fortran binding (wait MPI_ prefix?)
man
Performance-optimized component (expect Libpnbc - persistent collectives for Open MPI #4515?)
Fix 'nonblocking3' BVT test fails #2151 as well as nonblocking collectives
More testing (lightly tested)

jsquyres · 2017-11-27T15:05:55Z

Portable Python or Perl would still be greatly preferred over Ruby.

By introducing Ruby, you're adding another requirement on our build servers, which we try hard not to do.

bwbarrett · 2017-11-27T15:22:22Z

well, it sounds like Ruby would only be needed when the header file is generated. But that begs more questions than it answers, like why we're committing generated code rather than generate it during "make dist" or (preferably) "make". Otherwise, we're just asking for that file to get modified along the way and then those modifications to be undone by rerunning the script for some other change.

ggouaillardet · 2017-11-27T16:03:36Z

@jsquyres does Open MPI currently requires python ?

We do not commit automatically generated files.

That being said, these files could be inside the dist tarball, so ruby would only be required when building from git, and not from tarball.
This is far from ideal, but at least, that would not add a burden to the end users.

bwbarrett · 2017-11-27T16:08:16Z

@ggouaillardet currently, we do not. But if you gave me the choice between python and ruby, the answer would be python. All of the build systems already have python installed (because we do use it for a number of admin things outside of actually building a tarball from git), but don't use ruby anywhere.

jsquyres · 2017-11-27T16:20:33Z

'zactly what Brian said. FWIW:

$ find . -name \*.pl | wc -l
      49
$ find . -name \*.py | wc -l
       8
$ find . -name \*.rb | wc -l
       0

I don't think any of the Python scripts are used during building, but (in general) Perl is kinda on the way out and Python is on the way in...

rhc54 · 2017-11-27T16:32:42Z

yeah, Python scripts are for running tests and stuff - not for building (at least, so far)

kawashima-fj · 2017-11-28T00:48:07Z

To be clear: This Ruby script is not required when running autogen/configure/make. It is used by a component maintainer when the structure of this component is changed or a new collective communication routine is added. And manual update of the generated code is not so painful.

The discussion points are:

Requiring Ruby when running autogen/configure/make. I also strongly agree not to require Ruby. As described above, this PR does not have this problem.
Requiring Ruby for maintenance of this component. Manual update of the header file is not so painful once the header file is generated. So I'll remove the Ruby script if the community prefer it.
Putting a Ruby script in the Open MPI source tree. Same as 2.
Putting a generated code in the Open MPI source tree. I'll convert the Ruby script to Python or Perl and call it from autogen.pl if the community prefer it. But give me some time to work it.

kawashima-fj · 2017-11-28T02:46:50Z

FYI: The component which includes the Ruby script will become unnecessary and will be removed when another component like #4515 becomes ready.

kawashima-fj · 2017-12-13T06:19:44Z

@ggouaillardet I am afraid I cannot fully understand your intent. I understand MPI_Start handles only persistent requests and passing a nonblocking collective request to MPI_Start is erroneous. But it is same for point-to-point communication. OMPI_REQUEST_PML is used for both nonblocking and persistent requests. Why should we distinguish them only in collective?

ggouaillardet · 2017-12-13T06:41:24Z

@kawashima-fj that is a good point.
@bosilca, should we check the request is a persistent one and fail otherwise ?

kawashima-fj · 2017-12-13T07:13:25Z

We have req_persistent flag in ompi_request_t already. Checking this flag in MPI_Start and MPI_Startall is sufficient.

current ompi/request/request.h:

struct ompi_request_t {
    opal_free_list_item_t super;                /**< Base type */
    ompi_request_type_t req_type;               /**< Enum indicating the type of the request */
    ompi_status_public_t req_status;            /**< Completion status */
    volatile void *req_complete;                /**< Flag indicating wether request has completed */
    volatile ompi_request_state_t req_state;    /**< enum indicate state of the request */
    bool req_persistent;                        /**< flag indicating if the this is a persistent request */
    int req_f_to_c_index;                       /**< Index in Fortran <-> C translation array */
    ompi_request_start_fn_t req_start;          /**< Called by MPI_START and MPI_STARTALL */
    ompi_request_free_fn_t req_free;            /**< Called by free */
    ompi_request_cancel_fn_t req_cancel;        /**< Optional function to cancel the request */
    ompi_request_complete_fn_t req_complete_cb; /**< Called when the request is MPI completed */
    void *req_complete_cb_data;
    ompi_mpi_object_t req_mpi_object;           /**< Pointer to MPI object that created this request */
};

ggouaillardet · 2017-12-13T07:36:40Z

@kawashima-fj got it, thanks !
i made ggouaillardet/ompi@a125778 for that purpose

kawashima-fj · 2017-12-13T08:16:29Z

@ggouaillardet Thanks. I'll integrate your commits. But what's the intent of #4618? #4618 is the work I wanted during the conversation in #4515. #4618 integrates this PR and the concept of #4515. If #4618 is intented to be merged, I'll throw away this PR and can collaborate in #4618.

bosilca · 2017-12-13T13:57:39Z

@ggouaillardet not sure about the validity of a check right now. Looking through the code base we are setting the persistent flag for different types of requests, now only for those that are persistent. But this looks like a bug.

kawashima-fj · 2018-02-26T02:58:00Z

#4618 obsolete this PR. Close.

kawashima-fj added enhancement ⚠️ WIP-DNM! labels Jan 19, 2017

kawashima-fj self-assigned this Jan 19, 2017

kawashima-fj force-pushed the pcollreq branch from e1d20cd to 8f03b88 Compare January 19, 2017 12:02

This was referenced Mar 2, 2017

pml/bfo: Correct a function name and header filenames #3085

Merged

coll: Update ompi/mca/coll/base/coll_base_functions.h #3086

Merged

kawashima-fj requested review from rhc54 and removed request for rhc54 April 6, 2017 07:30

kawashima-fj force-pushed the pcollreq branch from 8f03b88 to 1846c6d Compare June 2, 2017 06:53

kawashima-fj force-pushed the pcollreq branch from 1846c6d to 8b764f5 Compare June 2, 2017 10:24

kawashima-fj mentioned this pull request Jun 15, 2017

ompi/request: Support non-PML persistent requests #3701

Merged

jjhursey mentioned this pull request Nov 20, 2017

Libpnbc - persistent collectives for Open MPI #4515

Closed

kawashima-fj added 4 commits November 24, 2017 15:40

coll: Add persistent collective communication request feature

348d778

Signed-off-by: KAWASHIMA Takahiro <[email protected]>

coll/nbpreq: Add nbpreq COLL component

a789aeb

The nbpreq COLL component implements the persistent collective communication request feature proposed at the MPI Forum. "nbpreq" is an abbreviation of "nonblocking persistent request". Signed-off-by: KAWASHIMA Takahiro <[email protected]>

coll: Enable func check in mca_coll_base_comm_select

2d987af

Now nbpreq COLL was added and all `*_init` functions of the COLL interface are available. So let's enable the check of availability of those functions on a communicator creation. Signed-off-by: KAWASHIMA Takahiro <[email protected]>

kawashima-fj force-pushed the pcollreq branch from 8b764f5 to bd8d093 Compare November 24, 2017 06:58

ggouaillardet mentioned this pull request Dec 13, 2017

persistent collectives #4618

Merged

kawashima-fj closed this Feb 26, 2018

kawashima-fj deleted the pcollreq branch July 9, 2018 01:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent collective communication request #2758

Persistent collective communication request #2758

kawashima-fj commented Jan 19, 2017 •

edited

Loading

hjelmn commented Jan 23, 2017

kawashima-fj commented Jan 23, 2017

kawashima-fj commented Jan 25, 2017

ibm-ompi commented Apr 11, 2017

ibm-ompi commented Jun 2, 2017

ibm-ompi commented Jun 2, 2017

ibm-ompi commented Jun 2, 2017

ggouaillardet commented Nov 22, 2017

ggouaillardet commented Nov 22, 2017

kawashima-fj commented Nov 24, 2017

kawashima-fj commented Nov 24, 2017 •

edited

Loading

jsquyres commented Nov 27, 2017

bwbarrett commented Nov 27, 2017

ggouaillardet commented Nov 27, 2017

bwbarrett commented Nov 27, 2017

jsquyres commented Nov 27, 2017

rhc54 commented Nov 27, 2017

kawashima-fj commented Nov 28, 2017

kawashima-fj commented Nov 28, 2017

kawashima-fj commented Dec 13, 2017

ggouaillardet commented Dec 13, 2017

kawashima-fj commented Dec 13, 2017

ggouaillardet commented Dec 13, 2017

kawashima-fj commented Dec 13, 2017

bosilca commented Dec 13, 2017

kawashima-fj commented Feb 26, 2018

Persistent collective communication request #2758

Persistent collective communication request #2758

Conversation

kawashima-fj commented Jan 19, 2017 • edited Loading

hjelmn commented Jan 23, 2017

kawashima-fj commented Jan 23, 2017

kawashima-fj commented Jan 25, 2017

ibm-ompi commented Apr 11, 2017

ibm-ompi commented Jun 2, 2017

ibm-ompi commented Jun 2, 2017

ibm-ompi commented Jun 2, 2017

ggouaillardet commented Nov 22, 2017

ggouaillardet commented Nov 22, 2017

kawashima-fj commented Nov 24, 2017

kawashima-fj commented Nov 24, 2017 • edited Loading

jsquyres commented Nov 27, 2017

bwbarrett commented Nov 27, 2017

ggouaillardet commented Nov 27, 2017

bwbarrett commented Nov 27, 2017

jsquyres commented Nov 27, 2017

rhc54 commented Nov 27, 2017

kawashima-fj commented Nov 28, 2017

kawashima-fj commented Nov 28, 2017

kawashima-fj commented Dec 13, 2017

ggouaillardet commented Dec 13, 2017

kawashima-fj commented Dec 13, 2017

ggouaillardet commented Dec 13, 2017

kawashima-fj commented Dec 13, 2017

bosilca commented Dec 13, 2017

kawashima-fj commented Feb 26, 2018

kawashima-fj commented Jan 19, 2017 •

edited

Loading

kawashima-fj commented Nov 24, 2017 •

edited

Loading