Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address MPI 4 errata for MPI_Psend_init and MPI_Precv_init #11967

Closed
cniethammer opened this issue Oct 4, 2023 · 33 comments
Closed

Address MPI 4 errata for MPI_Psend_init and MPI_Precv_init #11967

cniethammer opened this issue Oct 4, 2023 · 33 comments

Comments

@cniethammer
Copy link
Contributor

Background information

There is a backward incompatible Errata announced for MPI 4.0 affecting the Interfaces/ABI for MPI_Psend_init and MPI_Precv_init. (see mpi-forum/mpi-issues#765)

As the 5.0 release will provide the partitioned P2P interface in Open MPI for the first time, I would hope that this is directly what will be in MPI 4.1 / MPI 4.0+errata.

@awlauria
Copy link
Contributor

awlauria commented Oct 4, 2023

So this has been voted on/approved/merged?

Is this still accurate from what was approved/merged?

Rename the current MPI_P(send|recv)_init to MPI_P(send|recv)_c and add another binding for MPI_P(send|recv)_init that uses an int count|INTEGER count

@jsquyres
Copy link
Member

jsquyres commented Oct 4, 2023

How do we want to handle this -- should we have both the old and the new APIs?

@awlauria
Copy link
Contributor

awlauria commented Oct 4, 2023

in my opinion since v5 hasn't released we should just go with the new way.

I don't think there's an obligation to maintain compatibility with prior release candidates?

@devreal
Copy link
Contributor

devreal commented Oct 4, 2023

FWIW, there was an agreement on the proposed errata text and the necessary changes but the Forum has yet to vote on it. The next voting meeting will be October 31-November 3: https://www.mpi-forum.org/meetings/2023/10/agenda (Issue 765)

@awlauria
Copy link
Contributor

awlauria commented Oct 4, 2023

Oh fun. This complicates things.

Is this expected to pass?

@jsquyres
Copy link
Member

jsquyres commented Oct 4, 2023

@awlauria The old way is published in MPI-4.0. So it is the law of the land, and has been for a little while now. If the errata gets passed in the next meeting, it is supposed to take effect immediately (i.e., not have to wait for a formal MPI-4.1 release).

I guess the question is: are there any real apps out there that are using the old APIs?

E.g., what's MPICH doing about this?

Also: I didn't read what the Forum is doing: are they killing the old APIs entirely, or deprecating them?

@awlauria
Copy link
Contributor

awlauria commented Oct 4, 2023

Right, but 5.0.0 hasn't been released yet, and is the first open mpi release with partitioned communication.

Even if apps are out there using it, do we have an obligation to not break abi with prior release candidates?

I think the correct answer is, if we want to incur more technical debt we can have both and release faster. or we can wait for the result of the vote and not incur the debt. I don't care - @janjust thoughts?

Actually - what if the vote does not pass? It seems unlikely - but if it does that leaves us in a bad spot if the api was added.

@qkoziol
Copy link
Contributor

qkoziol commented Oct 4, 2023

Based on the progress made at the last few MPI Forum meetings, the errata seems like to pass.

I don't think we should hold to backward compatibility for previous release cndidates.

So, I would suggest adopting the new version in 5.0.

@qkoziol
Copy link
Contributor

qkoziol commented Oct 4, 2023

@awlauria The old way is published in MPI-4.0. So it is the law of the land, and has been for a little while now. If the errata gets passed in the next meeting, it is supposed to take effect immediately (i.e., not have to wait for a formal MPI-4.1 release).

I guess the question is: are there any real apps out there that are using the old APIs?

E.g., what's MPICH doing about this?

Also: I didn't read what the Forum is doing: are they killing the old APIs entirely, or deprecating them?

I'll ask the MPICH team what they are doing.

@janjust
Copy link
Contributor

janjust commented Oct 4, 2023

@awlauria I think I agree with @qkoziol , I think we should go with the adopt it in the new version. Seems like the simplest to move forward. But I would like to know what MPICH is doing here as well.

@edgargabriel
Copy link
Member

In my opinion, if Open MPI 5.0.0 claims to be compliant we MPI 4.0, we have to use the interfaces in that version of the document. If MPI 4.1 or 5.0 changes the interfaces, subsequent versions of Open MPI can be adjusted.

@qkoziol
Copy link
Contributor

qkoziol commented Oct 4, 2023

In my opinion, if Open MPI 5.0.0 claims to be compliant we MPI 4.0, we have to use the interfaces in that version of the document. If MPI 4.1 or 5.0 changes the interfaces, subsequent versions of Open MPI can be adjusted.

The errata will be on MPI 4.0

@jsquyres
Copy link
Member

jsquyres commented Oct 4, 2023

I agree with @qkoziol: no need to maintain ABI with prior 5.0.0rc's.

@edgargabriel Since it's an errata to MPI-4.0, it takes effect immediately (i.e., it doesn't have to wait until 4.1).

Question: is the MPI Forum completely deleting the old APIs, or are they deprecating the old APIs?

@awlauria
Copy link
Contributor

awlauria commented Oct 4, 2023

I agree with all of this (no abi expectation with prior rcs/adopt the new version/etc.)

@qkoziol
Copy link
Contributor

qkoziol commented Oct 4, 2023

@awlauria The old way is published in MPI-4.0. So it is the law of the land, and has been for a little while now. If the errata gets passed in the next meeting, it is supposed to take effect immediately (i.e., not have to wait for a formal MPI-4.1 release).
I guess the question is: are there any real apps out there that are using the old APIs?
E.g., what's MPICH doing about this?
Also: I didn't read what the Forum is doing: are they killing the old APIs entirely, or deprecating them?

I'll ask the MPICH team what they are doing.

Ken R. mentioned (on Slack) that MPICH is leaning toward adopting the new versions once they are voted in.

@edgargabriel
Copy link
Member

Do we have a handle on how significant the required code change would be? And has somebody volunteered to do the work :-)

@jsquyres
Copy link
Member

jsquyres commented Oct 5, 2023

Ken R. mentioned (on Slack) that MPICH is leaning toward adopting the new versions once they are voted in.

I talked with @raffenet further on Slack about this. We both agree that it would be best if both Open MPI and MPICH did the same thing. As @qkoziol mentioned, MPICH is leaning towards adopting the new APIs; @raffenet clarified for me that this also does, indeed, mean ditching the old APIs. I think that Open MPI should do the same -- i.e., just migrate to the new APIs (I haven't read the issue in detail, but I can't imagine it's much work) and ditch the old APIs.

Doing so in v5.0.0 means we won't have to carry a legacy, invalid APIs forward in future releases.

@raffenet and I also both agreed that either of us changes our mind and decide to support both the old and the new API, we'd let each other know ASAP, because -- as stated above -- it would be best for users if Open MPI and MPICH did the same thing here.

@bwbarrett
Copy link
Member

I strongly object to this being a blocker for 5.0.0. There is no action we can take until the forum issues the errata, and we can't be beholden to their timelines.

@devreal
Copy link
Contributor

devreal commented Oct 5, 2023

I tend to agree with @bwbarrett. This may be radical but: should we remove the API and reintroduce it when the errata is out, post OMPI 5.0.0? The current implementation (using isend/irecv) does not provide any benefit over application-side isend/irecv and so does not deliver what the API promises (lower overheads in multi-threaded environments). By holding back on the API until the errata has been voted on we would a) avoid an ABI break (which IMHO would be worse than delaying its introduction by one minor version); and b) give us (whoever that 'us' is) time to implement something that actually adds value for users.

@qkoziol
Copy link
Contributor

qkoziol commented Oct 5, 2023

I tend to agree with @bwbarrett. This may be radical but: should we remove the API and reintroduce it when the errata is out, post OMPI 5.0.0? The current implementation (using isend/irecv) does not provide any benefit over application-side isend/irecv and so does not deliver what the API promises (lower overheads in multi-threaded environments). By holding back on the API until the errata has been voted on we would a) avoid an ABI break (which IMHO would be worse than delaying its introduction by one minor version); and b) give us (whoever that 'us' is) time to implement something that actually adds value for users.

I was going to suggest removing the old APIs also. I believe this would be the least disruptive for users.

I also agree with @bwbarrett - we should not hold the 5.0 release for this.

@jsquyres
Copy link
Member

jsquyres commented Oct 5, 2023

For context, note that we're specifically talking about 2 APIs here:

  1. MPI_PSEND_INIT
  2. MPI_PRECV_INIT

I did a little digging this morning and spoke to @wesbland to clarify some details about this situation; let me throw a few data points in here:

  1. MPI-4.0 has the count argument to these 2 APIs as MPI_Count in all C and Fortran bindings.
  2. Open MPI main and v5.0.x have the following:
    • The C binding has the count argument as type MPI_Count
    • The mpi_f08 binding has the count argument as type INTEGER (according to MPI-4.0, this is wrong)
    • The other Fortran bindings has the count argument as type INTEGER (according to MPI-4.0, this is wrong)
  3. The Forum has an outstanding ticket for MPI-4.1 (MPI_Psend_init_c & MPI_Precv_init_c mpi-forum/mpi-issues#765) that does the following:
    • Change the C bindings MPI_P(send|recv)_init to take an int argument for count
    • Add new C bindings MPI_P(send|recv)_init_c to take an MPI_Count argument for count
    • Add an F08 overloaded MPI_P(send|recv)_init interface in the mpi_f08 module to take an INTEGER argument for count (in addition to the MPI-4.0 INTEGER(KIND=MPI_COUNT_KIND) binding)
    • Change the other Fortran interfaces (mpif.h and use mpi) MPI_P(send|recv)_init to take an INTEGER argument for count
    • This ticket will be voted on at the Forum meeting at the end of this month. I'm not sure why MPI_Psend_init_c & MPI_Precv_init_c mpi-forum/mpi-issues#765 is closed (possibly because its corresponding pull request was merged in preparation for the final MPI-4.1 meeting?), but @wesbland tells me that the issue is scheduled for voting at the Oct 30 Forum meeting.
    • Also in that meeting, the Forum will be having their final vote on MPI-4.1 itself. Hence, these bindings changes could get voted in and also be part of the final MPI-4.1 release.
  4. There is no trackable information at the moment about whether an MPI-4.0 errata will be created for this. If the vote passes and the change is put into MPI-4.1, it is likely that a corresponding 4.0 errata will be created for this issue. But no one can 100% promise this until/if it actually happens.

All this being said: OMPI has a bug that should be fixed before v5.0.0 (inconsistency between C and Fortran bindings). I think we have a choice as to how to fix it:

  1. Make all the bindings be like they are in MPI-4.0.
    • This seems like a bad idea, because we may (likely: will) have to change these bindings to agree with MPI-4.1.
  2. Make all the bindings as they are in MPI_Psend_init_c & MPI_Precv_init_c mpi-forum/mpi-issues#765 / https://github.com/mpi-forum/mpi-standard/pull/888.
    • IMHO, this feels safe to do (the vote is almost certainly going to pass), but it is technically "wrong" in that the changed bindings are not actually officially blessed yet.
    • And there is always a small chance that the vote will fail and the Forum will not include the changed bindings in MPI-4.1.
  3. @devreal's proposal of removing the P(SEND|RECV)_INIT APIs out of v5.0.0 (and potentially re-adding them in v5.0.x or later).
    • Just removing these 2 APIs does feel like the safest thing to do. With this proposal, OMPI v5.0.0 will definitely not have anything wrong.
    • We can someday do a 5.0.x release and re-introduce the APIs with whatever the result of the October 2023 MPI Forum meeting is.

Since there is a demonstrable bug in Open MPI v5.0.x with these bindings right now, I think we need to choose 1, 2, or 3 before v5.0.0.

@qkoziol
Copy link
Contributor

qkoziol commented Oct 5, 2023

Since there is a demonstrable bug in Open MPI v5.0.x with these bindings right now, I think we need to choose 1, 2, or 3 before v5.0.0.

Good info, thanks.

I still lean toward #3 (remove the APIs and re-add after the errata)

@cniethammer
Copy link
Contributor Author

Many thanks to Jeff. For me option 3 looks like the best way.

@lrbison
Copy link
Contributor

lrbison commented Oct 5, 2023

I agree with removing the APIs until after errata is issued as well.

Very complete summary Jeff. Thank you for taking the time to write it up.

@edgargabriel
Copy link
Member

I am also voting for option 3

@wesbland
Copy link
Contributor

wesbland commented Oct 5, 2023

I'm not sure why mpi-forum/mpi-issues#765 is closed (possibly because its corresponding pull request was merged in preparation for the final MPI-4.1 meeting?)

That's correct. The last meeting before a new version of the standard is kinda weird this way.

Also in that meeting, the Forum will be having their final vote on MPI-4.1 itself. Hence, these bindings changes could get voted in and also be part of the final MPI-4.1 release.

I won't say for sure that it will happen, but it's likely that this will be true.

There is no trackable information at the moment about whether an MPI-4.0 errata will be created for this. If the vote passes and the change is put into MPI-4.1, it is likely that a corresponding 4.0 errata will be created for this issue. But no one can 100% promise this until/if it actually happens.

I believe @Wee-Free-Scot has been pushing for this hard, but I'm not sure if that's been agreed to or not.


Just to enumerate the possibilities from the side of the forum, possibilities for mpi-forum/mpi-issues#765 are:

  1. Pass the reading and vote (3/4 majority) and be part of MPI 4.1, which would be completed at the same meeting and published shortly after.
  2. Fail the reading and have MPI 4.1 be published without this fix.
  3. MPI 4.1 isn't published and it doesn't matter.

@jsquyres
Copy link
Member

jsquyres commented Oct 5, 2023

I also vote for option 3 (remove the APIs for a v5.0.0 release).

Who can make a PR to do this? Remember to remove the interfaces in both C and all the Fortran interfaces.

@awlauria
Copy link
Contributor

awlauria commented Oct 5, 2023

I also vote for option 3 (remove the APIs for a v5.0.0 release).

Who can make a PR to do this? Remember to remove the interfaces in both C and all the Fortran interfaces.

Yes, option 3.

Also, any docs with these apis will need to be updated (or removed?)

@jsquyres
Copy link
Member

jsquyres commented Oct 5, 2023

Also, any docs with these apis will need to be updated (or removed?)

Yes, their man pages should probably be removed, as well. Or possibly edited to say something about our plan to add these APIs after the October 2023 MPI Forum meeting yadda yadda yadda.

@Wee-Free-Scot
Copy link

I believe @Wee-Free-Scot has been pushing for this hard, but I'm not sure if that's been agreed to or not.

Issue mpi-forum/mpi-issues#765 / PR https://github.com/mpi-forum/mpi-standard/pull/888 must follow the errata procedure in the MPI Forum to get the right level of voting threshold, but is actually a late-breaking during-prep-of-the-release-candidate change to MPI-4.1 itself.

I have advocated that we should treat the extant Issue & PR as an errata for MPI-4.0, as well as a change to MPI-4.1, because that massively simplifies the story for these APIs: they were never wrong, your memories of them being different are a spillover from a deleted timeline, there is no spoon/incompatibility. To be clear, I have not volunteered to create that errata entry. IIRC, it is usually a task for the document editor to add items into the errata document for an already-released document version (once the vote has passed, of course).

@hppritcha hppritcha self-assigned this Oct 10, 2023
@hppritcha
Copy link
Member

At the RM meeting yesterday we decided to go forward with the following for MPI 5.0.x: keep the existing MPI_PSEND_INIT/MPI_PRECV_INIT 'c' interfaces that are in 5.0.x but rename them MPIX. For the Fortran fixes do the same and also fix (they aren't implemented correctly per the MPI 4.0 standard). Also rename the accompany man pages and add a blurb about why they are MPIX prefixed.

hppritcha added a commit to hppritcha/ompi that referenced this issue Oct 10, 2023
Related to open-mpi#11967

The MPI Forum decided to change the API for partitioned communication subroutines MPI_Psend_init and MPI_Precv_init
as part of MPI 4.1.  This has generated quite a bit of dicussion, see mpi-forum/mpi-issues#765.

Rather than release a 5.0.0 with a set of functions which will be redefined if the current MPI 4.1-rc is ratified and
we want OMPI to be 4.1 compliant, just rename the existing MPI_Psend_init and MPI_Precv_init to MPIX_ prefixed.

This commit also fixes a bug with the Fortran interfaces.  They weren't compliant with the MPI 4.0 definitions in any case.

Signed-off-by: Howard Pritchard <[email protected]>
@jsquyres
Copy link
Member

Further information from after the MPI Forum Virtual meeting earlier today to discuss exactly this issue: the errata proposal to change MPI_PSEND|RECV_INIT count parameter type has been withdrawn from consideration from MPI-4.1. See mpi-forum/mpi-issues#765 (comment).

Hence, this issue is now moot for Open MPI.

@jsquyres
Copy link
Member

Fixing the OMPI Fortran bindings on main and v5.0.x to adhere to MPI-4.0 has been moved to a new issue: #11982

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests