Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_Comm_create_from_group and friend - need clarification of errhandler argument #511

Closed
hppritcha opened this issue Jul 19, 2021 · 11 comments
Assignees
Labels
chap-contexts Groups, Contexts, Communicators, Caching Chapter Committee errata Errata items for the previous MPI Standard had reading Completed the formal proposal reading mpi-4.1 For inclusion in the MPI 4.1 standard passed final vote Passed the final formal vote wg-sessions Sessions Working Group

Comments

@hppritcha
Copy link

hppritcha commented Jul 19, 2021

Problem

The current wording in the MPI 4.0 standard concerning the errhandler argument to MPI_Comm_create_from_group reads

The errhandler argument specifies an error handler to be attached to the new intracommunicator.
This error handler will also be invoked if the
MPI_COMM_CREATE_FROM_GROUP function encounters an error.

Although this text is adequate for describing how the supplied errhandler is used, it does not describe what constitutes a valid error handler argument.

Also, there is currently no text describing the errhandler argument for MPI_Intercomm_create_from_groups so as part of the changes to the wording for MPI_Comm_create_from_group, the text should also be added to the description below MPI_Intercomm_create_from_groups.

Proposal

There are several possibilities for improving this text. Some options:

  1. A very minor change would be to add valid as adjective preceding the error handler noun in the existing sentence.

  2. Another option would be to add text that explicitly states that the error handler argument must be either a predefined error handler, or one created using MPI_Comm_create_errhandler - basically the text describing the errhandler argument to MPI_Comm_set_errhandler.

  3. Yet another option would be to add text that explicitly states that the error handler argument must be either a predefined error handler, or one created using MPI_Comm_create_errhandler, or MPI_ERRHANDLER_NULL. If MPI_ERRHANDLER_NULL is supplied as an argument, the initial error handler attached to the communicator if the operation completes successfully.

Changes to the Text

Wait till we decide on which option above, or other, to take.

Impact on Implementations

Impact on implementations should be limited to potential changes in checking the supplied error handler validity.

Impact on Users

Should help clarify whether or not MPI_ERRHANDLER_NULL is valid - which is what brought this issue to our attention.

References and Pull Requests

Fixed by: https://github.com/mpi-forum/mpi-standard/pull/629

@hppritcha hppritcha added chap-contexts Groups, Contexts, Communicators, Caching Chapter Committee errata Errata items for the previous MPI Standard mpi-4.1 For inclusion in the MPI 4.1 standard wg-sessions Sessions Working Group labels Jul 19, 2021
@hppritcha hppritcha self-assigned this Jul 19, 2021
@Wee-Free-Scot
Copy link

There's text in §9.3.1 that needs adjusting too. In particular, p462 lines 3-7, the advice to users.

@Wee-Free-Scot Wee-Free-Scot self-assigned this Jul 21, 2021
@Wee-Free-Scot
Copy link

IMHO, option 1 is a ticket 0/chapter committee change that does not adequately resolve the ambiguity and so should be discarded, but options 2 and 3 are errata because selection between them requires a functionality choice -- is MPI_ERRHANDLER_NULL a valid input value here?

@Wee-Free-Scot Wee-Free-Scot added this to the September 2021 milestone Aug 17, 2021
@Wee-Free-Scot
Copy link

I suggest we create a PR that does option 3 (the most general option) and read it at the Sept 2021 meeting. If there is pushback, we can re-work the PR to do option 2 instead in time for the Dec 2021 meeting.

@wesbland wesbland added the scheduled reading Reading is scheduled for the next meeting label Aug 18, 2021
@abouteiller
Copy link
Member

One question is: when an error happens during the COMM_CREATE_FROM_GROUP function, and the errhandler is a user-supplied function, what should be the value of the comm argument to that errhandler function?

We can see two different rationale that needs further exploration

  1. The communicator is not yet formed properly, so we will pass-in MPI_COMM_NULL
  2. Despite the communicator being not yet formed properly, we already have the group, so we can create locally a communicator on which MPI_COMM_RANK/SIZE works (but not much else), that may be useful to end-users?
  3. Don't specify, implementors do YOLO

A peculiarity is that the handler is used for both creation (this procedure) and usage (future MPI_RECV); according to the law of surprise, it is to be expected that end-users will more often than not write buggy code in their handler that doesn't account for the creation case. Getting MPI_COMM_NULL may give the opportunity for the end-user to discriminate between the cases of erroring during creation vs during use of the comm.

@Wee-Free-Scot
Copy link

Thanks @abouteiller. We will take this back to the Sessions WG and address this question before bringing the proposal back to the Forum.

@hppritcha hppritcha removed the scheduled reading Reading is scheduled for the next meeting label Dec 9, 2021
@wesbland wesbland added the scheduled reading Reading is scheduled for the next meeting label May 10, 2022
@wesbland wesbland modified the milestones: December 2021, May 2022 May 10, 2022
@wesbland
Copy link
Member

This issue will be split into two so the PRs attached to this can refer to one issue each.

cc: @abouteiller

@wesbland wesbland moved this to In Progress in MPI 4.1 Jul 5, 2022
@wesbland wesbland added this to MPI 4.1 Jul 5, 2022
@wesbland wesbland modified the milestones: May 2022, September 2022 Sep 12, 2022
@wesbland
Copy link
Member

wesbland commented Oct 3, 2022

This issue had a "no-no" vote on 2022-09-29, which passed:

Yes No Abstain
27 0 2

@wesbland
Copy link
Member

wesbland commented Oct 3, 2022

This issue had an errata vote on 2022-09-30, which passed:

Yes No Abstain
29 0 0

@wesbland wesbland moved this from In Progress to Passed 2nd Vote in MPI 4.1 Oct 3, 2022
@wesbland wesbland added passed final vote Passed the final formal vote had reading Completed the formal proposal reading and removed scheduled vote scheduled reading Reading is scheduled for the next meeting labels Oct 3, 2022
@wgropp wgropp closed this as completed Oct 3, 2022
Repository owner moved this from Passed 2nd Vote to Merged in MPI 4.1 Oct 3, 2022
@abouteiller
Copy link
Member

There has been some confusion with this issue due to the fact that it tracked 2 different PRs from when the original issue got split into to different directions.

What has been read in may, no-no voted and voted-in as an errata in Sept. 2022 is https://github.com/mpi-forum/mpi-standard/pull/644

I thought we had already voted-in #629 as an errata earlier, but cannot find track record of voting on https://github.com/mpi-forum/mpi-standard/pull/629, Wesley could you kindly verify?

@wesbland
Copy link
Member

I see. That's my confusion then as issues and PRs should be 1:1 to avoid this kind of confusion (and is the reason I put this comment above at the time. I should have double checked that history before the last meeting. I always put voting records on issues, not PRs. If I look at the email that @hppritcha sent before the September 2022 meeting, I see this listed as a reading request:

expansion of error handler association to cover sessions (and clarifications)
https://github.com/mpi-forum/mpi-issues/issues/511
https://github.com/mpi-forum/mpi-standard/pull/629

So that means mpi-forum/mpi-standard#629 was read, voted on, and merged this meeting (which is reflected by the fact that it was merged).

When I go back to your email from May, it says this:

* expansion of error handler association to cover sessions (and clarifications)
 * https://github.com/mpi-forum/mpi-standard/pull/644

Which is unfortunately inconclusive. The voting record also doesn't have anything about a no-no or errata vote on this issue. At the time, I would have gone to the PR, seen which issue it was linked with, and put that one on the agenda. So based on that, I can tell what should have been merged and what shouldn't have.

Are you saying that what we voted on at the September meeting should have covered both mpi-forum/mpi-standard#644 and mpi-forum/mpi-standard#629? If so, that was not at all clear and IMO, people would not have known what they were voting on at the time given that we only give out issue numbers on the agenda/voting list/voting emails.

So with all that said, my suggestion is that since we only discussed #511 / https://github.com/mpi-forum/mpi-standard/pull/629 at this meeting, those should stay merged as they currently are. This issue should be closed. At the next meeting, we should discuss https://github.com/mpi-forum/mpi-standard/pull/644 and it should be connected to a new issue.

What are your thoughts @schulzm?

@abouteiller
Copy link
Member

I see, thanks for clarifying the history. I got confused myself as I thought that the announce for 629 was actually for 644. I'll bring up 644 again next time with an independent issue then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chap-contexts Groups, Contexts, Communicators, Caching Chapter Committee errata Errata items for the previous MPI Standard had reading Completed the formal proposal reading mpi-4.1 For inclusion in the MPI 4.1 standard passed final vote Passed the final formal vote wg-sessions Sessions Working Group
Projects
No open projects
Status: Done
Development

No branches or pull requests

4 participants