Skip to content

2022 01 31 webex j ftwg

Howard Pritchard edited this page Jan 31, 2022 · 1 revision

#01/31/22 webex notes for joint FT/Sessions WGs meeting

Attending: Howard Pritchard, Thomas Hines, Trupeshkumar Patel, Aureien Bouteiller Isais Urena, Martin Schulz, Ignacio Laguna, Grace Nansamba

Agenda items

PR 644

Aurelian rewrote part of the terms to remove "the associated operation has completed". Dan wasn't present today so can't make his points clear.

General discussion

How are error handlers handled in Sessions. Does Sessions obey the initial error handler? Yes. Not changed from World model and the initial error handler. Have to supply an error handler as part of session init. This error handler gets invoked via a degree of indirection for the group from session pset function. Note this requires wording in 644.

Discuss situation of how errors are handled before a communicator is created. All failures are local until a communicator is created to connect the processes. Howard gives example of using world model but before MPI_Init is called. Should double check verbiage around initial error handler and how that works.

How do we look at communicator objects when there are failures. In ULFM they have the shrink capability. Want to move to a more generic approach that allows for growing. Need to have an operation on a session handle to recover. These would need to be collective in nature (Aurelian). Howard talks about the throwing away model and starting over. If one did need to revoke ULFM style that implies that something will go wrong with group management based on groups created from group_from_session_pset.

Discuss ULFM PR https://github.com/mpi-forum/mpi-standard/pull/13 in this context.

TODOS

  • Should double check verbiage around initial error handler and how that works.
Clone this wiki locally