-
Notifications
You must be signed in to change notification settings - Fork 1
2022 03 14 webex j ftwg
#03/14/22 meeting notes for joint FT/Sessions WGs meeting
Attending: Howard Pritchard, Thomas Hines, Trupeshkumar Patel, Aureien Bouteiller, Martin Schulz, Grace Nansamba
- Followup to discussions at MPI Forum wrt aggreement/consensus
Text added from miro document
Notes from 3/14/22 zoom meeting
When user queries they need to know what version of a given process set they are getting from the runtime. And then there needs to be a mechanism for processes to agree on a given epoch(s). This is needed even outside of FT handling. ULFM agreement - surviving processes come to agreement on who is still alive. Growing - not so sure about whether this covers all cases. Which processes would take place in the fence-like agreement operation? Should this include any new processes in addition to the original processes. Maybe a subset of the processes. Locality of some pset operations desirable - want to retain that property.
Discussion of how ULFM approach implements consensus method. Agreement doesn't need to be blocking. Hidden inside MPI_Comm_shrink.
Perhaps have fencing mechanism optional, or maybe a fast version (I'm feeling lucky) and one that is more robust in the face of errors.
Discuss options on mpiexec cmd line to control the FT model. Could be used to influence behavior of sessions related consensus functions, failure modes, etc.
Discussion of distributed use of process sets vs the master process model employed in D.'s thesis.
Epochs in process set names - no we discussed this before and it leads to problems.
A fence operation to make sure everyones' group_from_pset_name call really return the same group. Need to be able to do this fence possibly over a subset of processes included in the process set. an example where this might be needed is if a subset of processes want to spawn additional processes without involvement of all processes in the process set.
Have to know ahead of time who will be fencing to use this approach for consensus.
Back to versions - moves the problem down a level.