Skip to content

sessions_cheat_sheet

Howard Pritchard edited this page Apr 9, 2020 · 28 revisions

This is a bit obsolete and needs to be updated

MPI Sessions Cheat Sheet

This wiki gives an overview of the MPI Sessions API as currently proposed, and how to use Sessions to create communicators and hence do all the great things that the current MPI API allows one to do, once one has a communicator.

General Scheme

The figure below illustrates the general scheme for using an MPI Session:

  • First one creates a Session using MPI_Sesson_init, which returns a Session handle to the application;
  • This Session handle can then be used to query the runtime using the MPI_Session_get_names method. This returns a NULL terminated array of strings representing names of available process sets. Currently the WG is favoring the use of URI-style format, e.g. mpi://WORLD being the name of the process set corresponding to MPI_COMM_WORLD (in a pre-Sessions MPI world).
  • A MPI group can be instantiated from a Session handle and one of the process set names returned from MPI_Session_get_names using the MPI_Group_create_from_session method:
  • The application can then use the MPI_Comm_create_from_group method to obtain a communicator.

Process Sets

Process sets are the mechanism for MPI applications to query the runtime. Each process set has a unique set name. In the current scheme, set names have a URI format. Two process sets are mandated:

mpi://WORLD
mpi://SELF

and maybe

mpi://UNIVERSE

Many additional process sets may be defined by the runtime, e.g.

location://rack/19
network://leaf-switch/37
arch://x86_64
application://redis-server/5

Mechanisms for defining process sets, and how system resources are assigned to these sets is currently assumed to be runtime implementation dependent.

A process set caches key/value tuples which an application can access using MPI_Session_get_info, and subsequent queries of the returned info object using existing MPI info object methods. The size key is mandatory for all process sets.

Are MPI_Init and MPI_Finalize needed?

This is a whole different discussion and will be written up in a separate wiki. [Spoiler - needed: no, supported: yes.]

Starting new Processes

This is also a whole different discussion and will be written up in a separate wiki.

API in a nutshell

Initialization/Finalization

MPI_Session_init(
INOUT MPI_Flags *flags,
IN MPI_Info info,
IN MPI_Errhandler errhandler, (we need something else here since err handlers now on specific object types)
OUT MPI_Session *session)

This function initializes a Session and returns the associated Session handle. The flags argument is currently thought of as a place where the application can request capabilities it would like to have for MPI objects associated with the Session, and as output what the implementation can provide. Right now this is what we have for possible flags:

MPI_FLAG_THREAD_NONCONCURRENT_SINGLE
MPI_FLAG_THREAD_NONCONCURRENT_FUNNELED
MPI_FLAG_THREAD_NONCONCURRENT_SERIALIZED
MPI_FLAG_THREAD_CONCURRENT

The info argument can be used for specifying the level of thread safety required for the Session, and possibly other MPI implementation specific resource and functionality requirements. The errhandler argument specifies an error handler to invoke in the event that the Session initialization call encounters an error. Session initialization is intended to be a lightweight operation. A single process may initialize multiple Sessions. MPI_Session_init is always thread safe; multiple threads within an application may invoke it concurrently.

MPI_Session_finalize(
INOUT MPI_Session *session)

This function is the Session equivalent of MPI_Finalize. It can block waiting for destruction of objects derived from the Session handle. Every initialized Session must be finalized using MPI_Session_finalize.

Runtime Query Functions

MPI_Session_get_names(
IN MPI_Session session,
OUT char **set_names)

This function is used to query the runtime for the names of available process sets. The names are returned as a NULL terminated array of strings. The caller is responsible for freeing the returned array of strings.

MPI_Session_get_info(
IN MPI_Session session,
IN const char *set_name,
OUT MPI_Info *info)

This function is used to query properties of a specific process set. The returned info object can in turn be queried with existing MPI info object query functions.

Group and Communicator Management Functions

MPI_Group_create_from_session_name(
IN MPI_Session session,
IN const char *set_name,
OUT MPI_Group *group);

This function can be used to create an MPI group given an input Session handle and a set name. The existing MPI_Comm_create_group function may be subsequently used to create an MPI communicator.

MPI_Create_comm_from_group(
IN MPI_Group group,
IN const char *uri, // for matching (see next slide)
IN MPI_Info info,
IN MPI_Errhandler errhander,
OUT MPI_Comm *comm)

This function is proposed as an alternative way to create a MPI communicator from a MPI group. The tag argument allows the MPI implementation to discriminate between potentially concurrent calls by the application to create multiple MPI communicators using the same supplied group. The function also allows for an alternate errhandler to be invoked if the MPI_Create_comm_from_group method encounters an internal error. Communicators derived from a static process set will have the same local rank, regardless of the session with which the communicator is associated.

Additional functions for creating communicators from MPI groups that the WG is proposing are:

MPI_Create_cart_comm_from_group(
IN MPI_Group group,
IN const char *uri,
IN MPI_Info info,
IN MPI_Errhandler errhander,
IN int ndims,
IN const int dims[],
IN const int periods[],
IN int reorder,
OUT MPI_Comm *comm)
MPI_Create_graph_comm_from_group(…)
MPI_Create_dist_graph_comm_from_group(…)
MPI_Create_dist_graph_adjacent_comm_from_group(…)

and

MPI_Create_intercomm_from_group(
IN MPI_Group local_group,
IN int local leader,
IN MPI_Group remote_group,
IN int remote_leader,
IN const char *uri,
IN MPI_Info info,
IN MPI_Errhandler errhander,
OUT MPI_Comm *comm)

But wait, there's more. While the WG was at it, the following were also thrown in to the mix

MPI_Create_win_from_group(
IN MPI_Group group,
IN void *base,
IN MPI_Aint size,
IN int disp_unit,
IN const char *uri,
IN MPI_Info info,
IN MPI_Errhandler errhander, // do we want this?
OUT MPI_Win *win)

and

MPI_Create_file_from_group(
IN MPI_Group group,
IN const char *filename,
IN int amode,
IN const char *uri, // necessary/desirable?
IN MPI_Info info,
IN MPI_Errhandler errhander, // do we want this?
OUT MPI_File *file)

Maybe all these new communicator from group functions could be handled as a separate proposal.

Additional notes

Within a single MPI process:

  • objects derived from Session A cannot be used to communicate with objects derived from Session B
  • Cannot have requests from different Sessions in a single call to the array TEST/WAIT functions
Clone this wiki locally