Comments on the MPI standard should be mailed to mpi-core@mpi-forum.org. Page and line numbers refer to the official MPI-2 document, not the HPCA issue or the 2nd edition of the Complete Reference.
Useful links:
This needs more discussion. The problem is that some of the C++
datatypes have no easily defined counterparts in C or Fortran. The
minimum fix is really a clarification that says that there is no
interlanguage support for the C++ complex types.
Proposed text
MPI-2, page 276, after line 4, addExtending the C++ datatypes to C and Fortran needs to include MPI::BOOL as well as the complex types, and should define what the equivalent types are in C and Fortran. The real issue here is the MPI:F_COMPLEX and completing the list of such routines.
Advice to users.
Most but not all datatypes in each language have corresponding datatypes in other languages. For example, there is no C or Fortran counterpart to the MPI::BOOL or the the MPI::COMPLEX, MPI::DOUBLE_COMPLEX, or MPI:LONG_DOUBLE_COMPLEX. End of advice to users.
MPI-2, page 164, line 16-30 should read:
7.3.5. Generalized All-to-all Functions
One of the basic data movement operations needed in parallel signal processing is the 2-D matrix transpose. This operation has motivated two generalizations of the MPI_ALLTOALLV function. These new collective operations are MPI_ALLTOALLW and MPI_ALLTOALLX; the ``W'' indicates that it is an extension to MPI_ALLTOALLV, and ``X'' indicates that it is an extension to MPI_ALLTOALLW. MPI_ALLTOALLX is the most general form of All-to-all. Like MPI_TYPE_CREATE_STRUCT, the most general type constructor, MPI_ALLTOALLW and MPI_ALLTOALLX allow separate specification of count, displacement and datatype. In addition, to allow maximum flexibility, the displacement of blocks within the send and receive buffers is specified in bytes. In MPI_ALLTOALLW, these displacements are specified as integer arguments and in MPI_ALLTOALLX they are specified as address integer.Rationale. The MPI_ALLTOALLW function generalizes several MPI functions by carefully selecting the input arguments. For example, by making all but one process have sendcounts[i] = 0, this achieves an MPI_SCATTERW function. MPI_ALLTOALLX allows the usage of MPI_BOTTOM as buffer argument and defining the different buffer location via the displacement arguments rather than only via different datatype arguments. (End of rationale.)
Add to page 165, after line 38:
MPI_ALLTOALLX(sendbuf, sendcounts, sdispls, sendtypes, recvbuf, recvcounts, rdispls, recvtypes, comm)
- [ IN sendbuf]
- starting address of send buffer (choice)
- [ IN sendcounts]
- integer array equal to the group size specifying the number of elements to send to each processor (array of integers)
- [ IN sdispls]
- integer array (of length group size). Entry j specifies the displacement in bytes (relative to sendbuf) from which to take the outgoing data destined for process j (array of integers)
- [ IN sendtypes]
- array of datatypes (of length group size). Entry j specifies the type of data to send to process j (array of handles)
- [ OUT recvbuf]
- address of receive buffer (choice)
- [ IN recvcounts]
- integer array equal to the group size specifying the number of elements that can be received from each processor (array of integers)
- [ IN rdispls]
- integer array (of length group size). Entry i specifies the displacement in bytes (relative to recvbuf) at which to place the incoming data from process i (array of integers)
- [ IN recvtypes]
- array of datatypes (of length group size). Entry i specifies the type of data received from process i (array of handles)
- [ IN comm]
- communicator (handle)
int MPI_Alltoallx(void *sendbuf, int sendcounts[], MPI_Aint sdispls[], MPI_Datatype sendtypes[], void *recvbuf, int recvcounts[], MPI_Aint rdispls[], MPI_Datatype recvtypes[], MPI_Comm comm) MPI_ALLTOALLx(SENDBUF, SENDCOUNTS, SDISPLS, SENDTYPES, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPES, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNTS(*), SENDTYPES(*), RECVCOUNTS(*), RECVTYPES(*), COMM, IERROR INTEGER (KIND=MPI_ADDRESS_KIND) SDISPLS(*), RDISPLS(*) void MPI::Comm::Alltoallx(const void* sendbuf, const int sendcounts[], const MPI::Aint sdispls[], const MPI::Datatype sendtypes[], void* recvbuf, const int recvcounts[], const MPI::Aint rdispls[], const MPI::Datatype recvtypes[]) const = 0Add to page 312, after line 37:
int MPI_Alltoallx(void *sendbuf, int sendcounts[], MPI_Aint sdispls[], MPI_Datatype sendtypes[], void *recvbuf, int recvcounts[], MPI_Aint rdispls[], MPI_Datatype recvtypes[], MPI_Comm comm)Add to page 322, after line 45:
MPI_ALLTOALLx(SENDBUF, SENDCOUNTS, SDISPLS, SENDTYPES, RECVBUF, RECVCOUNTS, RDISPLS, RECVTYPES, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNTS(*), SENDTYPES(*), RECVCOUNTS(*), RECVTYPES(*), COMM, IERROR INTEGER (KIND=MPI_ADDRESS_KIND) SDISPLS(*), RDISPLS(*)Add to page 335, after line 19:
void MPI::Comm::Alltoallx(const void* sendbuf, const int sendcounts[], const MPI::Aint sdispls[], const MPI::Datatype sendtypes[], void* recvbuf, const int recvcounts[], const MPI::Aint rdispls[], const MPI::Datatype recvtypes[]) const = 0
Proposed change
MPI-2, page 79, Line 11 isbut should beMPI_UNPACK_EXTERNAL (datarep, inbuf, incount, datatype, outbuf, outsize, position)MPI_UNPACK_EXTERNAL (datarep, inbuf, insize, position, outbuf, outcount, datatype)
MPI-2, page 337, line 31-32 readsbut should readbool MPI::Win::Get_attr(const MPI::Win&win, int win_keyval, void* attribute_val) constbool MPI::Win::Get_attr(int win_keyval, void* attribute_val) const
Status: The authors of the discussion are asked for a proposal. (Pending)
MPI_Scan Example
The example of MPI_Scan in the MPI 1.1 Standard on page 128,
line 11, has an extraneous root argument. That line should be
MPI_Scan( a, answer, 1, sspair, myOp, comm );
This could be added to section 3.2.10 (Minor Corrections) in the MPI 2
document.
MPI-2, page 179, lines 4-5 change
Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and MPI_COMM_PARENT will have the default of MPI_COMM_WORLD, MPI_COMM_SELF, and MPI_COMM_PARENT.to
Thus, the names of MPI_COMM_WORLD, MPI_COMM_SELF, and the communicator returned by MPI_COMM_GET_PARENT (if not MPI_COMM_NULL) will have the default of MPI_COMM_WORLD, MPI_COMM_SELF, and MPI_COMM_PARENT.MPI-2, page 94, line 3-5, change
to* The manager is represented as the process with rank 0 in (the remote * group of) MPI_COMM_PARENT. If the workers need to communicate among * themselves, they can use MPI_COMM_WORLD.
* The manager is represented as the process with rank 0 in (the remote * group of) the parent communicator. If the workers need to communicate among * themselves, they can use MPI_COMM_WORLD.
MPI_IN_PLACE in description
of MPI_ALLGATHER and MPI_ALLGATHERV
On MPI-2.0, page 158, lines 25-31, remove the text
Specifically, the outcome of a call to
MPI_ALLGATHER in the "in place" case is as if all processes executed n calls to
MPI_GATHER( MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, recvbuf, recvcount,
recvtype, root, comm )
for root = 0, ..., n - 1.
On MPI-2.0, page 159, lines 23-28, remove the text
Specifically, the outcome of a call to
MPI_ALLGATHER in the "in place" case is as if all processes executed n calls to
MPI_GATHERV( MPI_IN_PLACE, 0, MPI_DATATYPE_NULL, recvbuf, recvcount,
displs, recvtype, root, comm )
for root = 0, ..., n - 1.
MPI-2, section 8.2, page 172 mentions MPI_REQUEST_CANCEL;
this should be MPI_CANCEL.
\mpifbind{MPI\_FILE\_GET\_VIEW(FH, DISP, ETYPE, FILETYPE, DATAREP,
IERROR)\fargs INTEGER FH, ETYPE, FILETYPE, IERROR \\
CHARACTER*(*) DATAREP, INTEGER(KIND=MPI\_OFFSET\_KIND) DISP}
to
\mpifbind{MPI\_FILE\_GET\_VIEW(FH, DISP, ETYPE, FILETYPE, DATAREP,
IERROR)\fargs INTEGER FH, ETYPE, FILETYPE, IERROR \\
CHARACTER*(*) DATAREP\\ INTEGER(KIND=MPI\_OFFSET\_KIND) DISP}
in io-2.tex. See MPI-2, page 223, line 19. (replace the comma after the
declaration of datarep)
\mpifbind{MPI\_TYPE\_CREATE\_HVECTOR(COUNT, BLOCKLENGTH, STIDE, OLDTYPE, NEWTYPE,
IERROR)\fargs INTEGER COUNT, BLOCKLENGTH, OLDTYPE, NEWTYPE,
IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) STRIDE}
to
\mpifbind{MPI\_TYPE\_CREATE\_HVECTOR(COUNT, BLOCKLENGTH, STRIDE, OLDTYPE, NEWTYPE,
IERROR)\fargs INTEGER COUNT, BLOCKLENGTH, OLDTYPE, NEWTYPE,
IERROR\\INTEGER(KIND=MPI\_ADDRESS\_KIND) STRIDE}
in misc-2.tex (see MPI-2, page 66, line 26) (replace STIDE with STRIDE).
The communicator argument is missing from the MPI communication calls in MPI 1.1 Example 3.12.
On MPI 1.1, page 43, line 47 and page 44 lines 1, 5, 8, 10, and 13,
the communicator argument comm must be added before the req argument.
The MPI_COMM_RANK call and several of the MPI_Wait
calls are missing the ierr argument.
The ierr argument must be added at the end of the argument list on
MPI 1.1, page 43, line 43, and page 44, lines 6 and 14.
The calls to MPI_Wait
are missing the ierr argument.
The ierr argument must be added at the end of the argument list on
MPI 1.1, page 44, lines 35 and 36.
The lines MPI 1.1, page 52 line 45, and page 53 line 17
should beIF (status(MPI_SOURCE) = 0) THEN
IF (status(MPI_SOURCE) .EQ. 0) THEN
The variable base should be declared
as MPI_Aint, not int, see MPI 1.1 page 80, line 2.
MPI::BOTTOM, as defined in the standard, conflicts with definition of receive buffers. One possible fix is to define it as
namespace MPI {
...
extern void *const BOTTOM;
}
Change MPI-2 page 343 lines 22-23
// Type: const void * MPI::BOTTOMto
// Type: void * const MPI::BOTTOM
Proposed change is to use strlen + 1 instead of
strlen in the MPI_Send call on MPI 1.1 page 16 line 33.
A LaTeX line break is needed at MPI 1.1, page 58, line 44, in section 3.9. The text should read:
be invoked in a sequence of the form,Create (Start Complete)* Free ,where * indicates zero or more repetitions. If the same communication ...
MPI-2.0, Sect. 4.1, page 37, remove line 44-46:
This is advice to implementors, rather than a required part of MPI-2. It is not suggested that this be the only way to start MPI programs. If an implementation does provide a command called mpiexec, however, it must be of the form described here.
Rationale for this remove:
It is largely a repetition of lines 34-36 (except the statement "It is not suggested that this be the only way..."):Instead, MPI specifies an mpiexec startup command and recommends but does not require it, as advice to implementors. However, if an implementation does provide a command called mpiexec, it must be of the form described below.
MPI-2.0, Sect. 4.10, page 43, line 34:
Replace "It consists of (key,value) pairs" by
"It stores an unordered set of (key,value) pairs"
MPI-2.0, Sect. 4.10, page 43, line 34:
Replace "A key may have only one value." by "A key can have only one value."
Rationale for these replacements: To emphasize that the info object is a kind of dictionary (data structure).
MPI-2.0, Sect. 4.11, page 49, add after line 21:
/* no memory is allocated */
MPI-2.0, Sect. 4.11, page 49, add after line 22:
/* memory allocated */
Rationale for this and the previous addition: to make consistent with Fortran example, add these comments.
MPI-2.0, Sect. 4.12.2, page 50, line 9:
Remove first "in" (typo).
MPI 2.0, Sect. 4.12.6, Exa. 4.12, page 55, line 21-22 read:
but should read:INTEGER TYPE, IERR INTEGER (KIND=MPI_ADDRESS_KIND) ADDR
INTEGER TYPE, IERR, AOBLEN(1), AOTYPE(1) INTEGER (KIND=MPI_ADDRESS_KIND) AODISP(1)
MPI 2.0, Sect. 4.12.6, Exa. 4.12, page 55, line 25-26 read:
but should read:CALL MPI_GET_ADDRESS( R, ADDR, IERR) CALL MPI_TYPE_CREATE_STRUCT(1, 5, ADDR, MPI_REAL, TYPE, IERR)
AOBLEN(1) = 5 CALL MPI_GET_ADDRESS( R, AODISP(1), IERR) AOTYPE(1) = MPI_REAL CALL MPI_TYPE_CREATE_STRUCT(1, AOBLEN,AODISP,AOTYPE, TYPE, IERR)
MPI 2.0, Sect. 4.12.10, Exa. 4.14, page 60, line 31-32 read:
but should read:INTEGER TYPE, IERR, MYRANK INTEGER (KIND=MPI_ADDRESS_KIND) ADDR
INTEGER TYPE, IERR, MYRANK, AOBLEN(1), AOTYPE(1) INTEGER (KIND=MPI_ADDRESS_KIND) AODISP(1)
MPI 2.0, Sect. 4.12.10, Exa. 4.14, page 55, line 35-36 read:
but should read:CALL MPI_GET_ADDRESS( R, ADDR, IERR) CALL MPI_TYPE_CREATE_STRUCT(1, 5, ADDR, MPI_REAL, TYPE, IERR)
AOBLEN(1) = 5 CALL MPI_GET_ADDRESS( R, AODISP(1), IERR) AOTYPE(1) = MPI_REAL CALL MPI_TYPE_CREATE_STRUCT(1, AOBLEN,AODISP,AOTYPE, TYPE, IERR)
Rationale for this modification: It was bad Fortran style and hard to read for C programmer.
MPI-2.0, Sect. 4.12.6, page 56, line 29:
"assciated" should be "associated" (typo).
MPI-2.0, Sect. 4.14.5, page 74, line 9:
Replace "it erroneous" by "it is erroneous" (typo).
MPI-2.0, Sect. 5.3.2, page 85, line 25:
Replace "as the as the" by "as the" (typo).
Note that further modifications and clarifications in MPI 2.0, Sect. 4.10, The Info Object, can be found in What info keys can be set?, MPI_File_get_info, MPI_File_set_view.
This section contains the discussion on these ambiguities and in cases where a consensus emerged, text has been proposed.
Does MPI_ALLOC_MEM return a null pointer when a request for memory cannot be satisfied but a request for a smaller amount may work? The question is really if the user must set MPI_ERRORS_RETURN on MPI_COMM_WORLD before calling MPI_ALLOC_MEM if the user wants to handle "not enough memory for your request" errors.
Some names in the MPI Namespace in the C++ binding can conflict with C preprocessor names in standard include files. Examples include MPI:SEEK_SET} (conflicts with SEEK_SET in stdio.h).
The target is solved by the proposal to Example 4.13.
Change MPI-2.0, Sect.4.12, page 58, line 36 reads:
but should readIF (val.NE.5) THEN CALL ERROR
IF (val.NE.address_of_i) THEN CALL ERROR
Rationale for this modification:
MPI-2.0, Sect. 4.12, page 58, lines 12-13 and 16-18 clearly state that if an attribute is
set by C, retrieving it in Fortran will obtain the address of the attribute.
Note that this also resolves the question in Interlanguage use of Attributes.
MPI 2.0, Sect. 4.10 Info Objects, page 43, line 38-40 read
If a function does not recognize a key, it will ignore it, unless otherwise specified. If an implementation recognizes a key but does not recognize the format of the corresponding value, the result is undefined.but should read
An implementation must support info objects as caches for arbitrary (key, value) pairs, regardless of whether it recognizes the key. Each function that takes hints in the form of an MPI_Info must be prepared to ignore any key it does not recognize. This description of info objects does not attempt to define how a particular function should react if it recognizes a key but not the associated value. MPI_INFO_GET_NKEYS, MPI_INFO_GET_NTHKEY, MPI_INFO_GET_VALUELEN, and MPI_INFO_GET must retain all (key, value) pairs so that layered functionality can also use the Info object.
Rationale for this clarification:
The MPI-2.0 text allowed that also MPI_INFO_DELETE, MPI_INFO_SET, MPI_INFO_GET, and MPI_INFO_DUP could ignore (key, value) pairs that are not recognized in routines in other chapters that take hints with info arguments. The proposed clarification is necessary when we assume, that layered implementation of parts of the MPI-2 standard should be possible and may use the MPI_Info objects for their needs. This was a goal of the MPI-2 Forum and the MPI-2.0 specification.
Note that further modifications and clarifications in MPI 2.0, Sect. 4.10, The Info Object, can be found in Edits to MPI-2 Chapter 4, Miscellany, MPI_File_get_info, MPI_File_set_view.
MPI 1.1, page 22, line 48 reads
used after a call to MPI_PROBE. (End of rationale.)but should read
used after a call to MPI_PROBE or MPI_IPROBE. With a status returned from MPI_PROBE or MPI_IPROBE, the same dataypes are allowed as in a call to MPI_RECV to receive this message. (End of rationale.)Advice to users. The buffer size required for the receive can be affected by data conversions and by the stride of the receive datatype. In most cases, the safest approach is to use the same datatype with MPI_GET_COUNT and the receive. (End of advice to users.)
Rationale for this clarification:
Reason for the first part: The current MPI-1.1 text says "The datatype argument should match the argument provided by the receive call that set the status variable." With MPI_PROBE, there isn't such a receive call.
Reason for the advice to users: It helps to write portable code. Because malloc needs a byte count, users may write wrong programs by using MPI_BYTE.
MPI-2, page 114, after line 4 (and after the lines added about
MPI_PROC_NULL), add
After an RMA operations with rank MPI_PROC_NULL, it is still necessary to finish the RMA epoch with the synchronization method that started the epoch.
Rationale for this clarification:
The behavior of one-sided RMA with target MPI_PROC_NULL was not clear.
There are two different proposals. The Forum should decide, which one is better.
Proposal A
Add in MPI-2.0, page 88, after line 24:
Advice to users. If the non-root processes do not use MPI_ERRCODES_IGNORE, then they have to allocate the appropriate number of entries (see maxproc at root) in the array_of_errcodes although the maxproc argument is unused in non-root processes. It is allowed to use an array_of_errcodes at some of the calling processes and MPI_ERRCODES_IGNORE at some others. (End of advice to users.)
Rationale for this clarification:
It was not clear that maxproc is significant as input argument only at root while it is needed at all processes to define the length of array_of_errcodes. It was not explicitly forbidden that MPI_ERRCODES_IGNORE is used only at some processes. And there isn't a general rule that all arguments must be the same.
Proposal B
MPI 2.0 page 84 line 45 reads (in MPI_COMM_SPAWN):
OUT array_of_errcodes one code per process (array of integer)
but should read:
OUT array_of_errcodes one code per process (array of integer,
significant only at root)
MPI 2.0 page 89 line 42 reads (in MPI_COMM_SPAWN_MULTIPLE):
OUT array_of_errcodes one code per process (array of integer)
but should read:
OUT array_of_errcodes one code per process (array of integer,
significant only at root)
Comment: This proposal modifies the MPI interface. User codes may be broken. Another reason, not to do this modification, is that the non-root processes have no chance (in error-return-mode) to detect an error. And after an error, MPI does not guarantee that MPI communication still works. Only MPI_Abort should be guaranteed to work.
Add new paragraphs after MPI-2, 8.7.2 page 195 line 9 (the end of the clarification on "Collective calls"):
Advice to users. With three concurrent threads in each MPI process of a communicator comm, it is allowed that thread A in each MPI process calls a collective operation on comm, thread B calls a file operation on an existing filehandle that was formerly opened on comm, and thread C invokes one-sided operations on an existing window handle that was also formerly created on comm. (End of advice to users.)Rationale. As already specified in MPI_FILE_OPEN and MPI_WIN_CREATE, a file handle and a window handle inherit only the group of processes of the underlying communicator, but not the communicator itself. Accesses to communicators, window handles and file handles cannot affect one another. (End of rationale.)
Advice to implementors. If the implementation of file or window operations internally uses MPI communication then a duplicated communicator may be cached on the file or window object. (End of advice to implementors.)
Rationale for this clarification: The emails have shown, that the current MPI-2 text can be misunderstood.
The bindings for predefined types such as MPI::CHAR are const which was fine for MPI-1 but may be inappropriate in MPI-2 since datatypes have names and attributes, both of which can be set.
In addition to changes for MPI::Datatype, add these changes
In the second ballot, we voted to remove const from MPI::Datatype on pages 343, 344, and 345.
If the decision in Change "INOUT" to "IN" for MPI Handle Parameters in several routines is
See also
Mail discussing
of static const in C++ specification of predefined MPI objects
and
Mail discussing
of Use of const,
and
Mail discussing
of Change "INOUT" to "IN" for MPI Handle Parameters in several routines.
See also
Mail discussing
of const in C++ specification of predefined MPI objects (was just datatypes)
and
Mail discussing
of Use of const.
The initial mail contains some comments that may be appropriate clarifications that do not change the standard. A follow-up proposes several edits to make the text more specific. The test program and test protocol mentioned in the 3rd email are available through these links.
Ballot 4, Item 10:
Here is a proposal about handling zero-dimensional Cartesian communicators that are produced with MPI_Cart_sub if all remain_dims are false.
MPI-1.1 Sect.6.5.4, page 187, line 42 (end of definition of MPI_Cart_sub) reads
(This function is closely related to MPI_COMM_SPLIT.)but should read
If all entries in remain_dims are false or comm is already associated with a zero-dimensional Cartesian topology then newcomm is associated with a zero-dimensional Cartesian topology. (This function is closely related to MPI_COMM_SPLIT.)
MPI-1.1 Sect.6.5.4, page 183, add at end of lines 30 (definition of MPI_Cartdim_get and MPI_Cart_get):
If comm is associated with a zero-dimensional Cartesian topology, MPI_Cartdim_get returns ndims=0 and MPI_Cart_get will keep all output arguments unchanged.
MPI-1.1 Sect.6.5.4, page 184, add a new paragraph after line 23 (definition of MPI_Cart_rank):
If comm is associated with a zero-dimensional Cartesian topology, coord is not significant and 0 is returned in rank.
MPI-1.1 Sect.6.5.4, page 184, add a new paragraph after lines 39 (definition of MPI_Cart_coords):
If comm is associated with a zero-dimensional Cartesian topology, coords will be unchanged.
Alternative A: (This proposal was chosen by the MPI Forum at march 2008 meeting)
MPI-1.1 Sect.6.5.5, page 186, after line 47 (end of definition of MPI_Cart_shift), the following paragraph is added:
It is erroneous to call MPI_CART_SHIFT with a direction that is either negative or greater than or equal to the number of dimensions in the Cartesian communicator. This implies that it is erroneous to call MPI_CART_SHIFT with a comm that is associated with a zero-dimensional Cartesian topology.
Alternative B: (This was proposal was rejected)
MPI-1.1 Sect.6.5.5, page 186, after line 47 (end of definition of MPI_Cart_shift), the following paragraph is added:
If comm is associated with a zero-dimensional Cartesian topology, then the input arguments direction and disp are ignored and always MPI_PROC_NULL is returned in rank_source and rank_dest.
(End of Alternative A and B)
MPI-1.1 Sect.6.5.1, page 179, lines 29-30 (end of definition of MPI_Cart_create) reads
The call is erroneous if it specifies a grid that is larger than the group size.but should read
If ndims is zero then a zero-dimensional Cartesian topology is created. The call is erroneous if it specifies a grid that is larger than the group size or if ndims is negative.
MPI-1.1 Sect.6.5.4, page 184, lines 30 (definition of MPI_Cart_coords) reads
but should read (missing "s" at coords)IN maxdims length of vector coord in the calling program (integer)
(Also included in MPI-1.3)IN maxdims length of vector coords in the calling program (integer)
Ballot 4, Item 11:
A second part of this discussion thread handles graph topologies and addresses empty groups, multiple edges and self loops.
MPI-1.1, Sect. 6.5.3, page 181, line 1-3 read:
If the size, nnodes, of the graph is smaller than the size of the group of comm, then some processes are returned MPI_COMM_NULL, in analogy to MPI_CART_CREATE and MPI_COMM_SPLIT.but should read
If the size, nnodes, of the graph is smaller than the size of the group of comm, then some processes are returned MPI_COMM_NULL, in analogy to MPI_CART_CREATE and MPI_COMM_SPLIT. If the graph is empty, i.e., nnodes == 0, then MPI_COMM_NULL is returned in all processes.
Rationale for this clarification:
As in MPI_COMM_CREATE, empty groups are allowed, but empty groups are described here in a different way, and should be mentioned explicitly therefore.
After MPI-1.1, Sect. 6.5.3, page 181, line 35, the following paragraph should be added:
A single process is allowed to be defined multiple times in the list of neighbors of a process (i.e., there may be multiple edges between two processes). A process is also allowed to be a neighbor to itself (i.e., a self loop in the graph). The adjacency matrix is allowed to be non-symmetric.Advice to users. Performance implications of using multiple edges or a non-symmetric adjacency matrix are not defined. The definition of a node-neighbor edge does not imply a direction of the communication. (End of advice to users.)
Rationale for this clarification:
The Example 6.3, MPI-1.1, page 15, line 29 - page 186, line 13, clearly shows multiple edges between nodes and self loops: the two (multiple) self-loops of node 0 and of node 7. It is nowhere forbidden, that the graph has edges only in one direction.
After MPI-1.1, Sect. 6.4, page 178, end of the sentence on lines 6-7, the following sentence should be added:
All input arguments must have identical values on all processes of the group of comm_old.
Rationale for this clarification: This statement is missing.
In the mails it is mentioned that the description of the send and receive counts arguments could be interpreted as allowing the receive counts to be at least as large as required by the send count, rather than exactly matching the count as defined by the type signatures.
MPI-1, Sect. 4.8, MPI_ALLTOALLV, page 112, lines 37-40 clearly states that resulting send and receive type signature must be pairwise the same.
Therefore, a clarification is not necessary and not proposed.
MPI_File_get_view return copies of the datatypes for
the filetype and etype?
The question really is "can (and must) the user free those datatypes"? For other MPI routines, the answer is always yes, but here the original datatype may be a predefined type, which may not be freed.
The request for clarification is related to a multi-threaded execution with concurrent completion of a request in one thread and canceling of the same request in another thread.
MPI_Waitall) with a count of zero
No specific proposal yet.
MPI-2, Sect. 7.3.6, page 167, lines 6-8 read:
The reason that MPI-1 chose the inclusive scan is that the definition of behavior on processes zero and one was thought to offer too many complexities in definition, particularly for user-defined operations. (End of rationale.)but should read:
No in-place version is specified for MPI_EXSCAN because it is not clear what this means for the process for rank zero. The reason that MPI-1 chose the inclusive scan is that the definition of behavior on processes zero and one was thought to offer too many complexities in definition, particularly for user-defined operations. (End of rationale.)
Add the following clarification to the current interface definitions of MPI_GET_PROCESSOR_NAME and MPI_COMM_GET_NAME.
MPI 1.1, Sect. 7.1, routine MPI_GET_PROCESSOR_NAME, page 193, add after line 20:
In C, a \0 is additionally stored at name[resultlen]. resultlen cannot be larger then MPI_MAX_PROCESSOR_NAME-1. In Fortran, name is padded on the right with blank characters. resultlen cannot be larger then MPI_MAX_PROCESSOR_NAME.
MPI-1.1, Sect. 7.1, page 193, beginning of line 29 reads
examine the ouput argumentbut should read (additional t in output)
examine the output argument
MPI 2.0, Sect. 8.4, routine MPI_COMM_GET_NAME, page 178, add after line 48:
In C, a \0 is additionally stored at name[resultlen]. resultlen cannot be larger then MPI_MAX_OBJECT_NAME-1. In Fortran, name is padded on the right with blank characters. resultlen cannot be larger then MPI_MAX_OBJECT_NAME.
There is additional information, including results of checking the behavior of many MPI implementations, in the mail discussion. The test programs and some results are also available here:
MPI-2.0 Sect. 9.5.3 User-defined Data Representations, page 254, lines 13-15 read:
Then in subsequent calls to the conversion function, MPI will increment the value in position by the count of items converted in the previous call.but should read:
Then in subsequent calls to the conversion function, MPI will increment the value in position by the count of items converted in the previous call, and the userbuf pointer will be unchanged.
Rationale for this clarification:
It was not clear, whether the userbuf pointer must also be moved in the subsequent calls. This clarification was already done in 1999 and should already be implemented in existing implementations of user-defined data representations.
A clarification is not necessary and not proposed.
Status: The authors of the discussion are asked for a proposal. (Pending)
A clarification is not necessary and not proposed.
MPI-1.1 Sect. 7.5, MPI_Abort, page 200, lines 23-26 read:
This routine makes a "best attempt" to abort all tasks in the group of comm. This function does not require that the invoking environment take any action with the error code. However, a Unix or POSIX environment should handle this as a return errorcode from the main program or an abort(errorcode).but should read (" or an abort(errorcode)" removed):
This routine makes a "best attempt" to abort all tasks in the group of comm. This function does not require that the invoking environment take any action with the error code. However, a Unix or POSIX environment should handle this as a return errorcode from the main program.
Rationale for this clarification:
POSIX defines void abort(void). The routine void exit(int status) may be used to implement "handle this as a return errorcode from the main program". abort(errorcode) was not substituted by exit(errorcode) because this is technically not enough, if the MPI implementation wants to return it also from mpiexec, see next proposal.
MPI-1.1 Sect. 7.5, MPI_Abort, page 200, add after line 34 (end of rationale):
Advice to users. Whether the errorcode is returned from the executable or from the MPI process startup mechanism (e.g., mpiexec), is an aspect of quality of the MPI library but not mandatory. (End of advice to users.)Advice to implementors. Where possible, a high quality implementation will try to return the errorcode from the MPI process startup mechanism (e.g. mpiexec or singleton init). (End of advice to implementors.)
Rationale for this clarification:
The intent of word "should" in "should handle this as a return errorcode from the main program" is only a quality of implementation aspect and not a must. This was not clear and could be misinterpreted.
Problem:
An application may repeatedly call
(probably with same (p,r) combination) the MPI_TYPE_CREATE_F90_xxxx routines.
Proposal:
Add after MPI-2.0 Sect. 10.2.5, MPI_TYPE_CREATE_F90_xxxx, page 295, line 47
(End of advice to users.):
Advice to implementors. An application may often repeat a call to MPI_TYPE_CREATE_F90_xxxx with the same combination of (xxxx, p, r). The application is not allowed to free the returned predefined, unnamed datatype handles. To prevent the creation of a potentially huge amount of handles, the MPI implementation should return the same datatype handle for the same (REAL/COMPLEX/INTEGER, p, r) combination. Checking for the combination (p,r) in the preceding call to MPI_TYPE_CREATE_F90_xxxx and using a hash-table to find formerly generated handles should limit the overhead of finding a previously generated datatype with same combination of (xxxx,p,r). (End of advice to implementors.)
Rationale for this clarification:
Currently most MPI implementations are handling the MPI_TYPE_CREATE_F90_xxxx functions wrong or not with the requested quality.
Alternative proposal (Rejected by the MPI Forum in the March 2008 meeting):
Instead of giving the implementation hint in form of the advice to implementors, the MPI Forum can modify the MPI standard and require that for each call to MPI_TYPE_CREATE_F90_xxxx a new datatype handle is generated and that this may be freed if no longer in use (if the user may not waste space).
Because this alternative proposal may break existing application codes, it would be an MPI 3.0 proposal.
Is it an error to specify more than one of these?
The answer is yes (MPI-2, section 9.2.1, page 213, lines 3-4).
A clarification is not necessary and not proposed.
Add text stating that the return value can be null:
MPI-2.0 Sect. 9.2.8, File Info, page 219, lines 11-13 read:
MPI_FILE_GET_INFO returns a new info object containing the hints of the file associated with fh. The current setting of all hints actually used by the system related to this open file is returned in info_used. The user is responsible for freeing info_used via MPI_INFO_FREE.but should read:
MPI_FILE_GET_INFO returns a new info object containing the hints of the file associated with fh. The current setting of all hints actually used by the system related to this open file is returned in info_used. If no such hints exist, a handle to a newly created info object is returned that contains no key/value pair. The user is responsible for freeing info_used via MPI_INFO_FREE.
Rationale for this clarification:
This text was missing. It was not clear, whether a MPI_Info handle would be returned that would return nkeys=0 from MPI_INFO_GET_NKEYS. From user's point of view, this behavior might have been expected without this clarification. For most implementations, this clarification is irrelevant because they always return several default hints, e.g., the filename.
Note that further modifications and clarifications in MPI 2.0, Sect. 4.10, The Info Object, can be found in Edits to MPI-2 Chapter 4, Miscellany, What info keys can be set?, MPI_File_set_view.
Does the info passed to MPI_File_set_view replace all of the previous info keys? (The answer given in this clarification is "no".)
Proposal:
Add in MPI-2.0 Sect. 9.2.8, File Info, page 218, after line 18 the
following sentences:
When an info object that specifies a subset of valid hints is passed to MPI_FILE_SET_VIEW or MPI_FILE_SET_INFO, there will be no effect on previously set or defaulted hints that the info does not specify.
Rationale for this clarification:
This text was missing. It was not clear, whether an info object in MPI_FILE_SET_VIEW and MPI_FILE_SET_INFO was intended to replace only the mentioned hints or was intended to substitute a complete new set of hints for the prior set.
Note that further modifications and clarifications in MPI 2.0, Sect. 4.10, The Info Object, can be found in Edits to MPI-2 Chapter 4, Miscellany, What info keys can be set?, MPI_File_get_info.
The text added in MPI 1.1 on the error return for MPI_Waitall etc. is written as if the only error handler is MPI_ERRORS_RETURN.
Status: The authors of the discussion are asked for a proposal. (Pending)
A clarification is not necessary and not proposed.
A clarification is not necessary and not proposed.
MPI-2.0, Sect. 7.3.3, routine MPI_REDUCE_SCATTER, page 163, delete the sentence on line 19-20:
Note that the area occupied by the input data may be either longer or shorter than the data filled by the output data.
Rationale for this clarification:
The sentence makes no sense because the input data can never be shorter than the output data. The output, determined by recvcounts[i], is a subset of the input.
Blocklengths of zero are allowed. Do we need to add a statement to this effect? The mail threads contain a deeper discussion of the implications of zero-length blocks in MPI datatypes.
Proposal:
Add the following paragraph in MPI 1.1, Sect. 3.12, page 62,
after line 2 (i.e., after ... "of the types defined by Typesig."):
Most datatype constructors have replication count or block length arguments. Allowed values are nonnegative integers. If the value is zero, no elements are generated in the type map and there is no effect on datatype bounds or extent.
MPI 1.1, Sect 3.12.1, MPI_TYPE_HINDEXED, page 67, line 22-24 read:
IN count number of blocks - also number of entries in
array_of_displacements and array_of_blocklengths
(integer)
but should read:
IN count number of blocks - also number of entries in
array_of_displacements and array_of_blocklengths
(nonnegative integer)
MPI 1.1, Sect 3.12.1, MPI_TYPE_STRUCT, page 68, line 19-22 read:
IN count number of blocks (integer) - also number
of entries in arrays array_of_types,
array_of_displacements and array_of_blocklengths
IN array_of_blocklength number of elements in each
block (array of integer)
but should read:
IN count number of blocks (nonnegative integer) - also number
of entries in arrays array_of_types,
array_of_displacements and array_of_blocklengths
IN array_of_blocklength number of elements in each
block (array of nonnegative integer)
MPI 2.0, Sect 4.14.1, MPI_TYPE_CREATE_HINDEXED, page 66, line 36-38 read:
IN count number of blocks - also number of entries in
array_of_displacements and array_of_blocklengths
(integer)
but should read:
IN count number of blocks - also number of entries in
array_of_displacements and array_of_blocklengths
(nonnegative integer)
MPI 2.0, Sect 4.14.1, MPI_TYPE_CREATE_STRUCT, page 67, line 14-18 read:
IN count number of blocks (integer) - also number
of entries in arrays array_of_types,
array_of_displacements and array_of_blocklengths
IN array_of_blocklength number of elements in each
block (array of integer)
but should read:
IN count number of blocks (nonnegative integer) - also number
of entries in arrays array_of_types,
array_of_displacements and array_of_blocklengths
IN array_of_blocklength number of elements in each
block (array of nonnegative integer)
Rationale for this clarification and modification:
The outcome of zero-count entries in the type map was not defined. For this, a clarification was needed. The interfaces of MPI_TYPE_CREATE_HINDEXED and MPI_TYPE_CREATE_STRUCT was inconsistent to the rest derived datatype routines. This was probably due to editing errors. A meaning of negative values was never defined not intended. Therefore, portable applications could not use negative values. These editing errors are fixed by this proposal.
MPI-2.0 Sect. 8.7.3, MPI_Init_thread, page 196, lines 25-26 read:
MPI_THREAD_FUNNELED The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are "funneled" to the main thread).but should read:
MPI_THREAD_FUNNELED The process may be multi-threaded, but the application must ensure that only the main thread makes MPI calls (for the definition of main thread, see MPI_IS_THREAD_MAIN).
Rationale for this clarification:
The existing document doesn't make it clear that the MPI user has to funnel the calls to the main thread; it's not the job of the MPI library. I have seen multiple MPI users confused by this issue, and when I first read this section, I was confused by it, too.
The ISO/IEC Standard for C has added a number of new required and optional datatypes, such as int32_t and _Bool.
const
Add const to the C binding of MPI routines.
See also discussion of const for COMM_WORLD and COMM_SELF.
See also
Mail discussing
of const in C++ specification of predefined MPI objects (was just datatypes)
and
Mail discussing
of static const in C++ specification of predefined MPI objects.
In brief, in some cases when using dynamic processes, an application may need to know when a process, recently disconnected with MPI_Comm_disconnect, has exited. There is no easy way within MPI to do this since MPI_Comm_disconnect doesn't wait for the process to exit (and it shouldn't, of course).
MPI_Reduce_scatter
This proposes an extension to MPI to add a constant block-size version of
MPI_Reduce_scatter, much as MPI-2 added
MPI_Type_create_indexed_block. This allows implementations to
optimize the implementation of this routine. Several uses of
MPI_Reduce_scatter with constant block sizes have recently been
discussed at the Euro PVMMPI meetings.
Status: The authors of the discussion are asked for a proposal. (Pending)
The example appears to make use of data before the necessary
MPI_Win_complete is called to end the exposure epoch.
The use of integers (and even address-sized integers) in MPI routines such as the point-to-point routines and datatype creation can limit the (at least with the natural choice of arguments) size of message that may be sent. Some MPI users now want to send messages greater than 2 GB; the use of MPI datatypes to describe file layouts can also run into trouble on 32-bit systems.
MPI_Type_size returns size as an int:
This is a perenial erratum. However, with file types used for file views, it is not only possible but likely that datatypes will be (in fact, have been) constructed with a size that is greater than 2GB (range of a signed 32-bit int). Since we don't want to change MPI_Type_size, do we need another routine (and deprecate MPI_Type_size)?
This proposes a form of non-blocking connect and accept.
This proposes to remove the restriction that the user may not access a send buffer that is used in a nonblocking send operation
This asks about the reasoning behind the restriction noted in the MPI standard, and may suggest either the addition of a rationale or dropping the restriction.
This proposal suggests a variation on Exscan that also returns the final result (as if an Allreduce was performed with the same data).
This proposal suggests additional operations to support segmented scans and similar computations.
There is a inconsistency between the INOUT description for handle arguments, and their usage in the language independent definitions of MPI-1.1 and MPI-2.0. There are 3 possibilities to solve this. The MPI Forum should decide, which possibility is the best.
Alternative A: (This proposal was chosen by the MPI Forum in the March 2008 meeting)
Proposal: MPI 2.0 Sect. 2.3 Procedure Specification, page 6 lines 30-34 read:
There is one special case - if an argument is a handle to an opaque object (these terms are defined in Section 2.5.1), and the object is updated by the procedure call, then the argument is marked OUT. It is marked this way even though the handle itself is not modified - we use the OUT attribute to denote that what the handle references is updated. Thus, in C++, IN arguments are either references or pointers to const objects.but should read:
There is one special case - if an argument is a handle to an opaque object (these terms are defined in Section 2.5.1), and the object is updated by the procedure call, then the argument is marked INOUT or OUT. It is marked this way even though the handle itself is not modified - we use the INOUT or OUT attribute to denote that what the handle references is updated. Thus, in C++, IN arguments are either references or pointers to const objects.
Change the three inconsistent interface definitions from IN to INOUT in MPI-1.1 - see list of MPI 1.1 routines below.
Rationale for this clarification:
This is the minimal change to remove the existing inconsistency. Only the Fortran interfaces of three deprecated MPI-1.1 routines is modified from IN to INOUT. Due to Fortran call by reference, this has no impact for the applications. In the C interfaces, the handle argument is call by value.
Alternative B:
This proposal was deferred to MPI 2.2.
Keep the argument definition for handling the opaque objects (INOUT) and add the argument definition for the handles as IN.
Proposal: MPI 2.0 Sect. 2.3 Procedure Specification, page 6 lines 30-34 read:
There is one special case - if an argument is a handle to an opaque object (these terms are defined in Section 2.5.1), and the object is updated by the procedure call, then the argument is marked OUT. It is marked this way even though the handle itself is not modified - we use the OUT attribute to denote that what the handle references is updated. Thus, in C++, IN arguments are either references or pointers to const objects.but should read:
There is one special case - if an argument is a handle to an opaque object (these terms are defined in Section 2.5.1), and the object is updated by the procedure call but the handle itself is not modified, then the argument is marked IN/INOUT. We use the first part (IN) to specify the use of the handle and the second part (INOUT) to specify the use of the opaque object. Thus, in C++, IN arguments are either references or pointers to const objects, IN/INOUT arguments are references to const handles to non-const objects.
In the following routines, the INOUT handle declaration (in MPI-2.0) and the IN handle declaration (in MPI-1.1) is modified into a IN/INOUT handle declaration.
MPI 1.1:
MPI_ATTR_PUT, MPI_ATTR_DELETE, MPI_ERRHANDLER_SET
MPI 2.0:
MPI_INFO_SET, MPI_INFO_DELETE, MPI_COMM_SET_ERRHANDLER, MPI_TYPE_SET_ERRHANDLER, MPI_WIN_SET_ERRHANDLER, MPI_GREQUEST_COMPLETE, MPI_COMM_SET_NAME, MPI_TYPE_SET_NAME, MPI_WIN_SET_NAME, MPI_COMM_SET_ATTR, MPI_TYPE_SET_ATTR, MPI_WIN_SET_ATTR, MPI_COMM_DELETE_ATTR, MPI_TYPE_DELETE_ATTR, MPI_WIN_DELETE_ATTR, MPI_FILE_SET_SIZE, MPI_FILE_PREALLOCATE, MPI_FILE_SET_INFO, MPI_FILE_SET_VIEW, MPI_FILE_WRITE_AT, MPI_FILE_WRITE_AT_ALL, MPI_FILE_IWRITE_AT, MPI_FILE_READ, MPI_FILE_READ_ALL, MPI_FILE_WRITE, MPI_FILE_WRITE_ALL, MPI_FILE_IREAD, MPI_FILE_IWRITE, MPI_FILE_SEEK, MPI_FILE_READ_SHARED, MPI_FILE_WRITE_SHARED, MPI_FILE_IREAD_SHARED, MPI_FILE_IWRITE_SHARED, MPI_FILE_READ_ORDERED, MPI_FILE_WRITE_ORDERED, MPI_FILE_SEEK_SHARED, MPI_FILE_WRITE_AT_ALL_BEGIN, MPI_FILE_WRITE_AT_ALL_END, MPI_FILE_READ_ALL_BEGIN, MPI_FILE_READ_ALL_END, MPI_FILE_WRITE_ALL_BEGIN, MPI_FILE_WRITE_ALL_END, MPI_FILE_READ_ORDERED_BEGIN, MPI_FILE_READ_ORDERED_END, MPI_FILE_WRITE_ORDERED_BEGIN, MPI_FILE_WRITE_ORDERED_END, MPI_FILE_SET_ATOMICITY, MPI_FILE_SYNC
Rationale for this proposal:
I have checked the total MPI 1.1 and 2.0 standard to find all routines with an argument specification according to the following declaration pattern:Language independent interface: INOUT handle C interface MPI_handletype handleAll these routines keep the handle itself unchanged, but the opaque object is modified in a way, that with other MPI routines this change can be detected. For example, an attribute is cached or changed, a file pointer is moved, the content of a file was modified.The current specification with IN (in MPI 1.1) or INOUT (in MPI 2.0) is inadequate and led to misinterpretation in the const declarations of the C++ interface.
It is not explicitly mentioned that IN/IN is abbreviated with IN, and OUT/OUT with OUT. (Therefore no change in all routines with pure IN and pure OUT handles/opaque objects.
This proposal changes the Fortran interface, because the handles itself are now declared as IN. The MPI-2.0 did not decide whether they are IN or INOUT. Only C/C++ interfaces specified call by value for the handles itself. This hasn't any impact for applications. It is not expected that it has any impact on any MPI implementation.
Alternative C:
This proposal was deferred to MPI 2.2.
Substitute the argument definition for handling the opaque objects (INOUT) and by the argument definition for the handles (IN).
Proposal: MPI 2.0 Sect. 2.3 Procedure Specification, page 6 lines 30-34 read:
There is one special case - if an argument is a handle to an opaque object (these terms are defined in Section 2.5.1), and the object is updated by the procedure call, then the argument is marked OUT. It is marked this way even though the handle itself is not modified - we use the OUT attribute to denote that what the handle references is updated. Thus, in C++, IN arguments are either references or pointers to const objects.but should read:
There is one special case - if an argument is a handle to an opaque object (these terms are defined in Section 2.5.1), and the object is updated by the procedure call but the handle itself is not modified, then the argument is marked IN. Thus, in C++, IN arguments are either references or pointers to const objects, or references to const handles to non-const objects.
In all MPI-2.0 routines from the list above, the INOUT handle declaration is modified into a IN handle declaration.
Rationale for this proposal:
This proposal is easier, but looses the INOUT information on the opaque object itself.As at Alternative B, this proposal changes the Fortran interface, because the handles itself are now declared as IN. The MPI-2.0 did not decide whether they are IN or INOUT. Only C/C++ interfaces specified call by value for the handles itself. This hasn't any impact for applications. It is not expected that it has any impact on any MPI implementation.
See also Mail discussing of const in C++ specification of predefined MPI objects (was just datatypes). The decision there must be based on the decision to this item.
Note that this error does not appear in the MPI standard.
Note that this error does not appear in the MPI-2 standard (see page 40).