Why have communicators?
We will now explain in general about communicators. This
topic is covered in much greater depth in the module
MPI Groups and Communicator Management.
A message's eligibility to be picked up by a specific receive call depends on its
tag, source and communicator. Tag allows the program to distinguish between types
of messages. Source simplifies programming. Instead of having a unique tag for each
message, each process sending the same information can use the same tag. But why
is a communicator needed?
An example
Suppose you are sending messages between your processes, but you are also calling
a set of libraries you obtained elsewhere, which also runs on multiple nodes and
communicates within itself using MPI. In this case, you want to make sure that messages
you send go to your processes, and do not get confused with the messages being sent
internally between the processes that comprise the library routine. This is when
having communicators becomes important; they allow you to distinguish your program
MPI calls and the library MPI calls.
In this example, we have three processes communicating with each other. Each process
also calls a library routine, and the three parallel parts of the library routine
communicate with each other. We want to have two different message "spaces", one
for our messages, and one for the library's messages. We do not want any intermingling
of the messages.
The boxes represent parts of three parallel processes. Time progresses from the
top to the bottom of each diagram. The numbers in parentheses are NOT parameters,
but rather process numbers. For example, send(1) means send a message to process
1. Recv(any) means receive a message from any processor. The user's (caller's) code
is in the white (unshaded) boxes. The shaded boxes (callee) represent a (parallel)
library package being called by the user. Finally, the arrows represent the movement
of a message from sender to receiver.
The diagram below shows what we would like to happen. In this case, everything works
as intended.
However, there is no guarantee that things will occur in this order, since the relative
scheduling of processes on different nodes can vary from run to run. Suppose we
change the third process by adding some computation at the beginning. The sequence
of events might then occur as follows:
In this case, communications do not occur as intended. The first "receive" in process
0 now receives the "send" from the library routine in process 1, not the intended
(and now delayed) "send" from process 2. As a result, all three processes hang.
This problem is solved by the library developer requesting a new and unique communicator,
and specifying this communicator in all send and receive calls made by the library.
This creates a library ("callee") message space separate from the user's ("caller")
message space.
Can tags be used to accomplish separate message spaces? The problem with tags is
that they are given values by the programmer, and he/she might use the same tag
used by a parallel library using MPI. With communicators, the system, not the programmer,
assigns identification -- the system assigns a communicator to the user, and it
assigns a different communicator to the library -- so there is no possibility of
overlap.