Major Decision #2: SPMD or Manager/Worker?
If one has chosen the data parallel strategy, there is an additional decision to
- Single Program, Multiple Data (SPMD)
- All processes run the same program, operating on different data.
This model is particularly appropriate for problems with a regular, predictable
communication pattern. These tend to be scalable if all processes read/write to
files and if global communication is avoided.
- A single program (called the Manager) coordinates the work done on all the processes.
These are called Workers. The Manager may or may not contribute
to computation. This model has limited scalability due to the communication bottleneck
caused by all Workers needing to communicate with a single Manager.
- The Manager and Workers may all run the same program, or different programs.
If they are running the same program, conditional (if) statements
cause different tasks to run different code segments.
- "Embarrassingly parallel" is a special case. If all tasks can
be run completely independently from each other—i.e., they differ only in their
input, and they are coupled only through their final output—then detailed parallel
programming may be unnecessary. The Manager in this case could be a script (at the
OS level) that initiates separate instances of a serial program on different processors,
then collects and processes the final output.
Additional background material on SPMD and Manager/Worker can be
found in the module
Distributed Memory Programming (specifically
section 6.2) and
Performance Basics (specifically