Major Decision #1: Functional or Data Parallelism?
If your application features widespread, unavoidable data dependencies, you may
choose to...
- Partition by task (functional parallelism)
- Each process performs a different "function" or executes a different code section
- First identify functions, then look at the data requirements
- Commonly programmed with message-passing libraries
Conversely, if your application intertwines functions that operate on relatively
independent data, you may opt to...
- Partition by data (data parallelism)
- Each process does the same work on a unique piece of data
- "Owner computes"
-
First divide the data. Each process then becomes responsible for whatever work is
needed to process its data.
- Data placement is an essential part of a data-parallel algorithm
- Probably more scalable than functional parallelism
- Can be programmed at a high level (loop directives) with OpenMP, especially on a
shared-memory machine
- Can be programmed at a lower level (subroutine calls) using a message-passing library
like MPI, especially on a distributed-memory machine
These can be used in combination. A program can be partitioned
by function; each function can then be partitioned by data. In addition, there are
some cases in which the distinction between the two categories blurs.
Additional material on data and functional parallelism is in the module
Distributed Memory Programming, specifically
section 9.2.