
The diagram shows five processes, each running a different program. In this case,
each program calculates the population of a given group, where each group's growth
depends on that of its neighbors. As time progresses, each process calculates its
current state, then shares information with the neighbor populations, so they can
all go on to calculate the state at the next time step.
The load balancing for this program is static (pre-scheduled)—each process' load
is determined and inflexible at the start of the application. It is also likely
to be unequal, with the different programs requiring different amounts of computation
before sharing state.
The communication pattern is a ring. This will influence how the different programs
are mapped to physical processors. Those programs that need to communicate should
ideally be only one communication "hop" from each other. Note that at CTC, this
issue is currently irrelevant because on each cluster the interconnect is a single
"all-to-all" switch. Thus, every node in the cluster is always one hop away from
all other nodes. If you use HPC resources elsewhere, then you may have to worry
about network topology.