If you already have a serial version of your program, and are going to modify it
to use MPI, make sure your serial version is thoroughly debugged before going parallel.
This will make debugging your parallel version much easier. Then
add calls to MPI routines in the appropriate places in your program.
If you are writing an MPI program from scratch, and it's not much extra work to
write a serial version first (without MPI calls), you should do that.
Again, identifying and removing the non-parallel bugs first will make parallel debugging
that much easier. Design your parallel algorithm, taking advantage of any parallelism
inherent in your serial code, e.g., large arrays that can be broken down into subtasks
and processed independently.
When debugging in parallel, make sure your program runs successfully on a few machines
first.
Increase the number of machines gradually, e.g., from 2
to 4, to 8, etc. That way, you won't waste a lot of machine time on additional bugs.
We have a set of four development machines, each with two processors
per machine that are a good place to test your parallel codes. These machines
have a 20 minute time limit, so turnaround is quick.