The goal of the previous lab was to produce an executable of a parallel MPI program.
In this section we will run that executable on the compute machines, using the Velocity
Scheduler (vsched).
Prerequisites:
- Read
Velocity Scheduler for Parallel
- You must have the following files in $HOME/lab, either copied from the vwlabs folder
or created in previous lab sections:
- An executable of a parallel MPI program, named karp.parallel
- The input file values
- Batch scripts for serial runs karp.serial.xml and karp.serial.sh
- Batch scripts for parallel programs:
karp.parallel.xml: submit to request compute nodes
karp.parallel.sh: main batch script, executed when your job starts
karp.setup.sh: called by the main batch script, it sets up the files on your job's batch nodes
karp.cleanup.sh: called by the main batch script, it cleans up the files on you job's batch nodes
Instructions:
- If you haven't already, run the script setup_ssh_mpd_linux.sh as instructed in the first page of
this lab, Logon and Copy Files.
- Edit karp.parallel.sh to check paths, etc. Compare
karp.serial.sh, the script you used in the serial batch section,
to karp.parallel.sh.
These are some of the changes you will
notice:
- After cd /tmp there is an additional line to create a machines file in the /tmp folder:
vsched -m - Start the mpi daemons:
mpdboot -n 1 -f /tmp/machines - The line which runs the program has been modified to use mpiexec:
mpiexec -n 4 ./karp.parallel >& karp.parallel.out
- Note that karp.parallel.xml and karp.parallel.sh are written for a multiple node job. We want to run our first test on a single node.
- Edit karp.parallel.xml to request only one node.
- Edit karp.parallel.sh to run two processes (NPROCS) on one machine (NMACHINES).
- Submit the batch script:
vsched -s karp.parallel.xml - Check your output files to see that the program and scripts are runnning
correctly. karpc.parallel.out and karpf.parallel.out
should look like the output in the Results section below.
When they are, go on to the next step.
- Now run your parallel executable on two machines. Edit karp.parallel.xml
to request 2 nodes. Edit karp.parallel.sh
to run on four processes (NPROCS) on two machines (NMACHINES). Check karp.setup.sh
and karp.cleanup.sh for paths and file names, and to understand
how they work.
- Submit this job to the batch system with the command:
vsched -s karp.parallel.xml
Results: When your job has ended, there should be a new folder
$HOME/lab/output that contains your output files.
For a C program, karpc.parallel.out should look like this:
Approximation interval is 10
host calculated x = 0.96386
host got x = 0.90855
host got x = 0.65767
host got x = 0.61235
sum, err = 3.14243, 8.333314e-04
Approximation interval is 100
host calculated x = 0.79288
host got x = 0.78793
host got x = 0.78292
host got x = 0.77787
sum, err = 3.14160, 8.333333e-06
Approximation interval is 0
node 0 left
node 2 left
node 1 left
node 3 left
For a Fortran program, karpf.parallel.out should look like this:
Number of approximation intervals = 10
host calculated x= 0.963863445745749
host got x= 0.908549451125783
host got x= 0.657665673972368
host got x= 0.612347443971916
sum, err = 3.14242601481582 8.332738032428288E-004
Number of approximation intervals = 100
host calculated x= 0.792876329390252
host got x= 0.787926017074688
host got x= 0.782924454031102
host got x= 0.777874141723412
sum, err = 3.14160094221945 8.201206881164325E-006
Number of approximation intervals = 0