Skip to main content

more options


Parallel batch

Please use the "Switch to Details" button to return to the lab instructions.

The goal of the previous lab was to produce an executable of a parallel MPI program. In this section we will run that executable on the compute machines, using the Velocity Scheduler (vsched).

Prerequisites:

  1. Read Velocity Scheduler for Parallel
  2. You must have the following files in $HOME/lab, either copied from the vwlabs folder or created in previous lab sections:
  • An executable of a parallel MPI program, named karp.parallel
  • The input file values
  • Batch scripts for serial runs karp.serial.xml and karp.serial.sh 
  • Batch scripts for parallel programs:

    karp.parallel.xml: submit to request compute nodes
    karp.parallel.sh: main batch script, executed when your job starts
    karp.setup.sh: called by the main batch script, it sets up the files on your job's batch nodes
    karp.cleanup.sh: called by the main batch script, it cleans up the files on you job's batch nodes

Instructions:

  1. If you haven't already, run the script setup_ssh_mpd_linux.sh as instructed in the first page of this lab, Logon and Copy Files.
  2. Edit karp.parallel.sh to check paths, etc. Compare  karp.serial.sh, the script you used in the serial batch section, to karp.parallel.sh. These are some of the changes you will notice:
    • After cd /tmp there is an additional line to create a machines file in the /tmp folder:
      vsched -m
    • Start the mpi daemons:
      mpdboot -n 1 -f /tmp/machines
    • The line which runs the program has been modified to use mpiexec:
      mpiexec -n 4 ./karp.parallel >& karp.parallel.out
  3. Note that karp.parallel.xml and karp.parallel.sh are written for a multiple node job. We want to run our first test on a single node.
    • Edit karp.parallel.xml to request only one node.
    • Edit karp.parallel.sh to run two processes (NPROCS) on one machine (NMACHINES).
  4. Submit the batch script:
    vsched -s karp.parallel.xml
  5. Check your output files to see that the program and scripts are runnning correctly. karpc.parallel.out and karpf.parallel.out should look like the output in the Results section below. When they are, go on to the next step.
  6. Now run your parallel executable on two machines. Edit karp.parallel.xml to request 2 nodes.  Edit karp.parallel.sh to run on four processes (NPROCS) on two machines (NMACHINES). Check karp.setup.sh and karp.cleanup.sh for paths and file names, and to understand how they work.
  7. Submit this job to the batch system with the command:
    vsched -s karp.parallel.xml

Results: When your job has ended, there should be a new folder $HOME/lab/output that contains your output files.

For a C program, karpc.parallel.out should look like this:

Approximation interval is 10
host calculated x = 0.96386
host got x = 0.90855
host got x = 0.65767
host got x = 0.61235
sum, err = 3.14243, 8.333314e-04
Approximation interval is 100
host calculated x = 0.79288
host got x = 0.78793
host got x = 0.78292
host got x = 0.77787
sum, err = 3.14160, 8.333333e-06
Approximation interval is 0
node 0 left
node 2 left
node 1 left
node 3 left


For a Fortran program, karpf.parallel.out should look like this:

Number of approximation intervals =           10
host calculated x=  0.963863445745749
host got x=  0.908549451125783
host got x=  0.657665673972368
host got x=  0.612347443971916 sum, err =   3.14242601481582       8.332738032428288E-004 Number of approximation intervals =          100 host calculated x=  0.792876329390252
host got x=  0.787926017074688 host got x=  0.782924454031102
host got x=  0.777874141723412 sum, err =   3.14160094221945       8.201206881164325E-006 Number of approximation intervals =            0