![]() |
ANU College of Engineering and Computer Science (CECS)
Department of Computer Science
|
|
COMP4300 2009: Laboratory 1Introduction to MPI on the Saratoga Cluster
Saratoga Login IDsFirst Name Surname Login ID Simon Bragg c43sxb Markus Brenner c43mxb Steven Chang c43sxc Michael Chapman c43mxc Sumedha De Silva c43sxd Sotirios Diamand c43syd Zi Dong c43zxd Artyom Dziouba c43axd Mathew Ellis c43mxe Christopher Fraser c43cxf Stephen Gream c43sxg Paul Hartlipp c43pxh Omar Hashmi c43oxh John Haynes c43jxh Jie Hua c43jzh Yue Huang c43yxh Jonathon Hunklinger c43jyh Michael Karas c43mxk Paul Krix c43pxk Yuanpeng Li c43yxl Xiang Li c43xxl Wilson Ly c43wxl Swe Lynn c43sxl Francis Markham c43fxm Briely Marum c43bxm Timothy Mathas c43txm Gaurav Mitra c43gxm Ian Munsie c43ixm Sheharyar Naeem c43sxn Minh Nguyen c43mxn Kevin O'Shea c43kxo Alexander Osborne c43axo Christopher Pelling c43cxp Peter Penglis c43pxp Joel Plane c43jxp Benjamin Polkinghorne c43bxp Matthew Rankin c43mxr James Richards c43jxr Huw Rowlands c43hxr Ian Roxburgh c43ixr Matthew Scott c43mxs Travis Stenborg c43txs Richard Thomas c43rxt James Thomson c43jxt Khoi-Nguyen Tran c43kxt Temi Varghese c43txv Naveen Venugopal c43nxv Andrew Waldron c43axw Danny Wang c43dxw Guo Wang c43gxw Ming-Lun Wen c43mxw Nick Withers c43nxw Ji Wong c43jxw Yat Yiu c43yxy Li Zhou c43lxzYOU WILL NEED TO ATTEND THE LAB AND ASK ME FOR THE PASSWORD. Log on to the machine via ssh saratoga -l c43xxxSaratoga is a resource within the Computer Systems research group, so it is run an administered by the group (not the Technical Support Group). As a consequence, please use the machine with respect. If you have problems you will need to see myself, Rui Yang or Ganesh Venkateshwara (both in room N232). Be aware - THERE ARE NO BACKUPS ON SARATOGA - it is your responsibility to periodically move your files back to the student system. File space is also tight. You have a quota of 200MB, but there is insufficient space on the disk for everyone to use this. So please clean up periodically. Saratoga is the front end in a Beowulf cluster, providing a bridge between the outside world and the actual cluster. Saratoga is a 2GHz single CPU opteron processor (cat /proc/cpuinfo). The actual cluster has 7 working nodes (1 is not working!) that are (imaginatively) named node00, node02:node07 (guess which one is dead!). Each of these nodes is a 2.2GHz dual core Athlon, so in total you can run using 14 cores over 7 nodes. While you can log on to the nodes of the cluster (do "ssh node00" from saratoga), you will not normally do this. Rather you will submit jobs to the cluster using a queuing system. We will use Sun N1 Grid Engine (more on this later). On Saratoga you should be able to access emacs, kate and vi/vim to edit your files.
Example Programscp /tmp/COMP4300/lab1.tar . tar -xvf lab1.tar mpiexample1.cThis program is just to get started. It looks like: #include Note there are 3 basic requirements for ALL MPI codes
#include "mpi.h"
MPI_Init( &argc, &argv );
MPI_Finalize();
You can find the header file in
/opt/cluster/mpich_127/include/mpi.h. Take a look at it.
It provides the definition of MPI_COMM_WORLD - what integer
value does this take?
MPI_Init and MPI_Finalize should be the first and
last executable statements in your code .... basically because it is not clear
what happens before or after calls to these functions!! "man
MPI_Init" says: The MPI standard does not say what a program can do before an MPI_INIT or after an MPI_FINALIZE. In the MPICH implementation, you should do as little as possible. In particular, avoid anything that changes the external state of the program, such as opening files, reading standard input or writing to standard output. If you want to know what an MPI function does you can:
Compile the code make mpiexample1This will result in /opt/cluster/mpich_127/bin/mpicc -c mpiexample1.c /opt/cluster/mpich_127/bin/mpicc -o mpiexample1 mpiexample1.ompicc is a wrapper that will end up calling a standard C compiler (in this case gcc). (Do mpicc -v mpiexample1.c to see all the details!). mpicc also ensures that the program links with the mpi library. Run the code interactively by typing mpiexample1You should find the executable runs but using just one process. With some MPI implementations the code will fail because you have not defined the number of processes to be used. Using MPICH this is done using the command mpirun. Try running the code interactively again but this time by typing mpirun -np 2 mpiexample1Now try mpirun -np 6 mpiexample1(Don't set -np to anything over 10). If you run this program enough times you may see that the order in which the output appears changes. Output to stdout is line buffered, but beyond that can appear in any order. mpirun has a host of different options. Do "man mpirun" for information. The "-np" refers to the number of processes that you wish to spawn. Now we will run the same job, but using the batch queuing system. To submit a job to the queuing system we have to write batch script. An example of this is given in file batch_job. Take a look at this. Lines starting with "#$" are commands to the queuing system, informing it of how much resources you require and how your job should be executed. We use one of these lines to set the number of processors you want to use. After all this setup information you run the job by issuing the mpirun command, but taking the number of processes from the number of processors allocated by the queuing system. To submit your job to the queuing system do qsub batch_jobit will respond with something like
qsub batch_job
Your job 84 ("mpich_job") has been submitted
where 84 is the id of the job in the queuing system. To see
what is happening on the batch queue do
c43tut@saratoga:~/lab1> qstat job-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------- 86 0.55500 mpich_job c43tut r 03/07/2009 10:46:45 all.q@node04 8 87 0.55500 mpich_job c43tut qw 03/07/2009 10:46:37this shows two jobs, one running (status r) and one waiting. The running job is using 8 slots. To delete a job from the queue, do qdel 86 c43tut has deleted job 86 The queuing system has a total of 14 executing slots (this is determined at setup, and is set to 14 because we have 7 nodes each with a dual core processor). If you submit a job that uses all 14 slots, no other user will be able to run until your job finishes. In this respect the system is said to be space sharing, rather than time sharing. Make sure you are happy with all of the above.
Exercise 1Modify the code in mpiexample1 to also printout the name of the node each process is executing on. Do this by using the system call gethostname(name, sizeof(name));
Exercise 2Throughout the course we will be measuring the elapsed time taken to run our parallel jobs. So we start by assessing how good our various timing routines are.
Exercise 3
Exercise 4
Exercise 5This is a tricky question! mpiexample4.c uses a binary tree to perform a basic broadcast.
|
||||||||||||||||||||||||||||||||||||||||||||||||||
|
Please direct all enquiries to: Alistair.Rendell@anu.edu.au Page authorised by: Head of Department, DCS |
| The Australian National University — CRICOS Provider Number 00120C |