CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science (CECS)
Department of Computer Science
Printer Friendly Version of this Document
High Performance Scientific Computing COMP4300

COMP4300/6430 2009: Laboratory 6

Unix System V IPC and MPI-2 RMA

The aim of this lab is to give you a basic introduction to use of Unix System V Interprocedural Communication (IPC) operations and also to MPI-2 Remote Memory Access.

All the code for todays lab is available on Fremont in /tmp/COMP4300. Copy this and untar it as follows:

cp /tmp/COMP4300/lab6.tar .
tar -xvf lab6.tar
Type make to build all the executables. For the purpose of todays lab you should run all tests interactively.

There are four parts to this lab

  • Part 1: concerns use of shared memory segments
  • Part 2: concerns use of semaphore arrays
  • Part 3: involves using both the above to parallelise the heat distribution problem.
  • Part 4: concerns use of MPI-2 RMA with the active window model.

PART1: Unix System V Shared Memory Segments


Program shmem_sum.c computes the sum of integers from 0 to MAXSUM (ie 0+1+2+3+4....MAXSUM). The calculation is first performed sequentially so we know the correct result. The calculation is then performed in parallel using:
  • fork to spawn multiple processes
  • shared memory segments (shmget) to hold partial sums and to communicate those values back to the master process
  • waitpid to enforce synchronization at the termination of a child process
Currently the code is computing the entire sum on each process. Start by running the code and making sure that you understand how it works.

Shared memory segments allow two or more processes to share a given region of memory. A process creates a shared memory segment using shmget(). Processes with proper permission can perform various control functions on the shared memory segment using shmctl(). Once created, a shared segment can be attached to a process address space using shmat(). It can be detached using shmdt(). The attaching process must have the appropriate permissions for shmat(). Once attached, the process can read or write to the segment, as allowed by the permission requested in the attach operation. A shared memory segment is described by a control structure with a unique ID that points to an area of physical memory. The identifier of the segment is called the shmid.

In some ways allocating a shared memory segment is like malloc'ing a region of memory. However in contrast to malloc, the shared memory segment is maintained by the kernel and unless you specifically removed it, it will persist after your process dies. Not surprisingly this causes some problems for system administrators! The command ipcs can be used to determine what InterProcess Communication (IPC) resources you have allocated.

  • Use ipcs -m to examine whether you have shared memory segments remaining on the system.
  • Use ipcrm -m "shmid" to remove them (where "shmid" is the id number printed out by the ipcs -m command).
In shmem_sum.c the shared memory segment is removed by the master process at the very end of the program in the shmctl call. If you comment out this line and run the code, then do an ipcs -m you will see that you have a remaining shared memory segment. You will now need to use ipcrm -m "shmid" to remove it!

  1. Modify shmem_sum.c such that partial sums are evaluated by each process and the results are summed together by the master thread after the children have terminated. Have the master process printout the maximum partial sum and iam value of the process that evaluated the maximum partial sum.

PART2: Unix System V Semaphores


Like shared memory segments Unix provides semaphores that are created by the kernel, can be manipulated by a variety of different processes, and persist after process termination unless they have been explicitly removed. (You can again use the ipcrm command with the -s option.)

To create a semaphore we use semget. This allows the user to create an array of semaphores, and then subsequently perform operations on any one of these semaphores. At the point of creation it is possible to request the O/S to assign a unique ID to the semaphore array, or provide a specific ID. Other processes wishing to manipulate the same semaphore array use the same semget system call providing the ID value.

Initialization, removal and a number of other control operations on the semaphore set are performed using semctl. While individual semaphore operations are performed using semop. The latter requires the ID of the semaphore involved, the number of operations to be performed (i.e. what number of semaphores from the array of semaphores), and an array containing details of those operations.

Program sem_init.c illustrates how to create and initialize a semaphore array. Programs sema.c and semb.c manipulate the semaphore onces created.

  • Compile sem_init.c, sema.c and semb.c (typing make should do this).
  • Run sem_init.exe, this will create the semaphore array if it has not already been created.
  • Open another window on the fremont and make sure your current working directory is the same in both windows.
  • In one window run sema.exe
  • In the other window run semb.exe
  • Make sure you follow what is happening. You should also be aware that everyone in the lab is manipulating the same semaphore array!
  1. Modify the various semaphore programs so that sem_init.c is assigned a semaphore id by the O/S (use IPC_PRIVATE) and then have this value taken as command line input to sema.c and semb.c. At the same time modify the protection so that the semaphore array can only be modified by you! (This will prevent your use of the semaphore impact on other peoples use of the semaphore.)
  2. Consider modifying this program so that sem_init.exe starts, then after creating the semaphore array it forks two processes one of which executes sema.exe using exec and passing the semaphore ID value as an argument, and the other executes semb.exe. (This is less important than doing some MPI-2 exercises)

Part3: Combining Fork, Shared Memory and Semaphores


  1. Implement a version of heat2.c that uses fork to create a number of processes, shared memory segments to store the grid data points, and semaphores to coordinate the processes. (Your process creation should happen just once, before the main iterative update loop. But after the iterations are complete the parent should wait for the child processes to terminate.)

Part4: MPI-2 Remote Memory Access (RMA)


As discussed in lectures one of the new features of MPI-2 is the capability to perform remote memory access (RMA), or one-sided message passing. MPI-2 provides two models of RMA:
  • Active target synchronization
  • Passive target synchronization
Both cases require the MPI processes involved to define a "memory window" that is refered to as an MPI_Win object. Data is then fetched from, written to, or accumulated into remote windows specifying relevant data offsets. The two models differ in that active target RMA requires that all processors collectively call MPI_Win_fence before any of the RMA operations are complete. In passive target RMA no such requirement is imposed. In this respect active target RMA is somewhat like non-blocking message passing (MPI_Isend/Irecv) where MPI_Win_fence equates to an MPI_Wait and we no longer need to specify explicitly either the Isend or Irecv buffer (depending on whether we wish to do RMA get or put operations). On the otherhand passive targets are much more similar to shmem operations on Cray T3D/E systems or having multiple processes access a shared memory segment on an SMP system. In this lab we will investigate active target RMA operations. We will discuss passive targets in lectures.

Program mpi2.c contains a very simple example of active target RMA operations. Every processor allocates a send and receive buffer. The send buffer is initiated to the rank of the process, while the receive buffer is initiated to -99. Each process creates a memory window using its receive buffer, thereby permitting RMA operations on this address range. This is done using the call

        MPI_Win_create(rbuf, len*sizeof(int), sizeof(int), 
                       MPI_INFO_NULL, MPI_COMM_WORLD, &win);
where
  • rbuf is the starting address of the memory window
  • len*sizeof(int) is the total number of bytes
  • sizeof(int) is the size of the data units used to index elements in this window
  • MPI_INFO_NULL can be used to enhance performance (but we dont care about it here)
  • MPI_COMM_WORLD is the communicator space overwhich the window is defined
  • win is a return argument used to reference the MPI_Win object
A remote put (or get) operation takes the form of
MPI_Put(sbuf, len, MPI_INT, dest, 0, len, MPI_INT, win);
  • sbuf, len, MPI_INT is the starting address, number of data items and type of the data item that will be put (or fetched into if get). It is identical to the first 3 arguments in a send or recv call
  • dest is the process number that we are sending the data to (getting it from)
  • 0, len, MPI_INT is the byte offset within the remote window (0 here for first element in that window), the number of data items we will put (or get), and the type of the data item.
  • win is the MPI_Win object that we are doing the RMA operation on (ie you could have multiple MPI_Win objects).
Completion of the RMA operation is enforced by:
        MPI_Win_fence(0, win)
  • 0 is an assert parameter, and when non zero corresponds to options like stating that this particular MPI_Win will only be read from, or to block RMA operations to the local window. We just keep it as 0.
  • win is the MPI_Win object that the fence refers to
finally MPI_Win objects are allocated objects, hence it is wise to free them when no longer required
        MPI_Win_free(win);
  1. Reprogram heat2.c using RMA operations in place of the existing sends and receives