COMP4300/6430 2009: Laboratory 6
Unix System V IPC and MPI-2 RMA
The aim of this lab is to give you a basic introduction
to use of Unix System V Interprocedural Communication (IPC) operations
and also to MPI-2 Remote Memory Access.
All the code for todays lab is available on Fremont
in /tmp/COMP4300. Copy this and untar it as follows:
cp /tmp/COMP4300/lab6.tar .
tar -xvf lab6.tar
Type make to build all the executables. For the purpose of todays lab
you should run all tests interactively.
There are four parts to this lab
- Part 1: concerns use of shared memory segments
- Part 2: concerns use of semaphore arrays
- Part 3: involves using both the above to
parallelise the heat distribution problem.
- Part 4: concerns use of MPI-2 RMA with the
active window model.
PART1: Unix System V Shared Memory Segments
Program
shmem_sum.c computes the sum of
integers from 0 to MAXSUM (ie 0+1+2+3+4....MAXSUM). The calculation is
first performed sequentially so we know the correct result. The
calculation is then performed in parallel using:
- fork to spawn multiple processes
- shared memory segments (shmget) to hold partial sums and
to communicate those values back to the master process
- waitpid to enforce synchronization at the termination of a
child process
Currently the code is computing the entire sum on each process. Start
by running the code and making sure that you understand how it works.
Shared memory segments allow two or more processes to share a given
region of memory. A process creates a shared memory segment using
shmget(). Processes with proper permission can perform various control
functions on the shared memory segment using shmctl(). Once created, a
shared segment can be attached to a process address space using shmat(). It
can be detached using shmdt(). The attaching process must have the
appropriate permissions for shmat(). Once attached, the process can
read or write to the segment, as allowed by the permission requested
in the attach operation. A shared memory segment is described by a
control structure with a unique ID that points to an area of physical
memory. The identifier of the segment is called the shmid.
In some ways allocating a shared memory segment is like malloc'ing a
region of memory. However in contrast to malloc, the shared memory
segment is maintained by the kernel and unless you specifically
removed it, it will persist after your process dies. Not surprisingly
this causes some problems for system administrators! The command
ipcs can be used to determine what InterProcess Communication
(IPC) resources you have
allocated.
- Use ipcs -m to examine whether you have shared memory
segments remaining on the system.
- Use ipcrm -m "shmid" to remove them (where
"shmid" is the id number printed out by the ipcs
-m command).
In
shmem_sum.c the shared memory segment is removed by the
master process at the very end of the program in the
shmctl
call. If you comment out this
line and run the code, then do an
ipcs -m you will see that
you have a remaining shared memory segment. You will now need to
use
ipcrm -m "shmid" to remove it!
- Modify shmem_sum.c such that partial sums are evaluated by each
process and the results are summed together by the master thread
after the children have terminated. Have the master process
printout the maximum partial sum and iam value of the process that
evaluated the maximum partial sum.
PART2: Unix System V Semaphores
Like shared memory segments Unix provides semaphores that are created by
the kernel, can be manipulated by a variety of different processes,
and persist after process termination unless they have been explicitly
removed. (You can again use the ipcrm command with the -s option.)
To create a semaphore we use semget. This allows the user to
create an array of semaphores, and then subsequently perform
operations on any one of these semaphores. At the point of creation it
is possible to request the O/S to assign a unique ID to the semaphore
array, or provide a specific ID. Other processes wishing to manipulate
the same semaphore array use the same semget system call
providing the ID value.
Initialization, removal and a number of other control operations on
the semaphore set are performed using semctl. While
individual semaphore operations are performed using
semop. The latter requires the ID of the semaphore involved, the
number of operations to be performed (i.e. what number of semaphores
from the array of semaphores), and an array containing details
of those operations.
Program sem_init.c illustrates how to create and initialize a
semaphore array. Programs sema.c and semb.c
manipulate the semaphore onces created.
- Compile sem_init.c, sema.c and
semb.c (typing make should do this).
- Run sem_init.exe, this will create the
semaphore array if it has not already been created.
- Open another window on the fremont and make sure your current working
directory is the same in both windows.
- In one window run sema.exe
- In the other window run semb.exe
- Make sure you follow what is happening. You should also be aware
that everyone in the lab is manipulating the same semaphore array!
- Modify the various semaphore programs so that sem_init.c
is assigned a semaphore id by the O/S (use IPC_PRIVATE) and then have
this value taken as command line input to sema.c and
semb.c. At the same time modify the protection so that the
semaphore array can only be modified by you! (This will prevent your
use of the semaphore impact on other peoples use of the semaphore.)
- Consider modifying this program so that sem_init.exe
starts, then after creating the semaphore array it forks two processes
one of which executes sema.exe using exec and
passing the semaphore ID value as an argument, and the other executes
semb.exe. (This is less important than doing some MPI-2
exercises)
Part3: Combining Fork, Shared Memory and Semaphores
- Implement a version of heat2.c that uses fork to create a number
of processes, shared memory segments to store the grid data points,
and semaphores to coordinate the processes. (Your process creation
should happen just once, before the main iterative update loop. But
after the iterations are complete the parent should wait for the child
processes to terminate.)
Part4: MPI-2 Remote Memory Access (RMA)
As discussed in lectures one of the new features of MPI-2 is the
capability to perform remote memory access (RMA), or one-sided message
passing. MPI-2 provides two models of RMA:
- Active target synchronization
- Passive target synchronization
Both cases require the MPI processes involved to define a "memory
window" that is refered to as an MPI_Win object. Data is then fetched
from, written to, or accumulated into remote windows specifying
relevant data offsets. The two models differ in that active target RMA
requires that all processors collectively call MPI_Win_fence before
any of the RMA operations are complete. In passive target RMA no such
requirement is imposed. In this respect active target RMA is somewhat
like non-blocking message passing (MPI_Isend/Irecv) where
MPI_Win_fence equates to an MPI_Wait and we no longer need
to specify explicitly either the Isend or Irecv buffer (depending on
whether we wish to
do RMA get or put operations). On the otherhand passive targets are
much more similar to shmem operations on Cray T3D/E systems or having
multiple processes access a shared memory segment on an SMP system. In
this lab we will investigate active target RMA operations. We will
discuss passive targets in lectures.
Program mpi2.c contains a very simple example of active target
RMA operations. Every processor allocates a send and receive
buffer. The send buffer is initiated to the rank of the process, while
the receive buffer is initiated to -99. Each process creates a memory
window using its receive buffer, thereby permitting RMA operations on
this address range. This is done using the call
MPI_Win_create(rbuf, len*sizeof(int), sizeof(int),
MPI_INFO_NULL, MPI_COMM_WORLD, &win);
where
- rbuf is the starting address of the memory window
- len*sizeof(int) is the total number of bytes
- sizeof(int) is the size of the data units used to index elements in
this window
- MPI_INFO_NULL can be used to enhance performance (but we dont care
about it here)
- MPI_COMM_WORLD is the communicator space overwhich the window is
defined
- win is a return argument used to reference the MPI_Win object
A remote put (or get) operation takes the form of
MPI_Put(sbuf, len, MPI_INT, dest, 0, len, MPI_INT, win);
- sbuf, len, MPI_INT is the starting address, number of
data items and
type of the data item that will be put (or fetched into if get). It
is identical to the first 3 arguments in a send or recv call
- dest is the process number that we are sending the data
to (getting it from)
- 0, len, MPI_INT is the byte offset within the remote
window (0 here
for first element in that window), the number of data items we will
put (or get), and the type of the data item.
- win is the MPI_Win object that we are doing the RMA
operation on (ie
you could have multiple MPI_Win objects).
Completion of the RMA operation is enforced by:
MPI_Win_fence(0, win)
- 0 is an assert parameter, and when non zero corresponds
to options
like stating that this particular MPI_Win will only be read from, or
to block RMA operations to the local window. We just keep it as 0.
- win is the MPI_Win object that the fence refers to
finally MPI_Win objects are allocated objects, hence it is wise to
free them when no longer required
MPI_Win_free(win);
- Reprogram heat2.c using RMA operations in place of the existing
sends and receives