CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science (CECS)
Department of Computer Science
Printer Friendly Version of this Document
High Performance Scientific Computing COMP4300

COMP4300/6430: Laboratory 5

Shared Memory Programming with Pthreads

This lab will build on the "Shared Memory: Programming" lecture material, with a focus on Pthreads. You are provided with a variety of pthread codes that are all broken to various degrees. Your aim is to identify and fix the problems. A tar file for this lab is available on Fremont in /tmp/COMP4300. Copy this and untar it as follows:
cp /tmp/COMP4300/lab5.tar .
tar -xvf lab5.tar
Type make to build all the executables. For the purpose of todays lab you should run all tests interactively.

Sometimes My Code Doesn't Work!


  • Program die.c illustrates how to create threads using the pthread library. The program creates 4 threads, has each thread evaluate a series 1/n from n=1 to n=MAX_INT, and then prints out the result from each thread. Most times on fremont you will find you get no results printed, although it could print all results if you are lucky. Identify and fix the error.

  • Program hello.c illustrates the difference between global and local data, and how to pass arguments from the main thread to the child threads. The program is supposed to create 8 threads and have each thread print out a unique "Hello World" message, where the exact message is determined by the thread id. However, something is wrong and all threads are printing the same message. Identify and fix the error.

  • Program killed.c is part of a larger program that has been dying under mysterious circumstances. You have pruned the larger code back to this small block of code that undergoes similar problems. You find that if the code is built with ARRAY_SIZE of 100000 everything works well, but if you double the size of this quantity to 200000 the code dies with a message "Segmentation Fault". Determine what the problem is and fix it.

  • Program sum.c creates a single shared vector where each element is initialised according to its index, i.e. a(i)=i. You wish to sum the elements of this array using 8 threads. You partition this work so that each thread sums a unique portion of the array. For a vector of length N (and starting from 0) the value of this sum should be N*(N-1)/2. You find that your code is producing erratic results. You can demonstrate this by running the executable sum.exe several times and looking for the "ERROR" string to appear in the output. To show this, either using csh issue the command
    repeat 10000 sum.exe | grep ERROR
    or using bash do something like
    for ((i=0; i<10000; i++)) ; do sum.exe; done | grep ERROR
    This will execute sum.exe 10000 times and the number of lines printed will show how often the code fails. (You should note that this is not very often - illustrating that bugs like this are not easy to observe). Identify the problem and fix it.

  • (There's a bug in my Pi): Code pi.c computes Pi using numerical integration. The program requires two integers as input:
    • m the number of threads
    • n the number of integration points
    these are given on the command line. Run the code interactively with the following input
    > pi.exe 1 10000
    
    The calculation should complete very quickly. Now run the code with the following inputs:
    > pi.exe 2 10000000
    > pi.exe 4 10000000
    > pi.exe 8 10000000
    > pi.exe 2 10000
    > pi.exe 4 10000
    > pi.exe 1 100
    > pi.exe 2 100
    
    You should find that in some cases the code runs, but for other cases it deadlocks. If the code fails to give an answer in 1 second then use "cntrl-c" to kill the job. Whether the code runs or deadlocks is not reproducible.

    Note on Condition Variables

    Each condition variable must be associated with a specific mutex, and with a predicate condition. When a thread waits on a condition variable it must always have the associated mutex locked. The condition variable wait operation will unlock the mutex for your before blocking the thread, and will relock the mutex before returning to your code.

    It is important that you test the predicate after locking the appropriate mutex and before waiting on the condition variable. If a thread signals or broadcasts a condition variable while no threads are waiting, nothing happens. If some thread calls pthread_con_wait right after that, it will keep waiting regardless of the fact that the condition variable was just signaled - which means it waits for a long time. (Sound familiar?)

    It is equally important that you test the predicate again when the thread wakes up. There are various reasons for this, eg:

    • Intercepted wakeups: threads are asynchronous. Waking up from a condition variable wait involves locking the associated mutex. But what if another thread acquires the mutex first?
    • Loose predicates: it is often easier to have lose predicates, eg there may be work available - so wake everyone up.
    • Spurious wakeups: on multiprocessor machines making condition wakeup completely predictable may substantially slow all condition variables operations.
    This good programming practice explains the bit of code in pi.c:
      do {
          	pthread_cond_wait(&(mybarrier->barrier_cond), 
                              &(mybarrier->barrier_mutex));
         } while (mybarrier->cur_count!=0);
    
    • Why does the code sometimes deadlock? How could your code be changed to avoid potential deadlock while still conforming to the good programming practice mentioned above?
    • Implement your proposed code changes and verify that they work.

References