CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science
School of Computer Science
Printer Friendly Version of this Document

UniSAFE

COMP8320 Laboratory 3 - week 3, 2011

Advanced Topics in OpenMP


This laboratory session will give you hands-on experience using two advanced features of the OpenMP API: Nested parallelism and tasks.

Basic Setup

Copy the files for the session into your home directory area:

    cp -r /dept/dcs/comp8320/public/lab03 .
and then cd lab03

Nested Parallelism

Background briefing material for this portion of the lab is in the form of slides here.

Basic Nesting

Compile the sample program example1-Basic.c:

    /opt/sunstudio12.1/prod/bin/cc -fast -xopenmp example1-Basic.c -o example1-Basic

Run the program as-is. What does it do? Look at the source code: Do you understand how the program is nesting parallel regions?

Make a copy of this program and play with it. Adjust the number of threads in some of the regions, then recompile and re-run the program. How does it behave? Now try adding the function call

    omp_set_nested(1);
recompile and run again. Try to get at least 20 threads running at Level 3.

Add an additional level of parallelism to a copy of example1-Basic.c, and compile and run it.

Next, examine the source of example2-FlexLevels.c. Compile it using the command

    /opt/sunstudio12.1/prod/bin/cc -fast -xopenmp example2-FlexLevels.c -o example2-FlexLevels

Run the executable. We will now examine the effects of the Sun OpenMP environment variables OMP_MAX_ACTIVE_LEVELS and OMP_THREAD_LIMIT (the notes refer to the equivalent vendor-specific variables SUNW_MP_MAX_NESTED_LEVELS and SUNW_MP_MAX_POOL_THREADS). Before doing this, retrieve and record their default settings:

    echo $OMP_MAX_ACTIVE_LEVELS
    echo $OMP_THREAD_LIMIT

Now, play around with the environment variables OMP_MAX_ACTIVE_LEVELS (e.g. export OMP_MAX_ACTIVE_LEVELS=2) and OMP_THREAD_LIMIT, restricting their values to small quantities that will allow fewer levels of parallelism or fewer threads than the example program wants by default. What is the relationship between these variables and the number of levels where there is more than 1 thread?

Restore the default values corresponding to these environment variables by unsetting them:

    unset OMP_THREAD_LIMIT OMP_MAX_ACTIVE_LEVELS

Now, copy example2-FlexLevels.c into a new file, say example2a-FlexLevels.c and play with some OpenMP library calls in this file. Compile it with

    /opt/sunstudio12.1/prod/bin/cc -fast -xopenmp example2a-FlexLevels.c -o example2a-FlexLevels
Add library calls to set/get (and print) the maximum number of threads, and activate dynamic scheduling and nesting. Satisfy yourself these library routines behave the way outlined in the brief. (you might find omp_set_num_threads() ineffective; in this case use export OMP_NUM_THREADS=8 etc),

Finally, there are new routines in the OpenMP 3.0 that allow further queries of a thread's place in the nest environment. You can see examples of these calls in example4-OpenMP3.0-LibCalls.c. Compile and run this program (after export OMP_NUM_THREADS=4).

Adding Nesting to a Nontrivial Example

Examine the matrix-matrix multiplication program example5-MatMatMul.c. You will note that it contains one level of parallelism, but quite a lot of work to be done. You will also notice that this program determines the number of OpenMP threads it uses from the environment variable OMP_NUM_THREADS.

Compile this program by executing the command

    /opt/sunstudio12.1/prod/bin/cc -fast -xopenmp example5-MatMatMul.c -o example5-MatMatMul

Study its scaling behavior by timing it on 1, 2, 4, 8, 16, 24, and 32 cores. This will require that you set OMP_NUM_THREADS accordingly. If you are particularly ambitious, perform multiple time trials to get some feel for the uncertainties in your timing results. As computer scientists, you should know how to automate this process!;-) What sort of scaling behavior do you see? Plot the speedup versus the single thread time, and if you have multiple time trials, include vertical error bars in your results.

Armed with these results as a baseline, investigate the efficacy of adding a second level of parallelism in the main loop. You should be able to figure out how to do this on your own. But, if you are genuinely stymied, check out the program example6-MatMatMul-Nested.c (warning: it shows the idea for nested parallelism but in has a deliberate a bug in the variable scoping).

Run your nested code on thread counts that are perfect squares and compare with the performance results from your single-level code. If you have automated multiple timing trials for the single-level example, try this out in this situation as well. What do you see? How do the results differ? Are they statistically significant? That is, are the differences greater than the span of the error bars?

Try using OpenMP's facilities for setting the number of threads within levels and repeat these timing experiments. Also, try increasing the sizes of the matrix dimensions (the variables l, m, and n in main(), and see how large you need to make them to reduce the differences seen by single-versus-dual level parallelism to a statistically insignificant level (i.e., the error bars overlap). Can you even do this?

Finally, can this algorithm be re-implemented using OpenMP 3.0 tasks?

Parallel Quicksort

The program qsort.c uses several advanced OpenMP features, including tasks. Study the file. Compile it using:
    /opt/sunstudio12.1/prod/bin/cc -xopenmp -xO3 qsort.c -o qsort
The number of tasks can be controlled from the command line. Inspect the code to see how this is performed. Try running the program:
    ./qsort 1000000 p 1
where p=1,2,4,8,.... What is its scaling behavior? Try reducing the array size (the first parameter), say by a factor of 10 and 100. How does it scale now? What does this tell you asbout the creation overhead of the task construct?

Finally, remove the firstprivate(p,q) clauses in par_quick_sort. Recompile and run the program again. What does this tell you about the clause you removed?

Copyright | Disclaimer | Privacy | Contact ANU