COMP3320/6464 2008: Laboratory 1
Performance Benchmarking
The aim of this lab is to give you some experience of timing
code, performance measurement and benchmarking
Unless explicitly stated this lab will be using machine IWAKI
First login to Iwaki. Then creat a directory and copying all the
relevant programs into that directory:
ssh iwaki
mkdir lab1
cd lab1
cp -r /dept/dcs/comp3320/public/lab1/* .
Lab Assessment
As mentioned in the course assessment, labs are not marked, but the
content is examinable. I will place model answers on the lab web page
in a week or so time.
If you copied the files as above you will have a template answer file
called:
COMP3320_LAB1.txt
Where relevant write answers in plain text in this file.
For the COMP6464 students there are a few additional
questions.
Timer Resolution: Elapsed Time
Objective - to measure the overhead and resolution of two elapsed timers
For benchmarking and performance measuring it is important to have
some knowledge of the resolution and overhead of your timing
routine. As mentioned in lectures we can obtain this information by
making repeated calls to the timer routine. Program
walltime.c does this for the C function
gettimeofday.
Inspect the code and make sure you understand what it is doing. Then
compile the code by issuing the command
make walltime
- What is your estimate for the resolution (Tr) and
overhead (Tc) of gettimeofday on Iwaki.
(You may need to run the code a few times).
- Now repeat the above measurements but on your desktop machine.
You will need to compile the code using gcc.
On Iwkai another routine measuring elapsed time is available in the library
libunknown.a. It is called
elapsetime and has the
following prototype:
void elapsetime(int* isec, int* iusec)
where argument
isec returns the current second component
of the elapsed time while argument
iusec returns the
current microsecond (10
-6) component (similar to
gettimeofday).
- On Iwaki copy walltime.c to mywalltime.c and modify it
to determine the overhead and resolution
of elapsetime. What are the values of (Tr) and
overhead (Tc) for this routine?
(Note the makefile will compile and link your
file mywalltime.c with
libunknown.a correctly).
- Keep a copy of your code as mywalltime.c.
CPU timing and Scaling Properties
Objective - to measure CPU time and appreciate computational scaling
Program
linear.c sets up a system of linear equations and
solves them using Gaussian elimination. That is we are solving a set
of equations of the form
Ax = b
where
A is a square matrix of dimension
n and
b is a vector of length
n. The elements of both
A and
b are known, and the aim is to determine
the elements of
x. This sort of problem is very common in
computational science, and forms the basis
of the metric that is used to rank the
top500 most powerful
supercomputers. For the purpose of this lab it is not
important for you to understand the details of Gaussian elimination,
just to appreciate that the problem to be solved depends in some manner
on the input dimension
n.
On Iwaki compile linear.c by typing make linear. Run the code
by giving the problem size as a command line argument. Start with a
small value of n, say n=10
linear 10
Run the code for a few different input dimensions, e.g. 10, 100, 200.
You will see that the code reports the CPU, SYS and CPU+SYS time.
- What units are the CPU, SYS and CPU+SYS time reported in?
(You should make an educated guess based on the sort of time the
program takes to run - eg does it seem to take hours to run!
Then look at the code and read the relevant man pages).
- Briefly state what you understand to be the meaning of CPU time
and SYS time.
- We can time programs using the Unix time utility (do
man -s1 time for details). Use this utility and complete
the following table comparing the results from time with
the CPU+SYS time. Run 3 experiments for each data size,
or more if your results are inconsistent between runs. (You might
want to consider doing something like "repeat 7 time linear 128 |
egrep '(CPU\+SYS|%)' ")
-------------------------------------------------------------------
--------------Input Dimension----------
80 80 80 160 160 160 320 320 320
-------------------------------------------------------------------
via code
CPU+SYS
via time
real
user
sys
-------------------------------------------------------------------
--------------Input Dimension----------
640 640 640 1280 1280 1280
-------------------------------------------------------------------
via code
CPU+SYS
via time
real
user
sys
-------------------------------------------------------------------
- Why are the results from CPU+SYS different from the
time results (you may need to inspect the code to confirm
your thoughts).
- Using the above data, plus perhaps additional timing runs, deduce
how the computational cost of linear varies with the input
dimension n. You must include experimental justification
for your answer. (We are looking for an expression of the form
O(nk) where k is an integer).
Code Profiling
Objective - to understand how to profile code and analyze the results
Program
linear.c contains the following routines:
void main
void iprtmatrix
void prtmatrix
void gauss_elim
void largest
void elim
void pivot
void subst
double cpu_time
In addition the executable will make use of some system libraries,
e.g.
drand48 is used to provide pseudo random numbers. We
wish now to determine what routine(s) are consuming most of the CPU
time. We will do this using
gprof
Compile linear.c with profiling enabled. This is done using
the -xpg compiler option. The Makefile has been
modified to easily do this by typing
make linear_prof
Now run the profile enabled version of the code, e.g.
linear_prof 400
if you look in your directory you will find that a file
gmon.out has been created. Analyze this by typing
gprof linear_prof
It will produce a lot of output that you may want to redirect into a
file (e.g. to put output in a file called
file.profile
gprof linear_prof >& file.profile
Alternatively you can limit the size of the profile via the
-n option to
gprof:
gprof -n10 linear_prof
Profile the code for
n=800
- Which routine is the dominant routine and what fraction of the
total CPU is it taking?
- How many times are the routines elim, pivot and
largest called?
- Use the profile
output to construct a calling graph for those routines contained
in linear.c, i.e. show how each routine is
called and approximately how much time is spent in a given routine
when it is called from the parent routine. You should
verify your calling graph by inspecting the code.
We will now profile the code using basic block profiling with
tcov. This will tell us how many times different lines of
code are actually executed. To profile the code in this manner we
must compile with the option xprofile=tcov. The
Makefile has been conveniently modified to do this, type:-
make linear_tcov
Now run the tcov enabled version of the code, e.g.
linear_tcov 400
if you look in your directory you will find that you now have a directory
linear_tcov.profile. To obtain the coverage information we
must provide tcov with the name of this directory and the relevant
source code file, e.g.
tcov -x linear_tcov.profile linear.c
(do
man tcov for more information). This produces an output
file
linear.c.tcov.
Use tcov profiling for n=400
- Which line of the code is executed the most times (cut out the
source code and report the number of times this line is executed).
- Is the location of this line consistent with your gprof profiles?
- How does the number of times this line is executed change as you
vary the problem size? (NOTE, between tcov runs you should remove
any tcov data files that relate to the previous run - else your
results will be a combination of all previous tcov runs).
Is this consistent with your earlier
conclusions concerning the overall scaling of the code. Is it also
consistent with the loop structure around this line? (You may want
to run tcov profiling for a variety of problem sizes).
- What other lines in the code have significant execution counts and
how do they scale with the input problem size?
- How many basic blocks does the code contain, and how many of
these were actually executed?
- tcov is a Sun/Solaris utility. How do you perform equivalent
profiling on linux with gcc compilers?
Finally, Sun provides a rather fancy profiling tool called collector
and analyzer. This will provide profiling similar to gprof but without
having to recompile the code. Unfortunately, the graphical interface
is not installed on the student system - but using another machine I
will demonstrate this to you in labs. We can, however, still use
this profiler via its command line interface. As a simple demonstration type
collect linear 400
this will create a profile directory called
test.1.er or
similar (numbers increment each time). To analyze the profile you can
type
er_print -function test.1.er
this will print the top routines and the time spent in each.
- Read the man page for er_print. How would you use
er_print to
obtain details on which routine called which, ie a calling graph
similar to that from gprof? (Try this to make sure that it works).
Analyzer Demo
At this point I will show you use of the analyzer tool. This is a GUI
to view the results of using collect. It is on a different machine.
For COMP6464/Honours Students (and others if desired) : Machine
Information
Objective - find information out about the hardware we are using!
In lectures we have discussed RISC hardware. Often exactly which
algorithm is used in a high performance computing application will
depend on the exact configuration of the platform being used. For both
iwaki and partch
- What is the name of the machine you are considering?
- What Operating system and what version of it is running. How did
you tell?
- What speed processor (MHZ) is on this machine
and how did you determine this information?
- How much memory is on this machine and how did you determine this?
- How many CPUs does it have and how did you determine this?
- What is size of the level 1 data cache and level 2 cache (also
known as ecache or external cache) and how did you gain this information?
You will find the following:-
uname,
dmesg,
fpversion,
cpuinfo,
meminfo and
prtconf to be useful starting points!