COMP2310/COMP6310:
Concurrent and Distributed Systems
Semester 2 2012
COMP2310 Assignment 2: Concurrent and Distributed QuickSort
Deadline: 17:00 on
Friday 26 October
(Please report any errors, omissions and ambiguities
to the course lecturer)
(Amendments to this document after its release
will be marked in
blue)
This assignment is worth 18% of your total course mark. It will be
marked out of 36.
Submission
This assignment must be submitted electronically. This can be done
using the following commands (according to your enrollment):
submit comp2310 ass2 distquicklib.c report.txt
submit comp6310 ass2 distquicklib.c report.txt
Extensions
See the
course Assessment page.
| How Late | Penalty from 36 Marks |
|---|
| less than 1 hour | -2 |
| between 1 and 6 hours | -4 |
| between 6 and 24 hours (1 day) | -8 |
| between 24 and 48 hours (2 days) | -16 |
| between 48 and 72 hours (3 days) | -32 |
| more than 72 hours (4 days) | -64 (forget it!)
|
Objectives
- To implement a concurrent algorithm using three Posix-based concurrency
architectures: pipes, sockets, and threads.
- To gain experience in Unix system calls and libraries related towards
communication and concurrency, including using appropriate
defensive programming methods.
- To explore issues of concurrency and parallelism on a modern
multicore processor.
Background
Many algorithms requiring substantial computational resources have a
tree-like structure (i.e. use a divide-and -conquer strategy). These can
normally adapted to a distributed computation, and especially the case
if the tree can be kept balanced (i.e. use p=2k
processors by assigning p/2 processes to the left sub-tree and
p/2 to the right sub-tree). quicksort is an
example of this.
In this assignment you will implement various forms of a distributed
quicksort using various IPC and shared memory features of Unix and the
C programming language.
The figure below depicts the structure of a distributed quicksort with
p = 8 processes. The tree indicates the process structure,
beginning with process 0 at the top-level (d = 0).
The process partitions the array into (hopefully balanced) left and right
segments. This means that a call to the
partition() function has performed the appropriate
swapping of array elements, so that all
elements in the left partition are ≤ all elements in the right
partition, in which case the sorting can proceed on both the
left and right partitions ndependently.
Then,
instead of spawning 2 child processes, it only spawns 1 for the right
sub-tree, and then it processes the left sub-tree itself. This continues
recursively for k = log2(p) levels, at which point
p processes have been created.
These processes sort their respective segment of the array.
Then, the tree structure is traversed
upwards, with the respective process at each node then copying the array
segment from its right child into its own segment.
In this step the right child may have to explicitly communicate
its array segment; then that process is free to terminate. As the tree
traversal returns to level d = 0, process 0 will have
gathered the whole sorted array. The following diagram illustrates
the process with N=16 and p=8.
Each process must initially be given the left land
right r indices for its initial array segment. Let's say it
is at level d. As it descends a level, the partition step
splits the interval [l,r] into [l,m]
and [m+1,r] (note that the split in general will not be
exactly even, as shown above). Thus at level d+1, the left
index remains constant for that process, but the right changes. The
right indices can be recorded in an array
structure, r[0..k-1]. When it later re-ascends to
level d, it can determine the starting index of the
segment for the child on
the right using r[d+1]+1.
Setup
The file ass2.zip
(modified 16/10)
contains template files for
the files that you will submit. It also contains a library of provided
functions in quicklib.h and quicklib.c. It also
contains a Makefile which should be used to compile the
programs (note the use of the -Wall and -lmcheck).
Copy these files into your local directory area before you begin work
on your programs. You can compile the system with the simple command
make. The synopsis of the test program program is:
quicksort [-p|-s|-t] [-v v] [-d]
n [p [s]]
where -p selects the sort using pipes (the default),
-s selects the sort using sockets and -t
selects the sort using (Posix) threads. n is the array length.
If -d is selected,
printing of the array is turned on -
for debugging purposes. p is the number of processes or
threads to be used in the sort (default value is 1); you may assume that
p is a power of 2. s is a random number seed
(default value 0) used for generating array values.
The use of the -v v option will be explained below; it is
only needed when -t is specified.
Requirements
Your code will be tested on a 4-core Linux machine like those
in the CS student labs, so make sure it works on these machines!
Your programs should be written with good programming style (and so
should not produce any warnings when compiled with the -Wall
option!). I.e. it should be well commented, so as to enable the reader
(this includes the tutor/marker!) to understand how it works.
Identifiers should be meaningful and (unless their role is trivial)
commented where declared, any constants should be named rather than
hard-coded, and defensive programming should be used. The latter
includes checks being made for system calls that can fail due to
resource limitations; once detected, a message should be generated
with a call to the system error function perror() and the
process should exit with a status of 2. It also includes using
assert() to perform other kinds of checks, and the
freeing of any malloced data before (normal) exit
(this permits the mcheck library to detect any buffer
overwrites). All file descriptors should be closed before the process exits.
You must not change the interfaces to the sort functions in
the template distquicklib.c; while you are free
to modify if needed the given quicksort.c for debugging purposes,
your sort functions must work from the standard version of
quicksort.c.
Your program must compile on the CS student system
with the standard (un-modified) makefile that is provided.
Your sort functions should be robust for array sizes up to (say)
10 million and 32 processes / threads.
You are required to submit a file called report.txt.
This must be a plain ASCII text file.
Both C code and report must be formatted to a standard width
of 80 characters.
All submitted files must have a preamble making a declaration
about the work you have submitted. The provided template files come with
such a preamble. You must add your name and student number where
indicated. If you need to modify the declaration, e.g. you did in fact
receive substantial assistance from another person who is not course staff,
you should modify the text to say so. Code obtained from the internet
will receive no marks even if acknowledged.
Quick Sort using Forks and Pipes (worth 40% of the assignment mark)
In the function quickPipe() in distquicklib.c,
implement a distributed quicksort using forks to spawn (child)
processes and pipes to communicate the (sorted) array segments.
The distributed quicksort is then performed, which involves the (direct
or indirect) forking of a further p-1 processes.
The array segments should be transferred between processes using Unix
pipes. Note that the semantics of fork() can be utilized to
reduce the amount of explicit data transfer. The provided external
function m = partition(int *A, int n)
can be for the partition step, and
quickSort(int A[], int n) can be used to sort a
segment of the array (where A points to the start of the
segment and n is its length).
For full marks, no process should contain more than k copies of
pipe descriptor pairs, and processes should close any pipe as soon as no
longer needed. Child processes should exit when their participation in
the sort is finished, and their parent should wait on them to do this.
Debugging: the use of printf("%d:...\n", i, ...)
statements, where i is the process or thread id, at strategic
points, is recommended. printArray() can be used to print out
array segments. Remember, you may need to fflush(stdout)
if the process or thread is soon to exit.
Noting that your code should not produce any
output directly; you should remove these and any other print statements
before submitting. Running the program under strace -f may
also be helpful for debugging.
Quick Sort using Sockets (worth 15% of the assignment mark)
In the function quickSocket(), create a variant of the above
function which uses sockets instead of pipes to transfer data, but is
otherwise identical.
Shared Memory Quick Sort using Posix Threads (worth 30% of the
assignment mark)
In quickThread(), create a variant of the above function which
creates Posix threads instead of processes, and uses shared memory to
communicate data. You must use the thread structure as indicated in
the diagram above for your solution to be considered valid
(although you may use a different scheme to number the threads if you wish).
Note that the full solution requires various implementations of waiting
on child threads (see below).
Hints: it is recommended that you include the logical
thread id (as in the above diagram) in the parameters of the function
that the threads execute. It will be easier if you declare this
function with the exact type that pthread_create() requires
(returns void * with a single void * parameter). You
will probably need to pass more than 1 integer value to the function;
in which case, copy them into a small array, and use the pointer to
the array as the parameter. The use of shared memory may call for a
substantially different algorithm structure! And remember, the easiest
ways to the threads to access data they may need to share (e.g. the
array length and the pointer to array itself) is through global
variables (which should be declared static).
Issue: by default, quicksort is compiled with
-lmcheck (linking in the memory checking library). This may
detect an error that is possibly not caused by your code
(`memory clobbered before allocated block' towards the
end of the sort). If you think that this is the case, modify
the Makefile accordingly (it contains the instructions)
use quicksort-nomc, the version of the test program compiled
without -l mcheck, instead.
Write-up for this Assignment (worth 15% of the assignment mark)
Try compiling and running your programs with large n on the
CS student system. The CS
student workstation are 4 core machines, so a parallelism
of p=4 is possible (partch has however only 2,
and is unsuitable for the experiments).
In the file report.txt, answer the following questions.
- You should have noticed
quickSocket() develop problems for large n,
at least for a naive use of sockets.
Determine the message size associated with this problem and record
the value of n which bis (just) big enough to expose this.
Given an explanation of what you think is the cause of this problem,
fix it in your code, and describe the fix in the report.
- Time each sort with p=4 and
n=8000000 (or thereabouts). Compare these times with
those for p=1.
For p=4 take say 5 trials and include the best one.
Cut-and-paste your timings into report.txt (include the
prompts and the commands used).
Which version is fastest (for p=4)? Explain whether this
what you expect, given your knowledge of the underlying mechanisms
used. The theoretical maximum speedup for the p=4 timings
over the p=1 timing is 4. Are you observing close to this
speedup? Explain why this might be the case (Hint:try
doubling (or more) n; does the speedup improve
significantly?)
- One possible idea for improving the performance of
quickThread() is that if the parent does not have to wait
for the child to exit (call pthread_join()), the computation can be
sped up. But the parent still has to wait for the child to finish the
sort! Build into quickThread() the following variant code paths:
- wait_on_join: use pthread_join() to
tell the parent that the child has finished (you have probably
implemented this already). This is the default; it can
is explicitly activated with
the command line option -v 0 .
- wait_on_mutexlock: use (an array of) Pthread mutexes
to tell the parent that the child has finished (i.e.
the child unlocks a lock when finished, which the parent
must acquire before it exits).
This is activated when the option -v 1 is specified.
- wait_on_memlock: like the above but using a normal
array (i.e. a value of 1 signifies the
child is yet to finish; a value of 0 signifies it
has finished and the parent may finish).
This is activated when when the option -v 2 is specified.
Hint: the array (pointer) should be a global variable declared
as something like static volatile char *p_memlocks;. In
quickThread(), use a call to malloc() to initialize the
pointer. You should give careful thought as to when the array elements
should be initialized (and similarly for wait_on_mutexlock).
- Briefly explain why the simpler wait_on_memlock scheme can
be used safely in this situation. Also explain what the C keyword
volatile means and why it is needed (try running the code
without it!).
- Time the three variants of quickThreads as per part 2,
(except use a smaller n=8000) and
record your results similarly. Which version is fastest, and did the
idea work?
- Time the three variants of quickThreads() as per part 5,
except now use p=32, i.e. there are now many more threads
than CPUs.
Compare your results to the above, and note and try to explain
any significant differences in the relative speeds.