Previous Honours Topics
These projects topics were offered in 1999. While these - or
similar - topics may not offered this year, they do serve as a
guide to the diverse range of topics that will be available.
If you wish you could work on a topic like one of these, then why
not write to the person who offered it. They might be persuaded to
let you pursue a similar project under their supervision this
year.
Analysis and prototyping of tools that enable organisations to
identify individuals on the basis of some anatomical or physiological
feature. Further sources of information relevant to the topic are available on the web.
Analysis and prototyping of tools that enable copyright owners to
address various risks that their content is subject to, such as
interception in transit, unauthorised or unpaid use by the intended
recipient, unauthorised provision by an intended recipient to another
party, and unauthorised or unpaid use of the content as a basis for
producing new works.
It would be useful to have some prior understanding of copyright
law and/or multimedia content and/or
water-marking/content-fingerprinting and/or public key cryptography.
But I can provide access to materials and people who can provide such
information. Some sources of relevance to the topic are provided
on the web.
Java's high level byte-code makes very easy to implement interesting
program transformations like specialisation, partial evaluation,
in-lining, cloning, etc. This byte-code modification allows to implement
many sort of optimisation and semantic transformation in a transparent
way. Our group have been applying this technique as a way to add
orthogonal persistence and object versioning to Java programs. We also
have developed a first prototype of Object In-lining in Java. In this
project we would like to explore several byte-code transformations for
optimisation of Java code. In particular, we want to complete the
Object in-lining prototype and evaluate different folding
strategies.
Text documents, which include computer programs, reports and
essays, can often be characterised by the types and distribution of
the (key) words that they contain. In the case of computer programs,
these words can include identifiers, which have no predefined meaning.
Key word characteristics can thus be used to define some metric of
similarity between documents. Practical uses of this include
determining whether the two documents have the same main subject,
and/or whether they have a common origin.
Most existing means of computing the similarity between such
documents rely purely in the spelling of the words, without utilising
any predefined meaning that a word may have. Another method (so far
only partially developed) which is also reasonably successful uses the
frequency counts of key words, which have some predefined meaning. In
this case, the concepts of `synonyms' of the key words may be useful
in certain contexts.
However, much more information can be extracted from the words in a
document. One is applying the frequency counts idea to words without
predefined meaning, ie program identifiers, which introduces some new
challenges. Others include the occurrences of small groups of words
(with some reasonable definition of locality), and the distribution of
particular words throughout the document.
This project seeks to investigate simple but useful methods to
analyse and compare documents based on their composition of (key)
words, and to prove their usefulness over collections of documents
from a variety of different areas. Its relation to text retrieval
methods, which are also key word based, and have concepts of locality,
frequency and synonyms, should also be investigated. As such it has a
strong research component, and the diversity of its application means
that there will be a strong interest in the academic community in any
significant results.
Multi-cast operations arise in many parallel computing applications,
including backtracking problems, and parallel Fortran. Much research
has been done on general multi-cast implementations, which can work
well on random collections of nodes. However, the `optimal'
algorithms are complex and hence there is considerable software
overhead in both deriving the optimal routing for the multi-cast, and
in the multi-cast itself. This detracts from the practical
significance of this work. Also, the current research mainly focuses
on start-to-end latencies to measure the performance of a multi-cast.
However, there are some applications in which a more useful
performance measure is the maximum time any participating nodes has to
spend in the multi-cast.
However, in many applications that require multi-cast, there are
regularities in the collections of participating nodes. In the case
of parallel Fortran arrays, the multi-cast will involve a portion of a
regular data structure with a regular distribution over a regular
(logical) processor grid. Hence the participating nodes must also be
regular. Possibly, a simple multi-cast algorithm with very low
overheads can suffice for this purpose, and will yield far better
performance than general multi-cast approaches.
Furthermore, the Fujitsu AP1000, whose nodes are arranged in a
physical network with a torus configuration, has several hardware
broadcast mechanisms which are very efficient. For example, the
`C-net' broadcast could be modified to perform very efficient
multi-cast when the collection of number in the multi-casts is more than
half the total of nodes (this would faster than sending an ordinary
message!) . It also has a row or column broadcast column using the
net, and this could be fairly easily extended to perform a multi-cast
over any rectangular sub-array of the processor grid (at the cost of
sending an ordinary message). Also, whether the pipelining of large
messages can be effectively implemented on the AP1000, even though its
hardware lacks the capability to simultaneously send and receive
messages, which is believed to be necessary for this purpose.
This project will firstly investigate the applications that require
multi-cast operations, in order to determine the regularity of the
multi-cast patterns and what is the most appropriate performance
measure for that application. Then it will investigate what (semi-)
specialised algorithms are most appropriate for the application with
regular multi-cast patterns, with some versions utilising the AP1000's
special hardware capabilities. The performance of these will then be
compared with the general multi-cast implementations, both simple and
sophisticated (some these have already been implemented on the
AP1000), and an evaluation of these approaches will then be made.
While assembly language code appears to be monolithic and
unstructured, much useful information can be extracted from the
code. This includes:
- recovering the algorithmic `control structure'
(a la high level languages) of the code
- highlighting (non-obvious) dependencies between instructions
- identifying, in some formal way, the `function' of an
instruction
- identifying, in practice, redundant instructions and code sections
Such information is tedious to generate by hand, and is useful in
the reading of assembler programs and also in measuring the similarity
between assembler programs.
This project seeks to investigate simple and reliable methods to
analyse assembler programs to isolate their `essential' algorithmic
properties, over a space of input data of interest. The assembler
code can either be source code or decoded object codes.
Methods include using static instruction analysis to recover the
control structures, and then using dynamic instruction analysis to
determine how instructions have been used and what instructional
dependencies exist. Also, information on the `function' of an
instruction can be deduced from which sets of input data the
instruction was used. From this extra information, redundant
condition evaluations and code sections can be identified, and the
control structures deduced from the static analysis can be revised
accordingly.
This project involves both substantial research and software
engineering components. A program analysis system is first to be
produced to implement these methods. This system must be flexible so
that the assembly language format and (essential) semantics can be
table-driven. Example assembly language code can be taken from the
PeANUt and SPARC assembly languages. Its output should be
(ultimately) an augmented version of the assembler code, in which it
can present its analysis in a readable way side-by-side of the
corresponding instructions. Such a system will have immediate use to
assist tutors in the rather tedious task of marking assembler
programs.
The second stage of this project is to connect the output of the
program analysis system to a program comparator. Here, the concept of
`instruction' equivalence (a 1-1 or 1-many mapping), must be added;
from this, and information on the control structure nesting and
`function' of each instruction, enables the comparator to determine
what instructions from the two programs can be matched against each
other (and how strongly). Again, information on `instruction'
equivalence and matching `function' and control structure nesting
should be table-driven. The output of the comparator should not only
be an overall rating of similarity, but to also be able to `explain'
its results, in other words, to give the mapping of matchable
instructions from one program to the other.
A dynamic web search system takes as input a site's home page and a
query. It then enters the site by following its link structure (this
is also known as spidering/crawling) and employs some dynamic search
algorithm in order to find pages relevant to the query as quickly as
possible. When searching within a known site, dynamic search systems
produce superior results to those from search engines in terms of
freshness (no index to become out of date) and coverage (search engine
coverage may not be that good for any given site [Bharat,
Lawrence]).
The quality of results from such a system depends on its guidance
algorithm, and this may be measured according to how many documents
are found within a certain time or bandwidth limit. The ACSys WAR
(Web and Associated Research) project already has a TREC style [TREC]
web data testbed suitable for adaption to the dynamic search case.
Good research outcomes for an honours project would be:
- To set up a new TREC style testbed for dynamic search algorithms,
- To implement existing algorithms, in particular fish-search and
shark-search from IBM Labs [Hersovici],
- To experiment with new algorithms, with different heuristics and
strategies, possibly incorporating new WAR project relevance
scoring techniques, and
- To evaluate all the algorithms using the testbed.
References
- Krishna Bharat and Andrei Broder.
- A technique for measuring the
relative size and overlap of public Web search engines. WWW7.
- Steve Lawrence and C Lee Giles.
- Searching the World Wide Web. Science
Magazine, Apr 1998, Vol 280, Number 5360
- TREC. Text REtrieval Conference.
- Michael Hersovici, Michal Jacovi, Yoelle S. Maarek, Dan Pelleg,
Menachem Shtalhaim, and Sigalit Ur.
- The shark-search algorithm. WWW7.
Object-oriented (O-O) languages provide a `dotting' syntax for
calling methods: e.g. in Java:
X ob;
ob = new Y();
ob.f(arg1,arg2);
Here, ob is defined to be of type X. But by the
time the method call is reached, ob actually refers to an
object of type Y (assumed to be a subclass of X).
So it is Y's implementation of f that will be
invoked, not X's. The run-time mechanism that chooses the
particular implementation is called `dynamic dispatch', and it's one
of the major run-time overheads of an object-oriented system. In this
example, if the compiler could somehow `figure out' that ob will
always be a Y, it could instead generate a call to
Y's f, and perhaps in-line it, making further
optimisations possible.
A number of algorithms have been proposed to analyse O-O programs
to eliminate dynamic dispatch where possible. Some are fast, but
extremely conservative. Others take a long time to run but are
supposedly more successful. But are they worth the trouble?
In this project, you'll create a framework for implementing,
testing, and evaluating these algorithms.
This is a `red hot topic'; results will be suitable for submission
to one of the major compiler conferences.
The main prerequisite is a good understanding of how a compiler
works.
(two separate projects)
Formal (rigorous) logical proofs are needed in the proof of the
correctness of hardware design (hardware verification) or of software
programs (software verification). These proofs are themselves so
complex that they have to be done with computer-based assistance. For
any particular problem area (e.g., software verification), the right
choice of an appropriate logical system helps in making a proof
practicable. Thus the logical systems available, and the methods for
computer based proofs in them, are an important area of computer
science.
Display Logic is a paradigm for setting out certain logical
systems, and for obtaining proofs in those systems. In most cases it
cannot give an effective decision procedure, so a interactive
computer-based proof system is required.
Some Display Logic systems have been implemented in Isabelle, an
interactive theorem prover based on Standard ML. There is much
further work that could be done. Below we have proposed two projects
but please come and speak with either of us if you wish to work in
this area.
The aim of the project is to design and implement a graphical user
interface so that non-logicians can also use these proof systems.
There are two alternatives, one theoretically minded, one application
minded, and we would be happy to have two students working on one
each.
- Project 1
- This may be done using XIsabelle, an existing GUI-based
front-end for Isabelle, developed by DSTO. It is written using
Expect/Tcl/Tk. In this case the project would involve tailoring
and enhancing XIsabelle to make it suitable for the proof
techniques used in Display Logic.
- Project 2
- An alternative project would be to produce a GUI front-end
specifically for the Display Logic system for relation algebras.
In this case it should be possible to provide a representation
of statements in the relational logic as graphs (networks), and
to relate steps in the proof to incremental changes in the
graph.
The student will need to become acquainted with the design and
implementation of GUIs.
For project 1: also Standard ML, Isabelle, basic proof theory and
some propositional non-classical logics; project (1) will also involve
Expect/Tcl/Tk.
Project 2 will require working out the details of how to portray
relational assertions as graphs, and will involve programming the
display of graphs on the screen. This project will also need basic
proof theory and some propositional non-classical logics but at a
lower level than the first.
A colleague from Germany has recently invented a term rewriting
system called NRA for relation algebras using insights gained from the
display logic calculus for relation algebras. The project involves
implementing this system using existing theorem provers based on term
rewriting e.g Larch. Rewrite Rule Laboratory, Isabelle etc.
The student would need to become acquainted with logic, term
rewriting, and the chosen theorem proving system. Expertise in both
relation algebras and the term rewriting system NRA is available
locally. The system NRA is ideally suited to implementation using the
abovementioned theorem provers, but the project is non-trivial for two
reasons. First, some of the NRA rules have side-conditions associated
with their applicability, and the student would need to find a way to
express these constraints in the chosen theorem prover. And second
because the decision problem in relation algebras is known to be
undecidable, so no implementation is going to be perfect.
BSP (bulk synchronous parallel) is a platform independent parallel
computing model, along with PVM and MPI. Recently a library version
of the model, called BSPLib, has been defined and has been implemented
on a number of platforms. A
number of demonstrator programs have been published.
The project will involve the porting of BSPLib to the AP1000+ and
the AP3000 systems. This can be first done using the MPI library
available on those systems. After benchmarking measurements
appropriate optimisations (more direct calls of the AP-series
communication primitives) can be considered.
The public domain demonstrator programs can then be run. It should
then be possible to write BSP versions of some local application
programs developed for the AP-series machines.
This project would suit a student who is interested in systems
programming and applications programming in novel situations (here
parallel programming). The outcome (thesis and system documentation)
would be very useful when applying for a software engineering
position. If a more formal approach was taken it would suit a student
aiming towards graduate studies.
The FISh programming language has been designed and implemented by
Dr Barry Jay of UTS. It can be thought of as a language with
functional and imperative aspects where data structures are restricted
to where their shape can be determined statically. It can thus be
translated into very efficient programs. More information is
available from the FISh
home page.
A considerable motivation for FISh is that the static knowledge of
the shape of the data structures can potentially allow the
partitioning of the computation across several processes. In other
words, it can possibly allow compiler support for parallel programming
in a way that is not possible with standard parallel programming
languages.
This project will first involve extending the FISh implementation
by appropriate parallel primitives. The FISh implementation is
written in an object-oriented version of ML (with lex-yacc support) so
the extension should not be problematic. It is not clear a priori
what the extensions should look like. Options are
- BSP
- bulk synchronous parallel
- MPI
- message passing interface
- PVM
- parallel virtual machine
It will then involve assessing the extended FISh dialect as a
parallel programming language, by writing a range of parallel programs
for `canonical' problems and interpreting the situation.
This experiment focuses on explicit program construction by the
programmer. Barry Jay has hypothesised that the detailed data
structure knowledge in FISh might allow some automatic decomposition
into parallel program fragments.
The design of FISh is strongly driven from modern type theory, and
a sensitivity to these issues would be an advantage. Although some
implementation work is involved, this is not primarily a constructive
project. It would best suit a student looking towards graduate
studies.
High performance computing has been driven by developments in
architectures and programming languages. Essentially users have to
interact with a HPC system on the system's terms. Recently it has
been argued that this interaction should be on the user's terms. The
user should be presented with a problem solving environment (PSE) for
the problem involved, rather than a general programming environment.
Modern communication technologies (Internet, world-wide web) can allow
the physical decoupling of the user and the computational engine
itself.
A research group at Imperial College has developed a PSE in the context
of the AP-series machines (in the problem domain of data mining) that
can be regarded as representative of this work. More information is
available on the web.
This project will seek to build a demonstrator PSE in the context
of the department's HPC resources (AP-series systems, computational
science and engineering laboratory). The problem domain has not yet
been selected. The Imperial College demonstrator should be available
to the project and this will serve as a starting point for a local
experiment.
Although the project will be `grounded' in a demonstrator system,
the intellectual focus will be on the architecture of such PSE
systems. The thesis will be constructed as a technology review, and
will try to identify architectural issues that are common to such
systems. A new PSE for a different problem domain should not require
the wheel to be re-invented.
This project would suit a student who is interested in design and
architecture issues in software engineering. The thesis would have to
wrestle with the problem of describing such issues and perhaps the
concepts of design patterns might need to be used. The outcome
(thesis and system documentation) would be very useful when applying
for a software engineering position. It would also suit a student
intending to pursue graduate work in some aspect of experimental
computer science.
The enhanced feeling of `presence' that is provided by virtual
environments motivates their incorporation into future communications
technology. In fact, prototype collaborative virtual environments have
already begun to be developed and have been demonstrated across
continents (as was the case with an international hook-up involving
the ANU in November 1998). This project will seek to design and
implement a simple collaborative environment that will link the Barco
Baron facility of the ACSys VE program with the WEDGE two screen
system at the Research School of Physical Sciences and Engineering and
ANU Supercomputer Facility. The issues which need to be addressed
include the different operating environments of the two systems, the
various lags in the communication links across campus and the demands
of the target applications themselves. It is likely that the first
target application will be a simple interactive game.
Design by Contract (DBC) is a major feature of the Eiffel
programming language and is meant to facilitate the construction of
correct object oriented code. It was going to be a feature of Java
before it was left out to the `major regret' of the author. Phil Maker
of the University of the Northern Territory has designed a system to
implement a form of DBC for C/C++ which you can read about at the GNU Nana Home
Page. This project will seek to evaluate the usefulness of such a
system in scientific computation and will start by translating Maker's
ideas (with his help!) to FORTRAN90. It will then proceed with a
re-draft of a `living' scientific code using these methods and its
usefulness will be evaluated in a reasonably scientific way.
Nothing beats formal methods as a means for ensuring trust in
any engineered artifact, whether it is a bridge, an amplifier
or avionic software. If a program is formally specified and
appropriate proofs are constructed then properties of termination,
safety, timing, space usage and correctness then the program can
be guaranteed to have these properties with mathematical certainty.
If Java is to become the dominant language for engineering software
and if it is to be used for trusted systems then the ability to
construct proofs of properties must be there. Ultimately there
must be good tools but these must be based on solid foundations,
which is where we are right now.
Some work has been done on formalising the JVM (the Java Virtual
Machine) in HOL (Higher Order Logic) but there is more to do and
then there must be decision procedures written to make the burden
of theorem proving as automatic as possible.
This project is about building on these foundations with the aim
of proving some real properties of Java applets.
As we work towards the development of standards for the description
of various types of models we build in data mining, a toolkit is
required to support the visualisation and execution of those models.
The Extensible Markup Language (XML) is proposed as a standard for
this, leading to the development of PMML, a Predictive Modelling
Markup Language. Parsers for XML have been developed in Java. The
ACSys Data Miner's Arcade includes a number of tools that need to
generate PMML, including Gamming, BMars, and CBos. This project will
develop models for each of these types of predictive modelling tools,
leading to specifications and Java implementations of XML models.
The ACSys Data Miner's Arcade supports the process of identifying
Hot Spots in very large datasets. A new approach is under
development, using ideas from evolutionary computation. An initial
prototype has being developed and will form the basis of this project
which will research the concept of interestingness and work towards
capturing the concept in a formal language. Expressions of
interestingness will then be evolved to explore for measures which
match the requirements of the Data Mining exercise.
Data Mining is inherently interactive, relying on the domain
expertise to drive the data mining process, which in turn provides
information to guide the domain expert's exploration. Visualisation
is a very effective mechanism for interacting with domain experts but
is limited to what can be displayed on the screen. Virtual
Environments provide a lot more real estate for the presentation of
information. This project explores the use of virtual environments
for data mining, in general, and for the support of the ACSys Hot
Spots methodology in particular.
ACE (Abstract Clause Engine) enables rapid prototyping and
experiments with a wide variety of logical
languages. This approach has a
rigorous basis. It has been tested very successfully via a
smaller-scale forerunner that was written in Standard ML
and used internationally by industrial and academic research groups
for several languages.
The core code of ACE has recently been re-implemented in Java, and
the CUP Parser
Generator has been used for the front-end. This
project entails adding to the core to produce an ACE applet.
In particular, you will be designing and implementing a
Java graphical user interface, implementing search procedures, and
experimenting with the system.
References
Copies of the following papers are available on request.
- D. A. Wolfram.
Semantics for Abstract Clauses.
In Henk Barendregt and Tobias Nipkow, editors,
Types for Proofs and Programs: International Workshop
TYPES'93,
Selected Papers, Lecture Notes in Computer Science
806 (Springer, Berlin, 1994) 366-383.
- D. A. Wolfram.
ACE: the abstract clause engine.
In Mark E. Stickel, editor,
Proceedings of the
Tenth International
Conference on Automated Deduction (CADE-10),
Lecture Notes in Artificial Intelligence 449
(Springer, Berlin, 1990) 679-680.
For many years the desktop computer has enjoyed the concept of a
windowing system to partition the display into multiple independent
task spaces that the user can operate in parallel. To date virtual
environments have offered no such generic task space partitioning
system, relying on a single monolithic scene with a few widgets around
the periphery.
This project will focus on the design of a user interface for
virtual environments that is similar in spirit to a windowing system on
a desktop. It is based on the idea of controlling a complex task by
using independent task spaces to control simpler tasks. Issues such as
navigation of the interface, task space occlusion, and VE 'desktop'
design will be addressed. It is likely that a projection table based
terrain visualisation will be used as the testbed for this work, but
other modes of VE operation could also be explored.
Agents are small, independent and automonous programs that attempt
to intelligently work towards a goal often on behalf of a human. This
project will attempt to model and direct the behaviour of a number of
agents in an economics simulation. The agents should have a number of
senses that provide information about changes in the world. The aim
of the project will be to make the agents reason and direct their
behaviour based on the input from their senses in a local way. That
is, the agents will not require global knowledge of the system to act
on some small part of it.