Previous honours topics
These projects topics were offered in 2000. While these - or
similar - topics may not offered this year, they do serve as a
guide to the diverse range of topics that will be available.
If you wish you could work on a topic like one of these, then why
not write to the person who offered it. They might be persuaded to
let you pursue a similar project under their supervision this
year.
Text documents, which include computer programs, reports and
essays, can often be characterised by the types and distribution of
the (key) words that they contain. In the case of computer programs,
these words can include identifiers, which have no predefined meaning.
Key word characteristics can thus be used to define some metric of
similarity between documents. Practical uses of this include
determining whether the two documents have the same main subject,
and/or whether they have a common origin.
Most existing means of computing the similarity between such
documents rely purely in the spelling of the words, without utilising
any predefined meaning that a word may have. Another method (so far
only partially developed) which is also reasonably successful uses the
frequency counts of key words, which have some predefined meaning. In
this case, the concepts of `synonyms' of the key words may be useful
in certain contexts.
However, much more information can be extracted from the words in a
document. One is applying the frequency counts idea to words without
predefined meaning, ie program identifiers, which introduces some new
challenges. Others include the occurrences of small groups of words
(with some reasonable definition of locality), and the distribution of
particular words throughout the document.
This project seeks to investigate simple but useful methods to
analyse and compare documents based on their composition of (key)
words, and to prove their usefulness over collections of documents
from a variety of different areas. Its relation to text retrieval
methods, which are also key word based, and have concepts of locality,
frequency and synonyms, should also be investigated. As such it has a
strong research component, and the diversity of its application means
that there will be a strong interest in the academic community in any
significant results.
The major aims of this project are the design, implementation and
evaluation of algorithms for view selection for materialisation, view
replacement, and view refreshment in data warehouses. Thus our
objectives are:
- Design of new algorithms for view selection and replacements in
the dynamic data warehouse environment, and evaluation of new
strategies for such applications through implementation and
performance evaluation.
- Examination of the existing view refreshment policies,
development of new view refreshment algorithms that shorten view
refreshment time for the data warehouse.
The query response time and the view refreshment time are very
crucial factors in determining data warehouse applicability. These
factors in turn depend heavily on which views are selected for
materialisation, which views will be replaced, and how fast the
materialised views will be refreshed, given limited storage space and
limited refreshment time window. Therefore, any improvement in
handling view selection, refreshment and replacement will lead to a
substantial improvement of the entire performance of the data
warehouses.
A number of efficient sequential algorithms have been suggested for
mining the generalized association rules. In this project, we will
investigate different strategies to mine generalized association rules
in parallel and distributed computing environments. Particularly we
will focus on designing and implementing efficient mining algorithms
where the dimension tables and the fact table are decoupled and
distributed in different remote sites.
Investigate the practicability of implementing agents, and negotiations
between agents, in particular by measuring the increase in the complexity
of code as the complexity of the interactions between the two agents
increases.
The suggested implementation context is the W3C's Platform for
Privacy Preferences (P3P) specification. This defines how client-side
software can store people's privacy preferences, and server-side
software can store privacy policy statements by corporations and
government agencies. The two agents can then negotiate with one
another in order to permit a transaction to be entered into (such as
the provision of shoe-size and credit-card details), to prevent the
transaction, or to refer a mismatch to the consumer for a
decision.
It is envisaged that a succession of prototypes of increasing
completeness would be implemented, and key process and product factors
would be measured. This would depend on a thorough appreciation of
theories relating to agents, P3P, software development and
maintenance, software complexity, and development and maintenance
productivity.
Some background reading is at:
Investigate the extent to which it is feasible to devise and
implement a tool that provides the location in physical space of
Internet nodes.
Netizens assert that cyberspace is different from the real world.
Important among the tenets of netizenship is the belief that the
machines that people use cannot be readily found by other parties,
even by multiple law enforcement agencies working in concert,
especially if quite basic countermeasures are adopted. The increasing
incidence of wireless communications channels to mobile nodes adds to
the difficulties involved.
A number of sources of relevant data are publicly available (such
as IP-address registers and the DNS), and others are available to law
enforcement agencies (such as registers maintained by
telecommunications companies and Internet services providers). The
question is whether such sources of information can be combined into a
tool that provides real-time (or even retrospective) computation of
the node's geophysical coordinates at a point in time.
It is envisaged that this would involve an analysis of a range of
relevant literature (including IETF and probably other engineering
standards), study of relevant tools, establishment of a model of the
relevant characteristics of nodes, manual investigation of the
locatability of nodes in a sub-set of the circumstances defined by the
model, and prototyping and testing of tools and tool enhancements.
Because the task is likely to be partly dependent on data schemas that
are not in the public domain, it is probably necessary to conduct
interviews with telecommunications companies and ISPs.
Some background reading is at:
Investigate the risks to private signature keys, and the extent to
which security regimes and products are able to satisfy the need for
protection against access to the key by any other person.
Digital signature technology is entirely dependent on the
protection of the private signature key. There are difficulties in
designing a secure environment for that key. This is the case in
corporate settings, and especially so in the case of personal
work-and-playstations. Throughout its life-cycle, the key must only
ever be accessible to the owner, commencing with the generation of the
key-pair, and through the phases of usage, storage, recovery from
storage, backup, and recovery from backup. Artefacts designed to
provide protections of this kind are sometimes referred to as 'secure
wallets', and the most obvious implementation platform is a
smartcard.
It is envisaged that a theoretical analysis of requirements, risks,
security designs, and countermeasures would be undertaken; and that
existing proposals for, and implementations of, 'secure wallets' would
be evaluated. It may be appropriate to perform prototyping of some
aspects of such products, and/or to devise and implement attacks on
secure wallet products or prototypes. Because it is possible that some
relevant information may not be in the public domain, it may be
necessary to conduct interviews with smartcard technology providers
and consultants.
Some background reading is at:
Investigate the extent to which the centralised storage of
biometric measures of humans creates the risk of masquerade.
Biometrics is a generic term encompassing a wide range of measures
of human physiography and behaviour. Measures of relatively stable
aspects of the body include fingerprints, thumb geometry, aspects of
the iris and ear-lobes, and DNA. Dynamic measures of behaviour
include the process (as distinct from the product) of creating a
hand-written signature, and the process of keying a password.
Schemes can be devised that apply biometrics in such a manner that
the measure is only ever known to a chip held by the individual, and
the device currently measuring the person concerned. (This is
analogous to the mechanism used for protecting secure PINs input on
ATM and EFT/POS keyboards).
It is very common, however, for proposals for biometric schemes to
involve central storage of the biometrics, as police fingerprint
records do now, and as current proposals by the Australian Government
would do in relation to DNA records. This raises the question as to
whether a person who gains access to the store could masquerade as
that individual. Possible uses would be to gain access to buildings,
software or data, to digitally sign messages and transactions, to
capture the person's identity, to harm the person's reputation, or to
`frame' the person.
It is envisaged that this would involve an analysis of a range of
literature relating to identification and biometrics, and
investigation of the extent to which abstract representations of
biometric data can be used to produce artefacts that will satisfy
biometric measurement devices. Because the task is likely to be
partly dependent on information that is not in the public domain, it
is probably necessary to conduct interviews with biometrics technology
providers, and perhaps also physicists, biologists and engineers.
Some background reading is at:
Write a prototype next-generation Web crawler, capable of both Web
browsing and searching. Test its ability to automatically collate
subject-specific Web resources for use in a "vertical portal".
Vertical portals [1] are a hot area in the Web industry, acting as a
gathering place for a particular industry or interest group. They can
include a complete and very up-to-date search facility for pages in
the portal's area, as opposed to a general search engine which covers
16% of every area [2]. A hard task, however, is to efficiently find
pages which a portal should search. Current portals rely on manually
identified sites in the portal's subject area and crawling their pages
using a dumb Web crawler [3]. Automating this process would find more
pages with less human effort.
The project involves building a better crawler, that given the
portal's area e.g. Linux, Chinese language or SCUBA industry, would
automatically find matching pages. It would find pages using:
1) targeted browsing (crawling), building on the work of a 1999
honours student, and 2) automated searches of existing search engines.
The crawler would accept or reject pages based on a string match
(e.g. "Linux" or "SCUBA") or a language test (e.g. software for
detecting Chinese). Evaluation would involve measuring how many pages
the crawler can find with only browsing, only searching and then
browsing plus searching. The set of pages found could also be
compared with the coverage of an existing service such as [4]. This
project suits students who want a deep understanding of HTTP, Web
crawling and Web search for use in their later career.
Part of human nature is searching and foraging behaviour, even if
today it is most often used at the supermarket or looking for car
keys. Current text search applications, such as search engines, are
not much like real-world searches. They require a user to enter a
text query and view a textual list of results. The only indication of
the "lay of the land" presented by current Web search engines is the
10 results and a number indicating e.g. 750,000 other "hits".
The structural nature of the Web and of relations between documents in
a given search forms a network or graph. This dynamic network could be
explored visually, giving the user many paths that they could choose
to follow as they "forage" for information. The project involves
development and assessment of a system that displays a visual map of
document properties and relations. The project focuses on
understanding existing and developing new ways of exploring and
navigating through a complex information space. This project suits a
student who wants to learn about the Web and interactive graphics.
Initial information:
Write and evaluate a site summariser. Time permitting, apply the same
principles to subject-specific search services.
Yahoo! [1] is an example of a successful company which performs manual
site summarisation: all their listed sites have hand-written
summaries. InvisibleWeb [2] do the same job as Yahoo!, but with
subject-specific search services. Either type of company might find
automatic summarisation techniques useful.
There are many document summarisation algorithms, such as those
referenced in [3,4]. This project first involves implementing one of
them and trying it out on pages from a site or search service. The
ideal would be to modify the algorithm to build a single summary based
on several pages from a single source, then evaluate whether having
more pages improves the summary. Evaluation would involve listing
several summaries and asking a human judge (a volunteer) which is
best. This project suits a student who is interesting in conducting
really original research with commercial applications, performing an
experiment and analysing the results.
Analyse the Web search problem to see whether searching fewer
documents can give better results.
IR since the 60's has asked "How to search?" a single set of
documents. Today, any Web search engine could attempt to index a
billion documents, and the question has become "What to search?" or
"How much do I need to search?". Cutting down on what you index saves
time, money, bandwidth and other resources. The question is whether
doing so helps or hinders people in finding useful information.
The project is to evaluate three approaches used on the Web when
deciding what to index. One is to build the biggest index possible.
If you cover every document you ensure that every document is
findable. Another is to only index "good" documents, based on link
information. This is what large search engines do, and some
(AltaVista, Excite and Google) give hints of their criteria for
"goodness". The third approach is to only index documents that are
within the right subject area. This is the vertical portal approach
listed in one of the other honours topics. We now have ways of
evaluating these different approaches, using Web-data test collections
and over up to 100 gigabytes of Web pages (the size of a small Web
search engine). This project would suit a student who wants a deep
understanding of Web search, for use either in the Web search industry
or Web research.
Information on limited search coverage:
Users of desktop computers possess the ability to interact with
multiple applications concurrently through the advent of the WIMP (Windows,
Icons, Menus, Pointer) interface. Currently there is no such similar system
for Virtual Environment (VE) applications. In 1999 Justin Waddell carried out
an exploration of this concept, resulting in the development of a small
prototype system that used rectangular prisms to contain varied applications
in a single VE workspace.
This project continues work on the concept of a "window manager" for VE
applications. There are a number of avenues that may be explored,
possibilities include development of a different space metaphor (with
comparison to the prism metaphor), or extensions to the prism-based
system and detailed user analysis thereof. This project would suit a student
who is interested in user interface design, task analysis and VE applications
development.
Details are here.
Details are here.
Details are here.
The Beowulf is a class of parallel computers which are built from commodityhardware rather than highly specialised, expensive hardware. The
Beowulf system being constructed at ANU Computer Sciences Laboratory
(RSISE) and Dept of Computer Science (FEIT) will have at least 128,
possibly 192, processors running variants of the Linux operating
system, interconnected by 100 Mbps Ethernet networks. Application programs
include machine learning.
Effective use of parallel computers requires debugging and
performance analysis tools. The standard approach is off-line
visualisation of large event trace files; numbers of tools exist on
many different systems. Until recently most parallel computers had
proprietory operating systems and there are few standardised tools.
An investigation of existing tools is needed, with the practical
part of the project involving implementation or porting of one or more
tools onto the Beowulf. Issues include:
- design of trace collection and management utilities (large
amounts of data are involved)
- design and implementation of intelligent analysis assistant
software such as recognition of patterns of events, discovering
and displaying anomalies and deviations from the intended
communication patterns, identification of bottlenecks and
hotspots (some of this possibly relates to data mining, but
simpler approaches are more common)
- design and implementation of visualisation and navigation tools
- elimination or minimisation of probe effects (unintended modification of
program behaviour caused by tracing the behaviour)
- analysing event streams from processors with unsynchronised clocks
Background required
- an interest in programming utilities and programming
development environments
- COMP3037 Operating Systems Implementation, or similar
- an interest in concurrency, parallel computing, message passing
programs.
Further reading
- P.B. Thistlewaite and C.W. Johnson
- Towards Debugging and Analysis Tools for Kilo-Processor
Computers, Fujitsu Scientific and Technical Journal, vol 29
no. 1 (Spring 1993), pp. 32-40 (available in hard copy from
Chris Johnson)
- Debugging
parallel programs using incomplete information
- current, related work on debugging program topologies at University of Western Australia
- Debugging Concurrent Programs
- (an earlier survey article): C. E. McDowell and D. P. Helmbold,
ACM Computing Surveys, vol. 21, no. 4, December 1989, pp. 593-622.
- UPSHOT
- the UPSHOT
program visualisation system from Argonne (the latest version is
Tcl/tk based).
- Beowulf systems
- NASA Beowulf project
home page from NASA Jet Propulsion Lab (the origin of Beowulf systems)
When a user gets a program from across the web (an applet, for
example) there is need for some guarrantee that the code does
what it says it does and does not do nasty things. The Java
sandbox is a way of addressing the second of these concerns.
The ultimate way to give confidence to a user that appropriate
correctness and safety properties hold is to provide a
machine checkable proof of those properties. This is the
basis for the notion of PCC (proof carrying code).
For any particular property of some given code there will be
many possible versions of many different proofs; some more
suitable than others for transmission and checking, For
example, a base level proof in a primitive logic can be shown
to be valid by a simple checker in which the user can have
great confidence; on the other hand, such a proof is likely
to be so bulky that its transmission will be too expensive
and its parsing will be time-consuming. On the other hand,
some properties need just a one-line proof which is simply
the name of a powerful decision procedure; in this case,
transmission is not bottleneck but decision procedures are
often exponential in execution time and the user may find
the delay in executing an applet unbearable.
This project is about exploring how one can select an optimal
version of a family of proofs - an appropriate compromise
between bulk of proof to transmit and speed of checking at
the receiving end. For the sake of the study, the properties
will be expressed in higher order logic using propositional
logic, integer arithmetic and finite set theory. The
suggested mechanical prover/checker is the HOL system (which
is used in the honours course on mechanical verification).
Involves developing aspects of an algorithm to simulate genetic
recombination. Need to test the algorithm on genetic data and compare
results with those generated by existing algorithms. Background in
algorithms needed.
Java's high level bytecode makes very easy to implement interesting
program transformations like specialization, partial evaluation, inlining,
cloning, etc. This bytecode modification allows to implement many sort of
optimization and semantic transformation in a transparent way. Our group
have been applying this technique as a way to add orthogonal persistence
and object versioning to Java programs. We also have developed a first
prototype of Object Inlining in Java. In this project we would like
to explore several bytecode transformations for optimization of Java
code. In particular, we want to complete the Object inlining prototype
and evaluate different folding strategies.
Making Java programs orthogonally persistence requires that persistent
objects be stored in a long term storage medium like a database. One of
the different types of databases that can be used for this purpose
is a relational database. In order to use the database efficiently,
persistent objects that are no longer required need to be removed from
the database. This project looks at developing and implementing a garbage
collector that does the above.
This project would look at ways of using the PHANToM haptic device
to explore or "visualise" scientific data sets. The PHANToM is an
excellent 3D input device for probing a 3D space. Further, force
feedback can be used to constrain the exploration to surfaces or
areas of interest. The probe can be connected to other forms of
feedback, such as visual or auditory, to provide detailed information
about values at a point. Further, the haptic properties of the device
could be used to convey data values through force, vibration, etc.
The project could tackle one, some, or all of these challenges, and
more:
- integrating haptics with existing visualisation or scenegraph systems
- using haptic constraints to facilitate probing
- integrating probing with visual and/or audio feedback
- design of techniques for conveying data through forces
- investigation of whether the haptics improves appreciation of
the data set by the user
- determining what sort of data sets are amenable to this sort of
display, and finding some good examples
This project would extend work done by Hanson and Wernert (Vis '97, Vis '99)
on constrained navigation of 3D scenes. Navigation can be seen as selecting
a sequence of viewing parameters from a fairly high-dimensional space.
The task can be made easier by defining a lower-dimensional constraint
surface and navigating that instead. This could be an ideal situation to
use haptically constrained input techniques. The approach also has
natural extensions to collaborative exploration of a VE by multiple
participants. The project could tackle one, some or all of these
challenges, and more:
- combining haptic input with 3D display and navigation software
- design of interaction techniques to empower navigation
- implementation of constraint surfaces
- extensions to collaborative exploration
- extensions to general exploration of multi-dimensional spaces