Honours topics
Half of your time as an honours student is spent working on a project. But first you have to find a project topic.
The "official" reference for projects proposed by potential supervisors is the CECS projects database.
There are projects there available for all levels of research, including short projects, summerscholarship projects, Honours projects, Masters and PhD projects. All potential research students at any level are urged to browse it.
If you see a project that looks interesting, email the potential supervisor about it. Don't be afraid to discuss possible variations to the listed project: what appears on the web site is generally more of a suggestion than a rigid specification.
You don't have to be constrained by what you see on the project website. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Look at the web pages of Computer Science staff to find out their research interests. Remember that projects may also be supervised by people outside the College, or even ontside the University: from CSIRO or NICTA, for instance.
Former Project Topics
For your interest, here is an archive of Honours project proposals from previous years. Some of them are quite ancient now, of course, but they may help to give you ideas of the kind of thing which is suitable.2005 Honours project proposals
Network-level Optimization of Communication on Cluster Computers
Supervisor: Dr Peter StrazdinsBackground
A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. These have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. However, the COTS networks' raw communication speed has not kept up with the dramatic increases in processor speed, and provides a limit to the performance of many applications on clusters. It is therefore important to utilize all of the possible hardware capabilities of these networks in order to effectively increase communication speed.
The switch-based networks used by clusters not only support normal point-to-point messaging (via TCP/IP in the case of Ethernet networks), but data transfers by collective communications (e.g. multicasts) and remote direct memory access (RDMA). The nodes on a cluster are connected to these switches via Network Interface Cards (NICs); it is also possible for nodes to have multiple connections with the network and speed up data transfer by splitting a single message and sending parts of the message of each connection simultaneously.
The ANU Beowulf cluster Bunyip is a cluster based on Fast (100Mb) Ethernet switches. In partnership with the local company Alexander Technology, a subset of Bunyip will be upgraded to Gigabit Ethernet technology. Alexander Technology and DCS have recently begun a project to evaluate the effectiveness of multiple channels on this sub-cluster.
This project will complement the above project by investigating other aspects of communication performance for Gigabit Ethernet. This includes improvement of collective communications such as TCP/IP multicast and NIC-level optimizations, and RDMA over Ethernet. It will begin with the search for, installation and evaluation of existing (open source) software for this purpose. Depending on time and opportunity, new optimizations will be attempted and evaluated.
A companion project would perform similar work but for clusters connected by Quadrics switches, using the newly installed (again, courtesy of Alexander Technology) Sunnyvale cluster at DCS.
These projects are of strategic interest not only to DCS but Alexander Technology, who is rapidly expanding its research and development in cluster technology.
References
See the links above, and also:- the IEEE Task Force on Cluster Computing home page.
- for a description of optimizing communication patterns at the software level, see Wi Bing Tan's Hons thesis
- the RDMA Consortium
- related work done at Colorado
- for NIC-level communication optimizations, the NOW Lab at Ohio State Uni has produced several papers in this area, particularly on Quadrics.
Extended Threads Emulation in an SMP Computer Simulator
Supervisor: Dr Peter Strazdins and other members of the Sparc-Sulima team.Background
Simulation is playing an increasingly important role in the design and evaluation of current and future high performance computer systems. Sparc-Sulima is an UltraSPARC SMP full-machine simulator currently being developed by the CC-NUMA Project (before that by the ANU-Fujitsu CAP Project). Sparc-Sulima simulates an UltraSPARC processor by interpreting each instruction.
Currently, Sparc-Sulima can simulate an unmodified Solaris (UltraSPARC) binary (executable program). When that program executes a system call trap instruction, the corresponding operating system effects are `emulated' (the simulated computer state is updated directly, rather than via simulating a lengthy and complex sequence of kernel-level instructions), and simulated execution then resumes. This mechanism is handled by the SolEmn emulation layer in Sparc-Sulima. Threaded programs (e.g. using the pthreads Posix thread libraries) ultimately access thread-related operating system services via the Solaris _lwp (lightweight process) trap instructions. The limitation of the current implementation is that, at any time, there can only be one active thread per CPU. Such a limitation has been in all emulation-based computer simulators in the literature also.
The aim of this project is to extend the emulation of the _lwp trap instructions to remove this limitation. It will require implementing thread switching, which can occur by time-outs, and also by the the _lwp_suspend and yield system calls. The current thread context save/restore mechanisms will also have to be extended. If successful, the work should be publishable.
An extension of this project, more implementation oriented, would be to simulate, rather then emulate the _lwp traps. This would involve writing a mini-kernel that would perform an implementation of lightweight threads. Now, when Sparc-Sulima executes a threaded program and encounters an _lwp trap, it will starts executing the kernel, just like on a real machine. Note that the same underlying algorithms of the emulation mode can be used; however, some considerations must now be given to the `locking' of shared data structures. The advantage of this is that kernel-level (notably memory system-related) overheads can now be recorded by the simulator. Thus, the simulator will be able to predict more accurately the threaded program's overheads.
This project is related to the project Using Threading Techniques to speed up SMP Computer Simulation. It is part of the CC-NUMA Project.
References
- Slides for a background talk for Sparc-Sulima.
- Pthreads programming by O'Reilly, and other books on POSIX threads. Solaris on-line documentation on Solaris threads, including man -s 2 intro.
- from the Sparc-Sulima home page, there is a survey paper on machine simulation (citing many more references) and also papers on the implementation of Sparc-Sulima and SolEmn. There are manuals and source codes too.
The Effect of Architectural Variations on the NAS Parallel Benchmarks>
Supervisor: Dr Peter Strazdins and other members of the Sparc-Sulima team.Background
Simulation is playing an increasingly important role in the design and evaluation of current and future high performance computer systems. Sparc-Sulima is an UltraSPARC SMP full-machine simulator currently being developed by the CC-NUMA Project. SMPs (Symmetric Multiprocessors, or shared memory multiprocessors) are widely used as server machines for demanding scientific and commercial uses; in most situations, the performance (or lack of) the shared memory system is of key interest. In particular, the memory system is NUMA (Non Uniform Memory Access); that is, the time for a processor to access a piece of data depends on where that data is situated in the memory system. The CC-NUMA Project has a 12-CPU UltraSPARC III called alcatraz.
The NAS Parallel Benchmarks (NPB) are a widely used set of benchmarks derived from scientific applications (largely related to computational fluid dynamics). For SMP, the OMP (Open Multi-Processing version of the NPB are of chief interest, as OMP is emerging as a high-level and highly portable programming paradigm for shared memory multiprocessors.
Currently, Sparc-Sulima can accurately simulate UltraSPARC III Cu system (CPU, caches and memory system (backplane)), and has facilities to count events of interest with respect to program performance (e.g. the number of 2nd-level cache misses). Very recently, some preliminary work on analysing the NPB on alcatraz has been performed by vacation student Yan Zhang; this is of course for the current (actual) UltraSPARC III Cu system. Of greater interest to possible future system design would be to explore architectural variants, and evaluate their effect on performance. For example, what effects would doubling the 2nd-level cache size have on the performance of these benchmarks?
The aim of this project is to propose and quantitatively answer such questions, with the ultimate purpose of identifying the architectural features that are most critical to the performance of these benchmarks. These features include cache configurations, cache coherency protocols, and the configuration of the memory system `backplane' (particularly those which create NUMA effects). This project is related to the project Extended Threads Emulation in an SMP Computer Simulator. It is part of the CC-NUMA Project.
References
- Pradip Bose and Thomas M. Conbte. Performance Analysis and its Impact on Design. IEEE Computer, pages 41--49, May 1998.
- web sites (hyperlinks above) for the NPB, OMP and UltraSPARC.
- Slides for a background talk for Sparc-Sulima.
- from the Sparc-Sulima home page, there is a survey paper on machine simulation (citing many more references) and also papers on the implementation of Sparc-Sulima and SolEmn. There are manuals and source codes too.
Energy-Efficient Data Gathering in Wireless Sensor Networks
Supervisor: Dr Weifa LiangE-Mail: wliang@cs.anu.edu.au
Background
Wireless sensor networks have received significant recent attention due to their potential applications from civil to military domains including military and civilian surveillance, fine-grain monitoring of natural habitats, measuring variations in local salinity levels in riparian environment, etc. The sensors in sensor networks periodically collect the sensed data as they monitor their vicinities for a specific interest.
The sensed data can be transmitted directly to the base station where the client queries are posted, the base station then processes the collected data centrally and responds to users queries. However, due to the fact that each sensor is equipped with energy-limited battery, the energy efficiency rather than the response time for each query is paramount of importance in sensor networks. The energy consumption for the above centralized processing is very expensive. Instead, we adopt energy-efficient in-network processing, i.e., the data gathering is implemented through a tree rooted at the base station, and the data at each sensor has to be filtered or combined locally before it is transmitted to its parent.
Unlike most existing assumptions that each sensor only transmits a fixed length of message, we consider another data gathering, in which the length of the message that each sensor transmits to its parent is proportional to the sum of the message lengths of its children and itself. Thus, the total energy consumption at a node is proportional to the length of the transmitted message and the distance between the node and its parent in the tree. Specifically, on one hand, we aim to find such an energy efficient spanning tree rooted at the sink node that the network lifetime is maximized. On the other hand, we prefer the approximate result instead of the exact result in traditional data models for a given query, since the communications among the sensors are fail-prone. Under this circumstance, how close the approximate result is to the exact result becomes the essential issue.
In summary, this project will be in pursue of energy efficiency on the basis of database modeling in sensor network environments. A student working on this project would be involved:
- participate the research project.
- study the problem complexity if taking the network lifetime as the sole metric
- propose heuristic algorithm for finding a data gathering tree and provide approximate aggregate result for a given aggregate query
- conduct experiments by simultation to validate the efficiency of the proposed algorithms.
Neural networks theory and applications
Supervisor: Prof Tom GedeonE-Mail: tom@cs.anu.edu.au
Background
A number of projects are possible in this area. I am interested in a number of topics, such as extracting rules from neural networks, information retrieval using neural networks, data mining and feature selection, cascade neural network structures, hierarchical neural network structures, and neural network applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with neural networks is necessary. Most projects will use the very popular backpropagation neural network training algorithm. If you are interested in neural networks and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.
Fuzzy logic theory and applications
Supervisor: Prof Tom GedeonE-Mail: tom@cs.anu.edu.au
Background
A number of projects are possible in this area. I am interested in a number of topics, such as automated construction of fuzzy rule bases from data, hierarchical fuzzy systems, fuzzy interpolation, information retrieval using fuzzy logic,, universal approximators, and fuzzy logic applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with fuzzy logic is necessary. If you are interested in fuzzy logic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.
Face Recognition
Supervisor: Prof Tom GedeonE-Mail: tom@cs.anu.edu.au
Background
A number of projects are possible in this area. I am interested in a number of topics, such as image processing for face recognition, HCI research tool to collect facial images, biologically plausible architectures for face recognition, building and modifying computer models of real faces. No previous work on face recognition is necessary. For the image processing topic, some experience with computer graphics would be useful. If you are interested in face recognition or some similar topic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.
Modeling the Human Brain
Supervisor: Mr Andrew Coward and Prof Tom GedeonE-Mail: andrew.coward@anu.edu.au
Background
Understanding of how neurons are organized to support higher cognition in human beings is an important research area. This project would be part of a research p rogram aimed at modeling memory and attention in terms of the behaviour of neuro ns within a structured neural architecture.
Honours student research project
The cortex column ( www.cs.toronto.edu/~miguel/papers/ps/ecs02.pdf
) is a prominent feature of the mammal (including human) brain. This project wou
ld create a computer model of a cortex column within an architectural framework
called the recommendation architecture (
http://cs.anu.edu.au/~Andrew.Coward/scotland.pdf)
. The model would include a model for a neuron with inputs
resembling axon action potentials which could learn cognitively
useful responses to combinations of those inputs under the guidance of other neurons in the column.
A student working on this project would be involved through
Participation in a research project on modeling human cognitive behaviour in terms of neurons
Object oriented cross-platform programming (using the Smalltalk language).
For more information, publications etc. please visit the web page
http://cs.anu.edu.au/~Andrew.Coward
The Active National University: applying location awareness
Supervisor: A/Prof Chris JohnsonE-Mail: cwj@cs.anu.edu.au
Background
The proliferation of wireless access points and mobile access devices across university campuses provides an opportunity to use information on people's locations, provided by their wireless computing devices, to support their working and recreational life on campus. The Active Campus project at University of California San Diego is an example of a prototype system. In any such system there are technical issues about the effectiveness of the wireless network as a location sensor and information carrier, issues of who provides and administers access, identification of devices and people, the design of applications and their interfaces on mobile devices, privacy and ethics in tracking and informing on people, and the effectiveness of systems as people use them and adapt to their use. Wireless location on this scale has low precision and is highly variable. The ANU is deploying a network of 802.11b (WiFi) wireless access points; it's time to turn the campus into the Active National University. Software Engineering honours student research project Several projects in this area will potentially support each other:
1. Support framework for applications: (software engineering biassed) Analyse, design and implement a robust, extendable framework that demonstrates support for a small suite of basic applications suited to the ANU campus. Mobile computing platforms can be those anticipated in the near future, and the applications can be selected or invented to support education, research, and social activities. The system must incorporate a strong flexible emphasis on privacy and security.
2. Location Authority and Service: Various structures and requiremetns for Location Authorities for location based services have been suggested, in particular by Shafer. A robust, long-life, easily maintained Location Authority is essentially a database that maps the device identity of wireless access points onto their geographic locations on campus, but it also provides procedures and utilities for creating and maintaining the actual data as access points are installed, moved, go down and return to service. The LA should interoperate with Geographic Information Systems such as those giving the coordinate systems provided by available maps of ANU campus as a demonstration example.
3. Novel applications: Design, implement and evaluate the effectiveness and social issues of novel location-aware applications on the Active National University framework system. Human and social factors are as important here as communications and information services.
More information
- Guiding, Tracking and perceptual computing: http://www.designing-ubicomp.com/Arbeitsdateien/7_tracking+guiding.html
- Ubiquitous Computing (in Daily Wireless): http://www.dailywireless.org/modules.php?name=News&file=article&sid=1631
- ActiveCampus: Experiments in Community-Oriented Ubiquitous Computing, William G. Griswold, Patricia Shanahan, Steven W. Brown, Robert Boyer, Matt Ratto, R. Benjamin Shapiro, Tan Minh Truong, IEEE Computer, October 2004 pp. 73-81
- The Carrot Approach: Encouraging Use of Location Systems, by Kieran Mansley, Alastair R. Beresford, David Scott. in N. Davies et al .(eds) UbiComp 2004, LNCS 3205, pp 366-383, 2004.
- Everyday Encounters with Context-Aware Computing in a Campus Environment Louise Barkhuus and Paul Dourish in N. Davies et al .(eds) UbiComp 2004, LNCS 3205, pp 232-249, 2004
- Steven A. N. Shafer, Location Authorities for Ubiquitous Computing, in Mike Hazas, James Scott, and John Krumm (eds), UbiComp workshop on Location-Aware Computing, 2003. at http://www.ubicomp.org/ubicomp2003/workshops/locationaware
Geocoding Addresses with Probabilistic Data Linkage
Supervisor: Dr Peter ChristenE-Mail: Peter.Christen@anu.edu.au
Background
Geocoding is defined as the ability to create longitude and latitude coordinates (GPS coordinates) from address place name or descriptive information. Geocoding is useful where descriptive location information is available but quantitative spatial information is required for analysis. In the health sector, for example, geocoded records can be used to draw maps of disease outbreaks, or the unusual distribution or clustering of certain health indicators.
Because most real world data collections contain noisy, incomplete and incorrectly formatted information, the process of matching addresses or place names with a geo-referenced database is not trivial. Addresses may be recorded or captured in various, possibly obsolete, formats and data items may be missing, out of date, or contain errors. Incomplete or noisy addresses therefore need to be cleaned and standardised first before they can be used for geocoding.
The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of improved techniques for geocoding of noisy and incomplete address data. We have so far developed a prototype geocoding match engine including a simple Web interface. Our aim is to include these geocoding techniques into our open source data linkage system Febrl. A publication describing this project can be found here.
Honours student research project
The aim of this honours research project is to further develop geocoding techniques and to integrate them into an open source data linkage system. Issues that will be addressed include efficient indexing of records for fast retrieval, cleaning and standardisation of input records, fuzzy matching (including scoring/rating of matches), processing of individual records as well as batch processing, and reverse geocoding (i.e. searching the nearest address for a given coordinate). The geocoding system should be implemented in ways so it can be run as a service across the Internet (and intranet), using standard protocols such as HTTP, XML-RPC, or SOAP.
A student working on this project would be involved through
- Participation in an exciting research project.
- Exploration and development of geocoding and related techniques.
- Development of novel privacy-preserving geocoding services.
- Object-oriented cross-platform programming (using Python and friends).
- Open source software development (see http://sourceforge.net/projects/febrl).
For more information, publications, and further links regarding our data linkage project please visit the project web page at http://datamining.anu.edu.au/linkage.html.
Automated Probabilistic Name and Address Standardisation
Supervisor: Dr Peter ChristenE-Mail: Peter.Christen@anu.edu.au
Background
Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), police/intelligence or telecommunications. Data mining techniques are used to analyse such large data sets to find patterns and rules, or to detect outliers. In many data mining projects information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such linkages is to merge all records relating to the same entity, such as a customer or patient. Most of the time the linkage process is challenged by the lack of a common unique identifier, and thus becomes non-trivial. Probabilistic linkage techniques have to be applied using personal identifiers like names, addresses, dates of birth, etc.
As most real world data collections contain noisy, incomplete and incorrectly formatted information, data cleaning and standardisation are important pre-processing steps for successful data linkage, and before such data can be loaded into data warehouses or used for further analysis. Data may be recorded or captured in various, possibly obsolete, formats and data items may be missing, out of date, or contain errors. The cleaning and standardisation of names and addresses is especially important, to make sure that no misleading or redundant information is introduced (e.g. duplicate records).
The main task of data cleaning and standardisation is the conversion of the raw input data into well defined, consistent forms and the resolution of inconsistencies in the way information is represented or encoded. Rule-based data cleaning and standardisation as currently done by most commercial systems is cumbersome to set up and maintain, and often needs adjustments for new data sets.
The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the improvement of techniques for data cleaning and standardisation, and record linkage. In a recent honours project (J. Zhu, Probabilistic Address Segmentation and Record Linkage, 2002) we have developed new probabilistic techniques based on hidden Markov models (HMMs) which showed to achieve better standardisation accuracy and are easier to set-up and maintain compared to a popular commercial linkage software. A paper describing our approach is available online.
Honours student research project
The aim of this honours research project is to further explore how hidden Markov model (HMM) techniques can achieve better accuracy and facilitate the data cleaning and standardisation tasks, with the ultimate aim to have a fully automated data cleaning and standardisation system which needs minimal user configuration and intervention. For example, we would like to explore the use of the forward-backwards algorithm for developing HMMs without explicitly specifying the hidden states, or explore the utility of the Baum-Welch EM (expectation maximisation) algorithm in optimising the probabilities in a HMM, or the use of fuzzy look-up table for tagging lists so that a user does not have to specify all possible misspelling of words.
A student working on this project would be involved through
- Participation in an exciting research project.
- Exploration and development of machine learning and data mining techniques.
- Object-oriented cross-platform programming (using Python and friends).
- Open source software development (see http://sourceforge.net/projects/febrl).
For more information, publications, and further links please visit the project web page at http://datamining.anu.edu.au/linkage.html.
Probabilistic Temporal Planning
Supervisor: Dr Doug Aberdeen, Dr Sylvie Thiébaux, Dr Olivier BuffetE-Mail: doug.aberdeen@anu.edu.au, sylvie.thiebaux@anu.edu.au, olivier.buffet-AT-nicta.com.au
Background
Planning is the process of finding which sequence of decisions to
take so as to achieve a given goal. We are currently working on a large
planning domain aiming at scheduling tasks with probabilistic outcomes
[1]. To our knowledged, the tool which has been developed is the first
one handling time, resources and probabilities at the same time.
Various questions can be addressed in this framework. Improving the
planning algorithms is a major issue, may that be in the framework of
Markov Decision Processes or more classical planning. Other issues as
plan visualization or human-machine interaction are of interest, as the
plan should be easy to understand and the user may need to give
feedback to the software (by setting new constraints for planning).
There are several sub-projects available. We highly encourage
interested students to come and talk to us to discuss specific
possibilities. Software used in this project may end up being use
by the Defence Science Technology Organisation.
References
[1] D. Aberdeen and S. Thiébaux and L. Zhang,
"Decision-Theoretic Military Operations Planning", in Proceedings of
the Fourteenth International Conference on Automated Planning and
Scheduling (ICAPS'04), Whistler, Canada, June 2004.
Sport Signal Analysis
Supervisor: Dr Olivier Buffet, Dr Doug AberdeenE-Mail: olivier.buffet-AT-nicta.com.au, doug.aberdeen@anu.edu.au
Background
Kinetic Performance Technology is developing an on-line performance
analysis service (GymAware) which takes data collected by sensors in a
strength training gym and produces detailed reports to assist coaches
in the preparation and training of athletes. The system is currently
under development and relies on a robust analysis of the signal:
identifying phases in the athlete's moves (i.e. segmenting the signal)
should be made with great accuracy.
The first system developped for this segmentation was a rule-based
approach. The main aim of this project is to replace this by an
algorithm which should learn how to make this segmentation through
examples. This should be achievable through graphical models such as
Hidden Markov Models [1] or Conditional Random Fields [2]. The project
aims at determining more precisely which machine learning tools can be
used, making a theoretical and practical study in the framework of
GymAware.
References
[1] L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected
Applications in Speech Recognition", IEEE ASSP Magazine, pp 257--286,
February 1989, 77 (2).
[2] J. Lafferty, A. McCallum, and F. Pereira. Conditional random
fields: probabilistic modeling for segmenting and labeling sequence
data. In 18th International Conference on Machine Learning ICML, 2001.
Automatic Hierarchy Discovery
Supervisor: Dr Olivier BuffetE-Mail: olivier.buffet-AT-nicta.com.au
Background
Reinforcement Learning consists in learning a policy (how to act in
various situations) in order to maximise a payoff (which describes a
goal to achieve). Considering probabilistic environments, we work in
the framework of Markov Decision Processes (MDP) [1], where an action
may lead to different outcomes. As state spaces may be very large, it
is of major interest to consider a given problem as a hierarchy of
problems ("divide and conquer" approach). Unfortunately, if various
works have led to efficient algorithms exploiting such a hierarchical
structure, automatically discovering a hierarchy remains a very
difficult task.
A recent work [2] has proposed a first approach to automatic hierarchy
discovery, based on analysing which observed variables seem to be the
most dependent on the decisions made. Yet, this algorithm relies on an
appropriate choice of variables. The main concern of this project is
therefore to find how to get rid of this choice of variables, the
system being able to create "artificial variables" if they seem more
appropriate than the original ones.
References
[1] L. Kaelbling, M. Littman and A. Moore, "Reinforcement Learning:
A Survey", Journal of Artificial Intelligence Research, 4, pp 237--285,
1996.
[2] B. Hengst, "Discovering Hierarchy in Reinforcement Learning with
HEXQ", in Proceedings of the Nineteenth International Conference on
Machine Learning (ICML'02), pp 243--250, 2002.
Dr Pascal Vuylsteker's projects
Pascal Vuylsteker's honours projects.Efficient Tableaux Provers for Modal Logics Using the Tableau WorkBench
Supervisor: A/Prof Rajeev Gore and Dr Pietro AbateE-Mail: rajeev.gore@anu.edu.au
Background
Tableau calculi are now routinely used to determine whether or not a
particular statement A is a logical consequence of another collection
of statements Gamma in some (non-classical) logic L. Such provers have
applications in Hardware Verification, Artificial Intelligence and
Hybrid Systems. For example:
(i)
Gamma is a logical description of an
digital circuit and A is a logical description of some desired
property that you want to hold of the circuit.
(ii)
Gamma is a logical description of the knowledge and beliefs of a
collection of artificial agents and A is a logical query like "Do you
believe that agent x knows that agent y knows that ...?"
(iii) Gamma is a logical description of allowable flight
patterns through the three dimensional space around an
airport and A is a query asking whether one particular aeroplane flight
path is legal.
Pietro Abate has built a generic theorem prover called the Tableau WorkBench (http://rsise.anu.edu.au/~abate/twb/twb.html). This prover can be programmed with a new set of tableau rules to create a theorem prover for your favourite logic.
Honours student research project
Your task will be to join the research that we are doing on tableau-based theorem proving by creating theorem provers for various non-classical logics. For starters you would be expected to create provers for the 15 basic modal logics that can be built from the modal axioms for reflexivity, transitivity, euclidean-ness, symmetry, and seriality. Most of these logics already have known tableau calculi, but for some you would have to invent your own (with our help).
You will need to become familiar with sequent and tableau proof calculi for various nonclassical logics, and become familiar with modal logic (local expertise abounds). You will need to become familiar with the TWB. This project is likely to be extendible to a Phd topic and our previous students have published their work in international conferences and journals (see my home page).
You will need a strong background in theoretical computer science or mathematics. A grounding in functional programming and logic would be useful but is not essential as there is plenty of local expertise in this area. This project is ideal for students who wish to pursue a Phd in any area of theoretical computer science.
For more information feel free to email Raj to arrange a meeting, or
email Pietro for a demonstration of the TWB.
http://rsise.anu.edu.au/~abate/twb/twb.html
http://rsise.anu.edu.au/~rpg
Visualising Proof Trees for the Tableau WorkBench
Supervisor: A/Prof Rajeev Gore and Dr Pietro AbateE-Mail: rajeev.gore@anu.edu.au
Background
Tableau calculi are now routinely used to determine whether or not a particular statement A is a logical consequence of another collection of statements Gamma in some (non-classical) logic L. Such provers have applications in Hardware Verification, Artificial Intelligence and Hybrid Systems. Tableau calculi (and sequent calculi) can be efficiently automated for a broad range of logics.
Pietro Abate has built a generic theorem prover called the Tableau WorkBench [1]. This prover can be programmed with a new set of tableau rules to create a theorem prover for your favourite logic. Proofs in such calculi tend to be n-ary (cyclic) trees.
Visualizing large proof trees at the moment is an open area of research [2]. While the reasoning capabilities of interactive proof systems have increased markedly in recent years, the system interfaces have often not enjoyed the same attention as the proof engines themselves. In many cases, interfaces remain basic and under-designed.
The tableau workbench currently has only a basic user interface that makes it difficult to use for the naive user.
Honours student research project
Your task will be to join the research that we are doing on tableau-based theorem proving by adding a user interface to the TWB and modifying the TWB by adding a framework to extracting counter models from failed proofs.
Counter model extraction:
(i)
An important aspect of tableau-based theorem proving since failure
to find a proof can give useful insights to the user. The TWB
currently has no explicit mechanism for the extraction of
counter-models, although it does have a rudimentary trace facility
from which users could extract counter-models by hand.
(ii)
You will need to become familiar with sequent and tableau proof
calculi for various nonclassical logics, and become familiar with
modal logic (local expertise abounds). You will need to become
familiar with the TWB. This project is likely to be extensible to a
Phd topic and our previous students have published their work in
international conferences and journals.
Visualizing large proof trees:
(iii)
For starters you would be expected to create a widget to display
proof trees, integrate it to the existing prover and in a usable
GUI. You will need to be familiar with XML [3] and learn about XUL
[4], JS, C++, Ocaml [5] and related technologies [6]. The main goal
of this subproject is to offer a web based user interface to the TWB
as well as a general toolkit to display proof trees based on a XML
specification.
This is not just a programming task because GUIs for theorem provers are a emerging research are in their own right [2]. This work could easily lead to a new XML standard for displaying proof trees with applications in many other theorem provers apart from the TWB.
You will need a strong background computer science. A grounding in functional programming and logic would be useful but is not essential as there is plenty of local expertise in this area.
[1]
TWB Home Page
[2]
Workshop on User Interfaces for Theorem Provers
[3]
XML
[4]
XUL
[5]
OCAML
[6]
OCaml Related Technologies
For more information feel free to email Raj to arrange a meeting, or email Pietro for a demonstration of the TWB.
Numerical Simulation on Graphics Processing Units (GPU)
Supervisor: Dr Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au
Background
Over the past 10 years graphics processing units (GPU) have become ubiquitous in desktop computers. Modern GPUs now allow substantial user programability that lets them be used for more general-pupose computation. Indeed in a recent paper that was presented at SC04, Fan et al detail their efforts to construct a "GPU Cluster for High Performance Computing". In this work the authors plugged in 32 GPUs to a Xeon cluster system and in so doing increased the total theoretical peak performance of the machine by 512Gflops - for a cost just under US$13,000!!
Tracking this development there has been a number of groups developing programming environments that can exploit the GPU for general computations. Details of many of these are available at the GPGPU web site. Two of particular interest to this project are Brook and Sh. The former is built on top of C while the latter is built on top of C++. Brook for example extends standard ANSI C with the concept of a "stream", which is a new data type representing data that can be operated on in parallel.
The aim of this project is to investigate the potential for using the GPU in molecular science computations. The initial target application is likely to be N-body simulations, as typified by the simulation of atom motions. Careful consideration will be given to accuracy, since currently the GPU is limited to single precision computations. Thus the project will address two issues, i) the in principle feasibility of using the GPU for these types of problems, and ii) the scientific usefulness given the current single precision limitation.
An Automatically Tuned Message Passing Library
Supervisor: Dr Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au
Background
The majority of high performance computing platforms are cluster based systems built by linking a number of compute nodes together and programmed using message passing - usually MPI. In these systems the compute nodes typically contain a few CPUs sharing memory, while the network linking the nodes might be ethernet, Myrinet, Infiniband etc. The exact nature of the network and its topology determines communication latency and bandwidth and this may not be uniform across the entire machine.
Increasingly clusters are assembled by individual groups, such that no two clusters are identical. As a consequence it is very hard to design communication libraries (like MPI) that perform optimally on all platforms. A similar problem is encountered when trying to design numerical algebra libraries (e.g. for matrix multiplication) that perform optimally across a wide variety of processor architectures and memory heirarchies. In this case there has been much work on designing "self optimizing numerical software", e.g. the ATLAS project.
The aim of this project is to design self optimising global communication routines for modern cluster computing environments. The initial target will be global reduction operations, where data in all processes must be combined under some operation (e.g. addition). Can you design software that automatically determines optimal algorithms for a cluster of single CPU nodes compared to a cluster with 4 CPU nodes compared to a cluster with multiple network connections to each node?
Declarative theorem-proving
Supervisor: Dr Michael NorrishEmail: michael.norrish@nicta.com.au
This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities.
When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find.
The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well.
Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up.
The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data.
Implementing proof kernels
Supervisor: Dr Michael NorrishEmail: michael.norrish@nicta.com.au
At the core of LCF systems like HOL, there is a small “kernel” that implements the primitive rules of inference of the logic. There are a number of interesting issues relating to the design of these kernels. For example
- How to represent free and bound variables?
- Optimising common calculations (such as calculating the set of free variables)
Object recognition and localisation of known objects by inverting ray-tracing
Supervisor: Dr Eric McCreathEmail: Eric.McCreath@anu.edu.au
Background
Often model driven object recognition systems will extract features such as lines within a scene. These are then mapped into some invariant space and are matched against a set of known objects.[1] The quality of the recognition and localisation is dependent on the quality of the features extracted. One approach that does not require feature extraction is a direct comparison between a rendered image and the camera image. This will involve searching over a space of the object's orientation and localisation. Clear this has considerable computational overhead.
The aim of this project is to survey the current single camera object recognition and localisation approaches. Also the aim is to study the viability of an approach that simply inverts a rendering system. This project will also involve considering novel approaches that could be used to reduce the search space.
A student that takes on this project would be required to attend a weekly research meeting. Also the aim is to produce a paper from the results of this research.
[1] - Weiss and Ray, "Model-Based Recognition of 3D Objects from One View" Proc. of European Conf. on Vision, 1998
Optimizing for Design Patterns
Supervisor: Dr Steve Blackburn and Dr Alonso MarquezEmail: Steve.Blackburn@anu.edu.au and alonsomarquezes@yahoo.es
Background
The software engineering benefits of design patterns are widely understood. Consequently their use has become ubiquitous in industry. This widespread uptake is in spite of many patterns incurring significant performance penalties. This phenomena is only just emerging and so far has not received attention from the programming languages community.
Previously we have shown that bytecode transformations at class loading time can be used to make significant performance optimizations for Java programs. Our approach is powerful and unorthodox, sidestepping heavyweight pointer analyses typically used by the mainstream compiler community. This project will leverage this approach to optimize for some heavily used design patterns.
We anticipate that this work could lead to publication.
Performance profiling for Java
Supervisor: Dr Steve Blackburn and Daniel FramptonEmail: Steve.Blackburn@anu.edu.au and Daniel.Frampton@anu.edu.au
Background
A popular profiling technique is to use sampling. Periodically the program counter is sampled, and over the run of a program (involving many thousands of samples), a histogram is produced. The program counter values are mapped back to source lines of code to identify which source code is most heavily executed. This is often very helpful in debugging performance problems. Refinements to this idea include walking the stack at each sample point, to enrich the context. Another refinement is to use hardware performance counters. Hardware performance counters can be set to trigger interrupts after a given number of events (such as L2 cache misses, for example). This can be used to more specifically narrow down performance problems by identifying what code is triggering particular performance problems. Our group is leading work in Java performance and optimization and we are responsible for key research infrastructure to this end. One crucial tool we lack is the capacity to do performance counter driven profiling. This project would build a performance counter based profiler for Jikes RVM (the widely used research virtual machine), and would analyze the performance of some key Java programs which are currently not well understood.
Sane Garbage
Supervisor: Dr Steve Blackburn and Daniel FramptonEmail: Steve.Blackburn@anu.edu.au and Daniel.Frampton@anu.edu.au
Background
Garbage collection is key to modern languages like Java and C#. Our group plays a leading role in the development of garbage collection algorithms and we maintain the leading infrastructure for memory management research, MMTk. Garbage collectors are notoriously difficult to debug, as they subvert most assumptions about program behavior (by manipulating the object graph behind the back of the application). There are very few useful tools for debugging garbage collectors---typically we must resort to primitive methods such as the careful use of print statements!
This project will develop a heap sanity checker. The heap sanity checker will be completely unobtrusive, only observing, not altering the state of the system. The sanity checker will always be present in the system, though inactive unless invoked at the command line. The checker would report on dangling pointers, uncollected garbage, etc. The tool would also be extensible, allowing the addition of algorithm-specific checks (such as reference count checks for a reference counting collector). This tool would be invaluable to garbage collection research.
An allocator for machine code
Supervisor: Dr Steve Blackburn and Daniel FramptonEmail: Steve.Blackburn@anu.edu.au and Daniel.Frampton@anu.edu.au
Background
Modern virtual machines use Just In Time (JIT) compilers to dynamically generate machine code. The placement of the code in memory can have significant performance impacts. For example, if two contemporaneously hot methods are located in memory in such a was as to conflict within the instruction cache, then performance will suffer. Furthermore, the needs of code allocation are somewhat different to conventional allocation because code objects tend to be larger than normal objects and the size distribution is much less homogeneous. Finally, code allocation is rare, so a slightly more expensive allocation mechanism that is more careful with space may be far preferable. Currently in Jikes RVM code is allocating using a conventional allocator. This project would develop a new allocator that would allow the user a higher degree of flexibility in the placement of code (possibly specifying cache lines to avoid, or cache lines to favor), and would be better tailored to the space and time requirements of code allocation.
Java to C++ Translation
Supervisor: Dr Steve Blackburn and Robin GarnerEmail: Steve.Blackburn@anu.edu.au and Robin.Garner@crsrehab.gov.au
Background
MMTk is a memory management toolkit that is an important tool for memory management research. MMTk is written in a special dialect of Java which avoids dynamic memory allocation and the use of Java libraries which might perform such allocation, and adds to Java unboxed types for memory primitives such as Address, Word, and ObjectReference. The result is a powerful, flexible and high performing toolkit which allows researchers to rapidly experiment with garbage collection ideas. MMTk was implemented within Jikes RVM, a Java virtual machine written in Java. Since then MMTk has been ported to another virtual machine written in C. Rather than port the source code from Java to C, it is desirable to somehow produce a C library usable by the C program. In the first instance, this was done using the gnu compiler for java, gcj. We would like a more general solution. This project will write a Java to C++ translator for MMTk. The limited subset of Java used by MMTk makes the project feasible. We envisage the automatic translator both generating C++ files and headers with inline functions in them, allowing key elements of the interface to be inlined into the host virtual machine, greatly improving performance.
Facial Expression Tracking
Supervisors: Dr Roland Goecke and Prof Tom GedeonEmail: Roland.Goecke@nicta.com.au and Tom.Gedeon@anu.edu.au
Background
As computer systems become more and more part of our daily life, the issues of HCI and user-adaptive systems are highly important. In the past, the user had to adapt to the hardware, but the trend nowadays is clearly towards more human-like interaction through user-sensing systems. Such interaction is inherently multi-modal and it is that integrated multi-modality that leads to robustness in real-world situations. One new area of research is affective computing, i.e. the ability of computer systems to sense and adapt to the affective state ('mood', 'emotion') of a person. An essential capability is to track and recognise facial expressions.
Project
The aim of this project is to perform some experiments on facial expression tracking and recognition. This project suits a student who is interested in human-computer interaction with an emphasis on computer vision / image processing.
The project involves an evaluation of currently used methods for facial feature tracking, the implementation of the most promising ones, an evaluation of features representative of facial expressions, and experimental work on automatically recognising some facial expressions. The facial feature tracking involves the implementation of solutions to the basic problems of finding a face in an image by skin colour modelling and tracking the face by an appropriate face model (e.g. active appearance models). Furthermore, methods need to be analysed and developed for the tracking of facial features like the mouth or the eyebrows. Based on the locations and movements of these facial features, models need to be build that are representative of certain facial expressions, so that these can be recognised.
Audio-Video Speech Recogniser
Supervisors: Dr Roland Goecke and Prof Tom GedeonEmail: Roland.Goecke@nicta.com.au and Tom.Gedeon@anu.edu.au
Background
This project is also in the area of human-computer interaction. Automatic speech recognition (ASR) software has reached a state, where it can be employed successfully in environments that don't have a lot of acoustic background noise, like an office for example. However, there performance is still low in noisy environments, for example in a car or outdoors. One way of overcoming these problems is through the addition of video from the person talking. Visual speech recognition is successfully used by people who are able to lip-read and it has been shown to improve ASR performace as well.
Project
This project is suitable for a student who is interested in HCI research, with an emphasis on signal processing in the audio and video modality.
The aim of this project is to build an audio-video (AV) ASR system and to perform experiments on different ways of combining the audio and video data to see which gives the best results. ASR systems often use statistical models like Hidden Markov Models (HMM) and this technique shall also be used here. The HMMs are built based on training data and then they can be used in the recognition experiments. A convenient way of building HMMs is through the Hidden Markov Toolkit (HTK). The Audio-Video Australian English Speech data corpus AVOZES will be used to provide both training and experimental data. The project will also investigate different visual features to evaluate the usefulness of geometric versus image-based features. Geometric features are based on facial feature points (e.g. lip corners) and their movements while speaking. Image-based features derive speech-related information from the pixel values of, for example, the mouth region. Both approaches have their pros and cons, and their respective performance will be compared.
Symbolic Optimization of Robot Dynamics Calculations
Supervisor: Dr Roy FeatherstoneEmail: Roy.Featherstone@anu.edu.au
Background
Robot dynamics functions calculate a robot's equations of motion -- either the acceleration resulting from a given force, or the force needed to produce a given acceleration. They are used in control systems and simulators, and there is a need for them to be efficient. The most effective way to optimize their efficiency is a technique called symbolic optimization: the dynamics algorithm is executed symbolically, instead of numerically, resulting in an ordered list of expressions describing what calculations the algorithm performed when applied to a particular robot. This list of expressions is then fed to a symbolic optimizer, which prunes and simplifies these expressions as much as possible, and then outputs the source code of the simplified calculation list. The resulting code typically runs about 10 times faster than the original.
Project:
- Write a program to apply dynamics algorithms symbolically to a model of a robot, outputting a file containing an ordered list of assignment statements.
- Write a program to read a file containing an ordered list of assignment statements, apply various optimizations to the list, and output the source code of a function to perform the simplified version of the calculations.
Agent Negotiation
Supervisor: Dr Roger ClarkeEmail: Roger.Clarke@anu.edu.au
Background
Investigate the practicability of implementing agents, and negotiations between agents, in particular by measuring the increase in the complexity of code as the complexity of the interactions between the two agents increases.
The suggested implementation context is the W3C's Platform for Privacy Preferences (P3P) specification. This defines how client-side software can store people's privacy preferences, and server-side software can store privacy policy statements by corporations and government agencies. The two agents can then negotiate with one another in order to permit a transaction to be entered into (such as the provision of shoe-size and credit-card details), to prevent the transaction, or to refer a mismatch to the consumer for a decision.
It is envisaged that a succession of prototypes of increasing completeness would be implemented, and key process and product factors would be measured. This would depend on a thorough appreciation of theories relating to agents, P3P, software development and maintenance, software complexity, and development and maintenance productivity.
Some background reading is at:
- http://www.anu.edu.au/people/Roger.Clarke/DV/P3POview.html
- http://www.anu.edu.au/people/Roger.Clarke/DV/P3PCrit.html
Conception, Design and Implementation of Nyms
Supervisor: Dr Roger ClarkeEmail: Roger.Clarke@anu.edu.au
Background
Most network protocols deal in Entities and Identifiers.
An Entity (which covers things as diverse as a person, a company, a network-connected device, and a process) has precisely one Identity (which is, very roughly speaking, its 'essence'. In the case of a device or process, that might be operationalised as the specification of the functions it performs).
An Entity has one or more Identifiers, each of which is a data-item or group of data-items which reliably distinguish it from other Entities, especially those of the same class.
In complex networks, this model is too simplistic, and two additional concepts are necessary.
A Role is a particular presentation of an Entity. An Entity may have many Roles; and a Role may be associated with more than one Entity. As examples, think of a SIM Card as an Entity, and the multiple Mobile-Phone housings into which it is successively placed as Roles; and then there are the many Roles that you play yourself, as student, worker, sportsperson, voter, dole-bludger, scout-master, tax-payer, lover ... There are various ways in which you can accidentally or on purpose enable someone else to adopt one of your Roles (e.g. give them your password; although there are juicier examples than that).
A Nym is a data-item or group of data-items which reliably distinguishes a Role. However, because a Role is not reliably related to an Entity, there is no reliable mapping between a Nym and the underlying Entity or Entities (i.e. the mapping is not only m:n, but it's not determinable).
I'm particularly interested in Nyms because of the vital part they are going to play in keeping us reasonably sane and reasonably free, as the State and the Hypercorps increasingly abuse personal data in order to impose themselves more and more on individuals. But of course they could turn out to be important within open networks too, quite independently of privacy concerns. I'd like to see some serious research done, drawing on the emergent literature, and performing some laboratory experimentation.
Here are some starting materials (these happen to be on my own site, but
they point to plenty of other sources as well):
Concepts;
human identity;
tools;
digital persona;
PKI;
Notes
on the relevant section of the Computers, Freedom & Privacy Conference in
1999;
some references;
Intro (needs some updating);
Inet;
tracking crims.
P2P File-Sharing Technologies
Supervisor: Dr Roger Clarke
http://www.anu.edu.au/people/Roger.Clarke
E-mail:
Roger.Clarke@anu.edu.au
Background
Enormous tensions currently exist between, on the one hand, the need for musicians and music publishers to earn revenue and, on the other, the desire of consumers to get their music for free.
Peer-to-peer file-sharing technologies started out as centralised repositories, then became centralised directories of dispersed repositories (the Napster model), and are rapidly maturing into forms in which both the repositories and the directories are dispersed (the Gnutella model). The FastTrack model (which underlies Kazaa and several other services) uses a two-tier refinement to the fully-dispersed model.
But do they work? On the one hand, are there still choke-points? And on the other, is it feasible to run both anarchic, revenue-denying schemes and paid services all using the one architecture? Are there significant differences among the emergent products? Is a taxonomy feasible? And does such a taxonomy lead to the discovery of variants that no-one's implemented yet?
Here are some starting materials:
http://www.anu.edu.au/people/Roger.Clarke/EC/P2POview.html
http://www.anu.edu.au/people/Roger.Clarke/EC/P2PRes.html
