Honours topics
Half of your time as an honours student is spent working on a project. But first you have to find a project topic.
The "official" reference for projects proposed by potential supervisors is the CECS projects database.
There are projects there available for all levels of research, including short projects, summerscholarship projects, Honours projects, Masters and PhD projects. All potential research students at any level are urged to browse it.
If you see a project that looks interesting, email the potential supervisor about it. Don't be afraid to discuss possible variations to the listed project: what appears on the web site is generally more of a suggestion than a rigid specification.
You don't have to be constrained by what you see on the project website. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Look at the web pages of Computer Science staff to find out their research interests. Remember that projects may also be supervised by people outside the College, or even ontside the University: from CSIRO or NICTA, for instance.
Former Project Topics
For your interest, here is an archive of Honours project proposals from previous years. Some of them are quite ancient now, of course, but they may help to give you ideas of the kind of thing which is suitable.2004 Honours project proposals
Blind Data Linkage
Supervisor: Peter ChristenE-Mail: Peter.Christen@anu.edu.au
Background
Integrating or linking data from different sources increasingly becomes an important task in the preprocessing stage of many data mining projects. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. If no common unique entity identifiers (keys) are available in all data sources, the linkage needs to be performed using the available identifying attributes, like names and addresses. Data confidentiality often limits or even prohibits successful data linkage, as either no consent can be gained (for example in biomedical studies) or the data holders are not willing to openly provide their data.
- A medical research project, for example, may need to determine
which individuals, whose identities are recorded in a
population-based register of people suffering from hepatitis C
infection, also appear in a separate population-based register of
cases of liver cancer. Clearly such linkage between these two
databases involves the invasion of privacy of the individuals
whose details are stored in these databases. It is usually
infeasible to obtain consent from each individual identified in
each of the register databases - instead one or more ethics
committees or institutional review boards provide consent for the
linkage of the two databases on the behalf of the individuals
involved, after weighing the public good which is expected to
accrue from the research against the unavoidable invasion of
individual privacy which the research involves.
- Similarly, consider two research databases of genetic sequences of DNA, each of which are commercially valuable in their own right. The owners or custodians of these two databases may wish to determine whether their databases contain any sequences in common before deciding to collaborate and share their data, without having to first reveal the contents of their database to the other party, or to a third party.
The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of blind data linkage methods (i.e. methods where data linkage can be performed between different parties without the disclosure of any personal or valuable information). To achieve this goal, we are using various techniques from cryptology and related areas.
Honours student research project
The aim of this honours research project is to further develop this blind data linkage techniques. So far we have implemented simple proof-of-concept programs written in Python. Performance issues with large (real world) databases need to be addressed, as well as distributed services and Internet connectivity. One aim is to integrate blind data linkage into our open source data linkage system Febrl.
A student working on this project would be involved through
- Participation in an exciting research project in collaboration with an industry partner.
- Exploration and development of novel privacy preserving data linkage methods using algorithms and techniques from cryptography, machine learning, indexing, data mining, and distributed and remote computing.
- Object-oriented cross-platform programming (using Python and friends).
- Open source software development (see http://sourceforge.net/projects/febrl).
For more information, publications, and further links please visit the project web page at http://datamining.anu.edu.au/linkage.html.
Geocoding Addresses with Fuzzy Data Linkage
Supervisor: Peter ChristenE-Mail: Peter.Christen@anu.edu.au
Background
Geocoding is defined as the ability to create longitude and latitude coordinates (GPS coordinates) from address place name or descriptive information. Geocoding is useful where descriptive location information is available but quantitative spatial information is required for analysis. In the health sector, for example, geocoded records can be used to draw maps of disease outbreaks, or the unusual distribution or clustering of certain health indicators.
Because most real world data collections contain noisy, incomplete and incorrectly formatted information, the process of matching addresses or place names with a georeferenced database is not trivial. Addresses may be recorded or captured in various, possibly obsolete, formats and data items may be missing, out of date, or contain errors. Incomplete or noisy addresses therefore need to be cleaned and standardised first before they can be used for geocoding.
The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of improved techniques for geocoding of noisy and incomplete address data. We are also developing novel approaches for privacy-preserving geocoding services. We aim to include these geocoding techniques into our open source data linkage system Febrl.
Honours student research project
The aim of this honours research project is to further develop geocoding techniques and to integrate them into an open source data linkage system. Issues that will be addressed include efficient indexing of records for fast retrieval, cleaning and standardisation of input records, fuzzy matching (including scoring/rating of matches), processing of individual records as well as batch processing, and reverse geocoding (i.e. searching the nearest address for a given coordinate). The geocoding system should be implemented in ways so it can be run as a service across the Internet (and intranet), using standard protocols such as HTTP, XML-RPC, or SOAP.
A student working on this project would be involved through
- Participation in an exciting research project.
- Exploration and development of geocoding and related techniques.
- Development of novel privacy-preserving geocoding services.
- Object-oriented cross-platform programming (using Python and friends).
- Open source software development (see http://sourceforge.net/projects/febrl).
For more information, publications, and further links regarding our data linkage project please visit the project web page at http://datamining.anu.edu.au/linkage.html.
Automated Probabilistic Name and Address Standardisation
Supervisor: Peter ChristenE-Mail: Peter.Christen@anu.edu.au
Background
Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), police/intelligence or telecommunications. Data mining techniques are used to analyse such large data sets to find patterns and rules, or to detect outliers. In many data mining projects information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such linkages is to merge all records relating to the same entity, such as a customer or patient. Most of the time the linkage process is challenged by the lack of a common unique identifier, and thus becomes non-trivial. Probabilistic linkage techniques have to be applied using personal identifiers like names, addresses, dates of birth, etc.
As most real world data collections contain noisy, incomplete and incorrectly formatted information, data cleaning and standardisation are important pre-processing steps for successful data linkage, and before such data can be loaded into data warehouses or used for further analysis. Data may be recorded or captured in various, possibly obsolete, formats and data items may be missing, out of date, or contain errors. The cleaning and standardisation of names and addresses is especially important, to make sure that no misleading or redundant information is introduced (e.g. duplicate records).
The main task of data cleaning and standardisation is the conversion of the raw input data into well defined, consistent forms and the resolution of inconsistencies in the way information is represented or encoded. Rule-based data cleaning and standardisation as currently done by most commercial systems is cumbersome to set up and maintain, and often needs adjustments for new data sets.
The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the improvement of techniques for data cleaning and standardisation, and record linkage. In a recent honours project (J. Zhu, Probabilistic Address Segmentation and Record Linkage, 2002) we have developed new probabilistic techniques based on hidden Markov models (HMMs) which showed to achieve better standardisation accuracy and are easier to set-up and maintain compared to a popular commercial linkage software. A paper describing our approach is available online.
Honours student research project
The aim of this honours research project is to further explore how hidden Markov model (HMM) techniques can achieve better accuracy and facilitate the data cleaning and standardisation tasks, with the ultimate aim to have a fully automated data cleaning and standardisation system which needs minimal user configuration and intervention. For example, we would like to explore the use of the forward-backwards algorithm for developing HMMs without explicitly specifying the hidden states, or explore the utility of the Baum-Welch EM (expectation maximisation) algorithm in optimising the probabilities in a HMM, or the use of fuzzy look-up table for tagging lists so that a user does not have to specify all possible mis-spelling of words.
A student working on this project would be involved through
- Participation in an exciting research project.
- Exploration and development of machine learning and data mining techniques.
- Object-oriented cross-platform programming (using Python and friends).
- Open source software development (see http://sourceforge.net/projects/febrl).
For more information, publications, and further links please visit the project web page at http://datamining.anu.edu.au/linkage.html.
Agent Negotiation
| Supervisor: | Roger Clarke |
|---|---|
| E-mail: | Roger.Clarke@anu.edu.au |
| Phone: | (02) 6288 1472 |
Investigate the practicability of implementing agents, and negotiations between agents, in particular by measuring the increase in the complexity of code as the complexity of the interactions between the two agents increases.
The suggested implementation context is the W3C's Platform for Privacy Preferences (P3P) specification. This defines how client-side software can store people's privacy preferences, and server-side software can store privacy policy statements by corporations and government agencies. The two agents can then negotiate with one another in order to permit a transaction to be entered into (such as the provision of shoe-size and credit-card details), to prevent the transaction, or to refer a mismatch to the consumer for a decision.
It is envisaged that a succession of prototypes of increasing completeness would be implemented, and key process and product factors would be measured. This would depend on a thorough appreciation of theories relating to agents, P3P, software development and maintenance, software complexity, and development and maintenance productivity.
Some background reading is at:
- http://www.anu.edu.au/people/Roger.Clarke/DV/P3POview.html
- http://www.anu.edu.au/people/Roger.Clarke/DV/P3PCrit.html
Conception, Design and Implementation of Nyms
Supervisor: Roger Clarke (Visiting Fellow)
http://www.anu.edu.au/people/Roger.Clarke
E-mail:
Roger.Clarke@anu.edu.au
Phone: (02) 6288 1472
Most network protocols deal in Entities and Identifiers.
An Entity (which covers things as diverse as a person, a company, a network-connected device, and a process) has precisely one Identity (which is, very roughly speaking, its 'essence'. In the case of a device or process, that might be operationalised as the specification of the functions it performs).
An Entity has one or more Identifiers, each of which is a data-item or group of data-items which reliably distinguish it from other Entities, especially those of the same class.
In complex networks, this model is too simplistic, and two additional concepts are necessary.
A Role is a particular presentation of an Entity. An Entity may have many Roles; and a Role may be associated with more than one Entity. As examples, think of a SIM Card as an Entity, and the multiple Mobile-Phone housings into which it is successively placed as Roles; and then there are the many Roles that you play yourself, as student, worker, sportsperson, voter, dole-bludger, scout-master, tax-payer, lover ... There are various ways in which you can accidentally or on purpose enable someone else to adopt one of your Roles (e.g. give them your password; although there are juicier examples than that).
A Nym is a data-item or group of data-items which reliably distinguishes a Role. However, because a Role is not reliably related to an Entity, there is no reliable mapping between a Nym and the underlying Entity or Entities (i.e. the mapping is not only m:n, but it's not determinable).
I'm particularly interested in Nyms because of the vital part they are going to play in keeping us reasonably sane and reasonably free, as the State and the Hypercorps increasingly abuse personal data in order to impose themselves more and more on individuals. But of course they could turn out to be important within open networks too, quite independently of privacy concerns. I'd like to see some serious research done, drawing on the emergent literature, and performing some laboratory experimentation.
Here are some starting materials (these happen to be on my own site, but
they point to plenty of other sources as well):
Concepts;
human identity;
tools;
digital persona;
PKI;
Notes
on the relevant section of the Computers, Freedom & Privacy Conference in
1999;
some references;
Intro (needs some updating);
Inet;
tracking crims.
File-Sharing Technologies
Supervisor: Roger Clarke (Visiting Fellow)
http://www.anu.edu.au/people/Roger.Clarke
E-mail:
Roger.Clarke@anu.edu.au
Phone: (02) 6288 1472
If you didn't do COMP3410 - Information Technology in Electronic Commerce in Semester 2, 2000, the assignment that I set was: "Enormous tensions currently exist between, on the one hand, the need for musicians and music publishers to earn revenue and, on the other, the desire of consumers to get their music for free. Provide constructive suggestions as to how technology might be used to address these problems".
There are some leads in the slides for Lecture 5, at: http://www.anu.edu.au/people/Roger.Clarke/EC/ETIntro.html#LOutline.
File-sharing technologies started out as centralised repositories, then became centralised directories of dispersed repositories (the Napster model), and are rapidly maturing into forms in which both the repositories and the directories are dispersed (the Gnutella model).
But do they work? On the one hand, are there still choke-points? And on the other, is it feasible to run both anarchic, revenue-denying schemes and paid services all using the one architecture? Are there significant differences among the emergent products? Is a taxonomy feasible? And does such a taxonomy lead to the discovery of variants that no-one's implemented yet?
Here are some starting materials:
http://www.anu.edu.au/people/Roger.Clarke/EC/FDST.html; http://www.anu.edu.au/people/Roger.Clarke/EC/KingEP.html.A (still growing, probably incomplete) catalogue of technologies: http://www.anu.edu.au/people/Roger.Clarke/EC/FDST.html#Friends.
http://www.anu.edu.au/people/Roger.Clarke/EC/Bled2K.html.Declarative theorem-proving
Supervisor: Michael NorrishEmail: Michael.Norrish@cs.anu.edu.au
This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities.
When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find.
The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well.
Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up.
The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data.
Implementing proof kernels
Supervisor: Michael NorrishEmail: Michael.Norrish@cs.anu.edu.au
At the core of LCF systems like HOL, there is a small “kernel” that implements the primitive rules of inference of the logic. There are a number of interesting issues relating to the design of these kernels. For example
- How to represent free and bound variables?
- Optimising common calculations (such as calculating the set of free variables)
E-Publishing formats and techniques
Supervisors: Tom Worthington, Ian BarnesEmail: Tom.Worthington@tomw.net.au
In theory XML has provided a universal format for the communication of documents. In practice a lack of standards and easy to use software has slowed progress. In this topic you will research e-publishing formats and techniques. You will propose an XML based format for publishing of scholarly work and build a prototype system allowing for quickly creating on-line scholarly journals. One approach to be investigated is to adapt the Open Office open source package interfaced to the Plone open source content management system to produce an on-line pre-flight system, as discussed in unit Comp 3410. A scholarship for this work is available from the Australian Computer Society, with funding to include presentation of a paper at the Open Publish 2004 conference in Sydney.
See: Electronic Publishing Options for Academic Material, Tom Worthington, For: Computing 3410 Students, 20 August 2002: http://www.tomw.net.au/2002/epo.html
Agent based simulation of Capability Models
Supervisor: Shayne FlintSee http://se.anu.edu.au/2004_flint_1.html.
MMTk: Toolkit based GC for multiple runtimes
Supervisor: Steve BlackburnEmail: Steve.Blackburn@anu.edu.au
There are a number of projects under this title, all of which involve performance analysis, system implementation and investigation of programming language runtimes. MMTk is a toolkit for efficiently implementing memory management algorithms. MMTk It is the memory manager for the Jikes RVM Java virtual machine and is being ported to other runtimes including OVM (Java) and Rotor (C#). Because MMTk implements numerous algorithms and can run within more than one language, it provides an ideal context for performance analysis of memory management algorithms, both within and among langauge contexts. One project will integrate MMTk into another runtime (a number of candidates exist, including Mercury, nhc98, GNU smalltalk, Gwydion Dylan, Bigloo Scheme, OCaml, and SmartEiffel). This project follows on from the work of Robin Garner and will most likely have input from him. Another project will investigate the performance of a port of MMTk to a C runtime (CMTk). CMTk is used needed by any runtime that is implemented in C or C++ (MMTk is writen in a dialect of Java). Robin Garner did the initial implementation of CMTk. This project will investigate a range of techniques for improving its performance. These will include performing code transformations to translate Java into C code.
Support for accurate GC in gcc
Supervisor: Steve BlackburnEmail: Steve.Blackburn@anu.edu.au
This is an ambitious project that will investigate extending gcc to produce GC maps (maps which show where pointers exist) and thus allow strongly typed languages (such as Java) to be compiled with gcc and yet be collected with "exact" garbage collectors. Currently such implementations (eg gcj) are restricted to using "conservative" garbage collectors.
Performance analysis of Java
Supervisor: Steve BlackburnEmail: Steve.Blackburn@anu.edu.au
Java is very important commercially, and yet the runtime systems required to support it and other languages such as C# are relatively immature. This project will involve measurement and analysis of the performance of Java programs, with a view to better understanding where there are opportunities for performance improvement. This work will involve using and modifying MMTk, a memory management toolkit for JikesRVM.
OpenMP for Cluster Computers
Supervisor: Alistair RendellOpenMP is a directive based approached to shared memory parallel programming. This means you put a few directives in your code to partition the work associated with eg for loops. You could also view it as a simple interface to thread programming.
From Japan there is a cluster computing environment called score/scash/omni. Part of this is an OpenMP implementation for clusters. This is OpenMP layered on top of a paged based software distributed memory model. I'm very intersted in the performance of this and the use of data distribution and affinity clauses (how threads and memory get mapped to specific physical locations in the cluster) and this relates closely to work in the CC-NUMA project.
This project would involve getting the latest version of the
omni/scash software and intsalling it on the Bunyip cluster.
You would measure performance using for example the
From using a calculator we are all familiar with the fact that many
numbers must, at some level, be truncated and rounded. Thus 2/3 may
appear as 0.67, 0.66667, or 0.6666667 etc depending on the precision
of your calculator. More generally representing the continuum of floating
point numbers by the discrete finite representations used in all
modern computers gives rise to rounding errors, overflows, underflows
etc. Failure to account for these limitations can give rise to
spectacular (expensive!!) results, e.g. the failure of some Patriot
missiles to hit their targets during the Gulf War or the
explosion of the Ariane 5 space rocket (see
disasters)
Interval arithmetic is a fundamental paradigm shift that attempts to
address these issues. It defines an "interval" as a continuum of real
numbers bounded by its end-points and performs all arithmetic
operations with respect to intervals rather than discrete floating
point numbers. As a consequence of this errors are automatically
propagated and accounted for. Moreover the mathematical properties of
intervals can be exploited to develop entirely new algorithms.
The concept of intervals is actually not new, but was considered by
Moore in 1965. What is relatively new is the fact that some
hardware vendors are now providing direct support for interval
arithmetic within their compilers/hardware.
The aim of this project will be investigate the use of interval
arithmetic for computational science applications. The initial goal
will be to use intervals to study error propagation for some
"problematic" algorithms found within the Gaussian computational
chemistry code. A second goal will be to investigate the use of
intervals for nonlinear global optimizations, eg determining the
most stable conformation (geometry) of a large molecular system (eg
protein).
For more information on interval arithmetic see the
interval home page.
Every function, no matter how complicated, is executed on a computer as a
series of elementary operations such as additions, mutliplications,
divisions and elementary functions like exp(). By repeated application
of the chain rule to those elementary operations it is possible to
obtain the derivative of any computed function in a completely
mechanical fashion. This technique is applicable to computer programs of
arbitarary length containing loops, branches and procedures. In
contrast to explicit evaluation and coding of a derivative, automatic
differentiation enable derivatives to be easily updated when the
original code is changed.
A number of
programs are now available to perform automatic
differentiation for code written in a variety of languages. The aim of
this project would be to use one of these tools to perform derivative
evaluations for some computational science applications. Specifically
in the modeling of systems at the atomic level derivatives with
respect to atomic positions are hugely important determining things
like the frequencies at which molecules vibrate. Myself and others
have spent large quantities of time and effort deriving expresssions
for derivatives and then coding them. To automate this process and for
it to be relatively efficient would be very significant. Your
objective would be to explore this issue for some relatively simple
computational molecular science codes.
Machine learning may be defined as learning a description of a concept
from a set of training examples. This description may then be used for
prediction. This project involves investigating the application of
Machine Learning with an operating system to pre-fetch blocks. The
central outcomes of this research will be an analysis of machine
learning approaches to undertake such a task. This would also involve
forming a theoretical model of the domain in question, which would be
vital for the analysis. This project builds on work done be an
honours student last year.
Details:
Operating Systems often employ read-ahead windowing approaches for
sequentially accessed files. Basically the next part of a file is
pre-fetched into memory even before it is requested. This greatly
improves performance of the system.
Often when we interact with computers the same files are required
during our interaction. For example we may always sit down and
start-up our mail reader. If the computer could learn this pattern of
usage and pre-fetch these the blocks for these files. Hence, the
responsiveness of the system could be improved. This project involves
developing such a system and investigating its potential.
Within machine learning there are a number of issues that make this
project both challenging and interesting. These include: the temporal
nature of the data, the large number of examples, how the leant
hypothesis could be applied to recommend which files to
pre-fetch. Also within Operating Systems there are a number of
challenges, such as how the system hooks into the kernel and how to
measure the performance of the system when it is functioning with the
operating system.
The results of this research have the potential to be published in
both Operating Systems and Machine Learning publication. The project
would also form a good starting point for graduate study.
Students will need to be able to reason logically and should have done
well in theory subjects (either within Mathematics or Computer
Science). Successful completion will require both determination and
imagination. Finally, student considering embarking on this project
must have good programming skills and will need to gain an
understanding of the Linux kernel.
The Internet Futures group of the ANU is working with high-speed
networks, advanced network services, visualisation and
telecollaboration technologies, and large scale data facilities, to
support a diverse range of user communities. Many of its activities
have a national focus, e.g. through the GrangeNet project.
In the large-scale-data area, the users include Physics, Astronomy,
BioInformatics and Cultural Archives (e.g. audio, spoken and music, or
video). Archives can be easily many Terabytes (1000's of Gigabytes)
in size, need to be delivered to users on networks ranging from
multi-Gigabit down to slow dial-up modem speeds, and may need to
interface to large-scale computing systems.
The IF group has several potential projects feasible for an honours
project. These include
Technologies in these fields include grid toolkits (Globus for
example), metadata standards, various network protocols and
performance issues, large scale databases and database/directory
access methods, and web services and SOAP. Programming skills in C,
C++, Java, Perl, Python, are all useful.
You would get to work with a range of users across the ANU, on
real-world problems. You'd also get access to the APAC Mass Store
facility (a 1.2 Petabyte (1200 Terabyte) tape robot) and associated
computing hardware including possibly the APAC National Facility
supercomputer, the ANU Supercomputer Facility's Visualisation
Laboratory (VizLab) , and the Internet Futures' Access Grid (AG)
(telecollaboration) nodes in various locations on campus.
You can go as deep into these areas as you like (although we have some
ideas), and maybe even come back for a PhD if you're really keen :-)
TeleCollaboration can be seen as very high-end videoconferencing, with
many additional features. Some of these features require the user's
space to be intelligent or flexible, others require advanced network
or application services. Any kind of collaboration, for research and
for education can be supported. This could include music teaching
(masterclasses), architecture, history or medical training with
visualisation or even virtual reality interfaces, and large-scale
data browsers and instrument consoles (e.g. in Astronomy or
TeleMicroscopy).
The Internet Futures' telecollaboration activities revolve mainly around Access Grid (AG)
nodes. There are 3 full-size nodes being built at the ANU (including
one in the CS&IT building), and several additional nodes are being
discussed. There are also plans for smaller-scale (even desktop)
nodes, and ports to a variety of systems and other technology spaces
(e.g. the Wedge virtual reality/visualisation facility). Around
Australia 15 nodes are currently in planning or construction, and
there are over 130 nodes around the world at the moment. ANU users
will include Astronomers at Mt Stromlo Observatory (20km west of
Canberra), and the new ANU Medical School (on campus).
The underlying architecture of these systems is open and flexible, and
the ANU has the potential to become a worldwide focus of AG
development, with strong international collaboration.
The IF group has several potential projects feasible for an
honours project. These include
Technologies in these fields include IP multicast, hardware interfaces, various network protocols and performance issues, user-interface design, and audio/video codecs. Programming skills in C, C++, Java, Perl, Python, are all useful.
You can go as deep into these areas as you like (although we have some ideas), and maybe even come back for a PhD if you're really keen :-) Possibly more than one area could be looked at during the project.
This project will investigate gestural interfaces for the interactive
investigation of chaotic systems. It will start with a review of
existing systems with a particular focus on the visualisation of fluid
flows in immersive virtual reality theatres. It will examine two
existing software applications in the Wedge theatre and will use
these, and the literature review, to develop a taxonomy of
interaction. Key parts of the taxonomy will then be implemented and
subject to a formal usability test.
Chief Supervisor: Henry Gardner, Computer Science, FEIT.
Prerequisites: Knowledge and interest in logic, efficient programming
using "C" in UNIX environment, and search in AI.
Project Description:
The SAT problem is a special case of the constraint satisfaction
problem (CSP) where the variables take the Boolean value 0 or 1. CSP
itself plays an important role in artificial intelligence (AI). The
study of the SAT problem is essential because it is at the heart of
many AI problems, including automatic deduction, VLSI circuit control,
logic programming, learning, planning and scheduling.
In 1962, Davis, Logemann and Loveland presented their algorithm,
called DPLL procedure, to solve the SAT problem. Based on this
procedure, many complete systems have been created, for example,
GRASP, Posit, relsat, Satz, kcnfs, zchaff, and 2clseq. DPLL
essentially enumerates all possible solutions to a given SAT problem
by setting up a binary search tree and proceeding until it either
finds a satisfying truth assignment or concludes that no such
assignment exists. It is well known that the search tree size of a SAT
problem is generally an exponential function of the problem size, and
that the branching variable selected by a branching rule at a node is
crucial for determining the size of the sub-tree rooted at that node.
One of the main improvements to DPLL has been smart branch selection
heuristics through the use of unit propagation look-ahead (UPLA)
[1]. Recently, another sophisticated branching rule, called Dynamic
Variable Filtering (DVF), was proposed for doing more reasoning to
narrow the search tree of hard random 3-SAT problems [2]. DVF based
solver outperforms UPLA based solver with an order of magnitude in
terms of the number of branching nodes in the search tree when solving
the hard random 3-SAT problems.
The objective of this research project is:
To improve the performance of ssc355 SAT solver that integrates DVF
heuristic by avoiding the redundancy happened during the UPLA process.
To integrate backjumping techniques in ssc355 SAT solver.
References:
[1] Li, C. M., and Anbulagan. Heuristics Based on Unit Propagation for
Satisfiability Problems. In Proceedings of 15th International Joint
Conference on Artificial Intelligence, 1997, Nagoya, Japan,
pp. 366-371.
[2] Anbulagan, Thornton, J., and Sattar, A., Dynamic Variable
Filtering for Hard Random 3-Sat Problems. LNAI 2903, Proceedings of
16th Australian Conference on AI, December 2003, Perth, Australia,
pp. 100-111.
Since proofs can be mechanically transformed, it would be useful
to have a tool that would optimize proofs of programs or of
library theorems so that libraries would build faster or that
proof carrying code would download faster.
The direction of a project in this area could focus on a simple
tool that manipulated proofs in a limited domain or it could
be more mathematical and study higher order logic proof
transformations in the presence of decision procedures.
Interval Arithmetic: The New Floating-Point
Paradigm?
Supervisor:
Alistair Rendell
Automatic Differentiation
Supervisor:
Alistair Rendell
Using Machine Learning in a Linux Disk Block Pre-Fetcher
Supervisor: Eric McCreath
Internet futures projects
Supervisor: Markus Buchhorn
Email: Markus Buchhorn
Phone: 6125 8810
TeleCollaboration (videoconferencing on steroids) projects (Internet Futures)
Contact: Markus.Buchhorn@anu.edu.au, CSIT-N244, 6125-8810
Interactive Visualisations of Chaos
Supervisor: Henry Gardner
Supervisor: Prof Robert Dewar, Theoretical Physics, Research School of
Physical Sciences and Engineering.
Using Threading Techniques to speed up SMP Computer Simulation
Supervisor: Peter Strazdins
See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/ThreadedSulima.html
Efficiently Enhancing the Timing Accuracy of Computer Simulators
Supervisor: Peter Strazdins
See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/AccurateSulima.html
Performance Modelling of Parallel Programs on SMP Clusters
Supervisor: Peter Strazdins
See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/SMPClusterComm.html
A Framework for Investigating Job Scheduling on Linux Clusters
Supervisor: Peter Strazdins
See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/LinuxSchedFramework.html
Avoiding Redundancy During Unit Propagation Look-Ahead for Efficient SAT Solver
Supervisor: Anbulagan
Email: anbulagan@nicta.com.au
Phone: 6125 8632
Suitability: This project would suit a student keen to work in an area
involving logic, constraint, and search in AI.
Proof Optimization
Supervisor: Malcolm Newey
Email: Malcolm.Newey@cs.anu.edu.au
When a theorem is proved using an interactive prover (such
as the HOL system) a proof object is created that can be
replayed at a later date, for a different audience or, perhaps,
in a different system. It will typically be one of a
multitude of equivalent proofs for that same theorem, but
may not be optimal in terms of attributes such as time for
replay, size, portability, fragility etc.
