Skip navigation
The Australian National University

Honours topics

Half of your time as an honours student is spent working on a project. But first you have to find a project topic.

The "official" reference for projects proposed by potential supervisors is the CECS projects database.

There are projects there available for all levels of research, including short projects, summerscholarship projects, Honours projects, Masters and PhD projects. All potential research students at any level are urged to browse it.

If you see a project that looks interesting, email the potential supervisor about it. Don't be afraid to discuss possible variations to the listed project: what appears on the web site is generally more of a suggestion than a rigid specification.

You don't have to be constrained by what you see on the project website. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Look at the web pages of Computer Science staff to find out their research interests. Remember that projects may also be supervised by people outside the College, or even ontside the University: from CSIRO or NICTA, for instance.

Former Project Topics

For your interest, here is an archive of Honours project proposals from previous years. Some of them are quite ancient now, of course, but they may help to give you ideas of the kind of thing which is suitable.
  • 2009 Proposed Project Topics
  • 2008 Proposed Project Topics
  • 2007 Proposed Project Topics
  • 2006 Proposed Project Topics
  • 2005 Proposed Project Topics
  • 2004 Proposed Project Topics
  • 2003 Proposed Project Topics
  • 2002 Proposed Project Topics
  • 2001 Proposed Project Topics


    2004 Honours project proposals



    Blind Data Linkage

    Supervisor: Peter Christen
    E-Mail: Peter.Christen@anu.edu.au

    Background

    Integrating or linking data from different sources increasingly becomes an important task in the preprocessing stage of many data mining projects. The aim of such linkages is to merge all records relating to the same entity, such as a patient or a customer. If no common unique entity identifiers (keys) are available in all data sources, the linkage needs to be performed using the available identifying attributes, like names and addresses. Data confidentiality often limits or even prohibits successful data linkage, as either no consent can be gained (for example in biomedical studies) or the data holders are not willing to openly provide their data.

    • A medical research project, for example, may need to determine which individuals, whose identities are recorded in a population-based register of people suffering from hepatitis C infection, also appear in a separate population-based register of cases of liver cancer. Clearly such linkage between these two databases involves the invasion of privacy of the individuals whose details are stored in these databases. It is usually infeasible to obtain consent from each individual identified in each of the register databases - instead one or more ethics committees or institutional review boards provide consent for the linkage of the two databases on the behalf of the individuals involved, after weighing the public good which is expected to accrue from the research against the unavoidable invasion of individual privacy which the research involves.

    • Similarly, consider two research databases of genetic sequences of DNA, each of which are commercially valuable in their own right. The owners or custodians of these two databases may wish to determine whether their databases contain any sequences in common before deciding to collaborate and share their data, without having to first reveal the contents of their database to the other party, or to a third party.
    In both cases, comparisons need to be made between sequences of symbols - alphanumeric characters like names and addresses in the first example, nucleic bases in the second - in order to measure their similarity, without revealing what those sequences are.

    The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of blind data linkage methods (i.e. methods where data linkage can be performed between different parties without the disclosure of any personal or valuable information). To achieve this goal, we are using various techniques from cryptology and related areas.

    Honours student research project

    The aim of this honours research project is to further develop this blind data linkage techniques. So far we have implemented simple proof-of-concept programs written in Python. Performance issues with large (real world) databases need to be addressed, as well as distributed services and Internet connectivity. One aim is to integrate blind data linkage into our open source data linkage system Febrl.

    A student working on this project would be involved through

    • Participation in an exciting research project in collaboration with an industry partner.
    • Exploration and development of novel privacy preserving data linkage methods using algorithms and techniques from cryptography, machine learning, indexing, data mining, and distributed and remote computing.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).

    For more information, publications, and further links please visit the project web page at http://datamining.anu.edu.au/linkage.html.


    Geocoding Addresses with Fuzzy Data Linkage

    Supervisor: Peter Christen
    E-Mail: Peter.Christen@anu.edu.au

    Background

    Geocoding is defined as the ability to create longitude and latitude coordinates (GPS coordinates) from address place name or descriptive information. Geocoding is useful where descriptive location information is available but quantitative spatial information is required for analysis. In the health sector, for example, geocoded records can be used to draw maps of disease outbreaks, or the unusual distribution or clustering of certain health indicators.

    Because most real world data collections contain noisy, incomplete and incorrectly formatted information, the process of matching addresses or place names with a georeferenced database is not trivial. Addresses may be recorded or captured in various, possibly obsolete, formats and data items may be missing, out of date, or contain errors. Incomplete or noisy addresses therefore need to be cleaned and standardised first before they can be used for geocoding.

    The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of improved techniques for geocoding of noisy and incomplete address data. We are also developing novel approaches for privacy-preserving geocoding services. We aim to include these geocoding techniques into our open source data linkage system Febrl.

    Honours student research project

    The aim of this honours research project is to further develop geocoding techniques and to integrate them into an open source data linkage system. Issues that will be addressed include efficient indexing of records for fast retrieval, cleaning and standardisation of input records, fuzzy matching (including scoring/rating of matches), processing of individual records as well as batch processing, and reverse geocoding (i.e. searching the nearest address for a given coordinate). The geocoding system should be implemented in ways so it can be run as a service across the Internet (and intranet), using standard protocols such as HTTP, XML-RPC, or SOAP.

    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of geocoding and related techniques.
    • Development of novel privacy-preserving geocoding services.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).

    For more information, publications, and further links regarding our data linkage project please visit the project web page at http://datamining.anu.edu.au/linkage.html.


    Automated Probabilistic Name and Address Standardisation

    Supervisor: Peter Christen
    E-Mail: Peter.Christen@anu.edu.au

    Background

    Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), police/intelligence or telecommunications. Data mining techniques are used to analyse such large data sets to find patterns and rules, or to detect outliers. In many data mining projects information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such linkages is to merge all records relating to the same entity, such as a customer or patient. Most of the time the linkage process is challenged by the lack of a common unique identifier, and thus becomes non-trivial. Probabilistic linkage techniques have to be applied using personal identifiers like names, addresses, dates of birth, etc.

    As most real world data collections contain noisy, incomplete and incorrectly formatted information, data cleaning and standardisation are important pre-processing steps for successful data linkage, and before such data can be loaded into data warehouses or used for further analysis. Data may be recorded or captured in various, possibly obsolete, formats and data items may be missing, out of date, or contain errors. The cleaning and standardisation of names and addresses is especially important, to make sure that no misleading or redundant information is introduced (e.g. duplicate records).

    The main task of data cleaning and standardisation is the conversion of the raw input data into well defined, consistent forms and the resolution of inconsistencies in the way information is represented or encoded. Rule-based data cleaning and standardisation as currently done by most commercial systems is cumbersome to set up and maintain, and often needs adjustments for new data sets.

    The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the improvement of techniques for data cleaning and standardisation, and record linkage. In a recent honours project (J. Zhu, Probabilistic Address Segmentation and Record Linkage, 2002) we have developed new probabilistic techniques based on hidden Markov models (HMMs) which showed to achieve better standardisation accuracy and are easier to set-up and maintain compared to a popular commercial linkage software. A paper describing our approach is available online.

    Honours student research project

    The aim of this honours research project is to further explore how hidden Markov model (HMM) techniques can achieve better accuracy and facilitate the data cleaning and standardisation tasks, with the ultimate aim to have a fully automated data cleaning and standardisation system which needs minimal user configuration and intervention. For example, we would like to explore the use of the forward-backwards algorithm for developing HMMs without explicitly specifying the hidden states, or explore the utility of the Baum-Welch EM (expectation maximisation) algorithm in optimising the probabilities in a HMM, or the use of fuzzy look-up table for tagging lists so that a user does not have to specify all possible mis-spelling of words.

    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of machine learning and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).

    For more information, publications, and further links please visit the project web page at http://datamining.anu.edu.au/linkage.html.


    Agent Negotiation

    Supervisor: Roger Clarke
    E-mail: Roger.Clarke@anu.edu.au
    Phone: (02) 6288 1472

    Investigate the practicability of implementing agents, and negotiations between agents, in particular by measuring the increase in the complexity of code as the complexity of the interactions between the two agents increases.

    The suggested implementation context is the W3C's Platform for Privacy Preferences (P3P) specification. This defines how client-side software can store people's privacy preferences, and server-side software can store privacy policy statements by corporations and government agencies. The two agents can then negotiate with one another in order to permit a transaction to be entered into (such as the provision of shoe-size and credit-card details), to prevent the transaction, or to refer a mismatch to the consumer for a decision.

    It is envisaged that a succession of prototypes of increasing completeness would be implemented, and key process and product factors would be measured. This would depend on a thorough appreciation of theories relating to agents, P3P, software development and maintenance, software complexity, and development and maintenance productivity.

    Some background reading is at:


    Conception, Design and Implementation of Nyms

    Supervisor: Roger Clarke (Visiting Fellow)
    http://www.anu.edu.au/people/Roger.Clarke
    E-mail: Roger.Clarke@anu.edu.au
    Phone: (02) 6288 1472

    Most network protocols deal in Entities and Identifiers.

    An Entity (which covers things as diverse as a person, a company, a network-connected device, and a process) has precisely one Identity (which is, very roughly speaking, its 'essence'. In the case of a device or process, that might be operationalised as the specification of the functions it performs).

    An Entity has one or more Identifiers, each of which is a data-item or group of data-items which reliably distinguish it from other Entities, especially those of the same class.

    In complex networks, this model is too simplistic, and two additional concepts are necessary.

    A Role is a particular presentation of an Entity. An Entity may have many Roles; and a Role may be associated with more than one Entity. As examples, think of a SIM Card as an Entity, and the multiple Mobile-Phone housings into which it is successively placed as Roles; and then there are the many Roles that you play yourself, as student, worker, sportsperson, voter, dole-bludger, scout-master, tax-payer, lover ... There are various ways in which you can accidentally or on purpose enable someone else to adopt one of your Roles (e.g. give them your password; although there are juicier examples than that).

    A Nym is a data-item or group of data-items which reliably distinguishes a Role. However, because a Role is not reliably related to an Entity, there is no reliable mapping between a Nym and the underlying Entity or Entities (i.e. the mapping is not only m:n, but it's not determinable).

    I'm particularly interested in Nyms because of the vital part they are going to play in keeping us reasonably sane and reasonably free, as the State and the Hypercorps increasingly abuse personal data in order to impose themselves more and more on individuals. But of course they could turn out to be important within open networks too, quite independently of privacy concerns. I'd like to see some serious research done, drawing on the emergent literature, and performing some laboratory experimentation.

    Here are some starting materials (these happen to be on my own site, but they point to plenty of other sources as well):
    Concepts; human identity; tools; digital persona; PKI; Notes on the relevant section of the Computers, Freedom & Privacy Conference in 1999; some references; Intro (needs some updating); Inet; tracking crims.


    File-Sharing Technologies

    Supervisor: Roger Clarke (Visiting Fellow)
    http://www.anu.edu.au/people/Roger.Clarke
    E-mail: Roger.Clarke@anu.edu.au
    Phone: (02) 6288 1472

    If you didn't do COMP3410 - Information Technology in Electronic Commerce in Semester 2, 2000, the assignment that I set was: "Enormous tensions currently exist between, on the one hand, the need for musicians and music publishers to earn revenue and, on the other, the desire of consumers to get their music for free. Provide constructive suggestions as to how technology might be used to address these problems".

    There are some leads in the slides for Lecture 5, at: http://www.anu.edu.au/people/Roger.Clarke/EC/ETIntro.html#LOutline.

    File-sharing technologies started out as centralised repositories, then became centralised directories of dispersed repositories (the Napster model), and are rapidly maturing into forms in which both the repositories and the directories are dispersed (the Gnutella model).

    But do they work? On the one hand, are there still choke-points? And on the other, is it feasible to run both anarchic, revenue-denying schemes and paid services all using the one architecture? Are there significant differences among the emergent products? Is a taxonomy feasible? And does such a taxonomy lead to the discovery of variants that no-one's implemented yet?

    Here are some starting materials:

    http://www.anu.edu.au/people/Roger.Clarke/EC/FDST.html; http://www.anu.edu.au/people/Roger.Clarke/EC/KingEP.html.

    A (still growing, probably incomplete) catalogue of technologies: http://www.anu.edu.au/people/Roger.Clarke/EC/FDST.html#Friends.

    http://www.anu.edu.au/people/Roger.Clarke/EC/Bled2K.html.

    Declarative theorem-proving

    Supervisor: Michael Norrish
    Email: Michael.Norrish@cs.anu.edu.au

    This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities.

    When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find.

    The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well.

    Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up.

    The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data.

     

    Implementing proof kernels

    Supervisor: Michael Norrish
    Email: Michael.Norrish@cs.anu.edu.au

    At the core of LCF systems like HOL, there is a small “kernel” that implements the primitive rules of inference of the logic. There are a number of interesting issues relating to the design of these kernels. For example

    • How to represent free and bound variables?
    • Optimising common calculations (such as calculating the set of free variables)
    The project would be to explore this design space by implementing first a very simple-minded kernel for HOL, and then testing a set of possible optimisations on top of this initial design. This problem comes with its own test suite, and admits a very explicit and clear specification, because of its position as part of HOL.

     

    E-Publishing formats and techniques

    Supervisors: Tom Worthington, Ian Barnes
    Email: Tom.Worthington@tomw.net.au

    In theory XML has provided a universal format for the communication of documents. In practice a lack of standards and easy to use software has slowed progress. In this topic you will research e-publishing formats and techniques. You will propose an XML based format for publishing of scholarly work and build a prototype system allowing for quickly creating on-line scholarly journals. One approach to be investigated is to adapt the Open Office open source package interfaced to the Plone open source content management system to produce an on-line pre-flight system, as discussed in unit Comp 3410. A scholarship for this work is available from the Australian Computer Society, with funding to include presentation of a paper at the Open Publish 2004 conference in Sydney.

    See: Electronic Publishing Options for Academic Material, Tom Worthington, For: Computing 3410 Students, 20 August 2002: http://www.tomw.net.au/2002/epo.html


    Agent based simulation of Capability Models

    Supervisor: Shayne Flint
    See http://se.anu.edu.au/2004_flint_1.html.

    MMTk: Toolkit based GC for multiple runtimes

    Supervisor: Steve Blackburn
    Email: Steve.Blackburn@anu.edu.au
    There are a number of projects under this title, all of which involve performance analysis, system implementation and investigation of programming language runtimes. MMTk is a toolkit for efficiently implementing memory management algorithms. MMTk It is the memory manager for the Jikes RVM Java virtual machine and is being ported to other runtimes including OVM (Java) and Rotor (C#). Because MMTk implements numerous algorithms and can run within more than one language, it provides an ideal context for performance analysis of memory management algorithms, both within and among langauge contexts. One project will integrate MMTk into another runtime (a number of candidates exist, including Mercury, nhc98, GNU smalltalk, Gwydion Dylan, Bigloo Scheme, OCaml, and SmartEiffel). This project follows on from the work of Robin Garner and will most likely have input from him. Another project will investigate the performance of a port of MMTk to a C runtime (CMTk). CMTk is used needed by any runtime that is implemented in C or C++ (MMTk is writen in a dialect of Java). Robin Garner did the initial implementation of CMTk. This project will investigate a range of techniques for improving its performance. These will include performing code transformations to translate Java into C code.

    Support for accurate GC in gcc

    Supervisor: Steve Blackburn
    Email: Steve.Blackburn@anu.edu.au
    This is an ambitious project that will investigate extending gcc to produce GC maps (maps which show where pointers exist) and thus allow strongly typed languages (such as Java) to be compiled with gcc and yet be collected with "exact" garbage collectors. Currently such implementations (eg gcj) are restricted to using "conservative" garbage collectors.

    Performance analysis of Java

    Supervisor: Steve Blackburn
    Email: Steve.Blackburn@anu.edu.au
    Java is very important commercially, and yet the runtime systems required to support it and other languages such as C# are relatively immature. This project will involve measurement and analysis of the performance of Java programs, with a view to better understanding where there are opportunities for performance improvement. This work will involve using and modifying MMTk, a memory management toolkit for JikesRVM.

    OpenMP for Cluster Computers

    Supervisor: Alistair Rendell

    OpenMP is a directive based approached to shared memory parallel programming. This means you put a few directives in your code to partition the work associated with eg for loops. You could also view it as a simple interface to thread programming.

    From Japan there is a cluster computing environment called score/scash/omni. Part of this is an OpenMP implementation for clusters. This is OpenMP layered on top of a paged based software distributed memory model. I'm very intersted in the performance of this and the use of data distribution and affinity clauses (how threads and memory get mapped to specific physical locations in the cluster) and this relates closely to work in the CC-NUMA project.

    This project would involve getting the latest version of the omni/scash software and intsalling it on the Bunyip cluster. You would measure performance using for example the OpenMP microbenchmarks from Edinburgh Paralell Computing Center then explore performance as a function of thread and data locality for a range of kernel applications. Your fundamental objective will be to determine if there is a non-trivial class of applications for which OpenMP on a cluster is a viable programming model.


    Interval Arithmetic: The New Floating-Point Paradigm?

    Supervisor: Alistair Rendell

    From using a calculator we are all familiar with the fact that many numbers must, at some level, be truncated and rounded. Thus 2/3 may appear as 0.67, 0.66667, or 0.6666667 etc depending on the precision of your calculator. More generally representing the continuum of floating point numbers by the discrete finite representations used in all modern computers gives rise to rounding errors, overflows, underflows etc. Failure to account for these limitations can give rise to spectacular (expensive!!) results, e.g. the failure of some Patriot missiles to hit their targets during the Gulf War or the explosion of the Ariane 5 space rocket (see disasters)

    Interval arithmetic is a fundamental paradigm shift that attempts to address these issues. It defines an "interval" as a continuum of real numbers bounded by its end-points and performs all arithmetic operations with respect to intervals rather than discrete floating point numbers. As a consequence of this errors are automatically propagated and accounted for. Moreover the mathematical properties of intervals can be exploited to develop entirely new algorithms.

    The concept of intervals is actually not new, but was considered by Moore in 1965. What is relatively new is the fact that some hardware vendors are now providing direct support for interval arithmetic within their compilers/hardware.

    The aim of this project will be investigate the use of interval arithmetic for computational science applications. The initial goal will be to use intervals to study error propagation for some "problematic" algorithms found within the Gaussian computational chemistry code. A second goal will be to investigate the use of intervals for nonlinear global optimizations, eg determining the most stable conformation (geometry) of a large molecular system (eg protein).

    For more information on interval arithmetic see the interval home page.


    Automatic Differentiation

    Supervisor: Alistair Rendell

    Every function, no matter how complicated, is executed on a computer as a series of elementary operations such as additions, mutliplications, divisions and elementary functions like exp(). By repeated application of the chain rule to those elementary operations it is possible to obtain the derivative of any computed function in a completely mechanical fashion. This technique is applicable to computer programs of arbitarary length containing loops, branches and procedures. In contrast to explicit evaluation and coding of a derivative, automatic differentiation enable derivatives to be easily updated when the original code is changed.

    A number of programs are now available to perform automatic differentiation for code written in a variety of languages. The aim of this project would be to use one of these tools to perform derivative evaluations for some computational science applications. Specifically in the modeling of systems at the atomic level derivatives with respect to atomic positions are hugely important determining things like the frequencies at which molecules vibrate. Myself and others have spent large quantities of time and effort deriving expresssions for derivatives and then coding them. To automate this process and for it to be relatively efficient would be very significant. Your objective would be to explore this issue for some relatively simple computational molecular science codes.


    Using Machine Learning in a Linux Disk Block Pre-Fetcher

    Supervisor: Eric McCreath

    Machine learning may be defined as learning a description of a concept from a set of training examples. This description may then be used for prediction. This project involves investigating the application of Machine Learning with an operating system to pre-fetch blocks. The central outcomes of this research will be an analysis of machine learning approaches to undertake such a task. This would also involve forming a theoretical model of the domain in question, which would be vital for the analysis. This project builds on work done be an honours student last year.

    Details:

    Operating Systems often employ read-ahead windowing approaches for sequentially accessed files. Basically the next part of a file is pre-fetched into memory even before it is requested. This greatly improves performance of the system.

    Often when we interact with computers the same files are required during our interaction. For example we may always sit down and start-up our mail reader. If the computer could learn this pattern of usage and pre-fetch these the blocks for these files. Hence, the responsiveness of the system could be improved. This project involves developing such a system and investigating its potential.

    Within machine learning there are a number of issues that make this project both challenging and interesting. These include: the temporal nature of the data, the large number of examples, how the leant hypothesis could be applied to recommend which files to pre-fetch. Also within Operating Systems there are a number of challenges, such as how the system hooks into the kernel and how to measure the performance of the system when it is functioning with the operating system.

    The results of this research have the potential to be published in both Operating Systems and Machine Learning publication. The project would also form a good starting point for graduate study.

    Students will need to be able to reason logically and should have done well in theory subjects (either within Mathematics or Computer Science). Successful completion will require both determination and imagination. Finally, student considering embarking on this project must have good programming skills and will need to gain an understanding of the Linux kernel.


    Internet futures projects

    Supervisor: Markus Buchhorn
    Email: Markus Buchhorn
    Phone: 6125 8810

    The Internet Futures group of the ANU is working with high-speed networks, advanced network services, visualisation and telecollaboration technologies, and large scale data facilities, to support a diverse range of user communities. Many of its activities have a national focus, e.g. through the GrangeNet project.

    In the large-scale-data area, the users include Physics, Astronomy, BioInformatics and Cultural Archives (e.g. audio, spoken and music, or video). Archives can be easily many Terabytes (1000's of Gigabytes) in size, need to be delivered to users on networks ranging from multi-Gigabit down to slow dial-up modem speeds, and may need to interface to large-scale computing systems.

    The IF group has several potential projects feasible for an honours project. These include

    • development of new Bio-Mirror (www.bio-mirror.net) archive services, to greatly reduce their bandwidth needs, while increasing their performance and usability to users. This project would have international impact, as a good solution would be adopted by many biomirror sites. This project would build on some initial work in 2003 (thesis available on request).
    • development of a specialised workflow infrastructure for a large scale data grid environment, focussing on oceanographic data but with broader application. This would include controlled complex-data ingestion procedures, as well as the more common data output/processing chains.

    Technologies in these fields include grid toolkits (Globus for example), metadata standards, various network protocols and performance issues, large scale databases and database/directory access methods, and web services and SOAP. Programming skills in C, C++, Java, Perl, Python, are all useful.

    You would get to work with a range of users across the ANU, on real-world problems. You'd also get access to the APAC Mass Store facility (a 1.2 Petabyte (1200 Terabyte) tape robot) and associated computing hardware including possibly the APAC National Facility supercomputer, the ANU Supercomputer Facility's Visualisation Laboratory (VizLab) , and the Internet Futures' Access Grid (AG) (telecollaboration) nodes in various locations on campus.

    You can go as deep into these areas as you like (although we have some ideas), and maybe even come back for a PhD if you're really keen :-)


    TeleCollaboration (videoconferencing on steroids) projects (Internet Futures)


    Contact: Markus.Buchhorn@anu.edu.au, CSIT-N244, 6125-8810

    TeleCollaboration can be seen as very high-end videoconferencing, with many additional features. Some of these features require the user's space to be intelligent or flexible, others require advanced network or application services. Any kind of collaboration, for research and for education can be supported. This could include music teaching (masterclasses), architecture, history or medical training with visualisation or even virtual reality interfaces, and large-scale data browsers and instrument consoles (e.g. in Astronomy or TeleMicroscopy).

    The Internet Futures' telecollaboration activities revolve mainly around Access Grid (AG) nodes. There are 3 full-size nodes being built at the ANU (including one in the CS&IT building), and several additional nodes are being discussed. There are also plans for smaller-scale (even desktop) nodes, and ports to a variety of systems and other technology spaces (e.g. the Wedge virtual reality/visualisation facility). Around Australia 15 nodes are currently in planning or construction, and there are over 130 nodes around the world at the moment. ANU users will include Astronomers at Mt Stromlo Observatory (20km west of Canberra), and the new ANU Medical School (on campus).

    The underlying architecture of these systems is open and flexible, and the ANU has the potential to become a worldwide focus of AG development, with strong international collaboration.

    The IF group has several potential projects feasible for an honours project. These include

    • high-quality Audio/Video capture and delivery e.g.
      • tiled (multi-camera) video capture for higher resolutions,
      • broadcast quality video capture,
      • low latency Audio/Video codecs
    • intelligent A/V capture (using the steerable cameras, and possibly audio-capture interfaces). Some applications, e.g. music teaching, require very specific views of students as they play music. For general workshop/meeting interaction, the steerable cameras could be used to track a speaker as they wander around the room, or to focus on specific audience members when they ask questions.
    • integration of other inputs, e.g. laptops and dedicated applications (which aren't multicast aware, or are part of the "standard" AG toolkit), as well as intelligent/electronic whiteboards, wireless PDA's, etc.
    • cluster systems, for codec performance and robustness
    • layered encodings to support differential quality audio/video (e.g. to support high-bandwidth and low-bandwidth clients simultaneously).
    • integration with H323/SIP (the two major standard videoconferencing-over-IP technologies) and streaming systems (for one-way delivery to non-AG clients). This will involve some interesting design issues, e.g. H.323 is designed for essentially one-view-at-a-time, where the AG is designed for multi-site viewing in parallel.
    • User Interface design; the AG user interface is pretty ordinary and unfriendly to new users. There is a lot of untapped information in AG streams that could be used to greatly enhance the user experience. These include audio localisation, visual feedback of audio activity, and audio-video synchronisation ("lip-sync").
    • physical security of AG spaces; protecting the (expensive) hardware in AG spaces when it is not being used. Given multiple steerable cameras, sensitive microphones, and possibly other sensors, it should be possible to create a space that secures itself (and screams if attacked :-) ).
    • and others...

    Technologies in these fields include IP multicast, hardware interfaces, various network protocols and performance issues, user-interface design, and audio/video codecs. Programming skills in C, C++, Java, Perl, Python, are all useful.

    You can go as deep into these areas as you like (although we have some ideas), and maybe even come back for a PhD if you're really keen :-) Possibly more than one area could be looked at during the project.

     

    Interactive Visualisations of Chaos

    Supervisor: Henry Gardner

    This project will investigate gestural interfaces for the interactive investigation of chaotic systems. It will start with a review of existing systems with a particular focus on the visualisation of fluid flows in immersive virtual reality theatres. It will examine two existing software applications in the Wedge theatre and will use these, and the literature review, to develop a taxonomy of interaction. Key parts of the taxonomy will then be implemented and subject to a formal usability test.

    Chief Supervisor: Henry Gardner, Computer Science, FEIT.
    Supervisor: Prof Robert Dewar, Theoretical Physics, Research School of Physical Sciences and Engineering.


    Using Threading Techniques to speed up SMP Computer Simulation

    Supervisor: Peter Strazdins
    See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/ThreadedSulima.html

    Efficiently Enhancing the Timing Accuracy of Computer Simulators

    Supervisor: Peter Strazdins
    See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/AccurateSulima.html

    Performance Modelling of Parallel Programs on SMP Clusters

    Supervisor: Peter Strazdins
    See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/SMPClusterComm.html

    A Framework for Investigating Job Scheduling on Linux Clusters

    Supervisor: Peter Strazdins
    See http://cs.anu.edu.au/~Peter.Strazdins/postgrad/LinuxSchedFramework.html

    Avoiding Redundancy During Unit Propagation Look-Ahead for Efficient SAT Solver

    Supervisor: Anbulagan
    Email: anbulagan@nicta.com.au
    Phone: 6125 8632
    Suitability: This project would suit a student keen to work in an area involving logic, constraint, and search in AI.

    Prerequisites: Knowledge and interest in logic, efficient programming using "C" in UNIX environment, and search in AI.

    Project Description:

    The SAT problem is a special case of the constraint satisfaction problem (CSP) where the variables take the Boolean value 0 or 1. CSP itself plays an important role in artificial intelligence (AI). The study of the SAT problem is essential because it is at the heart of many AI problems, including automatic deduction, VLSI circuit control, logic programming, learning, planning and scheduling.

    In 1962, Davis, Logemann and Loveland presented their algorithm, called DPLL procedure, to solve the SAT problem. Based on this procedure, many complete systems have been created, for example, GRASP, Posit, relsat, Satz, kcnfs, zchaff, and 2clseq. DPLL essentially enumerates all possible solutions to a given SAT problem by setting up a binary search tree and proceeding until it either finds a satisfying truth assignment or concludes that no such assignment exists. It is well known that the search tree size of a SAT problem is generally an exponential function of the problem size, and that the branching variable selected by a branching rule at a node is crucial for determining the size of the sub-tree rooted at that node.

    One of the main improvements to DPLL has been smart branch selection heuristics through the use of unit propagation look-ahead (UPLA) [1]. Recently, another sophisticated branching rule, called Dynamic Variable Filtering (DVF), was proposed for doing more reasoning to narrow the search tree of hard random 3-SAT problems [2]. DVF based solver outperforms UPLA based solver with an order of magnitude in terms of the number of branching nodes in the search tree when solving the hard random 3-SAT problems.

    The objective of this research project is:

    To improve the performance of ssc355 SAT solver that integrates DVF heuristic by avoiding the redundancy happened during the UPLA process.

    To integrate backjumping techniques in ssc355 SAT solver.

    References:

    [1] Li, C. M., and Anbulagan. Heuristics Based on Unit Propagation for Satisfiability Problems. In Proceedings of 15th International Joint Conference on Artificial Intelligence, 1997, Nagoya, Japan, pp. 366-371.

    [2] Anbulagan, Thornton, J., and Sattar, A., Dynamic Variable Filtering for Hard Random 3-Sat Problems. LNAI 2903, Proceedings of 16th Australian Conference on AI, December 2003, Perth, Australia, pp. 100-111.


    Proof Optimization

    Supervisor: Malcolm Newey
    Email: Malcolm.Newey@cs.anu.edu.au
    When a theorem is proved using an interactive prover (such as the HOL system) a proof object is created that can be replayed at a later date, for a different audience or, perhaps, in a different system. It will typically be one of a multitude of equivalent proofs for that same theorem, but may not be optimal in terms of attributes such as time for replay, size, portability, fragility etc.

    Since proofs can be mechanically transformed, it would be useful to have a tool that would optimize proofs of programs or of library theorems so that libraries would build faster or that proof carrying code would download faster.

    The direction of a project in this area could focus on a simple tool that manipulated proofs in a limited domain or it could be more mathematical and study higher order logic proof transformations in the presence of decision procedures.

  • Updated:  18 February 2011 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.