Skip navigation
The Australian National University

Honours topics

Half of your time as an honours student is spent working on a project. But first you have to find a project topic.

The "official" reference for projects proposed by potential supervisors is the CECS projects database.

There are projects there available for all levels of research, including short projects, summerscholarship projects, Honours projects, Masters and PhD projects. All potential research students at any level are urged to browse it.

If you see a project that looks interesting, email the potential supervisor about it. Don't be afraid to discuss possible variations to the listed project: what appears on the web site is generally more of a suggestion than a rigid specification.

You don't have to be constrained by what you see on the project website. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Look at the web pages of Computer Science staff to find out their research interests. Remember that projects may also be supervised by people outside the College, or even ontside the University: from CSIRO or NICTA, for instance.

Former Project Topics

For your interest, here is an archive of Honours project proposals from previous years. Some of them are quite ancient now, of course, but they may help to give you ideas of the kind of thing which is suitable.
  • 2009 Proposed Project Topics
  • 2008 Proposed Project Topics
  • 2007 Proposed Project Topics
  • 2006 Proposed Project Topics
  • 2005 Proposed Project Topics
  • 2004 Proposed Project Topics
  • 2003 Proposed Project Topics
  • 2002 Proposed Project Topics
  • 2001 Proposed Project Topics


    2009 Honours project proposals




    Web Service Management System (WSMS)

    Service Oriented Architecture (SOA) is an emerging computing paradigm, in which infrastructure, platforms and softwares are transformed into services that can be consumed by anyone, anywhere, anytime. The trend of SOA has become noticeable in recent years. An increasing number of infrastructure and software services have been available on the Web, such as Amazon EC2 and Google Doc. The WSMS project of CSIRO aim at building an integrated framework for managing Web services. In this framework, Web services are first class objects that can be queried, composed, trusted, and changed. This project will explore research issues pertaining to the whole Web service lifecycle. We expect the outcome to become driving technology for the future service paradigm.

    In WSMS, we offer the following opportunities as Honours projects:


    Web Services Query Language

    Web services have been the main force in driving service-oriented computing that promises not only a new paradigm for B2B collaboration, but also an opportunity for the creation of value-added services and new businesses. We foresee that web services will be as popular as today's web pages in future. However, how to manage and apply the large number of web services for end-users in their daily life remains a big challenge. CSIRO ICT Centre is conducting a research project (WSMS: Web Services Management System) to address the above challenge. One of research efforts in this project is to allow end-users/applications to express their requirements in a query language so that WSMS can resolve the query with the underlying web services.

    Goals for this project: In this project, we will design and implement a query language for automatic web services discovery and execution.

    Requirements: Programming skills in Java or C/C++ is essential; knowledge and experiences with web services and compiler are preferred.

    Student's gain: The student will learn how to design and implement a computer language from scratch. He/she will also gain experiences and skills in web services, ontology and XML. The outcomes of this project may be published in an international conference/journal, which would increase the student's research credibility.

    Supervisor: Dr. Shiping Chen (shiping DOT chen AT csiro DOT au) and Dr. Xuan Zhou (xuan DOT zhou AT csiro DOT au)


    Evolutionary Algorithms for Optimal Composition of Web services

    Service composition is an important part of the life cycle of services innovation research. It is evidently seen by the fact that Web service composition has attracted an increasing attention recently. When the quality of Web services (QoWS) is concerned, the objective of Web service composition becomes to find an optimal solution of composition. It is an NP-hard problem. Different strategies have been proposed so far to make service composition easier, for example, Genetic Algorithms (GAs) and Multi-objective Genetic Algorithms, Integer Programming (IP) and Mixed Integer Programming (MIP), Constraint Programming, Particle swarm optimisation, Ant Colony optimisation, and so on. This study aims to investigate different evolutionary algorithms and related techniques for optimal composition of Web services.

    Goals for this project: this project will implement a number of different evolutionary algorithms and a comparison of these algorithms will be examined thoroughly (based on the empirical dataset).

    Requirements: Students need to be proficient with Java programming.

    Student's gain: The student will gain a deep insight into the evolutionary algorithms and their applications in the field of Web services. The work will be carried out in a world-leading research group (CSIRO) and will be conducted within the leading academic research infrastructure. The results of this project will include a proof-of-concept prototype and an international journal publication.

    Supervisor: Dr. Li Li (lily.li@csiro.au) and Dr. Dongxi Liu (dongxi.liu@csiro.au)


    Context Sensitive Web Service Composition

    Context is recognised as important and has been widely studied in the field of Mobile and Pervasive Computing. It needs to be explicitly described and managed in Web service composition so that different participating Web services can be connected correctly. As the context of a Web service changes as the state of a predefined environment evolves, hard-coding context into the application is insufficient. We argue that the following supports are necessary to develop a full workable service composition (engine): (1) Service composition infrastructure/middleware (to achieve context-awareness); (2) Context sensitive composition model and algorithms.

    Goals for this project: this project will first develop a model and some algorithms. The proposed algorithms will be evaluated a real Web service environment (currently, this environment is running well within CSIRO Web service Theme).

    Requirements: Students need to be proficient with Java programming. General knowledge of Service-oriented Computing is required.

    Student's gain: The student will gain a deep insight into the evolutionary algorithms and their applications in the field of Web services. The work will be carried out in a world-leading research group (CSIRO) and will be conducted within the leading academic research infrastructure. The results of this project will include a working prototype and publications to be submitted to the international conferences or journals.

    Supervisor: Dr. Dongxi Liu (dongxi.liu@csiro.au) and Dr. Li Li (lily.li@csiro.au)


    Reputation-based Trust Management in Composite Web Services

    With the introduction of Web Services and Semantic Web, the World Wide Web has evolved into a collection of hundreds and thousands of services deployed with an aim of providing a wide variety of services to general public. Examples of such services range from general information services like Wikipedia to specific business services like Amazon. As services proliferate on the Web, several services may provide same functionalities to a service consumer. In these situations, trust is one of the essential criteria a consumer can use in selecting the right set of services. Therefore, there is a need for a robust trust management system that utilizes the perception-based knowledge of the communities in the Web. In order to develop such a system, we are investigating a mechanism of providing a robust trust model for web services. Our model is based on the reputation of services as rated by service consumers in the Web communities. This project aims to investigate a mechanism of managing reputation-based trust in composite services, i.e. services composed of a set of services.

    Goals for this project: In this project, we will design, implement and deploy a distributed reputation-based trust management model for composite services.

    Requirements: The student should have good knowledge and experience with Java programming. Knowledge of Web Services is desirable but not mandatory.

    Student's gain: The project provides an opportunity for the student to contribute to an emerging area in computer science: Web Services and Service Computing. The work will be carried out in a research group of CSIRO and the student will have an opportunity to gain insight into R&D in both industry and academia. The project outcome will include a working prototype that can be demonstrated in an international conference and possibly a publication in an academic journal.

    Supervisor: Dr. Wanita Sherchan (wanita DOT sherchan AT csiro DOT au) and Dr. Surya Nepal (surya DOT nepal AT csiro DOT au)


    Change Management for Composite Web Services

    This project aims at developing a bottom-up change management architecture for composite service and design a change management model which can capture the triggering changes of the outsourced Web services and propagate the change to composite service and generate reactive changes at the service schema level. The work will include a taxonomy or specification of bottom-up changes in an SOE (Service Oriented Enterprise) context which include triggering changes and reactive changes. For each type of change, a list of functional and non-functional changes will be defined. An appropriate model will be proposed to model these changes. Some mappings rules can be defined to map between different levels of changes. Automatic change management algorithms will be proposed to capture and manage the bottom-up changes. Finally, some evaluation will be carried on to assess the bottom-up change management system.

    Goals for this project: The goal is to design a bottom-up change management architecture for composite services.

    Requirements: Students will need to be proficient systems programmers with good C++ or Java skills.

    Student's gain: The student will work on an interesting research problem with potential impact on Web services. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal.

    Supervisor: Dr. Jemma Wu (jemma DOT wu AT csiro DOT au)


    Web Service Optimization

    When selecting services on the Web, a user is usually faced with too many choices. For instance, both Yahoo and Google provide map services, characterized by different features. The user needs to pick one that can best satisfy her needs. This project aims at developing a system for helping user make the best selection among thousands of Web services. We will apply the technologies in decision making and artificial intelligence to support our solution.

    Goals for this project: The goal is to design a Web based service selection system.

    Requirements: Students need to be proficient will C++ or Java programming.

    Student's gain: The student will work on an interesting research problem with potential impact on Web services. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal.

    Supervisor: Dr. Xuan Zhou(jxuan DOT zhou AT csiro DOT au)


    Efficient network monitoring and routing via FPGAs

    Research area: software engineering, networking, formal languages, HW/SW design

    Difficulty: high

    Supervisor: Bob Edwards (DCS), Dr. Andreas Bauer (RSISE & NICTA)

    Background:

    The network code of modern operating systems is very involved as it allows users to configure the entire routing either to protect themselves from dangers in the Internet, or to set up their own networks and to control which traffic goes on which machine or to which subnet.

    In consequence, high-traffic networks can dramatically increase the system load in machines that are responsible to steer the traffic or block unwanted requests, up to the point where no routing is possible any longer (e.g., denial of service attacks).

    This project aims at providing an efficient routing mechanism, directly implementable on FPGAs, i.e., on hardware, to free the operating system, and hence the main CPU, from this time consuming task as much as possible. The routing policy is to be specified in terms of some formalism (e.g., regular expressions, finite automata), which should then be mapped onto an FPGA for execution.

    The main goals of this project are

    • to analyse common routing policies and to examine which formalisms are suitable for capturing these,
    • to devise algorithms for translating formal policies to FPGA programs,
    • to find trade-offs between different types of translations resulting not only in differently large programs, but also in different means for their implementation,
    • to implement and experiment with some of the found solutions.

    The candidate to work on this exciting project is, ideally, familiar with a Unix-type operating system and the netfilter framework for specifying routing policies. This project will also require basic knowledge of formal languages and automata theory such as gained by having visited COMP3360/6363 (cf. also [2]).

    References:

    1. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman.
      Introduction to Automata Theory, Languages, and Computation.
      Addison Wesley, 2000.
      [TOC]

    Algorithms for Efficient Manipulation of Boolean formulas

    Supervisor: Dr. Jussi Rintanen, DPO Group (ANU) & Managing Complexity (NICTA)

    After the success of Binary Decision Diagrams (BDD) in computer-aided verification and many other areas, there has been a lot of interest in identifying other normal forms of propositional formulas (Boolean functions) for which efficient (polynomial time) algorithms exist for the basic operations required in these applications. Research in the last 8 years have led to the identification of Decomposable Negation Normal Form DNNF which is more compact but allows a slightly more narrow set of efficient (polynomial-time) operations. The general conclusion from these works is that more restrictive normal forms lead to larger representations but more efficient reasoning algorithms.

    In the standard framework of Boolean function representations based on the connectives for Disjunction, Conjunction and Negation, the least restrictive form is that of general NNF (Negation Normal Form). This normal form always leads to at least as succinct representations as e.g. BDD or DNNF.

    A practically interesting problem is the identification of more efficient but not necessarily polynomial-time algorithms for the manipulation of general NNF formulas. In comparison to BDD or DNNF, NNF are sometimes exponentially more succint. The seemingly efficient operations on BDD or DNNF could be simply a result of these normal forms being less succinct: a non-polynomial operation on NNF could conceivably be in absolute terms more efficient than a polynomial operation on the exponentially bigger BDD or NNF.

    This project is an investigation of operations on NNF that are in absolute terms equally or more efficient than the operations on the equivalent BDD or DNNF. We know that such operations exist, because one way of implementing these operations is to first translate the functions to BDD or DNNF.

    The goal of the work is the development of practically useful alternatives to BDD and DNNF representations that are based on the more succinct NNF.

    Contact: Dr. Jussi Rintanen


    Extending GNU regular expressions matching with trace patterns

    Research area: software engineering, formal languages, algorithms

    Difficulty: medium

    Supervisor: Dr. Andreas Bauer (RSISE & NICTA)

    Related courses: COMP3360/6363

    Background:

    Regular expressions in Unix and many other systems allow the matching of strings against user-defined expressions. In [1], trace patterns were introduced, which are strings where some information is missing (i.e., strings which contain variables as place-holders), and it is shown that the problem of how to match such incomplete strings against regular expressions (given, e.g., in terms of a finite automaton) is decidable.

    The GNU C-library [3] provides its own implementation of expression matching, which is nowadays a standard in Linux and many Unix-like systems.

    The main goals of this project are to

    • devise efficient algorithms from [1] for matching trace patterns against expressions,
    • implement them first in terms of a stand-alone system library,
    • and to implement these algorithms also in the GNU C-library, as an extension to the standard functionality, thus making them widely available.

    The final implementations should also be tested and performance of the implementation evaluated empirically.

    This project will require familiarity with the C programming language, and basic knowledge of formal languages and automata theory such as having visited COMP3360/6363 (cf. also [2]).

    References:

    1. Franz Baader, Andreas Bauer, and Alwen Tiu.
      Matching linear and non-linear trace patterns with regular policies.
      In M. Marin, editor, Proceedings of the 22nd International Workshop on Unification (UNIF), RISC-Linz Report Series, no. 08-11, pages 16-24. Research Institute for Symbolic Computation, Johannes Kepler University, Linz, July 2008.
      [pdf] (extended version to appear in proceedings of LATA'09)

    2. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman.
      Introduction to Automata Theory, Languages, and Computation.
      Addison Wesley, 2000.
      [TOC]

    3. GNU C-library, regular expressions

    Systems monitoring via runtime verification

    Research area: software engineering, formal languages

    Difficulty: medium-high

    Supervisor: Dr. Andreas Bauer (RSISE & NICTA)

    Related courses: COMP3360/6363

    Background:

    Monitoring systems is useful during design time as a means of testing an actual implementation before it ships out to the customers. However, it can also be useful in the final product, at runtime, to make sure that some system does not enter an invalid state leading to the failure of the system when it runs.

    In [1] an approach to monitoring based on temporal logic was devised and a so called monitor generator implemented in terms of [2], which takes as input a user-defined temporal logic specification, and outputs a finite state machine, which processes a system's actions and decides whether or not these satisfy or violate the specification (or neither).

    The goal of this project is to

    • determine which means of integrating the monitors generated from [2] with real world programs exist (e.g., via standard logging mechanisms such as log4j, via dedicated monitoring classes/interfaces of the Java language, via debugging tools such as valgrind or dtrace, etc.),
    • determine a suitable mapping of a program's actions to the propositions occurring in a temporal logic specification,
    • devise a template-based code generation scheme for the monitors generated from [2] for a popular programming language, such as Java, C++, Ocaml, etc.,
    • implement this scheme, and undertake a small case study demonstrating the feasibility of this approach.

    This project is likely to involve some form of low-level programming, wrapping function calls, and other sorts of fun stuff, and will therefore require deep familiarity with a programming language (I am not picky about which one, really), and basic knowledge of formal languages and automata theory such as having visited COMP3360/6363 (cf. also [3]). Ideally, there's also a background on temporal logic and/or model checking.

    References:

    1. Andreas Bauer, Martin Leucker, and Christian Schallhart.
      The good, the bad, and the ugly - but how ugly is ugly?
      Technical Report TUM-I0803, Institut für Informatik, Technische Universität München, February 2008.
      [pdf] [postscript]

    2. LTL3 tools

    3. John E. Hopcroft, Rajeev Motwani, and Jeffrey D. Ullman.
      Introduction to Automata Theory, Languages, and Computation.
      Addison Wesley, 2000.
      [TOC]

    Title: Computing the Smallest Winning Coalition in a Multi-Agent Game

    • Project type: type and extent of work can be tailored to an SE research project, a CS honours project, an MComp (research-oriented) project, or a PhB research project.
    • Orientation: research and implementation
    • Research Area(s): Artificial Intelligence, Game Theory, Algorithms
    • Related Courses: COMP3620/6320 (AI), COMP3600/6466 (Alg), COMP4640/6460 (RL/Pl), COMP8620 (Adv. AI)
    • Supervisor: Dr Sylvie Thiebaux, School of Computer Science, ANU, Phone: (02) 6267 6311

    Description

    This topic is proposed as part of the French-Australian Science and Technology cooperation project "Planning Approaches to Sofware Verification", whose partners are INRIA, ANU, and UWA.

    Games are a computational model used in computer science, economics, biology, and social sciences to name a few. The motivation for the present project arises from the area of multi-agent planning in artificial intelligence, but the results are applicable to other areas.

    Given a description of the possible states of the game, of the states that are considered as winning positions, and of the actions that agents can take in each state, the basic problem is to find a winning strategy for a given agent coalition (a set of agents). Such a strategy specifies the actions that the coalition agents must take in each state, and guarantees that the coalition will end up in a winning state, regardless of the actions taken by agents outside the coalition. In the project, we focus on a slightly different problem, where the coalition is not given. Instead, we want to find the smallest size coalition which has a winning strategy. This is a computationally hard problem, so we will be building efficient algorithms which attempt to perform well on average. Such algorithms will use symbolic representations of set of states based on binary decision diagrams, and will automatically derive heuristics to guide the search from the description of the input game.

    The student will start from a basic algorithm idea, which (s)he will flesh out, implement and extend to more complex cases such as: 1) computing a coalition of minimal cost, given a specification of the cost of collaboration, 2) partially observable games (agent decisions are based on partial information about the current state of the game), 3) more complex objectives than just reaching a winning state.

    References

    • T. Brihaye, N. Markey, M. Ghannem, L. Rieg: Good Friends are Hard to Find! Proc. TIME pages 32-40, 2008. [don't worry, there are a couple of things that are obscure to me as well, but it's nice to know what the complexity is]
    • M. Ghallab, D. Nau, and P. Traverso. Automated Planning: Theory and Practice. Morgann Kaufmann, 2004.
    • R.M. Jensen, M.M. Veloso and M.H. Bowling, OBDD-Based Optimistic and Strong Cyclic Adversarial Planning, Proc. 6th European Conference on Planning (ECP-01), 2001.

    Title: Distributed Planning

    • Project type: type and extent of work can be tailored to an SE research project, a CS honours project, or a PhB research project.
    • Orientation: research and implementation
    • Research Area(s): Artificial Intelligence, Algorithms
    • Related Courses: COMP3620/6320 (AI), COMP3600/6466 (Alg), COMP8620 (Adv. AI)
    • Supervisor: Dr Sylvie Thiebaux, School of Computer Science, ANU, Phone: (02) 6267 6311

    Description

    This topic is proposed as part of a collaboration with CNRS in the framework of the ROSACE project funded by the French Scientific and Technological Research Network for Aeronautics and Space. ROSACE studies the design, specification, implementation and deployment of a fleet of mobile autonomous cooperating robots with well-established properties particularly in terms of safety, self-healability, ability to achieve a set of missions and self-adaptation in a dynamic environment.

    In the ROSACE project, a high-level mission is expanded into a set of goals which are allocated to the robots. The robots must act cooperatively to achieve these goals, and this might require explicitly delegating subgoals to other robots. Cooperatively synthesizing plans of actions is still a largely open problem. The student will design and implement a cooperative and distributed version of a standard planning algorithm. The algorithm will have to guarantee a number of desirable properties (termination, correctness, etc). This will require becoming familiar with automated planning, constraint satisfaction and distributed search methods.

    This project is best suited to a student with interest and background in artificial intelligence, distributed algorithms, and with good programming skills.

    References

    • C. Bessiere, A. Maestre, I. Brito, and P. Messeger. Asynchronous Backtracking without Ading Links: A New Member in the ABT Family, Artificial Intelligence, 161:7-24, 2005.
    • M. Ghallab, D. Nau, and P. Traverso. Automated Planning: Theory and Practice. Morgann Kaufmann, 2004.
    • D. Pellier, D. and H. Fiorino. A unified framework based on htn and pop approaches for multi-agent planning. Proc. International Conference on Intelligence Agent Technology (IAT-07), pages 2 -- 5, 2007.
    • S. Thiebaux, An Asynchronous Backtracking Algorithm for Distributed, Partial-Order Planning, LAAS-CNRS report, Feb 2009.

    Honors Projects

    Marcus Hutter, RSISE@ANU and SML@NICTA, Canberra, Australia
    http://rsise.anu.edu.au/~marcus/       marcus.hutter@anu.edu.au.

    General Reinforcement Learning Agents (GRLA)

    Agent applications are ubiquitous in commerce and industry, and the sophistication, complexity, and importance of these applications is increasing rapidly; they include speech recognition systems, vision systems, search engines, planetary explorers, auto-pilots, spam filters, and robots [RN03]. Existing agent technology can be improved by developing systems that can automatically acquire during deployment much of the knowledge that would otherwise be required to be built in by agent designers. This greatly reduces the effort required for agent construction, and results in agents that are more adaptive and operate successfully in a wide variety of environments [LH07].

    Goals of this project.
    Technically, the project is about a recent general approach to learning that bridges the gap between theory and practice in reinforcement learning (RL). General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian [RN03]. On the other hand, RL is well-developed for small finite state Markov decision processes (MDPs) [SB98]. Extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The project is to investigate (by simulations or theoretical) recent models [Hut09] that automate the reduction process and thereby significantly expand the scope of many existing RL algorithms and the agents that employ them.

    Requirements.
    • background in Artificial Intelligence and Machine Learning
    • good programming skills
    • performing (computer) experiments and analyzing results
    • good math skills; linear algebra at the very minimum
    • mastering elementary probability calculus
    Student's gain.
    • getting acquainted with state-of-the art RL algorithms
    • improving your math skills: linear algebra, statistics, probability, and information theory
    Literature
    • [RN03] S.J. Russell and P. Norvig. Artificial Intelligence: A Modern Approach Prentice-Hall, Englewood Cliffs, NJ, 2nd edition, 2003.
    • [SB98] R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction MIT Press, Cambridge, MA, 1998.
    • [LH07] S. Legg and M. Hutter. Universal intelligence: A definition of machine intelligence. Minds & Machines, 17(4):391--444, 2007.
    • [Hut09] M. Hutter. Feature Markov decision processes. In Proc. 2nd Conf. on Artificial General Intelligence (AGI'09), pages 61--66. Atlantis Press, 2009.

    Universal Artificial Intelligence (UAI)

    The dream of creating artificial devices that reach or outperform human intelligence is an old one. Most AI research is bottom-up, extending existing ideas and algorithms beyond their limited domain of applicability. The information-theoretic top-down approach pursued in [Hut05] justifies, formalizes, investigates, and approximates the core of intelligence: the ability to succeed in a wide range of environments [LH07]. All other properties are emergent.

    Goals of this project.
    The fundamentals of UAI are already laid out, but there are literally hundreds of open questions (see the exercises in [Hut05]) in this approach that have not yet been answered. The complexity ranges from suitable-for-short-projects to full PhD theses and beyond.

    Requirements.
    • background in AI or ML or statistics or information theory.
    • good math skills
    • good writing skills
    Student's gain.
    • getting acquainted with the most comprehensive theory of rational intelligence to date.
    • getting experience in writing a literature survey (if well done may be publishable), -or-
    • learn how to prove non-trivial theorems.
    Literature
    • [Hut05] M. Hutter. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability Springer, Berlin, 2005.
    • [LH07] S. Legg and M. Hutter. Universal intelligence: A definition of machine intelligence. Minds & Machines, 17(4):391--444, 2007.

    Human Knowledge Compression Contest (HKCC)

    Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 100MB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of the Human Knowledge Compression Prize [Hut06] is to encourage development of intelligent compressors/programs.

    Goals of this project.
    Some of the following four subgoals shall be addressed:
    • Get acquainted with the current state of the art compressor and in particular with the prize winning paq8hpX series, and write a brief comparative survey.
    • Test some novel compression ideas. If successful,
    • Integrate them into one of the state-of-the-art compressors.
    • Investigate some alternative performance measures that take the compressor more seriously into account (rather than only the decompressor).
    Requirements.
    • good programming skills
    • experience in understanding and extending existing code
    • good writing skills
    • performing (computer) experiments and analyzing results
    Student's gain.
    • getting acquainted with state-of-the-art compression methodologies.
    • winning a prize (but don't count on it, it's going to be very tough).
    Literature

    On the Foundations of Inductive Reasoning (FIR)

    Humans and many other intelligent systems (have to) learn from experience, build models of the environment from the acquired knowledge, and use these models for prediction. In philosophy this is called inductive inference, in statistics it is called estimation and prediction, and in computer science it is addressed by machine learning. The problem of how we (should) do inductive inference is of utmost importance in science and beyond. There are many apparently open problems regarding induction, the confirmation problem (Black raven paradox), the zero p(oste)rior problem, reparametrization invariance, the old-evidence and updating problems, to mention just a few. Solomonoff's theory of universal induction based on Occam's and Epicurus' principles, Bayesian probability theory, and Turing's universal machine [Hut05], presents a theoretical solution [Hut07].

    Goals of this project.
    • Elaborate on some of the solutions presented in [Hut07].
    • Present them in a generally accessible form, illustrating them with lots of examples.
    • Address some other open induction problem.
    Requirements.
    • being able to think sharply
    • reasonable to good math skills
    • good writing skills
    • interest in the philosophical foundations of induction
    Student's gain.
    • getting acquainted with Bayesian reasoning and algorithmic information theory.
    • interdisciplinary work between philosophy, computer science, and statistics
    • writing a small paper (e.g. for a philosophy journal)
    Literature.
    • [Hut07] M. Hutter. On universal prediction and Bayesian confirmation. Theoretical Computer Science, 384(1):33--48, 2007.
    • [Hut05] M. Hutter. Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability Springer, Berlin, 2005.

    Fast and Optimal Pathfinding on Game Maps

    Description and Goals
    Pathfinding involves computing a route between two given locations on a map, while avoiding all obstacles. The problem is important in many real-life applications, including games, robotics, and GSP navigation.

    Pathfinding problems are usually solved by running a search algorithm, such as A*. A* is able to compute cost-optimal solutions but it can require significant memory and time resources. Recent approaches use hierarachical abstraction to greatly speed up the process, at the cost of losing the solution optimality guarantees.

    In this project, we aim at designing a new algorithm that combines the speed of hierarchical methods and the solution optimality guarantees of standard A*. We do this starting from the observation that many path fragments are equivalent, and therefore there is no need to explore all of them.

    Requirements
    • Background in artificial intelligence and especially heuristic search.
    • Good programming skills, preferably in C++.
    Student's gain
    • Get knowledge of advanced pathfinding techniques.
    • Possibly contribute to advancing the state of the art in pathfinding.
    • Possibly continue working in this area as a graduate student.
    Supervisor
    Adi Botea
    adi.botea@nicta.com.au
    http://abotea.rsise.anu.edu.au


    Plasma MHD simulations of Concept Fusion Power Plants

    Background:

    Controlled magnetic confinement fusion offers the possibility of an inexhaustible supply of energy with zero greenhouse gas emissions. At the very high temperatures required to sustain a fusion reaction (millions of degrees), the fuel mixture is in a plasma state.

    Magnetohydrodynamics (MHD) is the study of magneticised plasmas, with applications ranging from solar flares through to laboratory confinement. The study of such plasmas using MHD in complicated toroidal magnetic confinement geometry necessitates the use of numerically intensive PDE and eigenvalue solvers. Different stability codes often include different physical effects, and utilise a wide range of input parameterisations, and/or coordinate systems. The benchmarking of results from different codes is important in validating code predictions, especially in the design of concept experiments. Such physics studies frequently involve determining the equilibrium and stability across parameter sets and configuration profiles.

    A

    Figure 1: Cut-away of a concept Spherical Tokamak Fusion Power Plant (3GW thermal, 1GW electric)

    parallel development has the been the development of coupling algorithsm in the climate modelling community. Climate simulations often require different physics models which function on a range of different temporal and spatial scales. Motivated by these needs, Dr Jay Larson of Argonne National Laboratory has developed the Model Coupling Toolkit (MCT), a software toolkit that facilitates programming of parallel couplings between MPI-based applications. In this project we will (1) deploy MCT on an array of different ideal MHD equilibrium and stability codes, (2) develop algorithms within MCT to perform parameter scans of the Component Test Facility and Spherical Tokamak Power Plant, and (3) interpret the stability implications.

    Project outcomes: Deployment of MCT to multiple plasma MHD codes, algorithm development to undertake configuration scans, and application to assess the stable operating configuration of next generation fusion energy experiments.

    Requirements: Students will need to have an interest or be familiar in working in a range of numerical simulation languages, Fortran 90, C; graphics visualisation languages such as Matlab, and an interest in physics and sustainable energy. The project suits a student at the interface of computational science, engineering, and physics.

    Student's gain: Experience in the science of large scale numerical analysis, code parallelisation, and addressing the coupling issues associated with different configuration metrics. The skill base is portable to large scale numerical analysis in any field. Wider outcomes include an exposure to sustainable energy research.
    Supervisor: Dr. Matthew Hole Co-supervisor: Dr Jay Larson
    Email: matthew.hole@anu.edu.au email: Jay.Larson@anu.edu.au






    Data mining projects

    Why do students fail? Data mining the DCS student database (Computer Science honours project)

    The ANU Department of Computer Science has been collecting student course enrolments, marks and grades through its FAculty Information System (FAIS) for several years.

    The aim of this honours project is to develop data mining techniques that allow analysis of the FAIS database with the objective to better understand student performance, progress and retention. The questions that the department is interested in include, for example: why students stop studying computer science?, what correlations there are between their marks in different courses?, can we predict that a student will have problems in a certain course based on her or his past performance?, can we identify students that might be at risk of failing or dropping out of computer sciences courses early in their studies?

    This project will include the following components:

    • Exploration, development and application of data mining techniques that allow mining of the multi-relational data as available in FAIS. This will include using the open source data mining tool Rattle as well as development of specific techniques required for this project (likely using the Python programming language).

    • Development of a data generator - based on real, summarised FAIS data - that will allow creation of synthetic data to be used for later testing and evaluation of the data mining techniques to be developed in this project.

    • Analysis of real, anonymised FAIS student data. This work will be done on a secure compute server (which is currently being purchased by the ANU Data Mining group for real world data mining projects such as this that involve sensitive data).
    Note: Parts of this project will require a human ethics clearance which is currently being applied for through the Human Ethics committee at the ANU Research Office. Depending upon the outcomes of this application there might be changes to this project.

    Supervisor: Peter Christen


    Data linkage projects

    Background

    Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), taxation, statistics, police/intelligence and telecommunications. Data mining techniques are used to analyse such large data sets, in order to find patterns and rules, or to detect outliers.

    In many data mining projects, information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such data linkages is to merge all records relating to the same entity, such as a customer or patient. Most of the time the linkage process is challenged by the lack of a common unique identifier, and thus becomes non-trivial. Probabilistic data linkage techniques have to be applied using personal identifiers like names, addresses, dates of birth, etc.

    The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of improved techniques and algorithms for large scale high-performance data linkage. We have been developing an open source data linkage system called Febrl (Freely extensible biomedical record linkage), which includes modules for data cleaning and standardisation (of names, addresses, dates of birth, etc.), deduplication, data linkage (or matching), and geocoding.

    For more information please have a look at our publications as well as presentations that describe our research in more details.

    The following student projects address important issues within our data linkage project.

    Supervisor for all projects: Peter Christen


    Probabilistic data generation (Computer Science honours project)

    One of the difficulties in data linkage research is getting access to real world data sets containing real names and addresses, as such data is protected by privacy and confidentiality regulations. Developing, and testing new and improved data linkage algorithms therefore becomes very difficult.

    We have developed a data set generator (which is part of Febrl). However, this generator is very simple, and various ways of improving it are possible. For more details please have a look at the recent publication Probabilistic Data Generation for Deduplication and Data Linkage.

    Student research project

    The aim of this honours research project is to further explore how our simple data set generator can be improved. This would include analysing the literature in this area (artificially creating data sets), analysing the existing data generator code, and developing improved techniques that reflect reality more accurately. Issues involved are:

    • Accurately model typographical errors as introduced by (a) scanning and optical character recognition (OCR), (b) keying errors (i.e. errors introduced when people type data into a keyboard), and (c) phonetical errors introduced when data is entered over the phone (and is manually typed in).
    • Model the dependencies between attributes (for example if a person moves, most of her or his address details will change).
    • Model household and family data (i.e. not just generate single records independently, but create sets of records for families, or shared households).

    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of statistical, machine learning, and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).


    Parallelisation of data linkage techniques (Computer Science honours project)

    One of the problems when linking large collections of data with tens or even hundreds of millions of records is the complexity of data linkage techniques. Potentially each record in one data set has to be compared with every record in the second data set, resulting in a quadratic complexity O(n2). Techniques known as blocking are normally used to reduce the number of record pair comparisons (for example, only records that have the same postcode value are compared with each other). However, linking large data sets still takes a lot of time.

    Only a very limited amount of research has been done in this area (we are only aware of four publications so far the deal with parallel data linkage, including one of our own papers: Febrl - A Parallel Open Source Data Linkage System, PAKDD'04).

    Student research project

    The aim of this project is to develop parallel data linkage techniques that allow the linking of large data sets on modern multi-core computing platforms. Issues which have to be considered include:

    • Data distribution and load balancing
    • Scalability
    • Privacy and confidentiality (as names and addresses are normally used for the linkage)
    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of parallel computing techniques, as well as statistical, machine learning, and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).


    Improved record pair classification techniques (Computer Science honours project)

    A crucial step in the data linkage process is the classification of pairs of records that are being compared into either matches or non-matches. For example, consider the following two records containing a name and address each:

    Dr Smith, Peter 42 Miller Street 2602 O'Connor
    Pete Smith 42 Miller St 2600 Canberra A.C.T.

    Are these two records representing the same person, or two different people?

    In the past few years researchers from the data mining, machine learning, and AI communities have explored various classification techniques, including decision trees, genetic algorithms, clustering, active learning, support vector machines, and instance-based classifier to improve the classification of record pairs. Other researchers have applied techniques from information retrieval and database research to improve the linkage classification quality. However, this research has been fragmented and so far no large study comparing the many classifiers has been conducted using different data sets (both synthetic and from the real world).

    Student research project

    The aim of this project is to (1) implement prototypes of different classifiers and then conduct studies comparing their performance using different data sets, and to (2) develop new - based on existing methods - classifiers that are well suited for the data linkage classification problem. Issues involved are:

    • Which classifier performs best for what types of errors in the data (e.g. simple typographical errors, missing words, swapped words, etc.)
    • Does a combination of classifiers work better than one single classifier?
    • Comparison of supervised techniques and un-supervised techniques.
    • Exploration of active learning techniques (in order to reduce the manual effort required to create training data)
    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of statistical, machine learning, and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).


    Semantic Web projects

    User Consumable Interface for the Semantic Web

    The vision of the Semantic Web is to equip the World Wide Web with machine-processible and machine-understandable semantics, so that the Web can becomes a universal medium of information and knowledge exchange for both computer and human being. Towards this vision, a number of building blocks of the semantic web have been created, including the Resource Description Framework (RDF), the Web Ontology Language (OWL) and several query languages such as SPARQL. It can be foreseen that in the near future a lot of information on the Web will be described or annotated using these facilities, resulting in a large scaled knowledge base. Creating a consumable interface for naive users to access such a large knowledge base is becoming a new challenge, as traditional interfaces on databases or knowledge bases are programmer-oriented rather than user-oriented.

    Goals for this project: In this project, we will design and implement a number of techniques to facilitate users' accesses to large scaled data sources on the semantic web.

    Requirements: Students will need to be proficient systems programmers with good C++ or Java skills.

    Student's gain: The student will work on an interesting research problem with potential impact on the future World Wide Web. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal.

    Literature and Background: Database usability; DBpedia -- a dataset for the semantic web.

    Supervisor: Dr. Jemma Wu (jemma DOT wu AT csiro DOT au), Dr. Xuan Zhou (xuan DOT zhou AT csiro DOT au)


    Semantics in Sensor Networks

    Background: Despite existing different domain-specific deployments and the heterogeneity of sensor networks and sensing data, sensor networks share a common feature: they all collect and periodically transmit information from some set of sensors, and they all must carefully manage limited power and radio bandwidth to ensure that essential information is collected and reported in a timely fashion. From a data storage point of view, one may think of a sensor network as a distributed database that is able to conduct query processing. This has led to the design and implementation of query processing a very popular topic. However, query processing in sensor networks works different from that in a traditional DBMS given that the former is resource constrained (e.g. limited energy, memory and computation ability). Furthermore, the volume of the sensing date can be very variable (e.g. different sampling rate) within a time period and the access to this data may be delayed arbitrarily due to the motion of sensors and the network latency. These add new challenges to the not yet fully solved query processing problem and more needs to be done to enable the success of sensor networks applications.

    Project outcomes: In this project, we will investigate what the ž³semanticsž´ can help to improve query processing - other than just conventional development of a schema for metadata about (semantic) transformation. Prototypes will be based on a real world problem. The Fleck platform (*Fleck* - the sensors and sensor operating system developed by CSIRO ICT Centre) is available for prototyping work. The outcome will include a working prototype for the future work to be built on.

    Requirements: Students will need to be familiar with C programming language.

    Student's gain: Student will be working on one of the most interesting sensor networks issues with other research scientists from CSIRO. At the end of this project, students will gain insights into R&D from both industry and academia and are ready to explore other issues arising from sensor network applications.

    Supervisor: Dr. Lily Li
    Email: lily.li@csiro.au


    Using semantics to better understand the Web

    Background: The Semantic Web means different things to different people because it is very broad. It may mean:
    • RDF, Microformats, OWL
    • web services
    • AI techniques
    • Simple and tangible applications
    So it is not surprising that it has been used by Google, Yahoo, and Firefox to attract users and certainly it will be playing more important role in the next generation search engines. Because users simply look for the utility and usefulness of a system rather than the technologies on which it is built on, developing new tools and applications on top of the Semantic Web is the real challenge. In this project, we aim to develop algorithms that (1) extract entities out of unstructured web pages; (2) handle continuous influx of user data (e.g. from Wikis); (3) parse it; and then (4) update the semantic repositories thus to keep up with the change of the world (a world on the Web).

    Project outcomes: A number of algorithms will be developed, and eventually leading to easy access to the information on the Web – other than just key words search.

    Requirements: Students will need to have knowledge/skills about (1) web services technologies; (2) good understanding of RDF, OWL and reasoning on OWL; and (3) C++/Java.

    Student's gain: By keeping up with the latest developments in the Semantic Web, students will not only gain insights into the vision of the Semantic Web but also have some hand-on experience about how to realise the semantics on the Web. Algorithms developed in this project may have potential to be used by popular search engines to deal with unstructured information on the Web more efficiently.

    Supervisor: Dr. Lily Li
    Email: lily.li@csiro.au


    Scalable Work Queues for Parallel Garbage Collection

    Physical limits mean that future performance improvements in processor design will mostly come from increased application-level parallelism. Thus the ability for systems to scale well on parallel architectures is an extremely important issue. Since managed languages such as Java and C# have become dominant in the application development domain, the scalability of managed runtimes has become a critical research issue. In this project we explore the scalability of a fundamental data structure at the heart of a Java runtime system---a load balancing parallel work queue.

    Supervisor: A. Prof Steve Blackburn, ANU

    Goals for this project

    This project will examine the scalability and pathologies of the load balancing parallel work queue used by MMTk, which is the memory management toolkit used by Jikes RVM. Hopefully the project will either confirm the scalability of the existing system or identify (and implement) a more scalable work queue design.

    Requirements

    Students will need to be proficient programmers with a good basic understanding of computer architecture and parallel algorithms. The work will all be done in a slightly restrictive subset of Java, so experience with Java is essential. Experimental design and data analysis is key to the first phase of the project, so experience in this area will be an asset (of course there will be plenty of assistance from your supervisor, particularly in this area).

    Student's gain

    The student will gain hands-on experience with an increasingly important problem in the context of the leading research infrastructure for managed runtime research, in the setting of a world leading team.

    Literature

    This is a well-focussed problem and does not require any significant background reading. The load balancing work queue is reasonably well insulated from the rest of the environment we work in, so a deep understanding of MMTk or Jikes RVM is not necessary. www.jikesrvm.org Contact Steve Blackburn

    Architectural Analysis of Garbage Collection Behavior

    Trends in microprocessor design are toward massively multicore chips. One direction within this burgeoning research area is that of asymmetric multicore design, where the cores within a chip are not homogeneous. In this setting some cores may serve very specific purposes. Given a) the centrality of garbage collection to managed runtimes, and b) the very limited requirements of garbage collection algorithms, it may make sense to have dedicated garbage collection cores. To this end, this project will explore the architectural "footprint" of a garbage collector with a view to ascertaining the base requirements for a dedicated garbage collecting core.

    Supervisor: A. Prof Steve Blackburn, ANU
    Goals for this project
    The project will use an architectural profiling tool (valgrind) to profile a range of garbage collection workloads and thus attempt to characterize the architectural requirments of garbage collectors.

    Requirements

    Students will need to be proficient programmers with a good basic understanding of computer architecture. The primary tool for this project will be valgrind, a publicly available virtualization tool written in C. The student working on this project should be an accomplished systems programmer.

    Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes, at the architectural level. This project is a logical starting place for a much deeper examination of the project and eventually, realization in an FPGA. Literature
    A basic understanding of garbage collection would be helpful, but is not essential. Henessy and Patterson's standard text on computer architecture ("Computer Architecture: A Quantitative Approach", 4th edition) is an invaluable starting point for this work.

    Contact Steve Blackburn


    Profiling Java Virtual Machine Performance

    The performance of managed runtime systems and the applications which execute on them is a hot topic of academic and industrial research. However, dynamic compilation, feedback directed optimization, and concurrency each make performance analysis significantly more complex than in traditional ahead-of-time compiled, static environments such as C programs. This project will explore techniques for profiling the performance of virtual machines and applications running over them. These techniques will include sampling via interrupts (due to timers and/or hardware performance counter triggers) and direct instrumentation of compiled code. Additionally, the project will explore dynamic generation and maintenance of code annotations so that machine code and calling contexts can be tied back to source code in the application or VM source code.

    Supervisor:A. Prof Steve Blackburn, ANU
    Goals for this project This project will implement a number of different profiling techniques and a basic mechanism for dynamic maintenance of mappings between machine code and source code, in the context of Jikes RVM, a widely used research JVM.

    Requirements Students will need to be proficient systems programmers with a good C and Java skills.

    Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes.

    Literature and Background Familiarity with a simple profiling tool such as gprof would be an asset. Some interesting related work can be found at the following links: http://portal.acm.org/citation.cfm?id=1134012 http://www.cs.utexas.edu/users/mikebond/pcc-tr-2007.pdf Contact Steve Blackburn


    Bubble VM: the VM within the VM within the VM....

    Surprisingly, and despite the stated advantages of managed languages (such as Java and C#), no commercial managed runtime systems are written in the language they implement. However, there are two successful research runtimes which take this approach, Jikes RVM (an open source Java-in-Java VM), and Bartok (an ahead-of-time C# system on which the Singularity OS project is built). Both of these projects have demonstrated considerable promise, and both have demonstrated that performance need not be a roadblock to such VM implementations. One of the most vexing issues faced by such VMs is the potential for blurring the distinction between the application and the host on which it runs. In this project we will explore how to better understand the boundary between the application and the supporting VM, using a rough analogy to the isolation that exists between user mode and kernel mode in a modern operating system. A good understanding to this problem could fundamentally change the design of such JVMs, increasing their reliability, improving their debugability, and improving security. Such a VM could run on any other VM and could run any application---thus could be self hosting and in principle could host an arbitrary nesting of instances of itself within itself!

    Supervisor: A. Prof Steve Blackburn, ANU Goals for this project This project will start by trying to clearly identify the application-VM boundary in Jikes RVM, a widely used research JVM. The project will try to apply the OS metaphor of trapping from one context or mode to another. Depending on the student's ability and ambitions, the project could lead to much more ambitious goals such as enabling Jikes RVM to self host.

    Requirements
    Students will need to be proficient systems programmers with a good C and Java skills.

    Student's gain The student will gain a deep insight into the design and implementation of managed runtime systems. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes.

    Literature and Background
    There is not a great deal of prior work in this area. However, the original paper describing Jalapeno (which became Jikes RVM) describes some of the most basic issues that a Java-in-Java design must face: a Java-in-Java design must face Contact: Steve Blackburn


    Projects offered by Dr Eric. McCreath


    Instance Based Methods

    Supervisors: Dr Peter Baumgartner, Prof John Slaney
    E-Mail: Peter.Baumgartner AT nicta.com.au

    Formal logic provides a mathematical foundation for many areas of computer science. Logical languages are used as specification language within, e.g., program development and verification, hardware design and verification, relational databases, and many subfields of Artificial Intelligence. Automated Deduction is concerned with the design and implementation of algorithms based on logical deduction for solving problems in these areas. Instance Based Methods are a relatively new family of methods for Automated Deduction; they share the principle of carrying out proof search by maintaining a set of instances of input formulas and analyzing it for satisfiability until completion.

    I offer several research-oriented honours projects centered around Instance Based Methods, in particular in connection (experimentation/extension) with our own Darwin system (http://goedel.cs.uiowa.edu/Darwin/). Please see this page for more details.


    Supercom

    Supervisor: A/Prof Sylvie Thiébaux, Dr Anbulagan, Dr Olivier Buffet, Dr Jinbo Huang, Dr Yannick Pencolé
    E-Mail: sylvie.thiebaux-AT-anu.edu.au, anbulagan-AT-nicta.edu.au, olivier.buffet-AT-nicta.edu.au, jinbo.huang-AT-nicta.edu.au, yannick.pencole-AT-anu.edu.au

    Background

    Composite systems feature simple components organised into a highly reconfigurable and modular architecture, due to which they can appropriately assemble to achieve complex operations. Examples include web and grid services, energy distribution systems, water management systems, chemical plant control systems, telecommunication and computer networks, to name but a few.

    In the "Supercom" project, we are interested in building Artificial Intelligence tools to supervise such composite systems. These will need to confer the system the ability to self-monitor (diagnose) and self-organise (assemble, reconfigure) to achieve optimal performance.

    In this framework, various projects are available on topics such as decentralised diagnosis [1], decentralised planning [2] or diagnosis using satisfiability and knowledge compilation methods [3]. The main supervisor will depend on the project, but interactions with other researcher should appear naturally. A strong background in computer science, software engineering, or mathematics, and an interest in artificial intelligence are required, as well as good programming skills.

    References:

    [1] Yannick Pencolé, Marie-Odile Cordier A formal framework for the decentralised diagnosis of large scale discrete event systems and its application to telecommunication networks, Artificial Intelligence, Volume 164, Number 1-2, May 2005.
    [W: http://www.informatik.uni-rier.de/%7Eley/db/journals/ai/ai164.html#PencoleC05]

    [2] E. Amir and B. Engelhardt, Factored Planning, in 18th Intl' Joint Conference on Artificial Intelligence (IJCAI'03), 2003.
    [W: http://www.cs.uiuc.edu/~eyal/papers/factored-plans-ijcai03.pdf]

    [3] Look-Ahead Versus Look-Back for Satisfiability Problems , Chu Min Li and Anbulagan, CP, 1997, Linz, Austria.
    [W: http://web.rsise.anu.edu.au/~anbu/papers/CP97Anbulagan.pdf]


    A compiler for program synthesis

    Supervisors: Dr Michael Comton, Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore, Dr Alwen Tiu (RSISE, ANU)
    E-Mail: Mark.Cameron@csiro.au
    E-Mail: Rajeev.Gore@anu.edu.au

    Background

    This project is about writing a compiler for automatic synthesis of 'programs' from specifications. Given the specifications for services S1,...,Sn and the specification of goal G the compiler will generate a 'program' satisfying G by using the services S1,...,Sn. This compiler could be targeted to a number of possible settings. For example, the specifications could describe web services and the goal in this case is automatic generation of composite web services from other web services. Alternitively, the specifications could describe components of a security system and the goal some larger securty property thus the compiler automatically generates the larger security framework. The Information Engineering Lab at CSIRO's ICT centre already has such a compiler for web services [1], in which a logic mapping language relates resources, such as data and web-services, to relevant concepts and properties [2], and the composition compiler automatically translates a high-level declarative composition goal into an executable workflow responsible for orchestrating the data and web-service calls necessary to materialize the specified data product. But, we are currently restricted to composing over services that have stateless, functional and deterministic operations. We would like to extend this composition environment to work with web-services that have a stateful interaction style (i.e. the operation/message that the service can perform depends on the current state of the service and the service state machine). A published method for web Service composition [3,4,5] does so, by using a modal logic and tableaux theorem proving. But it concentrates on message flow, not data. We would like to integrate the two. The project would involve investigating how to integrate both data and state into the specification of services and a composition goal. The outcome would be a logic for describing the specification of each and a compiler that can automatically generate a 'program' satisfying the goal by using the provided services. The project would suit a student with an interest in logic and specification. Refrences [1] Cameron, M. A., Taylor, K. L., Baxter, R. Web-Service Composition and Record Linking VLDB Workshop on Information Integration on the Web (IIWeb-2004) Toronto, Canada, August 2004. URL: http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf [2] Cameron, M.A. and Taylor, K. (2005), First order patterns for information integration, Proceedings of the 5th International Conference on Web Engineering (ICWE2005), 25-29 July, Sydney, Australia. URL: http://dx.doi.org/10.1007/11531371_25 [3] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), Synthesis of Underspecified Composite e-Services based on Automated Reasoning in Proc. of the 2nd Int. Conf. on Service Oriented Computing (ICSOC 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html [4] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), ESC : A Tool for Automatic Composition of e-Services based on Logics of Programs in Proc. of the 5th VLDB Workshop on Technologies for E-Services (TES 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html [5] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Rick Hull, and Massimo Mecella (2005), Automatic Composition of Transition-based Semantic Web Services with Messaging in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB 2005). URL: http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html


    Tableaux reasoning for web service composition

    Supervisors: Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore (RSISE, ANU)
    E-Mail: Mark.Cameron@csiro.au
    E-Mail: Rajeev.Gore@anu.edu.au

    Background

    The Information Engineering Lab has a logic-based service composition framework that is being applied across a number of domains, including Health Data Integration (Cameron et al) http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf and environmental management http://www.iemss.org/iemss2002/proceedings/pdf/volume%20uno/304_cameron. pdf . A published method for web Service composition (Berardi et al 2004) http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html (Berardi et al 2005) http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html relies on a translation of finite-state machine representations of a composition goal and stateful Web services into a modal/description logic, PDL. Tableaux theorem proving can then be applied to find a web service composition that statisfies the goal. This project would aim to replicate and implement that basic approach, but considering alternative logics to improve the scope of the compositions that may be achieved.


    Eye-gaze tracking as an interface for virtual environments

    Supervisors: Chris Gunn (CSIRO) Henry Gardner (DCS ANU)
    E-Mail: Chris.Gunn@csiro.au
    Henry.Gardner@anu.edu.au

    Background

    This project will seek to characterise the effectiveness of eye-gaze tracking as a human-computer interface for virtual environments. The concept is that participants should be able to interact with projected images using a combination of keyboard controls and eye gaze. By focussing their visual attention on icons or images on the screen, users should be able to select these icons or activate other interaction. Two application areas of interest are the Mind Attention Interface for the Wedge virtual reality theatre and a something-or-other system in which participants stand in front of a half-spherical shell screen. In the Mind Attention Interface, eye-gaze tracking equipment will be integrated with electroencephalogram (EEG) equipment which measures electric signals from the brain of a participant. Correlating eye-gaze measurements of attention with EEG data would be a possible extension of this project.


    Projects offered by Peter Strazdins

    Basic Linear Algebra kernels for the T2 Multicore Processor

    The Niagara or CoolThreads (T) series of processors has been motivated by commercial applications such as web, database and eCommerce servers. Thus, they have been able to achieve high throughput on workloads with many execution threads, which are integer, memory and I/O intensive. However, it has come equipped with a floating point unit on each core. The question arises: how effective is it for (parallel) numerical applications, and how can it be best programmed for these?

    This project will investigate how to program the matrix-matrix multiply kernel from the BLAS library, dgemm(), for the UltraSPARC T2, optimizing it for both single core and multicore/thread performance. It will also evaluate its performance. Since dgemm() forms the computational heart of almost all dense linear algebra computations, it is suprising that there seems to be very little work done on this topic so far.

    Evaluation of the T2's Multithreading Performance

    The T2 not only has 8 computational cores, but each core can concurrently support 8 computational threads. The question arises: how much speedup can you get from the multiple cores, and how much more can you get out of the hardware threading? Furthermore, what is the best way to synchronize software threads, running across different cores and hardware threads? In what ways is such a processor different from a traditional shared memory processor?
    This project will build on the a previous
    project by Hugh Blemings based on the OpenPower 720. It involves evaluating the T2 on memory- and compute- intensive parallel benchmarks and synchronization algorithms.

    A variant/extension to this project is described below.

    Tuning the Message Passing Interface library for large-scale Shared Memory / Multicore Processors

    The Message Passing Interface (MPI) is the standard programming paradigm for distributed memory parallel computing, and many large and important applications have already been written in terms of MPI, and have run successfully on recent cluster computers. With the recent advent of large-scale chip multi-processors, the issue emerges of how can these applications can reap the potentially massive performance of the new generation of clusters that will be comprised of a number of processor chips connected by shared memory, where each chip is a chip multi-processor. The simplest method for such applications to run a separate MPI process on each CPU in the system, but the efficiency of this depends on whether the MPI implementation is sufficiently intelligent to make use of the fact that there will be large sub-groups of processes that are running on the same node and hence can communicate by much faster communication mechanisms.

    This project will investigate the efficiency of the highly sophisticated OpenMPI implementation of MPI on large-scale shared memory nodes, such as the T2 (8 core by 8 strand SPARC) , Freemont (an 8 x 2 CPU Opteron) and IBM OpenPower 720 (2 x 2 CPU x 2-way multithreading) systems hosted in DCS. It will begin by an investigation of the relevant algorithms used in OpenMPI, concentrating on barriers and collective operations, comparing them (where applicable) with corresponding operations using threaded programming paradigms.

    Automatic Tuning of the High Performance Linpack Benchmark

    (co-supervised by Alistair Rendell and Warren Armstrong)

    Supercomputers today are large-scale parallel processors, and the Top 500 list is the most definitive ranking of the world's fastest. While the status of this list is contentious, the list is based on the results of a single benchmark - the Linpack benchmark - which in this case solves a (very) large dense linear system. The list is dominated by Beowulf-style cluster computers: a parallel computer using (typically Commercial-off-the-Shelf) switch-based network to communicate between the processors.

    There is a publically available open-source code for a highly optimized benchmark called High Performance Linpack (HPL). It is also highly tunable, but the tuning currently must be done by hand. This is problematic, in the sense that it requires both knowledge if the complex algorithms used in HPL (beyond the understanding of most people who wish to use the benchmarks) and of the (cluster) supercomputer itself. Furthermore, in the case of clusters, this problem is exacerbated by the fact that the cluster may be non-homogeneous (so the benchmark results may depend on which parts of the cluster are selected), and by the fact that by their nature clusters may be easily reconfigured. Thus, it is highly desirable that this tuning process can be automated.

    The LAPACK For Clusters (LFC) project went partway in addressing this, in that some work on determining, given p processors, to the best logical P x Q logical processor configuration, where P*Q <= p was made. However, automatically choosing the critical blocking parameter and a plethora of algorithm variants remains to be done.

    This project will investigate automatic tuning methods for the HPL package for cluster computers. The search can be done statically, and employ efficient empirical methods based on techniques such as the Nelder-Mead simplex search, as is done for the ATLAS package (which itself can be used as a component of HPL).

    The project involves experimenting with a variety of simple and more sophisticated tuning methods, and would evaluate their suitability in this context. The DCS Saratoga cluster will form a suitable experimental platform. The project has the potential for publishable work and for the production of widely used open-source software.

    Relevant references include the IEEE Task Force on Cluster Computing ; a paper on solving dense linear systems on this platform (exposing some problems with HPL); Dongarra, J et al. Self Adapting Numerical Software (SANS) Effort, IBM Journal of Research and Development 50(2-3), 2006; A Case Study of HPL Benchmarking and Tuning, Verdi March, Technical Report APSTC-TR-2008-01, 07 Jan 2008.

    Parallelizing Applications on a Heterogeneous Cluster

    A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. A typical cluster tends to evolve, and hence tend to become heterogeneous.

    The efficient parallelization of applications using the Message Passing Interface (MPI) is reasonably well understood for homogeneous clusters. However, for heterogeneous clusters, which can not only have nodes with different processor architectures but also sections of nodes connected by different networks, the situation is still far more challenging. In particular, differences in processor and network speeds make it difficult to utilize the faster components more effectively.

    Using the Jabberwocky heterogeneous cluster, this project will investigate techniques used so far to parallelize (scientific/engineering) applications for heterogeneous clusters. The project will then make a selection of these applications and evaluate the effectiveness of these techniques on the Jabberwocky. New or variant techniques may then be proposed and evaluated.

    Relevant references include the IEEE Task Force on Cluster Computing home page, and An Overview of Heterogeneous High Performance and Grid Computing, by Jack Dongarra and Alexey Lastovetsky.

    Extended Threads Emulation in an SMP Computer Simulator

    Simulation is playing an increasingly important role in the design and evaluation of current and future high performance computer systems. Sparc-Sulima is an open-source UltraSPARC SMP simulator, written in C++ and developed by the CC-NUMA Project. Sparc-Sulima simulates an UltraSPARC processor by interpreting each instruction. It has timing models which predict an UltraSPARC III Cu shared memory processor performance to a high degree of accuracy.

    Currently, Sparc-Sulima can simulate an unmodified Solaris (UltraSPARC) binary (executable program). When that program executes a system call trap instruction, the corresponding operating system effects are `emulated' (the simulated computer state is updated directly, rather than via simulating a lengthy and complex sequence of kernel-level instructions), and simulated execution then resumes. This mechanism is handled by the SolEmn emulation layer in Sparc-Sulima. Threaded programs (e.g. using the pthreads Posix thread libraries) ultimately access thread-related operating system services via the Solaris _lwp (lightweight process) trap instructions. The limitation of the current implementation is that, at any time, there can only be one active thread per CPU. Such a limitation has been in all emulation-based computer simulators in the literature also.

    The aim of this project is to extend the emulation of the _lwp trap instructions to remove this limitation. It will require implementing thread switching, which can occur by time-outs, and also by the _lwp_suspend and yield system calls. The current thread context save/restore mechanisms will also have to be extended. If successful, the work should be publishable (along with the existing work on threads emulation). An extension of this project, would be to investigate which thread scheduling heuristics best approximate real machine behavior.

    Relevant references include: Kevin Skadron et al. Challenges in Computer Architecture Evaluation, Computer, August 2003; and papers from the Sparc-Sulima home page.


    Energy-Efficient Data Gathering in Wireless Sensor Networks

    Supervisor: A/Prof Weifa Liang
    E-Mail: wliang@cs.anu.edu.au

    Background

    Wireless sensor networks have received significant recent attention due to their potential applications from civil to military domains including military and civilian surveillance, fine-grain monitoring of natural habitats, measuring variations in local salinity levels in riparian environment, etc. The sensors in sensor networks periodically collect the sensed data as they monitor their vicinities for a specific interest.

    The sensed data can be transmitted directly to the base station where the client queries are posted, the base station then processes the collected data centrally and responds to users queries. However, due to the fact that each sensor is equipped with energy-limited battery, the energy efficiency rather than the response time for each query is paramount of importance in sensor networks. The energy consumption for the above centralized processing is very expensive. Instead, we adopt energy-efficient in-network processing, i.e., the data gathering is implemented through a tree rooted at the base station, and the data at each sensor has to be filtered or combined locally before it is transmitted to its parent.

    Unlike most existing assumptions that each sensor only transmits a fixed length of message, we consider another data gathering, in which the length of the message that each sensor transmits to its parent is proportional to the sum of the message lengths of its children and itself. Thus, the total energy consumption at a node is proportional to the length of the transmitted message and the distance between the node and its parent in the tree. Specifically, on one hand, we aim to find such an energy efficient spanning tree rooted at the sink node that the network lifetime is maximized. On the other hand, we prefer the approximate result instead of the exact result in traditional data models for a given query, since the communications among the sensors are fail-prone. Under this circumstance, how close the approximate result is to the exact result becomes the essential issue.

    In summary, this project will be in pursue of energy efficiency on the basis of database modeling in sensor network environments. A student working on this project would be involved:

    • participate the research project.
    • study the problem complexity if taking the network lifetime as the sole metric
    • propose heuristic algorithm for finding a data gathering tree and provide approximate aggregate result for a given aggregate query
    • conduct experiments by simultation to validate the efficiency of the proposed algorithms.


    Energy-Efficient Aggregate Node Placement for Data Gathering in Wireless Sensor Networks

    Supervisor: A/Prof Weifa Liang
    E-Mail: wliang@cs.anu.edu.au

    Recent advances in microelectronic technology have made it possible to construct compact and inexpensive wireless sensors. Networks formed by such sensors, termed as wireless sensor networks, have been receiving significant attention due to their potential applications in environmental monitoring, surveillance, military operations and other domains. While these applications revealed tremendous potential for capturing important environment phenomenon using sensor networks, they have also posed certain associated limitations. One of the major limitations of sensor nodes that they are powered by low power batteries, which limit the network lifetime and impact on the quality of the network. Energy conversation in sensor network operations is thus of paramount importance. As most currently deployed sensor networks are used for monitoring and surveillance purposes, the sensed data generated by the sensor network must be available for users at the base station, data gathering that extracts and collects sensed data from sensors thus is one fundamental operation in sensor networks. The energy conservation on data gathering leads to prolongation of network lifetime. The study of energy-efficient data gathering poses great challenges due to the unique characteristics and physical constraints imposed on sensor networks. In this project we study the {it aggregate node placement problem for data gathering}, which aims to place a few powerful aggregate nodes to a dense sensor network such that the network performance can be significantly improved. Assume that a heterogeneous sensor network consists of two types of sensor nodes: cheap sensor nodes that have fixed identical transmission range and expensive aggregate sensor nodes that have various transmission ranges. Aggregate nodes have more energies, higher data transmission rate, and better processing and storage capabilities than sensor nodes. The main task of an aggregate node is to process and relay data for sensor nodes and the other aggregate nodes. Due to expensive cost associated with aggregate nodes, the number of them in a sensor network is very limited, thus, they need to be placed in the network carefully in order to increase the network connectivity, reduce the data gathering delay, and prolong the network lifetime. Specifically, the aggregate node placement problem for data gathering is as follows. Given a sensor network consisting of $n$ dense sensor nodes that are randomly deployed in a region of interest and a few number of expensive aggregate nodes $K$ ($K<< n$), the problem is to place the $K$ aggregate nodes in the region so that the network lifetime can be further maximised by answering data gathering queries.


    Energy-Efficient Skyline Computation and Maintenance in Sensor Networks

    Supervisor: A/Prof Weifa Liang
    E-Mail: wliang@cs.anu.edu.au

    Background

    Technological advances in recent years have enabled the deployment of large-scalable wireless sensor networks (WSNs) consisting of hundreds or thousands of inexpensive sensors in an ad-hoc fashion for a variety of environmental monitoring and surveillance purposes. In these applications, a large volume of continuous sensed data generated by sensors needs to be either collected at the base station or aggregated within the network. Thus, WSNs usually are treated as a virtual database. The skyline query, as an important operator in modern databases for multi-preference analysis and decision making, has received much attention recently due to its wide applications. In this project we consider the skyline query problem in WSNs.

    We aim to devise novel, distributed evaluation algorithms for skyline query evaluation from the scratch. We also consider the skyline maintenance within sliding window environments by developing distributed maintenance algorithms for it. We will conduct extensive experiments by simulation to evaluate the performance of the proposed algorithms in terms of multiple performance metrics including the total energy consumption, the maximum energy consumption among the network nodes, and the network lifetime.

    A student working on this project would be involved:

    • participate the research project.
    • study the problem complexity with several optimization metrics, particularly the network lifetime metric
    • develop energy-efficient heuristic algorithm for skyline finding and maintenance
    • conduct experiments by simultation to validate the efficiency of the proposed algorithms.


    Neural networks theory and applications

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background

    A number of projects are possible in this area. I am interested in a number of topics, such as extracting rules from neural networks, information retrieval using neural networks, data mining and feature selection, cascade neural network structures, hierarchical neural network structures, and neural network applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with neural networks is necessary. Most projects will use the very popular backpropagation neural network training algorithm. If you are interested in neural networks and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.

    Example projects: i. Cascade neural network stuctures can be built automatically without making decisions as to the number of neurons required to solve a problem by adding single neurons. This project would investigate the use of larger chunks such as feature maps as cascade components, which would be useful for recognising images (including faces).

    ii. Rule extraction: Neural networks can learn complex tasks but face the problem that human beings do not trust them as they can not understand _why_ a particular decision is made. This project focuses on rule extraction for explanation.


    Fuzzy logic theory and applications

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background

    A number of projects are possible in this area. I am interested in a number of topics, such as automated construction of fuzzy rule bases from data, hierarchical fuzzy systems, fuzzy interpolation, information retrieval using fuzzy logic,, universal approximators, and fuzzy logic applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with fuzzy logic is necessary. If you are interested in fuzzy logic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me. Example projects: i. Fuzzy document filtering: Web search engines return many documents which are not relevant to queries. Fuzzy techniques to enhance the results from search engines will be very useful. This project will investigate a number of fuzzy techniques for this purpose.

    ii. Combining uncertainty: Fuzzy logic is a technique which deals with uncertainty well. Sometimes data contains multiple kinds or sources of uncertainty. For example, a pedestrian could report that he saw someone he was quite sure was stealing something. He was not certain and his own reliability is another but different kind of uncertainty. This project is to develop some methods to combine different kinds of uncertainty in intelligence led investigation. (Data from a company specialising in this area will be available.)


    Bacterial evolutionary algorithms

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background There are several optimisation methods inspired by natural processes, It has been shown that evolutionary algorithms are efficient tools for solving non-linear, multi-objective and constrained optimizations. The principle is a search for a population of solutions, where tuning is done using mechanisms similar to biological recombination. The operations the bacterial evolutionary algorithm were inspired by microbial evolution. Bacteria can transfer genes to other bacteria, so the rules are optimised in each bacterium.

    Example projects:
    i. Clustering using BEAs: Clustering is more usually performed by gradient descent algorithms. BEAs should allow more interesting clusters to be formed where there are complex evaluation criteria for cluster validity. Objects clustered could be images (e.g. see 1.i or 4.ii), or text (e.g. see 2.i, 2.ii). Clusters could have predefined target criteria, such as hierarchical structure of a certain depth.

    ii. Making pictures: If bacteria encoded the components of an abstract computer generated picture, an individual could identify nice and not nice images repeatedly to generate some 'art' which is tuned for their esthetic sense.

    iii. Scheduling: A previous student has worked on a GA for University exam scheduling. A comparison with bacterial algorithms would be an interesting project.


    Face Recognition

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background

    A number of projects are possible in this area. I am interested in a number of topics, such as image processing for face recognition, HCI research tool to collect facial images, biologically plausible architectures for face recognition, building and modifying computer models of real faces. No previous work on face recognition is necessary. For the image processing topic, some experience with computer graphics would be useful. If you are interested in face recognition or some similar topic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.

    Example projects:
    i. View morphing: View morphing between two images of an object taken from two different viewpoints produces the illusion of physically moving a virtual camera. We generate compelling 2D transitions between images. Initially we will use the same face with small rotations and produce intermediate images to produce a realistic QuickTimeVR head.

    ii. Video annotation: Techniques such as PCA can recognise the same face so long as the rotation/expression/illumination is not too different. This can be used to recognise one person throughout a segment of video.

    iii. 3-D Face model: An average face 3D model can be constructed from a single face image and animated via rotation, a simple face expression model, or illumination. This can then be used to recognise faces in multiple poses, expressions or lighting.


    Data Cleaning Via Logic

    Supervisor: A/Prof Rajeev Gore
    E-Mail: rajeev.gore@anu.edu.au

    Background

    The Australian Bureau of Statistics stores its census information as a collection of tables with one row (record) for every individual with many columns like age, sex, marital status, current occupation etc. Such tables invariably contain errors, either introduced during data collection, or during data entry. For example, the census data may contain a record for a person named X who is aged 6, is married and in high school.

    Such data usually has an associated collection of extra conditions which capture important inter-relationships between the columns of the table. For example, the census table may have constraints like "the legal age of marriage is 16" and "a high-school student is aged between 11 and 18". According to these constraints, the entry for X is incorrect since the entry breaks both conditions, so the obvious question is to find a correction which is as close to the original as possible.

    The problem of finding an error in a data table is known as error-localisation, and the problem of changing an incorrect record so that it passes all the constraints is known as error-correction. The standard method for solving both of these problems is called the Fellegi-Holt method, but it is highly inefficient. Recently, Agnes Boskovitz, a phd student with the RSISE, has shown a deep connection between the Fellgi-Holt method and a well-known method from propositional logic called resolution.

    Efficient automated tools for performing resolution and associated logical deductions are readily available. The project is to implement the Fellegi-Holt method using these logical tools to gauge the efficacy of this logical approach to data cleaning. If successful, the prototype has huge potential in real world applications and will almost certainly lead to a publication in an international conference or journal.

    Feel free to email me if you are at all interested in this project.


    Interval Arithmetic for Computational Science

    Supervisor: A/Prof Alistair Rendell
    E-Mail: Alistair.Rendell@anu.edu.au

    An interval is a continuum of real numbers bounded by its end-points. Interval arithmetic performs operations on intervals rather than using discrete floating point representations. As a consequence of this errors associated with numerical rounding are automatically propagated and accounted for. Moreover the mathematical properties of intervals can be exploited to develop entirely new algorithms.

    The concept of intervals is actually not new, but was considered by Moore in 1965. What is relatively new is the fact that some hardware vendors are now providing direct support for interval arithmetic within their compilers/hardware.

    In this project we wish to develop an interval code for evaluating the type of integrals that are used in quantum calculations on atoms and molecules. Evaluations of these integrals involves truncation of an infinite series, followed by the application of a variety of recursion relations. As a consequence the value of the final integrals reflect both a series truncation error, and a rounding error which can vary according to the recursive path taken. This project will study the balance between these two sources of errors and the implication of this in large scale quantum calculations.

    This project will build on work undertaken in 2005 by honours student Josh Milthorpe who studied the basic performance of interval operations and use of intervals in fast multipole calculations. You will also be part of a team working on interval related methods funded from an Australian Research Council Discovery grant.


    Automatic Differentiation

    Supervisor: A/Prof Alistair Rendell
    E-Mail: Alistair.Rendell@anu.edu.au

    Every function, no matter how complicated, is executed on a computer as a series of elementary operations such as additions, multiplications, divisions and elementary functions like exp(). By repeated application of the chain rule to those elementary operations it is possible to obtain the derivative of any computed function in a completely mechanical fashion. This technique is applicable to computer programs of arbitrary length containing loops, branches and procedures. In contrast to explicit evaluation and coding of a derivative, automatic differentiation enable derivatives to be easily updated when the original code is changed.

    A number of programs are now available to perform automatic differentiation for code written in a variety of languages The aim of this project would be to use one of these tools to perform derivative evaluations for some computational science applications. Specifically in the modeling of systems at the atomic level derivatives with respect to atomic positions are hugely important determining things like the frequencies at which molecules vibrate. Myself and others have spent large quantities of time and effort deriving expressions for derivatives and then coding them. To automate this process and for it to be relatively efficient would be very significant. Your objective would be to explore this issue for some relatively simple computational molecular science codes.


    Numerical Simulation on Graphics Processing Units (GPU)

    Supervisor: A/Prof Alistair Rendell
    E-Mail: Alistair.Rendell@anu.edu.au

    Background

    Over the past 10 years graphics processing units (GPU) have become ubiquitous in desktop computers. Modern GPUs now allow substantial user programability that lets them be used for more general-pupose computation. Indeed in a recent paper that was presented at SC04, Fan et al detail their efforts to construct a "GPU Cluster for High Performance Computing". In this work the authors plugged in 32 GPUs to a Xeon cluster system and in so doing increased the total theoretical peak performance of the machine by 512Gflops - for a cost just under US$13,000!!

    Tracking this development there has been a number of groups developing programming environments that can exploit the GPU for general computations. Details of many of these are available at the GPGPU web site. Two of particular interest to this project are Brook and Sh. The former is built on top of C while the latter is built on top of C++. Brook for example extends standard ANSI C with the concept of a "stream", which is a new data type representing data that can be operated on in parallel.

    The aim of this project is to investigate the potential for using the GPU in molecular science computations. The initial target application is likely to be N-body simulations, as typified by the simulation of atom motions. Careful consideration will be given to accuracy, since currently the GPU is limited to single precision computations. Thus the project will address two issues, i) the in principle feasibility of using the GPU for these types of problems, and ii) the scientific usefulness given the current single precision limitation.


    MPI and Global File System I/O over InfiniBand using Virtual Lanes

    Supervisors: Richard Alexander, A/Prof Alistair Rendell, Peter Strazdins
    E-Mail: Richard@AlexanderTechnology.com, Alistair.Rendell@anu.edu.au, Peter.Strazdins@cs.anu.edu.au

    InfiniBand networks provide for multiple, parallel system area networks using the same physical hardware, using a concept called Virtual Lanes. Virtual Lanes allow the network to be split for a variety of purposes including the separation of HPC cluster management, message passing and I/O traffic. There are a number of parallel filesystems used in heterogeneous Linux clusters, with the most widely used being PVFS2, GPFS, Lustre in addition to the legacy NFS system. One possibility with large SMP nodes in a cluster is to use InfiniBand virtual lanes and operating system virtual machines in a way that allows nodes of a cluster to be used as both compute nodes and filesystem storage nodes. This project will implement the Lustre filesystem on a small test cluster with large SMP nodes and the InfiniBand interconnect, using virtual machines in a novel way to segregate compute functions from I/O.

    References:

    • J Liu, A Vishnu, DK Panda, Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation. Proceedings Supercomputing 2004.
    • J Cope, M Oberg, H Tufo and M Woitaszek, Linux Multi-Cluster Environments.

    Declarative theorem-proving

    Supervisor: Dr Michael Norrish
    Email: michael.norrish@nicta.com.au

    This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities.

    When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find.

    The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well.

    Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up.

    The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data.

     

    Finding Models of Unsolvable First Order Goals

    Supervisors: Dr Michael Norrish
    Email: michael.norrish@nicta.com.au

    When using an interactive proof assistant, a powerful tactic for finishing goals is the use of first order proof procedures. Unfortunately, these procedures are all-or-nothing affairs: if they succeed, they succeed completely, but if they fail to prove the goal, the user knows only that the provided lemmas and assumptions were not sufficient to allow the procedure to find a proof. The advent of model generation tools promises a way out of this painful situation. These tools can be used to generate what is effectively a counter-example to false goals. This is much more helpful for the user. Either the tool's output really is a counter-example and their proof will never succeed, or the counter-example will be impossible because the user has forgotten to tell the tools some relevant lemma from a database of known facts. The student's task in this project would be to implement a connection between two existing tools: the interactive system HOL4, and the model generation tool Darwin. This connection would provide a user-friendly implementation of the scheme described above: when an invocation of the first order prover in HOL failed, the connection implementation would translate the HOL goal into terms that Darwin could understand, call Darwin, and then finally translate Darwin's provided counter-example back into a human-readable form for the user's benefit. This project would suit a student familiar with functional programming (HOL is written in SML), and happy with the concepts of first order logic (terms, formulas, quantifiers). The student would be jointly supervised by Michael Norrish and Peter Baumgartner.


    Projects from NICTA researchers

    Project I

    Project Title: Knowledge Discovery from Internet Malicious Data Collected by Honeypots

    Supervisor: Dr Warren Jin

    Keywords: Data Mining, Internet security, intrusion

    The Internet security has been becoming more and more important as Cyber attacks have significantly increased in volume, coordination and sophistication due to the spread of worms and botnets. A botnet is a large collection of compromised machines under a common Command-and-Control infrastructure, typically used for nefarious purposes. A honeypot is a computer trap specifically designed to attract and detect malicious attacks, in particular botnet attacks. For example, the Leurre.com and the SGNet honeypots are distributed over 30 countries and they have collected hundreds of gigabytes of Internet malicious traffics. Warren is the NICTA’s representative for these honeypot projects.

     

    This Honours project, as a part of DSTO-NICTA joint honeypot project, is to improve or develop techniques for discovering actionable knowledge from Internet malicious data collected by the Leurre.com and the SGNet honeypots. The project may concentrate on following topics, each of which could be developed into a quality Honours thesis.

    • detecting emerging viruses or botnets before they become prevalent,
    • discovering hot spots of Internet security,
    • discovering attack trends, especially botnets,
    • modelling all the Internet attacks based on attack samples collected,
    • generating attack signatures,

    The new technique would be very useful for proactive security, such as Internet intrusion prevention.

     

    Participants will have the opportunity to get hands-on experience with some of the latest technologies. The participants will access large real-world databases collected by honeypots. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the honeypot project.

     

    Student participant should either have excellent programming skills in Java, C++, or Python, or have strong mathematical background. Experience of networking or Internet security or database is useful. Knowledge of data mining, machine learning or statistics is an advantage, but not necessary.

     

    Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information.



    Project II

    Project Title: Large Complicated Data Stream Mining

    Supervisor: Dr Warren Jin

    Keywords: Data stream mining, exploratory analysis

    Owing to the rapid progress on data collection and advanced database system technologies and World Wide Web, various complicated data have been explosively growing. There are lot of examples of data streams. For example, a satellite-mounted remote sensor is generating data constantly. World-wide distributed honeypots are collecting malicious Internet traffic data every second in order to monitor the spread of worms and botnets. These data streams are massive (e.g., the Leurre.com and the SGNet honeypots are collecting hundreds of megabytes data over 30 countries every day), temporally ordered, rapidly evolving, various types of attributes (e.g., continuous, categorical, string) and potentially infinite. Mining complicated data streams is one of very important real-world data mining tasks.

     

    This Honours project is to develop or improve a data mining technique on complicated data streams. The data mining technique could be, but not limited to, data visualisation, temporal association analysis, clustering, time series analysis, outlier detection and classification. The technique can be used to find interesting events or hot spots within data streams. The project will consider mixed data attributes such as continuous (e.g., packet length), categorical (e.g., TCP or not) and sequential (e.g., payload strings) attributes. The project may also take into account of scalability for large amount of streaming data.

     

    Participants will have the opportunity to get hands-on experience on up-to-dated stream mining techniques. The participants may be engaged in using these techniques to handle real-world challenges. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the use-inspired research.

     

    Interested students should have good understanding of data mining, visualisation or time series analysis. Programming skills in C++, MatLab or R are useful. Experience of databases or data analysis or distributed computing is a plus.

     

    Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information.



    Project III

    Project Title: Population Projection for New Canberra Suburbs

    Supervisors: Dr Warren Jin and Dr Yongxiang Xia

    Keywords: Data mining, time series analysis, prediction

    Population projection is to compute, given certain assumptions about future trends, future changes in population sizes for specific areas, say, states or cities. Population projection is valuable. For example, population sizes are a prerequisite for ensuring an adequate and appropriate supply of land for sustainable city development, as the development is supported by the timely delivery of commercial land, dwelling land, infrastructure and community facilities. Collaborating with the ACT Planning and Land Authority (ACTPLA), NICTA researchers have developed a stratified hidden Markov model to predict population sizes for existing suburbs. It performs quite well in terms of data independency, output flexibility and projection accuracy.?

    This Honours project aims to extend these population projection techniques for new suburbs. Research challenges include small area, lack of training data and scientific projection evaluation. Besides population sizes, the techniques may take into account of other economic, social, environmental and political factors. The participant will learn to solve real-world problems using data mining techniques, such as temporal classification or clustering. He/she will also learn to handle uncertainty for decision making. He/she will experience how a novel technique is developed within an industrial research organisation like NICTA.?

    The project requires good modelling 
    or programming skills. Some familiarity with elementary statistics, 
    time series or data mining is useful. Supervision will be provided from 
    Dr. Warren Jin and Dr Yongxiang Xia. Successful applicant might receive stipend for living expenses up to $1000 depending 
    on the project progress.
    
    

     

    Contact: Warren Jin (Huidong.Jin@nicta.com.au) or Yongxiang Xia (Yongxiang.Xia@nicta.com.au)


     

  • Updated:  18 February 2011 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.