![]() |
ANU College of Engineering and Computer Science
Department of Computer Science
|
|
|
Honours topicsHalf of your time as an honours student is spent working on a project. But first you have to find a project topic. You don't have to be constrained by what you see on the list below. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Proposed Project TopicsPlease note that some of the projects below are very specific in nature. However, you may be able to discus minor changes to the proposal with the supervisor. Other proposals are, in fact, too broad in scope to describe an individual honours project. You should take these descriptions as an indication of the areas of interest of the supervisor. If you are interested in the area then you can discuss potential specific projects with the supervisor. If the projects proposed for this year don't inspire you , check out the projects proposed in previous years as well:
In no particular order, here are the topics proposed so far for 2008:
Web Service Management System (WSMS)Service Oriented Architecture (SOA) is an emerging computing paradigm, in which infrastructure, platforms and softwares are transformed into services that can be consumed by anyone, anywhere, anytime. The trend of SOA has become noticeable in recent years. An increasing number of infrastructure and software services have been available on the Web, such as Amazon EC2 and Google Doc. The WSMS project of CSIRO aim at building an integrated framework for managing Web services. In this framework, Web services are first class objects that can be queried, composed, trusted, and changed. This project will explore research issues pertaining to the whole Web service lifecycle. We expect the outcome to become driving technology for the future service paradigm.
In WSMS, we offer the following opportunities as Honours projects:
Web Services Query LanguageWeb services have been the main force in driving service-oriented computing that promises not only a new paradigm for B2B collaboration, but also an opportunity for the creation of value-added services and new businesses. We foresee that web services will be as popular as today's web pages in future. However, how to manage and apply the large number of web services for end-users in their daily life remains a big challenge. CSIRO ICT Centre is conducting a research project (WSMS: Web Services Management System) to address the above challenge. One of research efforts in this project is to allow end-users/applications to express their requirements in a query language so that WSMS can resolve the query with the underlying web services. Goals for this project: In this project, we will design and implement a query language for automatic web services discovery and execution. Requirements: Programming skills in Java or C/C++ is essential; knowledge and experiences with web services and compiler are preferred. Student's gain: The student will learn how to design and implement a computer language from scratch. He/she will also gain experiences and skills in web services, ontology and XML. The outcomes of this project may be published in an international conference/journal, which would increase the student's research credibility. Supervisor: Dr. Shiping Chen (shiping DOT chen AT csiro DOT au) and Dr. Xuan Zhou (xuan DOT zhou AT csiro DOT au)
Evolutionary Algorithms for Optimal Composition of Web servicesService composition is an important part of the life cycle of services innovation research. It is evidently seen by the fact that Web service composition has attracted an increasing attention recently. When the quality of Web services (QoWS) is concerned, the objective of Web service composition becomes to find an optimal solution of composition. It is an NP-hard problem. Different strategies have been proposed so far to make service composition easier, for example, Genetic Algorithms (GAs) and Multi-objective Genetic Algorithms, Integer Programming (IP) and Mixed Integer Programming (MIP), Constraint Programming, Particle swarm optimisation, Ant Colony optimisation, and so on. This study aims to investigate different evolutionary algorithms and related techniques for optimal composition of Web services. Goals for this project: this project will implement a number of different evolutionary algorithms and a comparison of these algorithms will be examined thoroughly (based on the empirical dataset). Requirements: Students need to be proficient with Java programming. Student's gain: The student will gain a deep insight into the evolutionary algorithms and their applications in the field of Web services. The work will be carried out in a world-leading research group (CSIRO) and will be conducted within the leading academic research infrastructure. The results of this project will include a proof-of-concept prototype and an international journal publication. Supervisor: Dr. Li Li (lily.li@csiro.au) and Dr. Dongxi Liu (dongxi.liu@csiro.au)
Context Sensitive Web Service CompositionContext is recognised as important and has been widely studied in the field of Mobile and Pervasive Computing. It needs to be explicitly described and managed in Web service composition so that different participating Web services can be connected correctly. As the context of a Web service changes as the state of a predefined environment evolves, hard-coding context into the application is insufficient. We argue that the following supports are necessary to develop a full workable service composition (engine): (1) Service composition infrastructure/middleware (to achieve context-awareness); (2) Context sensitive composition model and algorithms. Goals for this project: this project will first develop a model and some algorithms. The proposed algorithms will be evaluated a real Web service environment (currently, this environment is running well within CSIRO Web service Theme). Requirements: Students need to be proficient with Java programming. General knowledge of Service-oriented Computing is required. Student's gain: The student will gain a deep insight into the evolutionary algorithms and their applications in the field of Web services. The work will be carried out in a world-leading research group (CSIRO) and will be conducted within the leading academic research infrastructure. The results of this project will include a working prototype and publications to be submitted to the international conferences or journals. Supervisor: Dr. Dongxi Liu (dongxi.liu@csiro.au) and Dr. Li Li (lily.li@csiro.au)
Reputation-based Trust Management in Composite Web ServicesWith the introduction of Web Services and Semantic Web, the World Wide Web has evolved into a collection of hundreds and thousands of services deployed with an aim of providing a wide variety of services to general public. Examples of such services range from general information services like Wikipedia to specific business services like Amazon. As services proliferate on the Web, several services may provide same functionalities to a service consumer. In these situations, trust is one of the essential criteria a consumer can use in selecting the right set of services. Therefore, there is a need for a robust trust management system that utilizes the perception-based knowledge of the communities in the Web. In order to develop such a system, we are investigating a mechanism of providing a robust trust model for web services. Our model is based on the reputation of services as rated by service consumers in the Web communities. This project aims to investigate a mechanism of managing reputation-based trust in composite services, i.e. services composed of a set of services. Goals for this project: In this project, we will design, implement and deploy a distributed reputation-based trust management model for composite services. Requirements: The student should have good knowledge and experience with Java programming. Knowledge of Web Services is desirable but not mandatory. Student's gain: The project provides an opportunity for the student to contribute to an emerging area in computer science: Web Services and Service Computing. The work will be carried out in a research group of CSIRO and the student will have an opportunity to gain insight into R&D in both industry and academia. The project outcome will include a working prototype that can be demonstrated in an international conference and possibly a publication in an academic journal. Supervisor: Dr. Wanita Sherchan (wanita DOT sherchan AT csiro DOT au) and Dr. Surya Nepal (surya DOT nepal AT csiro DOT au)
Change Management for Composite Web ServicesThis project aims at developing a bottom-up change management architecture for composite service and design a change management model which can capture the triggering changes of the outsourced Web services and propagate the change to composite service and generate reactive changes at the service schema level. The work will include a taxonomy or specification of bottom-up changes in an SOE (Service Oriented Enterprise) context which include triggering changes and reactive changes. For each type of change, a list of functional and non-functional changes will be defined. An appropriate model will be proposed to model these changes. Some mappings rules can be defined to map between different levels of changes. Automatic change management algorithms will be proposed to capture and manage the bottom-up changes. Finally, some evaluation will be carried on to assess the bottom-up change management system. Goals for this project: The goal is to design a bottom-up change management architecture for composite services. Requirements: Students will need to be proficient systems programmers with good C++ or Java skills. Student's gain: The student will work on an interesting research problem with potential impact on Web services. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal. Supervisor: Dr. Jemma Wu (jemma DOT wu AT csiro DOT au)
Web Service OptimizationWhen selecting services on the Web, a user is usually faced with too many choices. For instance, both Yahoo and Google provide map services, characterized by different features. The user needs to pick one that can best satisfy her needs. This project aims at developing a system for helping user make the best selection among thousands of Web services. We will apply the technologies in decision making and artificial intelligence to support our solution. Goals for this project: The goal is to design a Web based service selection system. Requirements: Students need to be proficient will C++ or Java programming. Student's gain: The student will work on an interesting research problem with potential impact on Web services. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal. Supervisor: Dr. Xuan Zhou(jxuan DOT zhou AT csiro DOT au) Efficient network monitoring and routing via FPGAsResearch area: software engineering, networking, formal languages, HW/SW design Difficulty: high Supervisor: Bob Edwards (DCS), Dr. Andreas Bauer (RSISE & NICTA) Background: The network code of modern operating systems is very involved as it allows users to configure the entire routing either to protect themselves from dangers in the Internet, or to set up their own networks and to control which traffic goes on which machine or to which subnet. In consequence, high-traffic networks can dramatically increase the system load in machines that are responsible to steer the traffic or block unwanted requests, up to the point where no routing is possible any longer (e.g., denial of service attacks). This project aims at providing an efficient routing mechanism, directly implementable on FPGAs, i.e., on hardware, to free the operating system, and hence the main CPU, from this time consuming task as much as possible. The routing policy is to be specified in terms of some formalism (e.g., regular expressions, finite automata), which should then be mapped onto an FPGA for execution. The main goals of this project are
The candidate to work on this exciting project is, ideally, familiar with a Unix-type operating system and the netfilter framework for specifying routing policies. This project will also require basic knowledge of formal languages and automata theory such as gained by having visited COMP3360/6363 (cf. also [2]). References:
Algorithms for Efficient Manipulation of Boolean formulasSupervisor: Dr. Jussi Rintanen, DPO Group (ANU) & Managing Complexity (NICTA)
After the success of Binary Decision Diagrams (BDD) in computer-aided verification and many other areas, there has been a lot of interest in identifying other normal forms of propositional formulas (Boolean functions) for which efficient (polynomial time) algorithms exist for the basic operations required in these applications. Research in the last 8 years have led to the identification of Decomposable Negation Normal Form DNNF which is more compact but allows a slightly more narrow set of efficient (polynomial-time) operations.
In the standard framework of Boolean function representations based on the connectives for Disjunction, Conjunction and Negation, the least restrictive form is that of general NNF (Negation Normal Form). This normal form always leads to at least as succinct representations as e.g. BDD or DNNF.
A practically interesting problem is the identification of more efficient but not necessarily polynomial-time algorithms for the manipulation of general NNF formulas. In comparison to BDD or DNNF, NNF are sometimes exponentially more succint. The seemingly efficient operations on BDD or DNNF could be simply a result of these normal forms being less succinct: a non-polynomial operation on NNF could conceivably be in absolute terms more efficient than a polynomial operation on the exponentially bigger BDD or NNF.
This project is an investigation of operations on NNF that are in absolute terms equally or more efficient than the operations on the equivalent BDD or DNNF. We know that such operations exist, because one way of implementing these operations is to first translate the functions to BDD or DNNF.
The goal of the work is the development of practically useful alternatives to BDD and DNNF representations that are based on the more succinct NNF.
Contact:
Dr. Jussi Rintanen
Research area: software engineering, formal languages,
algorithms Difficulty: medium Supervisor: Dr. Andreas Bauer (RSISE & NICTA) Related courses: COMP3360/6363 Background:
Regular expressions in Unix and many other systems allow the matching of
strings against user-defined expressions. In [1], trace patterns were
introduced, which are strings where some information is missing (i.e.,
strings which contain variables as place-holders), and it is shown that
the problem of how to match such incomplete strings against regular
expressions (given, e.g., in terms of a finite automaton) is decidable.
The GNU C-library [3] provides its own implementation of expression
matching, which is nowadays a standard in Linux and many Unix-like
systems.
The main goals of this project are to
The final implementations should also be tested and performance of the
implementation evaluated empirically.
This project will require familiarity with the C programming
language, and basic knowledge of formal languages and automata theory
such as having visited COMP3360/6363 (cf. also [2]).
References: Research area: software engineering, formal languages Difficulty: medium-high Supervisor: Dr. Andreas Bauer (RSISE & NICTA) Related courses: COMP3360/6363 Background:
Monitoring systems is useful during design time as a means of testing an
actual implementation before it ships out to the customers. However, it
can also be useful in the final product, at runtime, to make sure that
some system does not enter an invalid state leading to the failure of
the system when it runs.
In [1] an approach to monitoring based on temporal logic was devised and
a so called monitor generator implemented in terms of [2], which takes
as input a user-defined temporal logic specification, and outputs a
finite state machine, which processes a system's actions and decides
whether or not these satisfy or violate the specification (or neither).
The goal of this project is to
This project is likely to involve some form of low-level
programming, wrapping function calls, and other sorts of fun stuff, and
will therefore require deep familiarity with a programming language (I
am not picky about which one, really), and basic knowledge of formal
languages and automata theory such as having visited COMP3360/6363
(cf. also [3]). Ideally, there's also a background on temporal logic
and/or model checking.
References: This topic is proposed as part of the French-Australian Science and
Technology cooperation project "Planning Approaches to Sofware
Verification", whose partners are INRIA, ANU, and UWA.
Games are a computational model used in computer science, economics,
biology, and social sciences to name a few. The motivation for the
present project arises from the area of multi-agent planning in
artificial intelligence, but the results are applicable to other
areas.
Given a description of the possible states of the game, of the
states that are considered as winning positions, and of the actions
that agents can take in each state, the basic problem is to find a
winning strategy for a given agent coalition (a set of agents). Such
a strategy specifies the actions that the coalition agents must take
in each state, and guarantees that the coalition will end up in a
winning state, regardless of the actions taken by agents outside the
coalition. In the project, we focus on a slightly different problem,
where the coalition is not given. Instead, we want to find the
smallest size coalition which has a winning strategy. This is a
computationally hard problem, so we will be building efficient
algorithms which attempt to perform well on average. Such algorithms
will use symbolic representations of set of states based on binary
decision diagrams, and will automatically derive heuristics to guide
the search from the description of the input game.
The student will start from a basic algorithm idea, which (s)he will
flesh out, implement and extend to more complex cases such as: 1)
computing a coalition of minimal cost, given a specification of the
cost of collaboration, 2) partially observable games (agent decisions
are based on partial information about the current state of the game),
3) more complex objectives than just reaching a winning state.
This topic is proposed as part of a collaboration with
CNRS in the
framework of
the ROSACE
project funded by the French
Scientific and Technological Research Network for Aeronautics and
Space. ROSACE studies the design, specification, implementation
and deployment of a fleet of mobile autonomous cooperating robots with
well-established properties particularly in terms of safety,
self-healability, ability to achieve a set of missions and
self-adaptation in a dynamic environment.
In the ROSACE project, a high-level mission is expanded into a set
of goals which are allocated to the robots. The robots must act
cooperatively to achieve these goals, and this might require
explicitly delegating subgoals to other robots. Cooperatively
synthesizing plans of actions is still a largely open problem. The
student will design and implement a cooperative and distributed
version of a standard planning algorithm. The algorithm will have to
guarantee a number of desirable properties (termination, correctness,
etc). This will require becoming familiar with automated planning,
constraint satisfaction and distributed search methods.
This project is best suited to a student with interest and
background in artificial intelligence, distributed algorithms, and
with good programming skills.
Background:
Controlled magnetic confinement fusion offers the
possibility of an inexhaustible supply of energy with zero greenhouse
gas emissions. At the very high temperatures required to sustain a
fusion reaction (millions of degrees), the fuel mixture is in a
plasma state.
Magnetohydrodynamics (MHD) is the study of
magneticised plasmas, with applications ranging from solar flares
through to laboratory confinement. The study of such plasmas using
MHD in complicated toroidal magnetic confinement geometry
necessitates the use of numerically intensive PDE and eigenvalue
solvers. Different stability codes often include different physical
effects, and utilise a wide range of input parameterisations, and/or
coordinate systems. The benchmarking of results from different codes
is important in validating code predictions, especially in the design
of concept experiments. Such physics studies frequently involve
determining the equilibrium and stability across parameter sets and
configuration profiles. A
Figure
1: Cut-away of a concept Spherical Tokamak Fusion Power Plant (3GW
thermal, 1GW electric) Requirements: Students will need to have an
interest or be familiar in working in a range of numerical simulation
languages, Fortran 90, C; graphics visualisation languages such as
Matlab, and an interest in physics and sustainable energy. The
project suits a student at the interface of computational science,
engineering, and physics.
Student's gain: Experience in the science
of large scale numerical analysis, code parallelisation, and
addressing the coupling issues associated with different
configuration metrics. The skill base is portable to large scale
numerical analysis in any field. Wider outcomes include an exposure
to sustainable energy research.
The ANU Department of Computer
Science has been collecting student course enrolments, marks and
grades through its FAculty Information System (FAIS) for
several years.
The aim of this honours project is to develop data mining techniques
that allow analysis of the FAIS database with the objective to better
understand student performance, progress and retention. The questions
that the department is interested in include, for example: why
students stop studying computer science?, what correlations there are
between their marks in different courses?, can we predict that a
student will have problems in a certain course based on her or his
past performance?, can we identify students that might be at risk of
failing or dropping out of computer sciences courses early in their
studies?
This project will include the following components:
Supervisor: Peter
Christen
Background
Many organisations today collect massive amounts of data in their
daily businesses. Examples include credit card and insurance
companies, the health sector (e.g. Medicare), taxation, statistics,
police/intelligence and telecommunications. Data mining techniques are
used to analyse such large data sets, in order to find patterns and
rules, or to detect outliers.
In many data mining projects, information from multiple data sources
needs to be integrated, combined or linked in order to allow more
detailed analysis. The aim of such data linkages is to merge
all records relating to the same entity, such as a customer or
patient. Most of the time the linkage process is challenged by the
lack of a common unique identifier, and thus becomes non-trivial.
Probabilistic data linkage techniques have to be applied using
personal identifiers like names, addresses, dates of birth, etc.
The ANU Data
Mining Group is currently working in collaboration with the
NSW Department of Health, Centre for Epidemiology and
Research on the development of
improved
techniques and algorithms for large scale high-performance data
linkage. We have been developing an open source data linkage
system called
Febrl
(Freely extensible biomedical record linkage), which includes
modules for data cleaning and standardisation (of names, addresses,
dates of birth, etc.), deduplication, data linkage (or matching), and
geocoding.
For more information please have a look at our
publications as well as
presentations that describe our research in more details.
The following student projects address important issues within our
data linkage project.
Supervisor for all projects:
Peter Christen
One of the difficulties in data linkage research is getting access to
real world data sets containing real names and addresses, as such data
is protected by privacy and confidentiality regulations. Developing,
and testing new and improved data linkage algorithms therefore becomes
very difficult.
We have developed a data set generator (which is part of
Febrl). However, this generator is very simple, and various
ways of improving it are possible. For more details please have a look
at the recent publication
Probabilistic Data Generation for Deduplication and Data Linkage.
Student research project
The aim of this honours research project is to further explore how
our simple data set generator can be improved. This would include
analysing the literature in this area (artificially creating data
sets), analysing the existing data generator code, and developing
improved techniques that reflect reality more accurately. Issues
involved are:
A student working on this project would be involved through
One of the problems when linking large collections of data with tens
or even hundreds of millions of records is the complexity of data
linkage techniques. Potentially each record in one data set has to be
compared with every record in the second data set, resulting in a
quadratic complexity O(n2). Techniques known as
blocking are normally used to reduce the number of record
pair comparisons (for example, only records that have the same
postcode value are compared with each other). However, linking
large data sets still takes a lot of time.
Only a very limited amount of research has been done in this area (we
are only aware of four publications so far the deal with parallel data
linkage, including one of our own papers:
Febrl
- A Parallel Open Source Data Linkage System, PAKDD'04).
Student research project
The aim of this project is to develop parallel data linkage techniques
that allow the linking of large data sets on modern multi-core
computing platforms. Issues which have to be considered include:
A crucial step in the data linkage process is the classification of
pairs of records that are being compared into either matches or
non-matches. For example, consider the following two records
containing a name and address each:
Are these two records representing the same person, or two different
people?
In the past few years researchers from the data mining, machine
learning, and AI communities have explored various classification
techniques, including decision trees, genetic algorithms, clustering,
active learning, support vector machines, and instance-based
classifier to improve the classification of record pairs. Other
researchers have applied techniques from information retrieval and
database research to improve the linkage classification quality.
However, this research has been fragmented and so far no large study
comparing the many classifiers has been conducted using different data
sets (both synthetic and from the real world).
Student research project
The aim of this project is to (1) implement prototypes of different
classifiers and then conduct studies comparing their performance using
different data sets, and to (2) develop new - based on existing
methods - classifiers that are well suited for the data linkage
classification problem. Issues involved are:
The vision of the Semantic Web is to equip the World Wide Web with machine-processible and machine-understandable semantics,
so that the Web can becomes a universal medium of information and knowledge exchange for both computer and human being.
Towards this vision, a number of building blocks of the semantic web have been created, including the Resource Description
Framework (RDF), the Web Ontology Language (OWL) and several query languages such as SPARQL. It can be foreseen that in the
near future a lot of information on the Web will be described or annotated using these facilities, resulting in a large scaled
knowledge base. Creating a consumable interface for naive users to access such a large knowledge base is becoming a new challenge,
as traditional interfaces on databases or knowledge bases are programmer-oriented rather than user-oriented.
Goals for this project: In this project, we will design and implement a number of techniques to facilitate users' accesses to large scaled data sources on the semantic web.
Requirements: Students will need to be proficient systems programmers with good C++ or Java skills.
Student's gain: The student will work on an interesting research problem with potential impact on the future World Wide Web.
The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia.
The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic
journal.
Literature and Background: Database usability;
DBpedia -- a dataset for the semantic web.
Supervisor: Dr. Jemma Wu (jemma DOT wu AT csiro DOT au), Dr. Xuan Zhou (xuan DOT zhou AT csiro DOT au)
Using semantics to better understand the WebBackground: The Semantic Web means different things to different people because it is very broad. It may mean:
Project outcomes: A number of algorithms will be developed, and eventually leading to easy access to the information on the Web other than just key words search. Requirements: Students will need to have knowledge/skills about (1) web services technologies; (2) good understanding of RDF, OWL and reasoning on OWL; and (3) C++/Java. Student's gain: By keeping up with the latest developments in the Semantic Web, students will not only gain insights into the vision of the Semantic Web but also have some hand-on experience about how to realise the semantics on the Web. Algorithms developed in this project may have potential to be used by popular search engines to deal with unstructured information on the Web more efficiently. Supervisor: Dr. Lily Li Email: lily.li@csiro.au Scalable Work Queues for Parallel Garbage CollectionPhysical limits mean that future performance improvements in processor design will mostly come from increased application-level parallelism. Thus the ability for systems to scale well on parallel architectures is an extremely important issue. Since managed languages such as Java and C# have become dominant in the application development domain, the scalability of managed runtimes has become a critical research issue. In this project we explore the scalability of a fundamental data structure at the heart of a Java runtime system---a load balancing parallel work queue. Supervisor: A. Prof Steve Blackburn, ANUGoals for this projectThis project will examine the scalability and pathologies of the load balancing parallel work queue used by MMTk, which is the memory management toolkit used by Jikes RVM. Hopefully the project will either confirm the scalability of the existing system or identify (and implement) a more scalable work queue design.
Requirements Students will need to be proficient programmers with a good basic understanding of computer architecture and parallel algorithms. The work will all be done in a slightly restrictive subset of Java, so experience with Java is essential. Experimental design and data analysis is key to the first phase of the project, so experience in this area will be an asset (of course there will be plenty of assistance from your supervisor, particularly in this area).Student's gain The student will gain hands-on experience with an increasingly important problem in the context of the leading research infrastructure for managed runtime research, in the setting of a world leading team.
Literature This is a well-focussed problem and does not require any significant background reading. The load balancing work queue is reasonably well insulated from the rest of the environment we work in, so a deep understanding of MMTk or Jikes RVM is not necessary. www.jikesrvm.org Contact Steve BlackburnArchitectural Analysis of Garbage Collection BehaviorTrends in microprocessor design are toward massively multicore chips. One direction within this burgeoning research area is that of asymmetric multicore design, where the cores within a chip are not homogeneous. In this setting some cores may serve very specific purposes. Given a) the centrality of garbage collection to managed runtimes, and b) the very limited requirements of garbage collection algorithms, it may make sense to have dedicated garbage collection cores. To this end, this project will explore the architectural "footprint" of a garbage collector with a view to ascertaining the base requirements for a dedicated garbage collecting core. Supervisor: A. Prof Steve Blackburn, ANUGoals for this project The project will use an architectural profiling tool (valgrind) to profile a range of garbage collection workloads and thus attempt to characterize the architectural requirments of garbage collectors. Requirements Students will need to be proficient programmers with a good basic understanding of computer architecture. The primary tool for this project will be valgrind, a publicly available virtualization tool written in C. The student working on this project should be an accomplished systems programmer. Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes, at the architectural level. This project is a logical starting place for a much deeper examination of the project and eventually, realization in an FPGA. LiteratureA basic understanding of garbage collection would be helpful, but is not essential. Henessy and Patterson's standard text on computer architecture ("Computer Architecture: A Quantitative Approach", 4th edition) is an invaluable starting point for this work. Contact Steve Blackburn Profiling Java Virtual Machine PerformanceThe performance of managed runtime systems and the applications which execute on them is a hot topic of academic and industrial research. However, dynamic compilation, feedback directed optimization, and concurrency each make performance analysis significantly more complex than in traditional ahead-of-time compiled, static environments such as C programs. This project will explore techniques for profiling the performance of virtual machines and applications running over them. These techniques will include sampling via interrupts (due to timers and/or hardware performance counter triggers) and direct instrumentation of compiled code. Additionally, the project will explore dynamic generation and maintenance of code annotations so that machine code and calling contexts can be tied back to source code in the application or VM source code.
Supervisor:A. Prof Steve Blackburn, ANU Requirements Students will need to be proficient systems programmers with a good C and Java skills. Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes. Literature and Background Familiarity with a simple profiling tool such as gprof would be an asset. Some interesting related work can be found at the following links: http://portal.acm.org/citation.cfm?id=1134012 http://www.cs.utexas.edu/users/mikebond/pcc-tr-2007.pdf Contact Steve Blackburn Bubble VM: the VM within the VM within the VM....Surprisingly, and despite the stated advantages of managed languages (such as Java and C#), no commercial managed runtime systems are written in the language they implement. However, there are two successful research runtimes which take this approach, Jikes RVM (an open source Java-in-Java VM), and Bartok (an ahead-of-time C# system on which the Singularity OS project is built). Both of these projects have demonstrated considerable promise, and both have demonstrated that performance need not be a roadblock to such VM implementations. One of the most vexing issues faced by such VMs is the potential for blurring the distinction between the application and the host on which it runs. In this project we will explore how to better understand the boundary between the application and the supporting VM, using a rough analogy to the isolation that exists between user mode and kernel mode in a modern operating system. A good understanding to this problem could fundamentally change the design of such JVMs, increasing their reliability, improving their debugability, and improving security. Such a VM could run on any other VM and could run any application---thus could be self hosting and in principle could host an arbitrary nesting of instances of itself within itself!Supervisor: A. Prof Steve Blackburn, ANU Goals for this project This project will start by trying to clearly identify the application-VM boundary in Jikes RVM, a widely used research JVM. The project will try to apply the OS metaphor of trapping from one context or mode to another. Depending on the student's ability and ambitions, the project could lead to much more ambitious goals such as enabling Jikes RVM to self host.
Requirements Student's gain The student will gain a deep insight into the design and implementation of managed runtime systems. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes.
Literature and Background Projects offered by Dr Eric. McCreathInstance Based MethodsSupervisors: Dr Peter Baumgartner, Prof John Slaney E-Mail: Peter.Baumgartner AT nicta.com.au Formal logic provides a mathematical foundation for many areas of computer science. Logical languages are used as specification language within, e.g., program development and verification, hardware design and verification, relational databases, and many subfields of Artificial Intelligence. Automated Deduction is concerned with the design and implementation of algorithms based on logical deduction for solving problems in these areas. Instance Based Methods are a relatively new family of methods for Automated Deduction; they share the principle of carrying out proof search by maintaining a set of instances of input formulas and analyzing it for satisfiability until completion. I offer several research-oriented honours projects centered around Instance Based Methods, in particular in connection (experimentation/extension) with our own Darwin system (http://goedel.cs.uiowa.edu/Darwin/). Please see this page for more details. Supercom
Supervisor:
A/Prof Sylvie Thiébaux,
Dr Anbulagan,
Dr Olivier Buffet,
Dr Jinbo Huang,
Dr Yannick Pencolé
Background Composite systems feature simple components organised into a highly reconfigurable and modular architecture, due to which they can appropriately assemble to achieve complex operations. Examples include web and grid services, energy distribution systems, water management systems, chemical plant control systems, telecommunication and computer networks, to name but a few. In the "Supercom" project, we are interested in building Artificial Intelligence tools to supervise such composite systems. These will need to confer the system the ability to self-monitor (diagnose) and self-organise (assemble, reconfigure) to achieve optimal performance. In this framework, various projects are available on topics such as decentralised diagnosis [1], decentralised planning [2] or diagnosis using satisfiability and knowledge compilation methods [3]. The main supervisor will depend on the project, but interactions with other researcher should appear naturally. A strong background in computer science, software engineering, or mathematics, and an interest in artificial intelligence are required, as well as good programming skills. References:
A compiler for program synthesis
Supervisors: Dr Michael Comton, Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore, Dr Alwen Tiu (RSISE, ANU) Background This project is about writing a compiler for automatic synthesis of 'programs' from specifications. Given the specifications for services S1,...,Sn and the specification of goal G the compiler will generate a 'program' satisfying G by using the services S1,...,Sn. This compiler could be targeted to a number of possible settings. For example, the specifications could describe web services and the goal in this case is automatic generation of composite web services from other web services. Alternitively, the specifications could describe components of a security system and the goal some larger securty property thus the compiler automatically generates the larger security framework. The Information Engineering Lab at CSIRO's ICT centre already has such a compiler for web services [1], in which a logic mapping language relates resources, such as data and web-services, to relevant concepts and properties [2], and the composition compiler automatically translates a high-level declarative composition goal into an executable workflow responsible for orchestrating the data and web-service calls necessary to materialize the specified data product. But, we are currently restricted to composing over services that have stateless, functional and deterministic operations. We would like to extend this composition environment to work with web-services that have a stateful interaction style (i.e. the operation/message that the service can perform depends on the current state of the service and the service state machine). A published method for web Service composition [3,4,5] does so, by using a modal logic and tableaux theorem proving. But it concentrates on message flow, not data. We would like to integrate the two. The project would involve investigating how to integrate both data and state into the specification of services and a composition goal. The outcome would be a logic for describing the specification of each and a compiler that can automatically generate a 'program' satisfying the goal by using the provided services. The project would suit a student with an interest in logic and specification. Refrences [1] Cameron, M. A., Taylor, K. L., Baxter, R. Web-Service Composition and Record Linking VLDB Workshop on Information Integration on the Web (IIWeb-2004) Toronto, Canada, August 2004. URL: http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf [2] Cameron, M.A. and Taylor, K. (2005), First order patterns for information integration, Proceedings of the 5th International Conference on Web Engineering (ICWE2005), 25-29 July, Sydney, Australia. URL: http://dx.doi.org/10.1007/11531371_25 [3] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), Synthesis of Underspecified Composite e-Services based on Automated Reasoning in Proc. of the 2nd Int. Conf. on Service Oriented Computing (ICSOC 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html [4] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), ESC : A Tool for Automatic Composition of e-Services based on Logics of Programs in Proc. of the 5th VLDB Workshop on Technologies for E-Services (TES 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html [5] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Rick Hull, and Massimo Mecella (2005), Automatic Composition of Transition-based Semantic Web Services with Messaging in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB 2005). URL: http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html Tableaux reasoning for web service composition
Supervisors: Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore (RSISE, ANU) Background The Information Engineering Lab has a logic-based service composition framework that is being applied across a number of domains, including Health Data Integration (Cameron et al) http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf and environmental management http://www.iemss.org/iemss2002/proceedings/pdf/volume%20uno/304_cameron. pdf . A published method for web Service composition (Berardi et al 2004) http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html (Berardi et al 2005) http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html relies on a translation of finite-state machine representations of a composition goal and stateful Web services into a modal/description logic, PDL. Tableaux theorem proving can then be applied to find a web service composition that statisfies the goal. This project would aim to replicate and implement that basic approach, but considering alternative logics to improve the scope of the compositions that may be achieved. Eye-gaze tracking as an interface for virtual environments
Supervisors: Chris Gunn (CSIRO) Henry Gardner (DCS ANU) Background This project will seek to characterise the effectiveness of eye-gaze tracking as a human-computer interface for virtual environments. The concept is that participants should be able to interact with projected images using a combination of keyboard controls and eye gaze. By focussing their visual attention on icons or images on the screen, users should be able to select these icons or activate other interaction. Two application areas of interest are the Mind Attention Interface for the Wedge virtual reality theatre and a something-or-other system in which participants stand in front of a half-spherical shell screen. In the Mind Attention Interface, eye-gaze tracking equipment will be integrated with electroencephalogram (EEG) equipment which measures electric signals from the brain of a participant. Correlating eye-gaze measurements of attention with EEG data would be a possible extension of this project. Projects offered by Peter StrazdinsBasic Linear Algebra kernels for the T2 Multicore ProcessorThe Niagara or CoolThreads (T) series of processors has been motivated by commercial applications such as web, database and eCommerce servers. Thus, they have been able to achieve high throughput on workloads with many execution threads, which are integer, memory and I/O intensive. However, it has come equipped with a floating point unit on each core. The question arises: how effective is it for (parallel) numerical applications, and how can it be best programmed for these?
This project will investigate how to program the matrix-matrix multiply
kernel from the BLAS library, dgemm(), for the UltraSPARC T2,
optimizing it for both single core and multicore/thread performance. It
will also evaluate its performance. Since dgemm() forms the
computational heart of almost all dense linear algebra computations, it
is suprising that there seems to be very little work done on this topic
so far.
A variant/extension to this project is described below. Tuning the Message Passing Interface library for large-scale Shared Memory / Multicore ProcessorsThe Message Passing Interface (MPI) is the standard programming paradigm for distributed memory parallel computing, and many large and important applications have already been written in terms of MPI, and have run successfully on recent cluster computers. With the recent advent of large-scale chip multi-processors, the issue emerges of how can these applications can reap the potentially massive performance of the new generation of clusters that will be comprised of a number of processor chips connected by shared memory, where each chip is a chip multi-processor. The simplest method for such applications to run a separate MPI process on each CPU in the system, but the efficiency of this depends on whether the MPI implementation is sufficiently intelligent to make use of the fact that there will be large sub-groups of processes that are running on the same node and hence can communicate by much faster communication mechanisms.This project will investigate the efficiency of the highly sophisticated OpenMPI implementation of MPI on large-scale shared memory nodes, such as the T2 (8 core by 8 strand SPARC) , Freemont (an 8 x 2 CPU Opteron) and IBM OpenPower 720 (2 x 2 CPU x 2-way multithreading) systems hosted in DCS. It will begin by an investigation of the relevant algorithms used in OpenMPI, concentrating on barriers and collective operations, comparing them (where applicable) with corresponding operations using threaded programming paradigms. Automatic Tuning of the High Performance Linpack Benchmark(co-supervised by Alistair Rendell and Warren Armstrong)Supercomputers today are large-scale parallel processors, and the Top 500 list is the most definitive ranking of the world's fastest. While the status of this list is contentious, the list is based on the results of a single benchmark - the Linpack benchmark - which in this case solves a (very) large dense linear system. The list is dominated by Beowulf-style cluster computers: a parallel computer using (typically Commercial-off-the-Shelf) switch-based network to communicate between the processors. There is a publically available open-source code for a highly optimized benchmark called High Performance Linpack (HPL). It is also highly tunable, but the tuning currently must be done by hand. This is problematic, in the sense that it requires both knowledge if the complex algorithms used in HPL (beyond the understanding of most people who wish to use the benchmarks) and of the (cluster) supercomputer itself. Furthermore, in the case of clusters, this problem is exacerbated by the fact that the cluster may be non-homogeneous (so the benchmark results may depend on which parts of the cluster are selected), and by the fact that by their nature clusters may be easily reconfigured. Thus, it is highly desirable that this tuning process can be automated. The LAPACK For Clusters (LFC) project went partway in addressing this, in that some work on determining, given p processors, to the best logical P x Q logical processor configuration, where P*Q <= p was made. However, automatically choosing the critical blocking parameter and a plethora of algorithm variants remains to be done. This project will investigate automatic tuning methods for the HPL package for cluster computers. The search can be done statically, and employ efficient empirical methods based on techniques such as the Nelder-Mead simplex search, as is done for the ATLAS package (which itself can be used as a component of HPL). The project involves experimenting with a variety of simple and more sophisticated tuning methods, and would evaluate their suitability in this context. The DCS Saratoga cluster will form a suitable experimental platform. The project has the potential for publishable work and for the production of widely used open-source software. Relevant references include the IEEE Task Force on Cluster Computing ; a paper on solving dense linear systems on this platform (exposing some problems with HPL); Dongarra, J et al. Self Adapting Numerical Software (SANS) Effort, IBM Journal of Research and Development 50(2-3), 2006; A Case Study of HPL Benchmarking and Tuning, Verdi March, Technical Report APSTC-TR-2008-01, 07 Jan 2008. Parallelizing Applications on a Heterogeneous ClusterA Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. A typical cluster tends to evolve, and hence tend to become heterogeneous. The efficient parallelization of applications using the Message Passing Interface (MPI) is reasonably well understood for homogeneous clusters. However, for heterogeneous clusters, which can not only have nodes with different processor architectures but also sections of nodes connected by different networks, the situation is still far more challenging. In particular, differences in processor and network speeds make it difficult to utilize the faster components more effectively. Using the Jabberwocky heterogeneous cluster, this project will investigate techniques used so far to parallelize (scientific/engineering) applications for heterogeneous clusters. The project will then make a selection of these applications and evaluate the effectiveness of these techniques on the Jabberwocky. New or variant techniques may then be proposed and evaluated. Relevant references include the IEEE Task Force on Cluster Computing home page, and An Overview of Heterogeneous High Performance and Grid Computing, by Jack Dongarra and Alexey Lastovetsky.
Extended Threads Emulation in an SMP Computer SimulatorSimulation is playing an increasingly important role in the design and evaluation of current and future high performance computer systems. Sparc-Sulima is an open-source UltraSPARC SMP simulator, written in C++ and developed by the CC-NUMA Project. Sparc-Sulima simulates an UltraSPARC processor by interpreting each instruction. It has timing models which predict an UltraSPARC III Cu shared memory processor performance to a high degree of accuracy. Currently, Sparc-Sulima can simulate an unmodified Solaris (UltraSPARC) binary (executable program). When that program executes a system call trap instruction, the corresponding operating system effects are `emulated' (the simulated computer state is updated directly, rather than via simulating a lengthy and complex sequence of kernel-level instructions), and simulated execution then resumes. This mechanism is handled by the SolEmn emulation layer in Sparc-Sulima. Threaded programs (e.g. using the pthreads Posix thread libraries) ultimately access thread-related operating system services via the Solaris _lwp (lightweight process) trap instructions. The limitation of the current implementation is that, at any time, there can only be one active thread per CPU. Such a limitation has been in all emulation-based computer simulators in the literature also. The aim of this project is to extend the emulation of the _lwp trap instructions to remove this limitation. It will require implementing thread switching, which can occur by time-outs, and also by the _lwp_suspend and yield system calls. The current thread context save/restore mechanisms will also have to be extended. If successful, the work should be publishable (along with the existing work on threads emulation). An extension of this project, would be to investigate which thread scheduling heuristics best approximate real machine behavior. Relevant references include: Kevin Skadron et al. Challenges in Computer Architecture Evaluation, Computer, August 2003; and papers from the Sparc-Sulima home page. Energy-Efficient Data Gathering in Wireless Sensor Networks
Supervisor: A/Prof Weifa Liang Background Wireless sensor networks have received significant recent attention due to their potential applications from civil to military domains including military and civilian surveillance, fine-grain monitoring of natural habitats, measuring variations in local salinity levels in riparian environment, etc. The sensors in sensor networks periodically collect the sensed data as they monitor their vicinities for a specific interest. The sensed data can be transmitted directly to the base station where the client queries are posted, the base station then processes the collected data centrally and responds to users queries. However, due to the fact that each sensor is equipped with energy-limited battery, the energy efficiency rather than the response time for each query is paramount of importance in sensor networks. The energy consumption for the above centralized processing is very expensive. Instead, we adopt energy-efficient in-network processing, i.e., the data gathering is implemented through a tree rooted at the base station, and the data at each sensor has to be filtered or combined locally before it is transmitted to its parent. Unlike most existing assumptions that each sensor only transmits a fixed length of message, we consider another data gathering, in which the length of the message that each sensor transmits to its parent is proportional to the sum of the message lengths of its children and itself. Thus, the total energy consumption at a node is proportional to the length of the transmitted message and the distance between the node and its parent in the tree. Specifically, on one hand, we aim to find such an energy efficient spanning tree rooted at the sink node that the network lifetime is maximized. On the other hand, we prefer the approximate result instead of the exact result in traditional data models for a given query, since the communications among the sensors are fail-prone. Under this circumstance, how close the approximate result is to the exact result becomes the essential issue. In summary, this project will be in pursue of energy efficiency on the basis of database modeling in sensor network environments. A student working on this project would be involved:
Energy-Efficient Aggregate Node Placement for Data Gathering in Wireless Sensor Networks
Supervisor: A/Prof Weifa Liang Recent advances in microelectronic technology have made it possible to construct compact and inexpensive wireless sensors. Networks formed by such sensors, termed as wireless sensor networks, have been receiving significant attention due to their potential applications in environmental monitoring, surveillance, military operations and other domains. While these applications revealed tremendous potential for capturing important environment phenomenon using sensor networks, they have also posed certain associated limitations. One of the major limitations of sensor nodes that they are powered by low power batteries, which limit the network lifetime and impact on the quality of the network. Energy conversation in sensor network operations is thus of paramount importance. As most currently deployed sensor networks are used for monitoring and surveillance purposes, the sensed data generated by the sensor network must be available for users at the base station, data gathering that extracts and collects sensed data from sensors thus is one fundamental operation in sensor networks. The energy conservation on data gathering leads to prolongation of network lifetime. The study of energy-efficient data gathering poses great challenges due to the unique characteristics and physical constraints imposed on sensor networks. In this project we study the {\it aggregate node placement problem for data gathering}, which aims to place a few powerful aggregate nodes to a dense sensor network such that the network performance can be significantly improved. Assume that a heterogeneous sensor network consists of two types of sensor nodes: cheap sensor nodes that have fixed identical transmission range and expensive aggregate sensor nodes that have various transmission ranges. Aggregate nodes have more energies, higher data transmission rate, and better processing and storage capabilities than sensor nodes. The main task of an aggregate node is to process and relay data for sensor nodes and the other aggregate nodes. Due to expensive cost associated with aggregate nodes, the number of them in a sensor network is very limited, thus, they need to be placed in the network carefully in order to increase the network connectivity, reduce the data gathering delay, and prolong the network lifetime. Specifically, the aggregate node placement problem for data gathering is as follows. Given a sensor network consisting of $n$ dense sensor nodes that are randomly deployed in a region of interest and a few number of expensive aggregate nodes $K$ ($K<< n$), the problem is to place the $K$ aggregate nodes in the region so that the network lifetime can be further maximised by answering data gathering queries. Energy-Efficient Skyline Computation and Maintenance in Sensor Networks
Supervisor: A/Prof Weifa Liang Background Technological advances in recent years have enabled the deployment of large-scalable wireless sensor networks (WSNs) consisting of hundreds or thousands of inexpensive sensors in an ad-hoc fashion for a variety of environmental monitoring and surveillance purposes. In these applications, a large volume of continuous sensed data generated by sensors needs to be either collected at the base station or aggregated within the network. Thus, WSNs usually are treated as a virtual database. The skyline query, as an important operator in modern databases for multi-preference analysis and decision making, has received much attention recently due to its wide applications. In this project we consider the skyline query problem in WSNs. We aim to devise novel, distributed evaluation algorithms for skyline query evaluation from the scratch. We also consider the skyline maintenance within sliding window environments by developing distributed maintenance algorithms for it. We will conduct extensive experiments by simulation to evaluate the performance of the proposed algorithms in terms of multiple performance metrics including the total energy consumption, the maximum energy consumption among the network nodes, and the network lifetime. A student working on this project would be involved:
Neural networks theory and applications
Supervisor: Prof Tom Gedeon Background A number of projects are possible in this area. I am interested in a number of topics, such as extracting rules from neural networks, information retrieval using neural networks, data mining and feature selection, cascade neural network structures, hierarchical neural network structures, and neural network applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with neural networks is necessary. Most projects will use the very popular backpropagation neural network training algorithm. If you are interested in neural networks and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me. Example projects: i. Cascade neural network stuctures can be built automatically without making decisions as to the number of neurons required to solve a problem by adding single neurons. This project would investigate the use of larger chunks such as feature maps as cascade components, which would be useful for recognising images (including faces). ii. Rule extraction: Neural networks can learn complex tasks but face the problem that human beings do not trust them as they can not understand _why_ a particular decision is made. This project focuses on rule extraction for explanation.
Fuzzy logic theory and applications
Supervisor: Prof Tom Gedeon Background A number of projects are possible in this area. I am interested in a number of topics, such as automated construction of fuzzy rule bases from data, hierarchical fuzzy systems, fuzzy interpolation, information retrieval using fuzzy logic,, universal approximators, and fuzzy logic applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with fuzzy logic is necessary. If you are interested in fuzzy logic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me. Example projects: i. Fuzzy document filtering: Web search engines return many documents which are not relevant to queries. Fuzzy techniques to enhance the results from search engines will be very useful. This project will investigate a number of fuzzy techniques for this purpose. ii. Combining uncertainty: Fuzzy logic is a technique which deals with uncertainty well. Sometimes data contains multiple kinds or sources of uncertainty. For example, a pedestrian could report that he saw someone he was quite sure was stealing something. He was not certain and his own reliability is another but different kind of uncertainty. This project is to develop some methods to combine different kinds of uncertainty in intelligence led investigation. (Data from a company specialising in this area will be available.) Bacterial evolutionary algorithms
Supervisor: Prof Tom Gedeon Background There are several optimisation methods inspired by natural processes, It has been shown that evolutionary algorithms are efficient tools for solving non-linear, multi-objective and constrained optimizations. The principle is a search for a population of solutions, where tuning is done using mechanisms similar to biological recombination. The operations the bacterial evolutionary algorithm were inspired by microbial evolution. Bacteria can transfer genes to other bacteria, so the rules are optimised in each bacterium.
Example projects: ii. Making pictures: If bacteria encoded the components of an abstract computer generated picture, an individual could identify nice and not nice images repeatedly to generate some 'art' which is tuned for their esthetic sense. iii. Scheduling: A previous student has worked on a GA for University exam scheduling. A comparison with bacterial algorithms would be an interesting project.
Face Recognition
Supervisor: Prof Tom Gedeon Background A number of projects are possible in this area. I am interested in a number of topics, such as image processing for face recognition, HCI research tool to collect facial images, biologically plausible architectures for face recognition, building and modifying computer models of real faces. No previous work on face recognition is necessary. For the image processing topic, some experience with computer graphics would be useful. If you are interested in face recognition or some similar topic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.
Example projects: ii. Video annotation: Techniques such as PCA can recognise the same face so long as the rotation/expression/illumination is not too different. This can be used to recognise one person throughout a segment of video. iii. 3-D Face model: An average face 3D model can be constructed from a single face image and animated via rotation, a simple face expression model, or illumination. This can then be used to recognise faces in multiple poses, expressions or lighting.
Data Cleaning Via Logic
Supervisor: A/Prof Rajeev Gore Background The Australian Bureau of Statistics stores its census information as a collection of tables with one row (record) for every individual with many columns like age, sex, marital status, current occupation etc. Such tables invariably contain errors, either introduced during data collection, or during data entry. For example, the census data may contain a record for a person named X who is aged 6, is married and in high school. Such data usually has an associated collection of extra conditions which capture important inter-relationships between the columns of the table. For example, the census table may have constraints like "the legal age of marriage is 16" and "a high-school student is aged between 11 and 18". According to these constraints, the entry for X is incorrect since the entry breaks both conditions, so the obvious question is to find a correction which is as close to the original as possible. The problem of finding an error in a data table is known as error-localisation, and the problem of changing an incorrect record so that it passes all the constraints is known as error-correction. The standard method for solving both of these problems is called the Fellegi-Holt method, but it is highly inefficient. Recently, Agnes Boskovitz, a phd student with the RSISE, has shown a deep connection between the Fellgi-Holt method and a well-known method from propositional logic called resolution. Efficient automated tools for performing resolution and associated logical deductions are readily available. The project is to implement the Fellegi-Holt method using these logical tools to gauge the efficacy of this logical approach to data cleaning. If successful, the prototype has huge potential in real world applications and will almost certainly lead to a publication in an international conference or journal. Feel free to email me if you are at all interested in this project. Interval Arithmetic for Computational ScienceSupervisor: A/Prof Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au An interval is a continuum of real numbers bounded by its end-points. Interval arithmetic performs operations on intervals rather than using discrete floating point representations. As a consequence of this errors associated with numerical rounding are automatically propagated and accounted for. Moreover the mathematical properties of intervals can be exploited to develop entirely new algorithms. The concept of intervals is actually not new, but was considered by Moore in 1965. What is relatively new is the fact that some hardware vendors are now providing direct support for interval arithmetic within their compilers/hardware. In this project we wish to develop an interval code for evaluating the type of integrals that are used in quantum calculations on atoms and molecules. Evaluations of these integrals involves truncation of an infinite series, followed by the application of a variety of recursion relations. As a consequence the value of the final integrals reflect both a series truncation error, and a rounding error which can vary according to the recursive path taken. This project will study the balance between these two sources of errors and the implication of this in large scale quantum calculations. This project will build on work undertaken in 2005 by honours student Josh Milthorpe who studied the basic performance of interval operations and use of intervals in fast multipole calculations. You will also be part of a team working on interval related methods funded from an Australian Research Council Discovery grant. Automatic DifferentiationSupervisor: A/Prof Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au Every function, no matter how complicated, is executed on a computer as a series of elementary operations such as additions, multiplications, divisions and elementary functions like exp(). By repeated application of the chain rule to those elementary operations it is possible to obtain the derivative of any computed function in a completely mechanical fashion. This technique is applicable to computer programs of arbitrary length containing loops, branches and procedures. In contrast to explicit evaluation and coding of a derivative, automatic differentiation enable derivatives to be easily updated when the original code is changed. A number of programs are now available to perform automatic differentiation for code written in a variety of languages The aim of this project would be to use one of these tools to perform derivative evaluations for some computational science applications. Specifically in the modeling of systems at the atomic level derivatives with respect to atomic positions are hugely important determining things like the frequencies at which molecules vibrate. Myself and others have spent large quantities of time and effort deriving expressions for derivatives and then coding them. To automate this process and for it to be relatively efficient would be very significant. Your objective would be to explore this issue for some relatively simple computational molecular science codes. Numerical Simulation on Graphics Processing Units (GPU)Supervisor: A/Prof Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au Background Over the past 10 years graphics processing units (GPU) have become ubiquitous in desktop computers. Modern GPUs now allow substantial user programability that lets them be used for more general-pupose computation. Indeed in a recent paper that was presented at SC04, Fan et al detail their efforts to construct a "GPU Cluster for High Performance Computing". In this work the authors plugged in 32 GPUs to a Xeon cluster system and in so doing increased the total theoretical peak performance of the machine by 512Gflops - for a cost just under US$13,000!! Tracking this development there has been a number of groups developing programming environments that can exploit the GPU for general computations. Details of many of these are available at the GPGPU web site. Two of particular interest to this project are Brook and Sh. The former is built on top of C while the latter is built on top of C++. Brook for example extends standard ANSI C with the concept of a "stream", which is a new data type representing data that can be operated on in parallel. The aim of this project is to investigate the potential for using the GPU in molecular science computations. The initial target application is likely to be N-body simulations, as typified by the simulation of atom motions. Careful consideration will be given to accuracy, since currently the GPU is limited to single precision computations. Thus the project will address two issues, i) the in principle feasibility of using the GPU for these types of problems, and ii) the scientific usefulness given the current single precision limitation.
MPI and Global File System I/O over InfiniBand using Virtual Lanes
Supervisors: Richard Alexander,
A/Prof Alistair Rendell,
Peter Strazdins InfiniBand networks provide for multiple, parallel system area networks using the same physical hardware, using a concept called Virtual Lanes. Virtual Lanes allow the network to be split for a variety of purposes including the separation of HPC cluster management, message passing and I/O traffic. There are a number of parallel filesystems used in heterogeneous Linux clusters, with the most widely used being PVFS2, GPFS, Lustre in addition to the legacy NFS system. One possibility with large SMP nodes in a cluster is to use InfiniBand virtual lanes and operating system virtual machines in a way that allows nodes of a cluster to be used as both compute nodes and filesystem storage nodes. This project will implement the Lustre filesystem on a small test cluster with large SMP nodes and the InfiniBand interconnect, using virtual machines in a novel way to segregate compute functions from I/O. References:
Declarative theorem-proving
Supervisor: Dr Michael Norrish This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities. When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find. The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well. Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up. The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data. Finding Models of Unsolvable First Order Goals
Supervisors: Dr Michael Norrish When using an interactive proof assistant, a powerful tactic for finishing goals is the use of first order proof procedures. Unfortunately, these procedures are all-or-nothing affairs: if they succeed, they succeed completely, but if they fail to prove the goal, the user knows only that the provided lemmas and assumptions were not sufficient to allow the procedure to find a proof. The advent of model generation tools promises a way out of this painful situation. These tools can be used to generate what is effectively a counter-example to false goals. This is much more helpful for the user. Either the tool's output really is a counter-example and their proof will never succeed, or the counter-example will be impossible because the user has forgotten to tell the tools some relevant lemma from a database of known facts. The student's task in this project would be to implement a connection between two existing tools: the interactive system HOL4, and the model generation tool Darwin. This connection would provide a user-friendly implementation of the scheme described above: when an invocation of the first order prover in HOL failed, the connection implementation would translate the HOL goal into terms that Darwin could understand, call Darwin, and then finally translate Darwin's provided counter-example back into a human-readable form for the user's benefit. This project would suit a student familiar with functional programming (HOL is written in SML), and happy with the concepts of first order logic (terms, formulas, quantifiers). The student would be jointly supervised by Michael Norrish and Peter Baumgartner. Projects from NICTA researchersProject I
Project Title: Knowledge
Discovery from Internet Malicious Data Collected by Honeypots
Supervisor: Dr Warren JinKeywords: Data Mining, Internet security, intrusionThe Internet security has been
becoming more and more important as Cyber attacks have significantly increased
in volume, coordination and sophistication due to the spread of worms and
botnets. A botnet is a large collection of compromised machines under a common
Command-and-Control infrastructure, typically used for nefarious purposes. A
honeypot is a computer trap specifically designed to attract and detect
malicious attacks, in particular botnet attacks. For example, the Leurre.com
and the SGNet honeypots are distributed over 30 countries and they have
collected hundreds of gigabytes of Internet malicious traffics. This Honours project, as a part of DSTO-NICTA joint honeypot project, is to improve or develop techniques for discovering actionable knowledge from Internet malicious data collected by the Leurre.com and the SGNet honeypots. The project may concentrate on following topics, each of which could be developed into a quality Honours thesis.
The new technique would be very useful for proactive security, such as Internet intrusion prevention. Participants will have the opportunity to get hands-on experience with some of the latest technologies. The participants will access large real-world databases collected by honeypots. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the honeypot project. Student participant should either have
excellent programming skills in Java, C++, or Python, or have strong
mathematical background. Experience of networking or Internet security or
database is useful. Knowledge of data mining, machine learning or statistics
is an advantage, but not necessary. Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information. Project II
Project Title: Large
Complicated Data Stream Mining
Supervisor: Dr Warren JinKeywords: Data stream mining, exploratory analysis
Owing to the rapid progress on data collection and advanced database system technologies and World Wide Web, various complicated data have been explosively growing. There are lot of examples of data streams. For example, a satellite-mounted remote sensor is generating data constantly. World-wide distributed honeypots are collecting malicious Internet traffic data every second in order to monitor the spread of worms and botnets. These data streams are massive (e.g., the Leurre.com and the SGNet honeypots are collecting hundreds of megabytes data over 30 countries every day), temporally ordered, rapidly evolving, various types of attributes (e.g., continuous, categorical, string) and potentially infinite. Mining complicated data streams is one of very important real-world data mining tasks. This Honours project is to develop or improve a data mining technique on complicated data streams. The data mining technique could be, but not limited to, data visualisation, temporal association analysis, clustering, time series analysis, outlier detection and classification. The technique can be used to find interesting events or hot spots within data streams. The project will consider mixed data attributes such as continuous (e.g., packet length), categorical (e.g., TCP or not) and sequential (e.g., payload strings) attributes. The project may also take into account of scalability for large amount of streaming data. Participants will have the opportunity to get hands-on experience on up-to-dated stream mining techniques. The participants may be engaged in using these techniques to handle real-world challenges. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the use-inspired research. Interested students should have good
understanding of data mining, visualisation or time series analysis.
Programming skills in C++, MatLab or R are useful. Experience of databases or
data analysis or distributed computing is a plus. Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information. Project III
Project Title: Population
Projection for New
|
|||||||||||||||||||||||||||||||||||||||||||||||||
|
Page last updated: 25 May 2009 Please direct all enquiries to: webmaster@cs.anu.edu.au Page authorised by: Head of Department, DCS |
| The Australian National University — CRICOS Provider Number 00120C |