Skip navigation
The Australian National University

Honours topics

Half of your time as an honours student is spent working on a project. But first you have to find a project topic.

The "official" reference for projects proposed by potential supervisors is the CECS projects database.

There are projects there available for all levels of research, including short projects, summerscholarship projects, Honours projects, Masters and PhD projects. All potential research students at any level are urged to browse it.

If you see a project that looks interesting, email the potential supervisor about it. Don't be afraid to discuss possible variations to the listed project: what appears on the web site is generally more of a suggestion than a rigid specification.

You don't have to be constrained by what you see on the project website. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Look at the web pages of Computer Science staff to find out their research interests. Remember that projects may also be supervised by people outside the College, or even ontside the University: from CSIRO or NICTA, for instance.

Former Project Topics

For your interest, here is an archive of Honours project proposals from previous years. Some of them are quite ancient now, of course, but they may help to give you ideas of the kind of thing which is suitable.
  • 2009 Proposed Project Topics
  • 2008 Proposed Project Topics
  • 2007 Proposed Project Topics
  • 2006 Proposed Project Topics
  • 2005 Proposed Project Topics
  • 2004 Proposed Project Topics
  • 2003 Proposed Project Topics
  • 2002 Proposed Project Topics
  • 2001 Proposed Project Topics


    2008 Honours project proposals



    Data mining projects

    Why do students fail? Data mining the DCS student database (Computer Science honours project)

    The ANU Department of Computer Science has been collecting student course enrolments, marks and grades through its FAculty Information System (FAIS) for several years.

    The aim of this honours project is to develop data mining techniques that allow analysis of the FAIS database with the objective to better understand student performance, progress and retention. The questions that the department is interested in include, for example: why students stop studying computer science?, what correlations there are between their marks in different courses?, can we predict that a student will have problems in a certain course based on her or his past performance?, can we identify students that might be at risk of failing or dropping out of computer sciences courses early in their studies?

    This project will include the following components:

    • Exploration, development and application of data mining techniques that allow mining of the multi-relational data as available in FAIS. This will include using the open source data mining tool Rattle as well as development of specific techniques required for this project (likely using the Python programming language).

    • Development of a data generator - based on real, summarised FAIS data - that will allow creation of synthetic data to be used for later testing and evaluation of the data mining techniques to be developed in this project.

    • Analysis of real, anonymised FAIS student data. This work will be done on a secure compute server (which is currently being purchased by the ANU Data Mining group for real world data mining projects such as this that involve sensitive data).
    Note: Parts of this project will require a human ethics clearance which is currently being applied for through the Human Ethics committee at the ANU Research Office. Depending upon the outcomes of this application there might be changes to this project.

    Supervisor: Peter Christen


    Data linkage projects

    Background

    Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), taxation, statistics, police/intelligence and telecommunications. Data mining techniques are used to analyse such large data sets, in order to find patterns and rules, or to detect outliers.

    In many data mining projects, information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such data linkages is to merge all records relating to the same entity, such as a customer or patient. Most of the time the linkage process is challenged by the lack of a common unique identifier, and thus becomes non-trivial. Probabilistic data linkage techniques have to be applied using personal identifiers like names, addresses, dates of birth, etc.

    The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of improved techniques and algorithms for large scale high-performance data linkage. We have been developing an open source data linkage system called Febrl (Freely extensible biomedical record linkage), which includes modules for data cleaning and standardisation (of names, addresses, dates of birth, etc.), deduplication, data linkage (or matching), and geocoding.

    For more information please have a look at our publications as well as presentations that describe our research in more details.

    The following student projects address important issues within our data linkage project.

    Supervisor for all projects: Peter Christen


    Probabilistic data generation (Computer Science honours project)

    One of the difficulties in data linkage research is getting access to real world data sets containing real names and addresses, as such data is protected by privacy and confidentiality regulations. Developing, and testing new and improved data linkage algorithms therefore becomes very difficult.

    We have developed a data set generator (which is part of Febrl). However, this generator is very simple, and various ways of improving it are possible. For more details please have a look at the recent publication Probabilistic Data Generation for Deduplication and Data Linkage.

    Student research project

    The aim of this honours research project is to further explore how our simple data set generator can be improved. This would include analysing the literature in this area (artificially creating data sets), analysing the existing data generator code, and developing improved techniques that reflect reality more accurately. Issues involved are:

    • Accurately model typographical errors as introduced by (a) scanning and optical character recognition (OCR), (b) keying errors (i.e. errors introduced when people type data into a keyboard), and (c) phonetical errors introduced when data is entered over the phone (and is manually typed in).
    • Model the dependencies between attributes (for example if a person moves, most of her or his address details will change).
    • Model household and family data (i.e. not just generate single records independently, but create sets of records for families, or shared households).

    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of statistical, machine learning, and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).


    Parallelisation of data linkage techniques (Computer Science honours project)

    One of the problems when linking large collections of data with tens or even hundreds of millions of records is the complexity of data linkage techniques. Potentially each record in one data set has to be compared with every record in the second data set, resulting in a quadratic complexity O(n2). Techniques known as blocking are normally used to reduce the number of record pair comparisons (for example, only records that have the same postcode value are compared with each other). However, linking large data sets still takes a lot of time.

    Only a very limited amount of research has been done in this area (we are only aware of four publications so far the deal with parallel data linkage, including one of our own papers: Febrl - A Parallel Open Source Data Linkage System, PAKDD'04).

    Student research project

    The aim of this project is to develop parallel data linkage techniques that allow the linking of large data sets on modern multi-core computing platforms. Issues which have to be considered include:

    • Data distribution and load balancing
    • Scalability
    • Privacy and confidentiality (as names and addresses are normally used for the linkage)
    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of parallel computing techniques, as well as statistical, machine learning, and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).


    Improved record pair classification techniques (Computer Science honours project)

    A crucial step in the data linkage process is the classification of pairs of records that are being compared into either matches or non-matches. For example, consider the following two records containing a name and address each:

    Dr Smith, Peter 42 Miller Street 2602 O'Connor
    Pete Smith 42 Miller St 2600 Canberra A.C.T.

    Are these two records representing the same person, or two different people?

    In the past few years researchers from the data mining, machine learning, and AI communities have explored various classification techniques, including decision trees, genetic algorithms, clustering, active learning, support vector machines, and instance-based classifier to improve the classification of record pairs. Other researchers have applied techniques from information retrieval and database research to improve the linkage classification quality. However, this research has been fragmented and so far no large study comparing the many classifiers has been conducted using different data sets (both synthetic and from the real world).

    Student research project

    The aim of this project is to (1) implement prototypes of different classifiers and then conduct studies comparing their performance using different data sets, and to (2) develop new - based on existing methods - classifiers that are well suited for the data linkage classification problem. Issues involved are:

    • Which classifier performs best for what types of errors in the data (e.g. simple typographical errors, missing words, swapped words, etc.)
    • Does a combination of classifiers work better than one single classifier?
    • Comparison of supervised techniques and un-supervised techniques.
    • Exploration of active learning techniques (in order to reduce the manual effort required to create training data)
    A student working on this project would be involved through

    • Participation in an exciting research project.
    • Exploration and development of statistical, machine learning, and data mining techniques.
    • Object-oriented cross-platform programming (using Python and friends).
    • Open source software development (see http://sourceforge.net/projects/febrl).


    Semantic Web projects

    User Consumable Interface for the Semantic Web

    The vision of the Semantic Web is to equip the World Wide Web with machine-processible and machine-understandable semantics, so that the Web can becomes a universal medium of information and knowledge exchange for both computer and human being. Towards this vision, a number of building blocks of the semantic web have been created, including the Resource Description Framework (RDF), the Web Ontology Language (OWL) and several query languages such as SPARQL. It can be foreseen that in the near future a lot of information on the Web will be described or annotated using these facilities, resulting in a large scaled knowledge base. Creating a consumable interface for naive users to access such a large knowledge base is becoming a new challenge, as traditional interfaces on databases or knowledge bases are programmer-oriented rather than user-oriented.

    Goals for this project: In this project, we will design and implement a number of techniques to facilitate users' accesses to large scaled data sources on the semantic web.

    Requirements: Students will need to be proficient systems programmers with good C++ or Java skills.

    Student's gain: The student will work on an interesting research problem with potential impact on the future World Wide Web. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal.

    Literature and Background: Database usability; DBpedia -- a dataset for the semantic web.

    Supervisor: Dr. Jemma Wu (jemma DOT wu AT csiro DOT au), Dr. Xuan Zhou (xuan DOT zhou AT csiro DOT au)


    Semantics in Sensor Networks

    Background: Despite existing different domain-specific deployments and the heterogeneity of sensor networks and sensing data, sensor networks share a common feature: they all collect and periodically transmit information from some set of sensors, and they all must carefully manage limited power and radio bandwidth to ensure that essential information is collected and reported in a timely fashion. From a data storage point of view, one may think of a sensor network as a distributed database that is able to conduct query processing. This has led to the design and implementation of query processing a very popular topic. However, query processing in sensor networks works different from that in a traditional DBMS given that the former is resource constrained (e.g. limited energy, memory and computation ability). Furthermore, the volume of the sensing date can be very variable (e.g. different sampling rate) within a time period and the access to this data may be delayed arbitrarily due to the motion of sensors and the network latency. These add new challenges to the not yet fully solved query processing problem and more needs to be done to enable the success of sensor networks applications.

    Project outcomes: In this project, we will investigate what the ž³semanticsž´ can help to improve query processing - other than just conventional development of a schema for metadata about (semantic) transformation. Prototypes will be based on a real world problem. The Fleck platform (*Fleck* - the sensors and sensor operating system developed by CSIRO ICT Centre) is available for prototyping work. The outcome will include a working prototype for the future work to be built on.

    Requirements: Students will need to be familiar with C programming language.

    Student's gain: Student will be working on one of the most interesting sensor networks issues with other research scientists from CSIRO. At the end of this project, students will gain insights into R&D from both industry and academia and are ready to explore other issues arising from sensor network applications.

    Supervisor: Dr. Lily Li
    Email: lily.li@csiro.au


    Using semantics to better understand the Web

    Background: The Semantic Web means different things to different people because it is very broad. It may mean:
    • RDF, Microformats, OWL
    • web services
    • AI techniques
    • Simple and tangible applications
    So it is not surprising that it has been used by Google, Yahoo, and Firefox to attract users and certainly it will be playing more important role in the next generation search engines. Because users simply look for the utility and usefulness of a system rather than the technologies on which it is built on, developing new tools and applications on top of the Semantic Web is the real challenge. In this project, we aim to develop algorithms that (1) extract entities out of unstructured web pages; (2) handle continuous influx of user data (e.g. from Wikis); (3) parse it; and then (4) update the semantic repositories thus to keep up with the change of the world (a world on the Web).

    Project outcomes: A number of algorithms will be developed, and eventually leading to easy access to the information on the Web – other than just key words search.

    Requirements: Students will need to have knowledge/skills about (1) web services technologies; (2) good understanding of RDF, OWL and reasoning on OWL; and (3) C++/Java.

    Student's gain: By keeping up with the latest developments in the Semantic Web, students will not only gain insights into the vision of the Semantic Web but also have some hand-on experience about how to realise the semantics on the Web. Algorithms developed in this project may have potential to be used by popular search engines to deal with unstructured information on the Web more efficiently.

    Supervisor: Dr. Lily Li
    Email: lily.li@csiro.au


    Scalable Work Queues for Parallel Garbage Collection

    Physical limits mean that future performance improvements in processor design will mostly come from increased application-level parallelism. Thus the ability for systems to scale well on parallel architectures is an extremely important issue. Since managed languages such as Java and C# have become dominant in the application development domain, the scalability of managed runtimes has become a critical research issue. In this project we explore the scalability of a fundamental data structure at the heart of a Java runtime system---a load balancing parallel work queue.

    Supervisor: A. Prof Steve Blackburn, ANU

    Goals for this project

    This project will examine the scalability and pathologies of the load balancing parallel work queue used by MMTk, which is the memory management toolkit used by Jikes RVM. Hopefully the project will either confirm the scalability of the existing system or identify (and implement) a more scalable work queue design.

    Requirements

    Students will need to be proficient programmers with a good basic understanding of computer architecture and parallel algorithms. The work will all be done in a slightly restrictive subset of Java, so experience with Java is essential. Experimental design and data analysis is key to the first phase of the project, so experience in this area will be an asset (of course there will be plenty of assistance from your supervisor, particularly in this area).

    Student's gain

    The student will gain hands-on experience with an increasingly important problem in the context of the leading research infrastructure for managed runtime research, in the setting of a world leading team.

    Literature

    This is a well-focussed problem and does not require any significant background reading. The load balancing work queue is reasonably well insulated from the rest of the environment we work in, so a deep understanding of MMTk or Jikes RVM is not necessary. www.jikesrvm.org Contact Steve Blackburn

    Architectural Analysis of Garbage Collection Behavior

    Trends in microprocessor design are toward massively multicore chips. One direction within this burgeoning research area is that of asymmetric multicore design, where the cores within a chip are not homogeneous. In this setting some cores may serve very specific purposes. Given a) the centrality of garbage collection to managed runtimes, and b) the very limited requirements of garbage collection algorithms, it may make sense to have dedicated garbage collection cores. To this end, this project will explore the architectural "footprint" of a garbage collector with a view to ascertaining the base requirements for a dedicated garbage collecting core.

    Supervisor: A. Prof Steve Blackburn, ANU
    Goals for this project
    The project will use an architectural profiling tool (valgrind) to profile a range of garbage collection workloads and thus attempt to characterize the architectural requirments of garbage collectors.

    Requirements

    Students will need to be proficient programmers with a good basic understanding of computer architecture. The primary tool for this project will be valgrind, a publicly available virtualization tool written in C. The student working on this project should be an accomplished systems programmer.

    Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes, at the architectural level. This project is a logical starting place for a much deeper examination of the project and eventually, realization in an FPGA. Literature
    A basic understanding of garbage collection would be helpful, but is not essential. Henessy and Patterson's standard text on computer architecture ("Computer Architecture: A Quantitative Approach", 4th edition) is an invaluable starting point for this work.

    Contact Steve Blackburn


    Profiling Java Virtual Machine Performance

    The performance of managed runtime systems and the applications which execute on them is a hot topic of academic and industrial research. However, dynamic compilation, feedback directed optimization, and concurrency each make performance analysis significantly more complex than in traditional ahead-of-time compiled, static environments such as C programs. This project will explore techniques for profiling the performance of virtual machines and applications running over them. These techniques will include sampling via interrupts (due to timers and/or hardware performance counter triggers) and direct instrumentation of compiled code. Additionally, the project will explore dynamic generation and maintenance of code annotations so that machine code and calling contexts can be tied back to source code in the application or VM source code.

    Supervisor:A. Prof Steve Blackburn, ANU
    Goals for this project This project will implement a number of different profiling techniques and a basic mechanism for dynamic maintenance of mappings between machine code and source code, in the context of Jikes RVM, a widely used research JVM.

    Requirements Students will need to be proficient systems programmers with a good C and Java skills.

    Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes.

    Literature and Background Familiarity with a simple profiling tool such as gprof would be an asset. Some interesting related work can be found at the following links: http://portal.acm.org/citation.cfm?id=1134012 http://www.cs.utexas.edu/users/mikebond/pcc-tr-2007.pdf Contact Steve Blackburn


    Bubble VM: the VM within the VM within the VM....

    Surprisingly, and despite the stated advantages of managed languages (such as Java and C#), no commercial managed runtime systems are written in the language they implement. However, there are two successful research runtimes which take this approach, Jikes RVM (an open source Java-in-Java VM), and Bartok (an ahead-of-time C# system on which the Singularity OS project is built). Both of these projects have demonstrated considerable promise, and both have demonstrated that performance need not be a roadblock to such VM implementations. One of the most vexing issues faced by such VMs is the potential for blurring the distinction between the application and the host on which it runs. In this project we will explore how to better understand the boundary between the application and the supporting VM, using a rough analogy to the isolation that exists between user mode and kernel mode in a modern operating system. A good understanding to this problem could fundamentally change the design of such JVMs, increasing their reliability, improving their debugability, and improving security. Such a VM could run on any other VM and could run any application---thus could be self hosting and in principle could host an arbitrary nesting of instances of itself within itself!

    Supervisor: A. Prof Steve Blackburn, ANU Goals for this project This project will start by trying to clearly identify the application-VM boundary in Jikes RVM, a widely used research JVM. The project will try to apply the OS metaphor of trapping from one context or mode to another. Depending on the student's ability and ambitions, the project could lead to much more ambitious goals such as enabling Jikes RVM to self host.

    Requirements
    Students will need to be proficient systems programmers with a good C and Java skills.

    Student's gain The student will gain a deep insight into the design and implementation of managed runtime systems. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes.

    Literature and Background
    There is not a great deal of prior work in this area. However, the original paper describing Jalapeno (which became Jikes RVM) describes some of the most basic issues that a Java-in-Java design must face: a Java-in-Java design must face Contact: Steve Blackburn


    Projects offered by Dr Eric. McCreath


    Instance Based Methods

    Supervisors: Dr Peter Baumgartner, Prof John Slaney
    E-Mail: Peter.Baumgartner AT nicta.com.au

    Formal logic provides a mathematical foundation for many areas of computer science. Logical languages are used as specification language within, e.g., program development and verification, hardware design and verification, relational databases, and many subfields of Artificial Intelligence. Automated Deduction is concerned with the design and implementation of algorithms based on logical deduction for solving problems in these areas. Instance Based Methods are a relatively new family of methods for Automated Deduction; they share the principle of carrying out proof search by maintaining a set of instances of input formulas and analyzing it for satisfiability until completion.

    I offer several research-oriented honours projects centered around Instance Based Methods, in particular in connection (experimentation/extension) with our own Darwin system (http://goedel.cs.uiowa.edu/Darwin/).



    Supercom

    Supervisor: Dr Sylvie Thiébaux, Dr Anbulagan, Dr Olivier Buffet, Dr Jinbo Huang, Dr Yannick Pencolé
    E-Mail: sylvie.thiebaux-AT-anu.edu.au, anbulagan-AT-nicta.edu.au, olivier.buffet-AT-nicta.edu.au, jinbo.huang-AT-nicta.edu.au, yannick.pencole-AT-anu.edu.au

    Background

    Composite systems feature simple components organised into a highly reconfigurable and modular architecture, due to which they can appropriately assemble to achieve complex operations. Examples include web and grid services, energy distribution systems, water management systems, chemical plant control systems, telecommunication and computer networks, to name but a few.

    In the "Supercom" project, we are interested in building Artificial Intelligence tools to supervise such composite systems. These will need to confer the system the ability to self-monitor (diagnose) and self-organise (assemble, reconfigure) to achieve optimal performance.

    In this framework, various projects are available on topics such as decentralised diagnosis [1], decentralised planning [2] or diagnosis using satisfiability and knowledge compilation methods [3]. The main supervisor will depend on the project, but interactions with other researcher should appear naturally. A strong background in computer science, software engineering, or mathematics, and an interest in artificial intelligence are required, as well as good programming skills.

    References:

    [1] Yannick Pencolé, Marie-Odile Cordier A formal framework for the decentralised diagnosis of large scale discrete event systems and its application to telecommunication networks, Artificial Intelligence, Volume 164, Number 1-2, May 2005.
    [W: http://www.informatik.uni-rier.de/%7Eley/db/journals/ai/ai164.html#PencoleC05]

    [2] E. Amir and B. Engelhardt, Factored Planning, in 18th Intl' Joint Conference on Artificial Intelligence (IJCAI'03), 2003.
    [W: http://www.cs.uiuc.edu/~eyal/papers/factored-plans-ijcai03.pdf]

    [3] Look-Ahead Versus Look-Back for Satisfiability Problems , Chu Min Li and Anbulagan, CP, 1997, Linz, Austria.
    [W: http://web.rsise.anu.edu.au/~anbu/papers/CP97Anbulagan.pdf]


    Word Processing Structure Analyser

    Supervisors: Dr Ian Barnes and Dr Peter Raftos

    Background

    An important part of current research being done into the long- term preservation and accessibility of word processor (WP) documents is the conversion of (primarily) Word and OpenOffice files to some structured XML (usually DocBook-XML).

    This work assumes that the WP documents have already been structured in a systematic manner using the WP software's inbuilt style functions. However, most users do not use styles; generally, they are not even aware they exist. And even if every user were to start using them from today, there remains a legacy of unstyled (and therefore unstructured) WP documents.

    Most users rely on the look of document content to deduce structure: reasonably arbitrary choices of font size and weighting. It would be very useful if algorithms could be developed that might analyse (or u201cguessu201d) at the structure of a WP document. This should then lead to the development of a working tool to perform the structuration of unstructured WP documents.

    The Work

    Using a series of unstructured trial documents of varying complexity develop:

    1. Algorithm(s) that can deduce with a reasonable level of accuracy (80-90%), structure within the document. Basic structure includes body text and headings from level 1 to 6, as well as associated sections, multi-level lists, both numbered and bulleted, block quotes and definition lists. This may be expanded.

    2. Using the algorithms, a Java/XML-based tool to convert set WP documents from Word or OpenOffice formats to structured (that is, styled) OpenOffice documents, suitable for submission to Dr Ian Barnes' Scholars Workbench (SWB) application.

    There may be a scholarship attached to this project.


    A compiler for program synthesis

    Supervisors: Dr Michael Comton, Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore, Dr Alwen Tiu (RSISE, ANU)
    E-Mail: Mark.Cameron@csiro.au
    E-Mail: Rajeev.Gore@anu.edu.au

    Background

    This project is about writing a compiler for automatic synthesis of 'programs' from specifications. Given the specifications for services S1,...,Sn and the specification of goal G the compiler will generate a 'program' satisfying G by using the services S1,...,Sn. This compiler could be targeted to a number of possible settings. For example, the specifications could describe web services and the goal in this case is automatic generation of composite web services from other web services. Alternitively, the specifications could describe components of a security system and the goal some larger securty property thus the compiler automatically generates the larger security framework. The Information Engineering Lab at CSIRO's ICT centre already has such a compiler for web services [1], in which a logic mapping language relates resources, such as data and web-services, to relevant concepts and properties [2], and the composition compiler automatically translates a high-level declarative composition goal into an executable workflow responsible for orchestrating the data and web-service calls necessary to materialize the specified data product. But, we are currently restricted to composing over services that have stateless, functional and deterministic operations. We would like to extend this composition environment to work with web-services that have a stateful interaction style (i.e. the operation/message that the service can perform depends on the current state of the service and the service state machine). A published method for web Service composition [3,4,5] does so, by using a modal logic and tableaux theorem proving. But it concentrates on message flow, not data. We would like to integrate the two. The project would involve investigating how to integrate both data and state into the specification of services and a composition goal. The outcome would be a logic for describing the specification of each and a compiler that can automatically generate a 'program' satisfying the goal by using the provided services. The project would suit a student with an interest in logic and specification. Refrences [1] Cameron, M. A., Taylor, K. L., Baxter, R. Web-Service Composition and Record Linking VLDB Workshop on Information Integration on the Web (IIWeb-2004) Toronto, Canada, August 2004. URL: http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf [2] Cameron, M.A. and Taylor, K. (2005), First order patterns for information integration, Proceedings of the 5th International Conference on Web Engineering (ICWE2005), 25-29 July, Sydney, Australia. URL: http://dx.doi.org/10.1007/11531371_25 [3] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), Synthesis of Underspecified Composite e-Services based on Automated Reasoning in Proc. of the 2nd Int. Conf. on Service Oriented Computing (ICSOC 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html [4] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), ESC : A Tool for Automatic Composition of e-Services based on Logics of Programs in Proc. of the 5th VLDB Workshop on Technologies for E-Services (TES 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html [5] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Rick Hull, and Massimo Mecella (2005), Automatic Composition of Transition-based Semantic Web Services with Messaging in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB 2005). URL: http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html

    Tableaux reasoning for web service composition

    Supervisors: Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore (RSISE, ANU)
    E-Mail: Mark.Cameron@csiro.au
    E-Mail: Rajeev.Gore@anu.edu.au

    Background

    The Information Engineering Lab has a logic-based service composition framework that is being applied across a number of domains, including Health Data Integration (Cameron et al) http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf and environmental management http://www.iemss.org/iemss2002/proceedings/pdf/volume%20uno/304_cameron. pdf . A published method for web Service composition (Berardi et al 2004) http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html (Berardi et al 2005) http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html relies on a translation of finite-state machine representations of a composition goal and stateful Web services into a modal/description logic, PDL. Tableaux theorem proving can then be applied to find a web service composition that statisfies the goal. This project would aim to replicate and implement that basic approach, but considering alternative logics to improve the scope of the compositions that may be achieved.


    Eye-gaze tracking as an interface for virtual environments

    Supervisors: Chris Gunn (CSIRO) Henry Gardner (DCS ANU)
    E-Mail: Chris.Gunn@csiro.au
    Henry.Gardner@anu.edu.au

    Background

    This project will seek to characterise the effectiveness of eye-gaze tracking as a human-computer interface for virtual environments. The concept is that participants should be able to interact with projected images using a combination of keyboard controls and eye gaze. By focussing their visual attention on icons or images on the screen, users should be able to select these icons or activate other interaction. Two application areas of interest are the Mind Attention Interface for the Wedge virtual reality theatre and a something-or-other system in which participants stand in front of a half-spherical shell screen. In the Mind Attention Interface, eye-gaze tracking equipment will be integrated with electroencephalogram (EEG) equipment which measures electric signals from the brain of a participant. Correlating eye-gaze measurements of attention with EEG data would be a possible extension of this project.


    Projects offered by Peter Strazdins


    Energy-Efficient Data Gathering in Wireless Sensor Networks

    Supervisor: Dr Weifa Liang
    E-Mail: wliang@cs.anu.edu.au

    Background

    Wireless sensor networks have received significant recent attention due to their potential applications from civil to military domains including military and civilian surveillance, fine-grain monitoring of natural habitats, measuring variations in local salinity levels in riparian environment, etc. The sensors in sensor networks periodically collect the sensed data as they monitor their vicinities for a specific interest.

    The sensed data can be transmitted directly to the base station where the client queries are posted, the base station then processes the collected data centrally and responds to users queries. However, due to the fact that each sensor is equipped with energy-limited battery, the energy efficiency rather than the response time for each query is paramount of importance in sensor networks. The energy consumption for the above centralized processing is very expensive. Instead, we adopt energy-efficient in-network processing, i.e., the data gathering is implemented through a tree rooted at the base station, and the data at each sensor has to be filtered or combined locally before it is transmitted to its parent.

    Unlike most existing assumptions that each sensor only transmits a fixed length of message, we consider another data gathering, in which the length of the message that each sensor transmits to its parent is proportional to the sum of the message lengths of its children and itself. Thus, the total energy consumption at a node is proportional to the length of the transmitted message and the distance between the node and its parent in the tree. Specifically, on one hand, we aim to find such an energy efficient spanning tree rooted at the sink node that the network lifetime is maximized. On the other hand, we prefer the approximate result instead of the exact result in traditional data models for a given query, since the communications among the sensors are fail-prone. Under this circumstance, how close the approximate result is to the exact result becomes the essential issue.

    In summary, this project will be in pursue of energy efficiency on the basis of database modeling in sensor network environments. A student working on this project would be involved:

    • participate the research project.
    • study the problem complexity if taking the network lifetime as the sole metric
    • propose heuristic algorithm for finding a data gathering tree and provide approximate aggregate result for a given aggregate query
    • conduct experiments by simultation to validate the efficiency of the proposed algorithms.


    Energy-Efficient Aggregate Node Placement for Data Gathering in Wireless Sensor Networks

    Supervisor: Dr Weifa Liang
    E-Mail: wliang@cs.anu.edu.au

    Recent advances in microelectronic technology have made it possible to construct compact and inexpensive wireless sensors. Networks formed by such sensors, termed as wireless sensor networks, have been receiving significant attention due to their potential applications in environmental monitoring, surveillance, military operations and other domains. While these applications revealed tremendous potential for capturing important environment phenomenon using sensor networks, they have also posed certain associated limitations. One of the major limitations of sensor nodes that they are powered by low power batteries, which limit the network lifetime and impact on the quality of the network. Energy conversation in sensor network operations is thus of paramount importance. As most currently deployed sensor networks are used for monitoring and surveillance purposes, the sensed data generated by the sensor network must be available for users at the base station, data gathering that extracts and collects sensed data from sensors thus is one fundamental operation in sensor networks. The energy conservation on data gathering leads to prolongation of network lifetime. The study of energy-efficient data gathering poses great challenges due to the unique characteristics and physical constraints imposed on sensor networks. In this project we study the {it aggregate node placement problem for data gathering}, which aims to place a few powerful aggregate nodes to a dense sensor network such that the network performance can be significantly improved. Assume that a heterogeneous sensor network consists of two types of sensor nodes: cheap sensor nodes that have fixed identical transmission range and expensive aggregate sensor nodes that have various transmission ranges. Aggregate nodes have more energies, higher data transmission rate, and better processing and storage capabilities than sensor nodes. The main task of an aggregate node is to process and relay data for sensor nodes and the other aggregate nodes. Due to expensive cost associated with aggregate nodes, the number of them in a sensor network is very limited, thus, they need to be placed in the network carefully in order to increase the network connectivity, reduce the data gathering delay, and prolong the network lifetime. Specifically, the aggregate node placement problem for data gathering is as follows. Given a sensor network consisting of $n$ dense sensor nodes that are randomly deployed in a region of interest and a few number of expensive aggregate nodes $K$ ($K<< n$), the problem is to place the $K$ aggregate nodes in the region so that the network lifetime can be further maximised by answering data gathering queries.


    Energy-Efficient Skyline Computation and Maintenance in Sensor Networks

    Supervisor: Dr Weifa Liang
    E-Mail: wliang@cs.anu.edu.au

    Background

    Technological advances in recent years have enabled the deployment of large-scalable wireless sensor networks (WSNs) consisting of hundreds or thousands of inexpensive sensors in an ad-hoc fashion for a variety of environmental monitoring and surveillance purposes. In these applications, a large volume of continuous sensed data generated by sensors needs to be either collected at the base station or aggregated within the network. Thus, WSNs usually are treated as a virtual database. The skyline query, as an important operator in modern databases for multi-preference analysis and decision making, has received much attention recently due to its wide applications. In this project we consider the skyline query problem in WSNs.

    We aim to devise novel, distributed evaluation algorithms for skyline query evaluation from the scratch. We also consider the skyline maintenance within sliding window environments by developing distributed maintenance algorithms for it. We will conduct extensive experiments by simulation to evaluate the performance of the proposed algorithms in terms of multiple performance metrics including the total energy consumption, the maximum energy consumption among the network nodes, and the network lifetime.

    A student working on this project would be involved:

    • participate the research project.
    • study the problem complexity with several optimization metrics, particularly the network lifetime metric
    • develop energy-efficient heuristic algorithm for skyline finding and maintenance
    • conduct experiments by simultation to validate the efficiency of the proposed algorithms.


    Neural networks theory and applications

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background

    A number of projects are possible in this area. I am interested in a number of topics, such as extracting rules from neural networks, information retrieval using neural networks, data mining and feature selection, cascade neural network structures, hierarchical neural network structures, and neural network applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with neural networks is necessary. Most projects will use the very popular backpropagation neural network training algorithm. If you are interested in neural networks and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.

    Example projects: i. Cascade neural network stuctures can be built automatically without making decisions as to the number of neurons required to solve a problem by adding single neurons. This project would investigate the use of larger chunks such as feature maps as cascade components, which would be useful for recognising images (including faces).

    ii. Rule extraction: Neural networks can learn complex tasks but face the problem that human beings do not trust them as they can not understand _why_ a particular decision is made. This project focuses on rule extraction for explanation.


    Fuzzy logic theory and applications

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background

    A number of projects are possible in this area. I am interested in a number of topics, such as automated construction of fuzzy rule bases from data, hierarchical fuzzy systems, fuzzy interpolation, information retrieval using fuzzy logic,, universal approximators, and fuzzy logic applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with fuzzy logic is necessary. If you are interested in fuzzy logic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me. Example projects: i. Fuzzy document filtering: Web search engines return many documents which are not relevant to queries. Fuzzy techniques to enhance the results from search engines will be very useful. This project will investigate a number of fuzzy techniques for this purpose.

    ii. Combining uncertainty: Fuzzy logic is a technique which deals with uncertainty well. Sometimes data contains multiple kinds or sources of uncertainty. For example, a pedestrian could report that he saw someone he was quite sure was stealing something. He was not certain and his own reliability is another but different kind of uncertainty. This project is to develop some methods to combine different kinds of uncertainty in intelligence led investigation. (Data from a company specialising in this area will be available.)


    Bacterial evolutionary algorithms

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background There are several optimisation methods inspired by natural processes, It has been shown that evolutionary algorithms are efficient tools for solving non-linear, multi-objective and constrained optimizations. The principle is a search for a population of solutions, where tuning is done using mechanisms similar to biological recombination. The operations the bacterial evolutionary algorithm were inspired by microbial evolution. Bacteria can transfer genes to other bacteria, so the rules are optimised in each bacterium.

    Example projects:
    i. Clustering using BEAs: Clustering is more usually performed by gradient descent algorithms. BEAs should allow more interesting clusters to be formed where there are complex evaluation criteria for cluster validity. Objects clustered could be images (e.g. see 1.i or 4.ii), or text (e.g. see 2.i, 2.ii). Clusters could have predefined target criteria, such as hierarchical structure of a certain depth.

    ii. Making pictures: If bacteria encoded the components of an abstract computer generated picture, an individual could identify nice and not nice images repeatedly to generate some 'art' which is tuned for their esthetic sense.

    iii. Scheduling: A previous student has worked on a GA for University exam scheduling. A comparison with bacterial algorithms would be an interesting project.


    Face Recognition

    Supervisor: Prof Tom Gedeon
    E-Mail: tom@cs.anu.edu.au

    Background

    A number of projects are possible in this area. I am interested in a number of topics, such as image processing for face recognition, HCI research tool to collect facial images, biologically plausible architectures for face recognition, building and modifying computer models of real faces. No previous work on face recognition is necessary. For the image processing topic, some experience with computer graphics would be useful. If you are interested in face recognition or some similar topic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.

    Example projects:
    i. View morphing: View morphing between two images of an object taken from two different viewpoints produces the illusion of physically moving a virtual camera. We generate compelling 2D transitions between images. Initially we will use the same face with small rotations and produce intermediate images to produce a realistic QuickTimeVR head.

    ii. Video annotation: Techniques such as PCA can recognise the same face so long as the rotation/expression/illumination is not too different. This can be used to recognise one person throughout a segment of video.

    iii. 3-D Face model: An average face 3D model can be constructed from a single face image and animated via rotation, a simple face expression model, or illumination. This can then be used to recognise faces in multiple poses, expressions or lighting.


    Data Cleaning Via Logic

    Supervisor: A/Prof Rajeev Gore
    E-Mail: rajeev.gore@anu.edu.au

    Background

    The Australian Bureau of Statistics stores its census information as a collection of tables with one row (record) for every individual with many columns like age, sex, marital status, current occupation etc. Such tables invariably contain errors, either introduced during data collection, or during data entry. For example, the census data may contain a record for a person named X who is aged 6, is married and in high school.

    Such data usually has an associated collection of extra conditions which capture important inter-relationships between the columns of the table. For example, the census table may have constraints like "the legal age of marriage is 16" and "a high-school student is aged between 11 and 18". According to these constraints, the entry for X is incorrect since the entry breaks both conditions, so the obvious question is to find a correction which is as close to the original as possible.

    The problem of finding an error in a data table is known as error-localisation, and the problem of changing an incorrect record so that it passes all the constraints is known as error-correction. The standard method for solving both of these problems is called the Fellegi-Holt method, but it is highly inefficient. Recently, Agnes Boskovitz, a phd student with the RSISE, has shown a deep connection between the Fellgi-Holt method and a well-known method from propositional logic called resolution.

    Efficient automated tools for performing resolution and associated logical deductions are readily available. The project is to implement the Fellegi-Holt method using these logical tools to gauge the efficacy of this logical approach to data cleaning. If successful, the prototype has huge potential in real world applications and will almost certainly lead to a publication in an international conference or journal.

    Feel free to email me if you are at all interested in this project.


    Interval Arithmetic for Computational Science

    Supervisor: Alistair Rendell
    E-Mail: Alistair.Rendell@anu.edu.au

    An interval is a continuum of real numbers bounded by its end-points. Interval arithmetic performs operations on intervals rather than using discrete floating point representations. As a consequence of this errors associated with numerical rounding are automatically propagated and accounted for. Moreover the mathematical properties of intervals can be exploited to develop entirely new algorithms.

    The concept of intervals is actually not new, but was considered by Moore in 1965. What is relatively new is the fact that some hardware vendors are now providing direct support for interval arithmetic within their compilers/hardware.

    In this project we wish to develop an interval code for evaluating the type of integrals that are used in quantum calculations on atoms and molecules. Evaluations of these integrals involves truncation of an infinite series, followed by the application of a variety of recursion relations. As a consequence the value of the final integrals reflect both a series truncation error, and a rounding error which can vary according to the recursive path taken. This project will study the balance between these two sources of errors and the implication of this in large scale quantum calculations.

    This project will build on work undertaken in 2005 by honours student Josh Milthorpe who studied the basic performance of interval operations and use of intervals in fast multipole calculations. You will also be part of a team working on interval related methods funded from an Australian Research Council Discovery grant.


    Automatic Differentiation

    Supervisor: Alistair Rendell
    E-Mail: Alistair.Rendell@anu.edu.au

    Every function, no matter how complicated, is executed on a computer as a series of elementary operations such as additions, multiplications, divisions and elementary functions like exp(). By repeated application of the chain rule to those elementary operations it is possible to obtain the derivative of any computed function in a completely mechanical fashion. This technique is applicable to computer programs of arbitrary length containing loops, branches and procedures. In contrast to explicit evaluation and coding of a derivative, automatic differentiation enable derivatives to be easily updated when the original code is changed.

    A number of programs are now available to perform automatic differentiation for code written in a variety of languages The aim of this project would be to use one of these tools to perform derivative evaluations for some computational science applications. Specifically in the modeling of systems at the atomic level derivatives with respect to atomic positions are hugely important determining things like the frequencies at which molecules vibrate. Myself and others have spent large quantities of time and effort deriving expressions for derivatives and then coding them. To automate this process and for it to be relatively efficient would be very significant. Your objective would be to explore this issue for some relatively simple computational molecular science codes.


    Numerical Simulation on Graphics Processing Units (GPU)

    Supervisor: Dr Alistair Rendell
    E-Mail: Alistair.Rendell@anu.edu.au

    Background

    Over the past 10 years graphics processing units (GPU) have become ubiquitous in desktop computers. Modern GPUs now allow substantial user programability that lets them be used for more general-pupose computation. Indeed in a recent paper that was presented at SC04, Fan et al detail their efforts to construct a "GPU Cluster for High Performance Computing". In this work the authors plugged in 32 GPUs to a Xeon cluster system and in so doing increased the total theoretical peak performance of the machine by 512Gflops - for a cost just under US$13,000!!

    Tracking this development there has been a number of groups developing programming environments that can exploit the GPU for general computations. Details of many of these are available at the GPGPU web site. Two of particular interest to this project are Brook and Sh. The former is built on top of C while the latter is built on top of C++. Brook for example extends standard ANSI C with the concept of a "stream", which is a new data type representing data that can be operated on in parallel.

    The aim of this project is to investigate the potential for using the GPU in molecular science computations. The initial target application is likely to be N-body simulations, as typified by the simulation of atom motions. Careful consideration will be given to accuracy, since currently the GPU is limited to single precision computations. Thus the project will address two issues, i) the in principle feasibility of using the GPU for these types of problems, and ii) the scientific usefulness given the current single precision limitation.


    MPI and Global File System I/O over InfiniBand using Virtual Lanes

    Supervisors: Richard Alexander, Alistair Rendell, Peter Strazdins
    E-Mail: Richard@AlexanderTechnology.com, Alistair.Rendell@anu.edu.au, Peter.Strazdins@cs.anu.edu.au

    InfiniBand networks provide for multiple, parallel system area networks using the same physical hardware, using a concept called Virtual Lanes. Virtual Lanes allow the network to be split for a variety of purposes including the separation of HPC cluster management, message passing and I/O traffic. There are a number of parallel filesystems used in heterogeneous Linux clusters, with the most widely used being PVFS2, GPFS, Lustre in addition to the legacy NFS system. One possibility with large SMP nodes in a cluster is to use InfiniBand virtual lanes and operating system virtual machines in a way that allows nodes of a cluster to be used as both compute nodes and filesystem storage nodes. This project will implement the Lustre filesystem on a small test cluster with large SMP nodes and the InfiniBand interconnect, using virtual machines in a novel way to segregate compute functions from I/O.

    References:

    • J Liu, A Vishnu, DK Panda, Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation. Proceedings Supercomputing 2004.
    • J Cope, M Oberg, H Tufo and M Woitaszek, Linux Multi-Cluster Environments.

    Declarative theorem-proving

    Supervisor: Dr Michael Norrish
    Email: michael.norrish@nicta.com.au

    This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities.

    When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find.

    The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well.

    Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up.

    The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data.

     

    Finding Models of Unsolvable First Order Goals

    Supervisors: Dr Michael Norrish
    Email: michael.norrish@nicta.com.au

    When using an interactive proof assistant, a powerful tactic for finishing goals is the use of first order proof procedures. Unfortunately, these procedures are all-or-nothing affairs: if they succeed, they succeed completely, but if they fail to prove the goal, the user knows only that the provided lemmas and assumptions were not sufficient to allow the procedure to find a proof. The advent of model generation tools promises a way out of this painful situation. These tools can be used to generate what is effectively a counter-example to false goals. This is much more helpful for the user. Either the tool's output really is a counter-example and their proof will never succeed, or the counter-example will be impossible because the user has forgotten to tell the tools some relevant lemma from a database of known facts. The student's task in this project would be to implement a connection between two existing tools: the interactive system HOL4, and the model generation tool Darwin. This connection would provide a user-friendly implementation of the scheme described above: when an invocation of the first order prover in HOL failed, the connection implementation would translate the HOL goal into terms that Darwin could understand, call Darwin, and then finally translate Darwin's provided counter-example back into a human-readable form for the user's benefit. This project would suit a student familiar with functional programming (HOL is written in SML), and happy with the concepts of first order logic (terms, formulas, quantifiers). The student would be jointly supervised by Michael Norrish and Peter Baumgartner.


    Projects from NICTA researchers

    Project I

    Project Title: Knowledge Discovery from Internet Malicious Data Collected by Honeypots

    Supervisor: Dr Warren Jin

    Keywords: Data Mining, Internet security, intrusion

    The Internet security has been becoming more and more important as Cyber attacks have significantly increased in volume, coordination and sophistication due to the spread of worms and botnets. A botnet is a large collection of compromised machines under a common Command-and-Control infrastructure, typically used for nefarious purposes. A honeypot is a computer trap specifically designed to attract and detect malicious attacks, in particular botnet attacks. For example, the Leurre.com and the SGNet honeypots are distributed over 30 countries and they have collected hundreds of gigabytes of Internet malicious traffics. Warren is the NICTA’s representative for these honeypot projects.

     

    This Honours project, as a part of DSTO-NICTA joint honeypot project, is to improve or develop techniques for discovering actionable knowledge from Internet malicious data collected by the Leurre.com and the SGNet honeypots. The project may concentrate on following topics, each of which could be developed into a quality Honours thesis.

    • detecting emerging viruses or botnets before they become prevalent,
    • discovering hot spots of Internet security,
    • discovering attack trends, especially botnets,
    • modelling all the Internet attacks based on attack samples collected,
    • generating attack signatures,

    The new technique would be very useful for proactive security, such as Internet intrusion prevention.

     

    Participants will have the opportunity to get hands-on experience with some of the latest technologies. The participants will access large real-world databases collected by honeypots. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the honeypot project.

     

    Student participant should either have excellent programming skills in Java, C++, or Python, or have strong mathematical background. Experience of networking or Internet security or database is useful. Knowledge of data mining, machine learning or statistics is an advantage, but not necessary.

     

    Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information.



    Project II

    Project Title: Large Complicated Data Stream Mining

    Supervisor: Dr Warren Jin

    Keywords: Data stream mining, exploratory analysis

    Owing to the rapid progress on data collection and advanced database system technologies and World Wide Web, various complicated data have been explosively growing. There are lot of examples of data streams. For example, a satellite-mounted remote sensor is generating data constantly. World-wide distributed honeypots are collecting malicious Internet traffic data every second in order to monitor the spread of worms and botnets. These data streams are massive (e.g., the Leurre.com and the SGNet honeypots are collecting hundreds of megabytes data over 30 countries every day), temporally ordered, rapidly evolving, various types of attributes (e.g., continuous, categorical, string) and potentially infinite. Mining complicated data streams is one of very important real-world data mining tasks.

     

    This Honours project is to develop or improve a data mining technique on complicated data streams. The data mining technique could be, but not limited to, data visualisation, temporal association analysis, clustering, time series analysis, outlier detection and classification. The technique can be used to find interesting events or hot spots within data streams. The project will consider mixed data attributes such as continuous (e.g., packet length), categorical (e.g., TCP or not) and sequential (e.g., payload strings) attributes. The project may also take into account of scalability for large amount of streaming data.

     

    Participants will have the opportunity to get hands-on experience on up-to-dated stream mining techniques. The participants may be engaged in using these techniques to handle real-world challenges. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the use-inspired research.

     

    Interested students should have good understanding of data mining, visualisation or time series analysis. Programming skills in C++, MatLab or R are useful. Experience of databases or data analysis or distributed computing is a plus.

     

    Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information.



    Project III

    Project Title: Population Projection for New Canberra Suburbs

    Supervisors: Dr Warren Jin and Dr Yongxiang Xia

    Keywords: Data mining, time series analysis, prediction

    Population projection is to compute, given certain assumptions about future trends, future changes in population sizes for specific areas, say, states or cities. Population projection is valuable. For example, population sizes are a prerequisite for ensuring an adequate and appropriate supply of land for sustainable city development, as the development is supported by the timely delivery of commercial land, dwelling land, infrastructure and community facilities. Collaborating with the ACT Planning and Land Authority (ACTPLA), NICTA researchers have developed a stratified hidden Markov model to predict population sizes for existing suburbs. It performs quite well in terms of data independency, output flexibility and projection accuracy.?

    This Honours project aims to extend these population projection techniques for new suburbs. Research challenges include small area, lack of training data and scientific projection evaluation. Besides population sizes, the techniques may take into account of other economic, social, environmental and political factors. The participant will learn to solve real-world problems using data mining techniques, such as temporal classification or clustering. He/she will also learn to handle uncertainty for decision making. He/she will experience how a novel technique is developed within an industrial research organisation like NICTA.?

    The project requires good modelling 
    or programming skills. Some familiarity with elementary statistics, 
    time series or data mining is useful. Supervision will be provided from 
    Dr. Warren Jin and Dr Yongxiang Xia. Successful applicant might receive stipend for living expenses up to $1000 depending 
    on the project progress.
    
    

     

    Contact: Warren Jin (Huidong.Jin@nicta.com.au) or Yongxiang Xia (Yongxiang.Xia@nicta.com.au)


     

  • Updated:  18 February 2011 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.