![]() |
ANU College of Engineering and Computer Science
Department of Computer Science
|
|
|
Honours topicsHalf of your time as an honours student is spent working on a project. But first you have to find a project topic. You don't have to be constrained by what you see on the list below. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Proposed Project TopicsPlease note that some of the projects below are very specific in nature. However, you may be able to discus minor changes to the proposal with the supervisor. Other proposals are, in fact, too broad in scope to describe an individual honours project. You should take these descriptions as an indication of the areas of interest of the supervisor. If you are interested in the area then you can discuss potential specific projects with the supervisor. If the projects proposed for this year don't inspire you , check out the projects proposed in previous years as well:
In no particular order, here are the topics proposed so far for 2008:
Data mining projects
Why do students fail? Data mining the DCS student database (Computer Science honours project)The ANU Department of Computer Science has been collecting student course enrolments, marks and grades through its FAculty Information System (FAIS) for several years. The aim of this honours project is to develop data mining techniques that allow analysis of the FAIS database with the objective to better understand student performance, progress and retention. The questions that the department is interested in include, for example: why students stop studying computer science?, what correlations there are between their marks in different courses?, can we predict that a student will have problems in a certain course based on her or his past performance?, can we identify students that might be at risk of failing or dropping out of computer sciences courses early in their studies? This project will include the following components:
Supervisor: Peter Christen
Data linkage projectsBackground Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), taxation, statistics, police/intelligence and telecommunications. Data mining techniques are used to analyse such large data sets, in order to find patterns and rules, or to detect outliers. In many data mining projects, information from multiple data sources needs to be integrated, combined or linked in order to allow more detailed analysis. The aim of such data linkages is to merge all records relating to the same entity, such as a customer or patient. Most of the time the linkage process is challenged by the lack of a common unique identifier, and thus becomes non-trivial. Probabilistic data linkage techniques have to be applied using personal identifiers like names, addresses, dates of birth, etc. The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the development of improved techniques and algorithms for large scale high-performance data linkage. We have been developing an open source data linkage system called Febrl (Freely extensible biomedical record linkage), which includes modules for data cleaning and standardisation (of names, addresses, dates of birth, etc.), deduplication, data linkage (or matching), and geocoding. For more information please have a look at our publications as well as presentations that describe our research in more details. The following student projects address important issues within our data linkage project. Supervisor for all projects: Peter Christen
Probabilistic data generation (Computer Science honours project)One of the difficulties in data linkage research is getting access to real world data sets containing real names and addresses, as such data is protected by privacy and confidentiality regulations. Developing, and testing new and improved data linkage algorithms therefore becomes very difficult. We have developed a data set generator (which is part of Febrl). However, this generator is very simple, and various ways of improving it are possible. For more details please have a look at the recent publication Probabilistic Data Generation for Deduplication and Data Linkage. Student research project The aim of this honours research project is to further explore how our simple data set generator can be improved. This would include analysing the literature in this area (artificially creating data sets), analysing the existing data generator code, and developing improved techniques that reflect reality more accurately. Issues involved are:
A student working on this project would be involved through
Parallelisation of data linkage techniques (Computer Science honours project)One of the problems when linking large collections of data with tens or even hundreds of millions of records is the complexity of data linkage techniques. Potentially each record in one data set has to be compared with every record in the second data set, resulting in a quadratic complexity O(n2). Techniques known as blocking are normally used to reduce the number of record pair comparisons (for example, only records that have the same postcode value are compared with each other). However, linking large data sets still takes a lot of time. Only a very limited amount of research has been done in this area (we are only aware of four publications so far the deal with parallel data linkage, including one of our own papers: Febrl - A Parallel Open Source Data Linkage System, PAKDD'04). Student research project The aim of this project is to develop parallel data linkage techniques that allow the linking of large data sets on modern multi-core computing platforms. Issues which have to be considered include:
Improved record pair classification techniques (Computer Science honours project)A crucial step in the data linkage process is the classification of pairs of records that are being compared into either matches or non-matches. For example, consider the following two records containing a name and address each:
Are these two records representing the same person, or two different people? In the past few years researchers from the data mining, machine learning, and AI communities have explored various classification techniques, including decision trees, genetic algorithms, clustering, active learning, support vector machines, and instance-based classifier to improve the classification of record pairs. Other researchers have applied techniques from information retrieval and database research to improve the linkage classification quality. However, this research has been fragmented and so far no large study comparing the many classifiers has been conducted using different data sets (both synthetic and from the real world). Student research project The aim of this project is to (1) implement prototypes of different classifiers and then conduct studies comparing their performance using different data sets, and to (2) develop new - based on existing methods - classifiers that are well suited for the data linkage classification problem. Issues involved are:
Semantic Web projects
User Consumable Interface for the Semantic WebThe vision of the Semantic Web is to equip the World Wide Web with machine-processible and machine-understandable semantics, so that the Web can becomes a universal medium of information and knowledge exchange for both computer and human being. Towards this vision, a number of building blocks of the semantic web have been created, including the Resource Description Framework (RDF), the Web Ontology Language (OWL) and several query languages such as SPARQL. It can be foreseen that in the near future a lot of information on the Web will be described or annotated using these facilities, resulting in a large scaled knowledge base. Creating a consumable interface for naive users to access such a large knowledge base is becoming a new challenge, as traditional interfaces on databases or knowledge bases are programmer-oriented rather than user-oriented. Goals for this project: In this project, we will design and implement a number of techniques to facilitate users' accesses to large scaled data sources on the semantic web. Requirements: Students will need to be proficient systems programmers with good C++ or Java skills. Student's gain: The student will work on an interesting research problem with potential impact on the future World Wide Web. The work will be carried in a research group of CSIRO. The student will get an insight of the R&D in both industry and academia. The results will include a working prototype that can be demonstrated in an international conference and a publication in an academic journal. Literature and Background: Database usability; DBpedia -- a dataset for the semantic web. Supervisor: Dr. Jemma Wu (jemma DOT wu AT csiro DOT au), Dr. Xuan Zhou (xuan DOT zhou AT csiro DOT au)
Semantics in Sensor NetworksBackground: Despite existing different domain-specific deployments and the heterogeneity of sensor networks and sensing data, sensor networks share a common feature: they all collect and periodically transmit information from some set of sensors, and they all must carefully manage limited power and radio bandwidth to ensure that essential information is collected and reported in a timely fashion. From a data storage point of view, one may think of a sensor network as a distributed database that is able to conduct query processing. This has led to the design and implementation of query processing a very popular topic. However, query processing in sensor networks works different from that in a traditional DBMS given that the former is resource constrained (e.g. limited energy, memory and computation ability). Furthermore, the volume of the sensing date can be very variable (e.g. different sampling rate) within a time period and the access to this data may be delayed arbitrarily due to the motion of sensors and the network latency. These add new challenges to the not yet fully solved query processing problem and more needs to be done to enable the success of sensor networks applications.Project outcomes: In this project, we will investigate what the ³semantics´ can help to improve query processing - other than just conventional development of a schema for metadata about (semantic) transformation. Prototypes will be based on a real world problem. The Fleck platform (*Fleck* - the sensors and sensor operating system developed by CSIRO ICT Centre) is available for prototyping work. The outcome will include a working prototype for the future work to be built on. Requirements: Students will need to be familiar with C programming language. Student's gain: Student will be working on one of the most interesting sensor networks issues with other research scientists from CSIRO. At the end of this project, students will gain insights into R&D from both industry and academia and are ready to explore other issues arising from sensor network applications. Supervisor: Dr. Lily Li Email: lily.li@csiro.au Using semantics to better understand the WebBackground: The Semantic Web means different things to different people because it is very broad. It may mean:
Project outcomes: A number of algorithms will be developed, and eventually leading to easy access to the information on the Web other than just key words search. Requirements: Students will need to have knowledge/skills about (1) web services technologies; (2) good understanding of RDF, OWL and reasoning on OWL; and (3) C++/Java. Student's gain: By keeping up with the latest developments in the Semantic Web, students will not only gain insights into the vision of the Semantic Web but also have some hand-on experience about how to realise the semantics on the Web. Algorithms developed in this project may have potential to be used by popular search engines to deal with unstructured information on the Web more efficiently. Supervisor: Dr. Lily Li Email: lily.li@csiro.au Scalable Work Queues for Parallel Garbage CollectionPhysical limits mean that future performance improvements in processor design will mostly come from increased application-level parallelism. Thus the ability for systems to scale well on parallel architectures is an extremely important issue. Since managed languages such as Java and C# have become dominant in the application development domain, the scalability of managed runtimes has become a critical research issue. In this project we explore the scalability of a fundamental data structure at the heart of a Java runtime system---a load balancing parallel work queue. Supervisor: A. Prof Steve Blackburn, ANUGoals for this projectThis project will examine the scalability and pathologies of the load balancing parallel work queue used by MMTk, which is the memory management toolkit used by Jikes RVM. Hopefully the project will either confirm the scalability of the existing system or identify (and implement) a more scalable work queue design.
Requirements Students will need to be proficient programmers with a good basic understanding of computer architecture and parallel algorithms. The work will all be done in a slightly restrictive subset of Java, so experience with Java is essential. Experimental design and data analysis is key to the first phase of the project, so experience in this area will be an asset (of course there will be plenty of assistance from your supervisor, particularly in this area).Student's gain The student will gain hands-on experience with an increasingly important problem in the context of the leading research infrastructure for managed runtime research, in the setting of a world leading team.
Literature This is a well-focussed problem and does not require any significant background reading. The load balancing work queue is reasonably well insulated from the rest of the environment we work in, so a deep understanding of MMTk or Jikes RVM is not necessary. www.jikesrvm.org Contact Steve BlackburnArchitectural Analysis of Garbage Collection BehaviorTrends in microprocessor design are toward massively multicore chips. One direction within this burgeoning research area is that of asymmetric multicore design, where the cores within a chip are not homogeneous. In this setting some cores may serve very specific purposes. Given a) the centrality of garbage collection to managed runtimes, and b) the very limited requirements of garbage collection algorithms, it may make sense to have dedicated garbage collection cores. To this end, this project will explore the architectural "footprint" of a garbage collector with a view to ascertaining the base requirements for a dedicated garbage collecting core. Supervisor: A. Prof Steve Blackburn, ANUGoals for this project The project will use an architectural profiling tool (valgrind) to profile a range of garbage collection workloads and thus attempt to characterize the architectural requirments of garbage collectors. Requirements Students will need to be proficient programmers with a good basic understanding of computer architecture. The primary tool for this project will be valgrind, a publicly available virtualization tool written in C. The student working on this project should be an accomplished systems programmer. Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes, at the architectural level. This project is a logical starting place for a much deeper examination of the project and eventually, realization in an FPGA. LiteratureA basic understanding of garbage collection would be helpful, but is not essential. Henessy and Patterson's standard text on computer architecture ("Computer Architecture: A Quantitative Approach", 4th edition) is an invaluable starting point for this work. Contact Steve Blackburn Profiling Java Virtual Machine PerformanceThe performance of managed runtime systems and the applications which execute on them is a hot topic of academic and industrial research. However, dynamic compilation, feedback directed optimization, and concurrency each make performance analysis significantly more complex than in traditional ahead-of-time compiled, static environments such as C programs. This project will explore techniques for profiling the performance of virtual machines and applications running over them. These techniques will include sampling via interrupts (due to timers and/or hardware performance counter triggers) and direct instrumentation of compiled code. Additionally, the project will explore dynamic generation and maintenance of code annotations so that machine code and calling contexts can be tied back to source code in the application or VM source code.
Supervisor:A. Prof Steve Blackburn, ANU Requirements Students will need to be proficient systems programmers with a good C and Java skills. Student's gain The student will work on a problem of great industrial and academic importance and will gain a solid insight into the behavior of managed runtimes. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes. Literature and Background Familiarity with a simple profiling tool such as gprof would be an asset. Some interesting related work can be found at the following links: http://portal.acm.org/citation.cfm?id=1134012 http://www.cs.utexas.edu/users/mikebond/pcc-tr-2007.pdf Contact Steve Blackburn Bubble VM: the VM within the VM within the VM....Surprisingly, and despite the stated advantages of managed languages (such as Java and C#), no commercial managed runtime systems are written in the language they implement. However, there are two successful research runtimes which take this approach, Jikes RVM (an open source Java-in-Java VM), and Bartok (an ahead-of-time C# system on which the Singularity OS project is built). Both of these projects have demonstrated considerable promise, and both have demonstrated that performance need not be a roadblock to such VM implementations. One of the most vexing issues faced by such VMs is the potential for blurring the distinction between the application and the host on which it runs. In this project we will explore how to better understand the boundary between the application and the supporting VM, using a rough analogy to the isolation that exists between user mode and kernel mode in a modern operating system. A good understanding to this problem could fundamentally change the design of such JVMs, increasing their reliability, improving their debugability, and improving security. Such a VM could run on any other VM and could run any application---thus could be self hosting and in principle could host an arbitrary nesting of instances of itself within itself!Supervisor: A. Prof Steve Blackburn, ANU Goals for this project This project will start by trying to clearly identify the application-VM boundary in Jikes RVM, a widely used research JVM. The project will try to apply the OS metaphor of trapping from one context or mode to another. Depending on the student's ability and ambitions, the project could lead to much more ambitious goals such as enabling Jikes RVM to self host.
Requirements Student's gain The student will gain a deep insight into the design and implementation of managed runtime systems. The work will be carried out in a world-leading research group and will be conducted within the leading academic research infrastructure for managed runtimes.
Literature and Background Projects offered by Dr Eric. McCreathInstance Based MethodsSupervisors: Dr Peter Baumgartner, Prof John Slaney E-Mail: Peter.Baumgartner AT nicta.com.au Formal logic provides a mathematical foundation for many areas of computer science. Logical languages are used as specification language within, e.g., program development and verification, hardware design and verification, relational databases, and many subfields of Artificial Intelligence. Automated Deduction is concerned with the design and implementation of algorithms based on logical deduction for solving problems in these areas. Instance Based Methods are a relatively new family of methods for Automated Deduction; they share the principle of carrying out proof search by maintaining a set of instances of input formulas and analyzing it for satisfiability until completion. I offer several research-oriented honours projects centered around Instance Based Methods, in particular in connection (experimentation/extension) with our own Darwin system (http://goedel.cs.uiowa.edu/Darwin/). A circuit-based algorithm for solving quantified Boolean formulaeSupervisor: Dr Jussi Rintanen E-Mail: rintanen-AT-informatik.uni-freiburg.de Dr Rintanen will join NICTA's Knowledge Representation and Reasoning Program (KRR) in January 2006. He is currently with the University of Freiburg, Germany. The local contact for this project before Dr Rintanen's arrival is Dr Sylvie Thiébaux. Background Following the success of satisfiability algorithms in solving a wide range of practically important problems in computer aided verification, artificial intelligence and other areas, the idea of using algorithms for quantified Boolean formulae (QBF) for similar but more general problems has become increasingly attractive. One of the problems present in the current QBF algorithms is that they are defined only for QBF that have their matrices in conjunctive normal form (CNF), and the most natural representations of many problems in QBF involve arbitrary Boolean formulae or circuits and do not have simple translations to CNF.Honours project The goal of the research project is to develop a proof system for QBF that are represented as arbitrary Boolean circuits, not restricted to CNF, and to develop an efficient QBF algorithm based on the proof system. In comparison to circuit-based proof systems for the classical propositional logic, this involves identifying proof rules for quantifiers and universally and existentially quantified variables, as well as rules for efficient scoping of variables. The QBF algorithm will be compared to CNF-based QBF algorithms which are used in connection with translations of circuit-QBF into QBF in CNF. T-anu.edu.auCompact representation of schematic operators in planning as satisfiabilitySupervisor: Dr Jussi Rintanen E-Mail: rintanen-AT-informatik.uni-freiburg.de Dr Rintanen will join NICTA's Knowledge Representation and Reasoning Program (KRR) in January 2006. He is currently with the University of Freiburg, Germany. The local contact for this project before Dr Rintanen's arrival is Dr Sylvie Thiébaux. Background Translation into the classical propositional logic has been proposed as a powerful and general solution technique for classical AI planning. The actions of many planning problems are best represented in terms of schemata that are instantiated with given objects to obtain the actual actions. For example, a generic schema for the movement of vehicles between locations is IN(V,X) => -IN(V,X), IN(V,Y) in which V can be instantiated with any vehicle and X and Y with any locations. Kautz and Selman [1996] and Ernst et al. [IJCAI'1997] have considered compact representations of schematic actions in the propositional logic. However, these works restrict to sequential plans, and there is a lot of potential to improve the efficiency of planning further by considering some notion of parallel plans which allows the execution of several actions simultaneously.Honours project The goal of the research project is to develop compact representations of schematic actions for parallel plans in the propositional logic. If the regularities in the actions described by an action schema are recognized, a more compact representation of the planning problem is possible, potentially leading to much more efficient planning. The work includes extending the ideas presented by Kautz and Selman and Ernst et al. to parallel plans as well as developing new techniques for compact action representation, and then implementing these techniques as a part of a planning system.Supercom
Supervisor:
Dr Sylvie Thiébaux,
Dr Anbulagan,
Dr Olivier Buffet,
Dr Jinbo Huang,
Dr Yannick Pencolé
Background Composite systems feature simple components organised into a highly reconfigurable and modular architecture, due to which they can appropriately assemble to achieve complex operations. Examples include web and grid services, energy distribution systems, water management systems, chemical plant control systems, telecommunication and computer networks, to name but a few. In the "Supercom" project, we are interested in building Artificial Intelligence tools to supervise such composite systems. These will need to confer the system the ability to self-monitor (diagnose) and self-organise (assemble, reconfigure) to achieve optimal performance. In this framework, various projects are available on topics such as decentralised diagnosis [1], decentralised planning [2] or diagnosis using satisfiability and knowledge compilation methods [3]. The main supervisor will depend on the project, but interactions with other researcher should appear naturally. A strong background in computer science, software engineering, or mathematics, and an interest in artificial intelligence are required, as well as good programming skills. References:
Word Processing Structure AnalyserSupervisors: Dr Ian Barnes and Dr Peter Raftos Background An important part of current research being done into the long- term preservation and accessibility of word processor (WP) documents is the conversion of (primarily) Word and OpenOffice files to some structured XML (usually DocBook-XML). This work assumes that the WP documents have already been structured in a systematic manner using the WP software's inbuilt style functions. However, most users do not use styles; generally, they are not even aware they exist. And even if every user were to start using them from today, there remains a legacy of unstyled (and therefore unstructured) WP documents. Most users rely on the look of document content to deduce structure: reasonably arbitrary choices of font size and weighting. It would be very useful if algorithms could be developed that might analyse (or \u201cguess\u201d) at the structure of a WP document. This should then lead to the development of a working tool to perform the structuration of unstructured WP documents. The Work Using a series of unstructured trial documents of varying complexity develop:
There may be a scholarship attached to this project. A compiler for program synthesis
Supervisors: Dr Michael Comton, Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore, Dr Alwen Tiu (RSISE, ANU) Background This project is about writing a compiler for automatic synthesis of 'programs' from specifications. Given the specifications for services S1,...,Sn and the specification of goal G the compiler will generate a 'program' satisfying G by using the services S1,...,Sn. This compiler could be targeted to a number of possible settings. For example, the specifications could describe web services and the goal in this case is automatic generation of composite web services from other web services. Alternitively, the specifications could describe components of a security system and the goal some larger securty property thus the compiler automatically generates the larger security framework. The Information Engineering Lab at CSIRO's ICT centre already has such a compiler for web services [1], in which a logic mapping language relates resources, such as data and web-services, to relevant concepts and properties [2], and the composition compiler automatically translates a high-level declarative composition goal into an executable workflow responsible for orchestrating the data and web-service calls necessary to materialize the specified data product. But, we are currently restricted to composing over services that have stateless, functional and deterministic operations. We would like to extend this composition environment to work with web-services that have a stateful interaction style (i.e. the operation/message that the service can perform depends on the current state of the service and the service state machine). A published method for web Service composition [3,4,5] does so, by using a modal logic and tableaux theorem proving. But it concentrates on message flow, not data. We would like to integrate the two. The project would involve investigating how to integrate both data and state into the specification of services and a composition goal. The outcome would be a logic for describing the specification of each and a compiler that can automatically generate a 'program' satisfying the goal by using the provided services. The project would suit a student with an interest in logic and specification. Refrences [1] Cameron, M. A., Taylor, K. L., Baxter, R. Web-Service Composition and Record Linking VLDB Workshop on Information Integration on the Web (IIWeb-2004) Toronto, Canada, August 2004. URL: http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf [2] Cameron, M.A. and Taylor, K. (2005), First order patterns for information integration, Proceedings of the 5th International Conference on Web Engineering (ICWE2005), 25-29 July, Sydney, Australia. URL: http://dx.doi.org/10.1007/11531371_25 [3] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), Synthesis of Underspecified Composite e-Services based on Automated Reasoning in Proc. of the 2nd Int. Conf. on Service Oriented Computing (ICSOC 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html [4] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Massimo Mecella (2004), ESC : A Tool for Automatic Composition of e-Services based on Logics of Programs in Proc. of the 5th VLDB Workshop on Technologies for E-Services (TES 2004). URL: http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html [5] Daniela Berardi, Diego Calvanese, Giuseppe De Giacomo, Rick Hull, and Massimo Mecella (2005), Automatic Composition of Transition-based Semantic Web Services with Messaging in Proc. of the 31st Int. Conf. on Very Large Data Bases (VLDB 2005). URL: http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html Tableaux reasoning for web service composition
Supervisors: Dr Mark Cameron (CSIRO) A/Prof Rajeev Gore (RSISE, ANU) Background The Information Engineering Lab has a logic-based service composition framework that is being applied across a number of domains, including Health Data Integration (Cameron et al) http://cips.eas.asu.edu/iiwebfinalproceedings/31.pdf and environmental management http://www.iemss.org/iemss2002/proceedings/pdf/volume%20uno/304_cameron. pdf . A published method for web Service composition (Berardi et al 2004) http://www.inf.unibz.it/~calvanese/papers-html/ICSOC-2004.html http://www.inf.unibz.it/~calvanese/papers-html/TES-2004.html (Berardi et al 2005) http://www.inf.unibz.it/~calvanese/papers-html/VLDB-2005.html relies on a translation of finite-state machine representations of a composition goal and stateful Web services into a modal/description logic, PDL. Tableaux theorem proving can then be applied to find a web service composition that statisfies the goal. This project would aim to replicate and implement that basic approach, but considering alternative logics to improve the scope of the compositions that may be achieved. Eye-gaze tracking as an interface for virtual environments
Supervisors: Chris Gunn (CSIRO) Henry Gardner (DCS ANU) Background This project will seek to characterise the effectiveness of eye-gaze tracking as a human-computer interface for virtual environments. The concept is that participants should be able to interact with projected images using a combination of keyboard controls and eye gaze. By focussing their visual attention on icons or images on the screen, users should be able to select these icons or activate other interaction. Two application areas of interest are the Mind Attention Interface for the Wedge virtual reality theatre and a something-or-other system in which participants stand in front of a half-spherical shell screen. In the Mind Attention Interface, eye-gaze tracking equipment will be integrated with electroencephalogram (EEG) equipment which measures electric signals from the brain of a participant. Correlating eye-gaze measurements of attention with EEG data would be a possible extension of this project. Projects offered by Peter Strazdins
Energy-Efficient Data Gathering in Wireless Sensor Networks
Supervisor: Dr Weifa Liang Background Wireless sensor networks have received significant recent attention due to their potential applications from civil to military domains including military and civilian surveillance, fine-grain monitoring of natural habitats, measuring variations in local salinity levels in riparian environment, etc. The sensors in sensor networks periodically collect the sensed data as they monitor their vicinities for a specific interest. The sensed data can be transmitted directly to the base station where the client queries are posted, the base station then processes the collected data centrally and responds to users queries. However, due to the fact that each sensor is equipped with energy-limited battery, the energy efficiency rather than the response time for each query is paramount of importance in sensor networks. The energy consumption for the above centralized processing is very expensive. Instead, we adopt energy-efficient in-network processing, i.e., the data gathering is implemented through a tree rooted at the base station, and the data at each sensor has to be filtered or combined locally before it is transmitted to its parent. Unlike most existing assumptions that each sensor only transmits a fixed length of message, we consider another data gathering, in which the length of the message that each sensor transmits to its parent is proportional to the sum of the message lengths of its children and itself. Thus, the total energy consumption at a node is proportional to the length of the transmitted message and the distance between the node and its parent in the tree. Specifically, on one hand, we aim to find such an energy efficient spanning tree rooted at the sink node that the network lifetime is maximized. On the other hand, we prefer the approximate result instead of the exact result in traditional data models for a given query, since the communications among the sensors are fail-prone. Under this circumstance, how close the approximate result is to the exact result becomes the essential issue. In summary, this project will be in pursue of energy efficiency on the basis of database modeling in sensor network environments. A student working on this project would be involved:
Energy-Efficient Aggregate Node Placement for Data Gathering in Wireless Sensor Networks
Supervisor: Dr Weifa Liang Recent advances in microelectronic technology have made it possible to construct compact and inexpensive wireless sensors. Networks formed by such sensors, termed as wireless sensor networks, have been receiving significant attention due to their potential applications in environmental monitoring, surveillance, military operations and other domains. While these applications revealed tremendous potential for capturing important environment phenomenon using sensor networks, they have also posed certain associated limitations. One of the major limitations of sensor nodes that they are powered by low power batteries, which limit the network lifetime and impact on the quality of the network. Energy conversation in sensor network operations is thus of paramount importance. As most currently deployed sensor networks are used for monitoring and surveillance purposes, the sensed data generated by the sensor network must be available for users at the base station, data gathering that extracts and collects sensed data from sensors thus is one fundamental operation in sensor networks. The energy conservation on data gathering leads to prolongation of network lifetime. The study of energy-efficient data gathering poses great challenges due to the unique characteristics and physical constraints imposed on sensor networks. In this project we study the {\it aggregate node placement problem for data gathering}, which aims to place a few powerful aggregate nodes to a dense sensor network such that the network performance can be significantly improved. Assume that a heterogeneous sensor network consists of two types of sensor nodes: cheap sensor nodes that have fixed identical transmission range and expensive aggregate sensor nodes that have various transmission ranges. Aggregate nodes have more energies, higher data transmission rate, and better processing and storage capabilities than sensor nodes. The main task of an aggregate node is to process and relay data for sensor nodes and the other aggregate nodes. Due to expensive cost associated with aggregate nodes, the number of them in a sensor network is very limited, thus, they need to be placed in the network carefully in order to increase the network connectivity, reduce the data gathering delay, and prolong the network lifetime. Specifically, the aggregate node placement problem for data gathering is as follows. Given a sensor network consisting of $n$ dense sensor nodes that are randomly deployed in a region of interest and a few number of expensive aggregate nodes $K$ ($K<< n$), the problem is to place the $K$ aggregate nodes in the region so that the network lifetime can be further maximised by answering data gathering queries. Energy-Efficient Skyline Computation and Maintenance in Sensor Networks
Supervisor: Dr Weifa Liang Background Technological advances in recent years have enabled the deployment of large-scalable wireless sensor networks (WSNs) consisting of hundreds or thousands of inexpensive sensors in an ad-hoc fashion for a variety of environmental monitoring and surveillance purposes. In these applications, a large volume of continuous sensed data generated by sensors needs to be either collected at the base station or aggregated within the network. Thus, WSNs usually are treated as a virtual database. The skyline query, as an important operator in modern databases for multi-preference analysis and decision making, has received much attention recently due to its wide applications. In this project we consider the skyline query problem in WSNs. We aim to devise novel, distributed evaluation algorithms for skyline query evaluation from the scratch. We also consider the skyline maintenance within sliding window environments by developing distributed maintenance algorithms for it. We will conduct extensive experiments by simulation to evaluate the performance of the proposed algorithms in terms of multiple performance metrics including the total energy consumption, the maximum energy consumption among the network nodes, and the network lifetime. A student working on this project would be involved:
Neural networks theory and applications
Supervisor: Prof Tom Gedeon Background A number of projects are possible in this area. I am interested in a number of topics, such as extracting rules from neural networks, information retrieval using neural networks, data mining and feature selection, cascade neural network structures, hierarchical neural network structures, and neural network applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with neural networks is necessary. Most projects will use the very popular backpropagation neural network training algorithm. If you are interested in neural networks and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me. Example projects: i. Cascade neural network stuctures can be built automatically without making decisions as to the number of neurons required to solve a problem by adding single neurons. This project would investigate the use of larger chunks such as feature maps as cascade components, which would be useful for recognising images (including faces). ii. Rule extraction: Neural networks can learn complex tasks but face the problem that human beings do not trust them as they can not understand _why_ a particular decision is made. This project focuses on rule extraction for explanation.
Fuzzy logic theory and applications
Supervisor: Prof Tom Gedeon Background A number of projects are possible in this area. I am interested in a number of topics, such as automated construction of fuzzy rule bases from data, hierarchical fuzzy systems, fuzzy interpolation, information retrieval using fuzzy logic,, universal approximators, and fuzzy logic applications. I have published papers in all of these areas with former students so there is plenty of earlier work to build on. No previous experience with fuzzy logic is necessary. If you are interested in fuzzy logic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me. Example projects: i. Fuzzy document filtering: Web search engines return many documents which are not relevant to queries. Fuzzy techniques to enhance the results from search engines will be very useful. This project will investigate a number of fuzzy techniques for this purpose. ii. Combining uncertainty: Fuzzy logic is a technique which deals with uncertainty well. Sometimes data contains multiple kinds or sources of uncertainty. For example, a pedestrian could report that he saw someone he was quite sure was stealing something. He was not certain and his own reliability is another but different kind of uncertainty. This project is to develop some methods to combine different kinds of uncertainty in intelligence led investigation. (Data from a company specialising in this area will be available.) Bacterial evolutionary algorithms
Supervisor: Prof Tom Gedeon Background There are several optimisation methods inspired by natural processes, It has been shown that evolutionary algorithms are efficient tools for solving non-linear, multi-objective and constrained optimizations. The principle is a search for a population of solutions, where tuning is done using mechanisms similar to biological recombination. The operations the bacterial evolutionary algorithm were inspired by microbial evolution. Bacteria can transfer genes to other bacteria, so the rules are optimised in each bacterium.
Example projects: ii. Making pictures: If bacteria encoded the components of an abstract computer generated picture, an individual could identify nice and not nice images repeatedly to generate some 'art' which is tuned for their esthetic sense. iii. Scheduling: A previous student has worked on a GA for University exam scheduling. A comparison with bacterial algorithms would be an interesting project.
Face Recognition
Supervisor: Prof Tom Gedeon Background A number of projects are possible in this area. I am interested in a number of topics, such as image processing for face recognition, HCI research tool to collect facial images, biologically plausible architectures for face recognition, building and modifying computer models of real faces. No previous work on face recognition is necessary. For the image processing topic, some experience with computer graphics would be useful. If you are interested in face recognition or some similar topic and would like to see if you might want to do a project in this area, please e-mail me tom@cs.anu.edu.au or come and see me.
Example projects: ii. Video annotation: Techniques such as PCA can recognise the same face so long as the rotation/expression/illumination is not too different. This can be used to recognise one person throughout a segment of video. iii. 3-D Face model: An average face 3D model can be constructed from a single face image and animated via rotation, a simple face expression model, or illumination. This can then be used to recognise faces in multiple poses, expressions or lighting.
Data Cleaning Via Logic
Supervisor: A/Prof Rajeev Gore Background The Australian Bureau of Statistics stores its census information as a collection of tables with one row (record) for every individual with many columns like age, sex, marital status, current occupation etc. Such tables invariably contain errors, either introduced during data collection, or during data entry. For example, the census data may contain a record for a person named X who is aged 6, is married and in high school. Such data usually has an associated collection of extra conditions which capture important inter-relationships between the columns of the table. For example, the census table may have constraints like "the legal age of marriage is 16" and "a high-school student is aged between 11 and 18". According to these constraints, the entry for X is incorrect since the entry breaks both conditions, so the obvious question is to find a correction which is as close to the original as possible. The problem of finding an error in a data table is known as error-localisation, and the problem of changing an incorrect record so that it passes all the constraints is known as error-correction. The standard method for solving both of these problems is called the Fellegi-Holt method, but it is highly inefficient. Recently, Agnes Boskovitz, a phd student with the RSISE, has shown a deep connection between the Fellgi-Holt method and a well-known method from propositional logic called resolution. Efficient automated tools for performing resolution and associated logical deductions are readily available. The project is to implement the Fellegi-Holt method using these logical tools to gauge the efficacy of this logical approach to data cleaning. If successful, the prototype has huge potential in real world applications and will almost certainly lead to a publication in an international conference or journal. Feel free to email me if you are at all interested in this project. Interval Arithmetic for Computational ScienceSupervisor: Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au An interval is a continuum of real numbers bounded by its end-points. Interval arithmetic performs operations on intervals rather than using discrete floating point representations. As a consequence of this errors associated with numerical rounding are automatically propagated and accounted for. Moreover the mathematical properties of intervals can be exploited to develop entirely new algorithms. The concept of intervals is actually not new, but was considered by Moore in 1965. What is relatively new is the fact that some hardware vendors are now providing direct support for interval arithmetic within their compilers/hardware. In this project we wish to develop an interval code for evaluating the type of integrals that are used in quantum calculations on atoms and molecules. Evaluations of these integrals involves truncation of an infinite series, followed by the application of a variety of recursion relations. As a consequence the value of the final integrals reflect both a series truncation error, and a rounding error which can vary according to the recursive path taken. This project will study the balance between these two sources of errors and the implication of this in large scale quantum calculations. This project will build on work undertaken in 2005 by honours student Josh Milthorpe who studied the basic performance of interval operations and use of intervals in fast multipole calculations. You will also be part of a team working on interval related methods funded from an Australian Research Council Discovery grant. Automatic DifferentiationSupervisor: Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au Every function, no matter how complicated, is executed on a computer as a series of elementary operations such as additions, multiplications, divisions and elementary functions like exp(). By repeated application of the chain rule to those elementary operations it is possible to obtain the derivative of any computed function in a completely mechanical fashion. This technique is applicable to computer programs of arbitrary length containing loops, branches and procedures. In contrast to explicit evaluation and coding of a derivative, automatic differentiation enable derivatives to be easily updated when the original code is changed. A number of programs are now available to perform automatic differentiation for code written in a variety of languages The aim of this project would be to use one of these tools to perform derivative evaluations for some computational science applications. Specifically in the modeling of systems at the atomic level derivatives with respect to atomic positions are hugely important determining things like the frequencies at which molecules vibrate. Myself and others have spent large quantities of time and effort deriving expressions for derivatives and then coding them. To automate this process and for it to be relatively efficient would be very significant. Your objective would be to explore this issue for some relatively simple computational molecular science codes. Numerical Simulation on Graphics Processing Units (GPU)Supervisor: Dr Alistair RendellE-Mail: Alistair.Rendell@anu.edu.au Background Over the past 10 years graphics processing units (GPU) have become ubiquitous in desktop computers. Modern GPUs now allow substantial user programability that lets them be used for more general-pupose computation. Indeed in a recent paper that was presented at SC04, Fan et al detail their efforts to construct a "GPU Cluster for High Performance Computing". In this work the authors plugged in 32 GPUs to a Xeon cluster system and in so doing increased the total theoretical peak performance of the machine by 512Gflops - for a cost just under US$13,000!! Tracking this development there has been a number of groups developing programming environments that can exploit the GPU for general computations. Details of many of these are available at the GPGPU web site. Two of particular interest to this project are Brook and Sh. The former is built on top of C while the latter is built on top of C++. Brook for example extends standard ANSI C with the concept of a "stream", which is a new data type representing data that can be operated on in parallel. The aim of this project is to investigate the potential for using the GPU in molecular science computations. The initial target application is likely to be N-body simulations, as typified by the simulation of atom motions. Careful consideration will be given to accuracy, since currently the GPU is limited to single precision computations. Thus the project will address two issues, i) the in principle feasibility of using the GPU for these types of problems, and ii) the scientific usefulness given the current single precision limitation.
MPI and Global File System I/O over InfiniBand using Virtual Lanes
Supervisors: Richard Alexander,
Alistair Rendell,
Peter Strazdins InfiniBand networks provide for multiple, parallel system area networks using the same physical hardware, using a concept called Virtual Lanes. Virtual Lanes allow the network to be split for a variety of purposes including the separation of HPC cluster management, message passing and I/O traffic. There are a number of parallel filesystems used in heterogeneous Linux clusters, with the most widely used being PVFS2, GPFS, Lustre in addition to the legacy NFS system. One possibility with large SMP nodes in a cluster is to use InfiniBand virtual lanes and operating system virtual machines in a way that allows nodes of a cluster to be used as both compute nodes and filesystem storage nodes. This project will implement the Lustre filesystem on a small test cluster with large SMP nodes and the InfiniBand interconnect, using virtual machines in a novel way to segregate compute functions from I/O. References:
Declarative theorem-proving
Supervisor: Dr Michael Norrish This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities. When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find. The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well. Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up. The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data. Finding Models of Unsolvable First Order Goals
Supervisors: Dr Michael Norrish When using an interactive proof assistant, a powerful tactic for finishing goals is the use of first order proof procedures. Unfortunately, these procedures are all-or-nothing affairs: if they succeed, they succeed completely, but if they fail to prove the goal, the user knows only that the provided lemmas and assumptions were not sufficient to allow the procedure to find a proof. The advent of model generation tools promises a way out of this painful situation. These tools can be used to generate what is effectively a counter-example to false goals. This is much more helpful for the user. Either the tool's output really is a counter-example and their proof will never succeed, or the counter-example will be impossible because the user has forgotten to tell the tools some relevant lemma from a database of known facts. The student's task in this project would be to implement a connection between two existing tools: the interactive system HOL4, and the model generation tool Darwin. This connection would provide a user-friendly implementation of the scheme described above: when an invocation of the first order prover in HOL failed, the connection implementation would translate the HOL goal into terms that Darwin could understand, call Darwin, and then finally translate Darwin's provided counter-example back into a human-readable form for the user's benefit. This project would suit a student familiar with functional programming (HOL is written in SML), and happy with the concepts of first order logic (terms, formulas, quantifiers). The student would be jointly supervised by Michael Norrish and Peter Baumgartner. Projects from NICTA researchersProject I
Project Title: Knowledge
Discovery from Internet Malicious Data Collected by Honeypots
Supervisor: Dr Warren JinKeywords: Data Mining, Internet security, intrusionThe Internet security has been
becoming more and more important as Cyber attacks have significantly increased
in volume, coordination and sophistication due to the spread of worms and
botnets. A botnet is a large collection of compromised machines under a common
Command-and-Control infrastructure, typically used for nefarious purposes. A
honeypot is a computer trap specifically designed to attract and detect
malicious attacks, in particular botnet attacks. For example, the Leurre.com
and the SGNet honeypots are distributed over 30 countries and they have
collected hundreds of gigabytes of Internet malicious traffics. This Honours project, as a part of DSTO-NICTA joint honeypot project, is to improve or develop techniques for discovering actionable knowledge from Internet malicious data collected by the Leurre.com and the SGNet honeypots. The project may concentrate on following topics, each of which could be developed into a quality Honours thesis.
The new technique would be very useful for proactive security, such as Internet intrusion prevention. Participants will have the opportunity to get hands-on experience with some of the latest technologies. The participants will access large real-world databases collected by honeypots. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the honeypot project. Student participant should either have
excellent programming skills in Java, C++, or Python, or have strong
mathematical background. Experience of networking or Internet security or
database is useful. Knowledge of data mining, machine learning or statistics
is an advantage, but not necessary. Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information. Project II
Project Title: Large
Complicated Data Stream Mining
Supervisor: Dr Warren JinKeywords: Data stream mining, exploratory analysis
Owing to the rapid progress on data collection and advanced database system technologies and World Wide Web, various complicated data have been explosively growing. There are lot of examples of data streams. For example, a satellite-mounted remote sensor is generating data constantly. World-wide distributed honeypots are collecting malicious Internet traffic data every second in order to monitor the spread of worms and botnets. These data streams are massive (e.g., the Leurre.com and the SGNet honeypots are collecting hundreds of megabytes data over 30 countries every day), temporally ordered, rapidly evolving, various types of attributes (e.g., continuous, categorical, string) and potentially infinite. Mining complicated data streams is one of very important real-world data mining tasks. This Honours project is to develop or improve a data mining technique on complicated data streams. The data mining technique could be, but not limited to, data visualisation, temporal association analysis, clustering, time series analysis, outlier detection and classification. The technique can be used to find interesting events or hot spots within data streams. The project will consider mixed data attributes such as continuous (e.g., packet length), categorical (e.g., TCP or not) and sequential (e.g., payload strings) attributes. The project may also take into account of scalability for large amount of streaming data. Participants will have the opportunity to get hands-on experience on up-to-dated stream mining techniques. The participants may be engaged in using these techniques to handle real-world challenges. They can experience both academic and industrial research work environments, under supervision of Dr Warren Jin. Each student participant may receive stipend for living expenses up to $1500 per semester depending on the contribution to the use-inspired research. Interested students should have good
understanding of data mining, visualisation or time series analysis.
Programming skills in C++, MatLab or R are useful. Experience of databases or
data analysis or distributed computing is a plus. Please feel free to contact Dr Warren Jin (Huidong.Jin@nicta.com.au) for further information. Project III
Project Title: Population
Projection for New
|
|||||||||||||||||||||||||||||||||||||||||||||||||||
|
Page last updated: 12 June 2008 Please direct all enquiries to: webmaster@cs.anu.edu.au Page authorised by: Head of Department, DCS |
| The Australian National University — CRICOS Provider Number 00120C |