Honours topics
Half of your time as an honours student is spent working on a project. But first you have to find a project topic.
The "official" reference for projects proposed by potential supervisors is the CECS projects database.
There are projects there available for all levels of research, including short projects, summerscholarship projects, Honours projects, Masters and PhD projects. All potential research students at any level are urged to browse it.
If you see a project that looks interesting, email the potential supervisor about it. Don't be afraid to discuss possible variations to the listed project: what appears on the web site is generally more of a suggestion than a rigid specification.
You don't have to be constrained by what you see on the project website. If you have something that you would like to work on as a project, then feel free to discus it with the honours convener to see if it could form the basis of an honours project and to identify a possible supervisor. Look at the web pages of Computer Science staff to find out their research interests. Remember that projects may also be supervised by people outside the College, or even ontside the University: from CSIRO or NICTA, for instance.
Former Project Topics
For your interest, here is an archive of Honours project proposals from previous years. Some of them are quite ancient now, of course, but they may help to give you ideas of the kind of thing which is suitable.2003 Honours project proposals
Induction in a large temporal domain - the intelligent file pre-fetcher for Linux
Supervisor: Eric McCreathMachine learning may be defined as learning a description of a concept from a set of training examples. This description may then be used for prediction. This project involves investigating the application of Machine Learning with an operating system to pre-fetch files. The central outcomes of this research will be an analysis of machine learning approaches to undertake such a task. This would also involve forming a theoretical model of the domain in question, which would be vital for the analysis.
Details:
Operating Systems often employ read-ahead windowing approaches for sequentially accessed files. Basically the next part of a file is pre-fetched into memory even before it is requested. This greatly improves performance of the system.
Often when we interact with computers the same files are required during our interaction. For example we may always sit down and start-up our mail reader. If the computer could learn this pattern of usage and pre-fetch these files the responsiveness of the system could be improved. This project involves developing such a system and investigating its potential.
Within machine learning there are a number of issues that make this project both challenging and interesting. These include: the temporal nature of the data, the large number of examples, how the leant hypothesis could be applied to recommend which files to pre-fetch. Also within Operating Systems there are a number of challenges, such as how the system hooks into the kernel and how to measure the performance of the system when it is functioning with the operating system.
The results of this research have the potential to be published in both Operating Systems and Machine Learning publication. The project would also form a good starting point for graduate study.
Students will need to be able to reason logically and should have done well in theory subjects (either within Mathematics or Computer Science). Successful completion will require both determination and imagination. Finally, student considering embarking on this project must have good programming skills and will need to gain an understanding of the Linux kernel.
Parallelisation, Machine Learning and a Graphical User Interface for a Data Cleaning and Matching System
Supervisor: Peter ChristenE-Mail: Peter.Christen@anu.edu.au
Many organisations today collect massive amounts of data in their daily businesses. Examples include credit card and insurance companies, the health sector (e.g. Medicare), police/intelligence or telecommunications. Data mining techniques are used to analyse such large data sets in order to find new, unexpected and potentially valuable information (like patterns and rules, or outliers).
Often several data sets have to be linked to obtain more detailed information. As most data is not primarily collected for data analysis purposes, a common unique identifier (like a patient or customer identification number) is missing in many cases, and so probabilistic techniques have to be applied to link data sets. Data or record linkage - also called data matching, data integration, data cleansing or data scrubbing - is a a rapidly growing field with applications in many areas (business mailing lists, health and biomedical research, fraud and crime detection, census statistics, etc.) and it is an important initial step in many data mining projects.
The ANU Data Mining Group is currently working in collaboration with the NSW Department of Health, Centre for Epidemiology and Research on the improvement of probabilistic techniques for record linkage. We are mainly interested in developing high-performance techniques for linkage that can be run on parallel computers like the APAC National Facility (a Compaq supercomputer with 480 processors), or clusters of workstations or PCs. Prototype software has been developed and published under an open source software. See http://sourceforge.net/projects/febrl and the project web page at http://datamining.anu.edu.au/linkage.htmlfor more details.
Student research in this project would include:
- Exploration of parallelisation and development of techniques
for distributed data access, (dynamic) load balancing, fault
tolerance and recovery within the framework of the developed
prototype software.
- Exploration of machine learning and data mining techniques
to improve the linkage quality. One of the disadvantages of
most machine learning techniques is their need of training
data, which is hardly available in the given application area.
While some initial work has been done by various researchers,
there are still many open questions and issues.
- As users of a record linkage system most likely will not be computer experts, but rather biomedical researchers, statisticians, etc.), a graphical user interface needs to be developed. Research issues involved in this area are for example how to efficiently support a user when human intervention is needed on the linkage status of a record pair.
A student working on this project would be involved through
- participation in an exciting research project
- object-oriented cross-platform programming (using Python, Tkinter, wxPython, etc.)
- exploration and development of machine learning and data mining techniques
- distributed parallel computing on SMPs, COWs and supercomputers (APAC)
- open source software development (see http://sourceforge.net/projects/febrl).
For more information, publications, and further links please visit the project web page at http://datamining.anu.edu.au/linkage.html.
Large Scale Data Grid projects (Internet Futures)
Contact: Markus.Buchhorn@anu.edu.au , CSIT-N244, 6125-8810
The Internet Futures group of the ANU is working with high-speed networks, advanced network services, visualisation and telecollaboration technologies, and large scale data facilities, to support a diverse range of user communities. Many of its activities have a national focus, e.g. through the GrangeNet project.
In the large-scale-data area, the users include Physics, Astronomy, BioInformatics and Cultural Archives (e.g. audio, spoken and music, or video). Archives can be easily many Terabytes (1000's of Gigabytes) in size, need to be delivered to users on networks ranging from multi-Gigabit down to slow dial-up modem speeds, and may need to interface to large-scale computing systems.
The IF group has several potential projects feasible for an honours project. These include
- development of new Bio-Mirror (www.bio-mirror.net) archive services, to greatly reduce their bandwidth needs, while increasing their performance and usability to users. This project would have international impact, as a good solution would be adopted by many biomirror sites.
- development of Virtual Observatory elements, to support astronomers' real-time access to large scale observational data sets
- and others, depending on your interests...
Technologies in these fields include grid toolkits (Globus for example), metadata standards, various network protocols and performance issues, large scale databases and database/directory access methods, and web services and SOAP. Programming skills in C, C++, Java, Perl, Python, are all useful.
You would get to work with a range of users across the ANU, on real-world problems. You'd also get access to the APAC Mass Store facility (a 1.2 Petabyte (1200 Terabyte) tape robot) and associated computing hardware including possibly the APAC National Facility supercomputer, the ANU Supercomputer Facility's Visualisation Laboratory (VizLab) , and the Internet Futures' Access Grid (AG) (telecollaboration) nodes in various locations on campus.
You can go as deep into these areas as you like (although we have some ideas), and maybe even come back for a PhD if you're really keen :-)
TeleCollaboration (videoconferencing on steroids) projects (Internet Futures)
Contact: Markus.Buchhorn@anu.edu.au, CSIT-N244, 6125-8810
The Internet Futures group of the ANU is working with high-speed networks, advanced network services, visualisation and telecollaboration technologies, and large scale data facilities, to support a diverse range of user communities. Many of its activities have a national focus, e.g. through the GrangeNet project.
TeleCollaboration can be seen as very high-end videoconferencing, with many additional features. Some of these features require the user's space to be intelligent or flexible, others require advanced network or application services. Any kind of collaboration, for research and for education can be supported. This could include music teaching (masterclasses), architecture, history or medical training with visualisation or even virtual reality interfaces, and large-scale data browsers and instrument consoles (e.g. in Astronomy or TeleMicroscopy).
The Internet Futures' telecollaboration activities revolve mainly around Access Grid (AG) nodes. There are currently (Dec/Jan 2003) 3 full-size nodes being built at the ANU (including one in the CS&IT building), and several additional nodes are being discussed. There are also plans for smaller-scale (even desktop) nodes, and ports to a variety of systems and other technology spaces (e.g. the Wedge virtual reality/visualisation facility). Around Australia 15 nodes are currently in planning or construction, and there are over 130 nodes around the world at the moment (Jan 2003). ANU users will include Astronomers at Mt Stromlo Observatory (20km west of Canberra), and the new ANU Medical School (on campus).
The underlying architecture of these systems is open and flexible, and the ANU has the potential to become a worldwide focus of AG development, with strong international collaboration.
The IF group has several potential projects feasible for an honours project. These include
- high-quality Audio/Video capture and delivery e.g.
- tiled (multi-camera) video capture for higher resolutions,
- broadcast quality video capture,
- low latency Audio/Video codecs
- intelligent A/V capture (using the steerable cameras, and possibly audio-capture interfaces). Some applications, e.g. music teaching, require very specific views of students as they play music. For general workshop/meeting interaction, the steerable cameras could be used to track a speaker as they wander around the room, or to focus on specific audience members when they ask questions.
- integration of other inputs, e.g. laptops and dedicated applications (which aren't multicast aware, or are part of the "standard" AG toolkit), as well as intelligent/electronic whiteboards, wireless PDA's, etc.
- cluster systems, for codec performance and robustness
- layered encodings to support differential quality audio/video (e.g. to support high-bandwidth and low-bandwidth clients simultaneously).
- integration with H323/SIP (the two major standard videoconferencing-over-IP technologies) and streaming systems (for one-way delivery to non-AG clients). This will involve some interesting design issues, e.g. H.323 is designed for essentially one-view-at-a-time, where the AG is designed for multi-site viewing in parallel.
- User Interface design; the AG user interface is pretty ordinary and unfriendly to new users. There is a lot of untapped information in AG streams that could be used to greatly enhance the user experience. These include audio localisation, visual feedback of audio activity, and audio-video synchronisation ("lip-sync").
- physical security of AG spaces; protecting the (expensive) hardware in AG spaces when it is not being used. Given multiple steerable cameras, sensitive microphones, and possibly other sensors, it should be possible to create a space that secures itself (and screams if attacked :-) ).
- and others...
Technologies in these fields include IP multicast, hardware interfaces, various network protocols and performance issues, user-interface design, and audio/video codecs. Programming skills in C, C++, Java, Perl, Python, are all useful.
You can go as deep into these areas as you like (although we have some ideas), and maybe even come back for a PhD if you're really keen :-) Possibly more than one area could be looked at during the project.
Extending the Sparc-Sulima Simulator for Clusters
Supervisor: Peter StrazdinsEmail: Peter.Strazdins@anu.edu.au
Phone: (02) 6125 5140
For a description of this project please see the web page titled: Extending the Sparc-Sulima Simulator for Clusters.
Checkpointing in a complex object oriented software system
Supervisor: Bill ClarkeEmail: Bill.Clarke@anu.edu.au
Phone: (02) 6125 5687
For a description of this project please see the web page titled: Checkpointing in a complex object oriented software system.
Hyperlinks, PageRank and Page Quality
Potential supervisors: Trystan Upstill, Nick Craswell, David HawkingE-Mail: nick.craswell@csiro.au
This project would suit a student interested in the evaluation of Web search engine technologies.
The "link recommendation" assumption is that each hyperlink is a vote for the target page being of high quality. The most famous example is Google's PageRank [1], but most search engines probably use some form of link recommendation to boost the ranking of pages with high indegree.
While the use of link recommendation is widespread there has been no published evidence to support its use. For example, nobody has tested whether, given the home pages of two things (products, companies, CS departments) the one with the highest PageRank tends to be the best or most important one. Yet for us to be convinced that link recommendation is useful in search engines, something like this must be true.
The project is to compare PageRank to "real world" notions of quality such as company reputation (using S&P [2]), movie popularity (using IMDB [3]) or sales ranks (using bestseller listings). The question is whether PageRank as calculated by Google predicts such rankings, or which rankings are best predicted by PageRank. There would also be room for the student to implement their own link recommendation measure and evaluate it in the same way, depending on their interest.
- [1] http://www.google.com/technology/index.html
- [2] http://www.standardandpoors.com
- [3] http://www.imdb.org
- A paper describing PageRank by the founders of Google: http://www7.scu.edu.au/programme/fullpapers/1921/com1921.htm
- To see a simple use of PageRank in ranking, see the Google Directory e.g. this list of CS Departments is in PageRank order.
Data Cleaning on the Web
Potential supervisors: Francis Crimmins, Nick Craswell, David HawkingE-Mail: nick.craswell@csiro.au
This project would suit a student interested in studying Web search technology.
Many Web search systems rely on a crawler to gather pages, for example Google crawling billions of pages per month and search.anu.edu.au crawling hundreds of thousands of ANU pages per week. Some information on crawlers is in the Web Robots FAQ.
Unfortunately, when crawling, all kinds of redundant pages tend to be gathered, lessening the efficiency and effectiveness of the search system. Types of redundancy include:
- Mirroring: The same pages on two hosts
- Aliasing: Two pages or directories might deliberately appear in different locations on a host.
- Case insensitivity: The same page might be available with an upper and lower case URLs (while on other servers maintaining correct case is crucial).
- Parameterised URLs: By varying the order of CGI parameters in the URL or adding ones which do not affect content (e.g. a user id) the same content can appear at hundreds of URLs in a crawl.
- Crawler traps: The simplest form of crawler trap leads to URLs such as: http://xyz.anu.edu.au/hons/hons/hons/hons/hons/topics.html. It occurs when topics.html links to the subdirectory "hons/" and that is a valid URL due to some soft link or server misconfiguration.
- Applications: For example, a calendar application if completely crawled might give monthly calendar pages for a hundred years, even if only the current year has any calendar entries.
- Error pages: Error pages which have an incorrect HTTP status code (200) also lead to redundant pages being included in the crawl.
Crawling these redundant pages wastes network and machine resources. Also, since crawls are usually halted after a certain number of URLs or time limit, good pages can be crowded out of the crawl by these redundant pages. Reducing redundancy also frees up disk space on the search machine, improves index/query processing time and can improve retrieval effectiveness (user satisfaction with results).
The project is to investigate the sort of redundancy which occurs in crawls and methods for eliminating it, perhaps starting with search.anu.edu.au. Possible approaches include pre-GET, post-GET or post crawl redundancy elimination, based on content [Broder et al] or on URLs [Bharat and Broder]. The project could measure the impact of such detection, study its implementation in a real search system and/or investigate improved URL-based detection methods, particularly in the area of parameterised (CGI) URLs.
References:
- http://www.google.com/
- http://panoptic2.anu.edu.au/search/search.cgi?collection=anu_search_external
- http://www.robotstxt.org/wc/faq.html
- http://www.w3.org/Protocols/HTTP/HTRESP.html
- http://www.scope.gmd.de/info/www6/technical/paper205/paper205.html
- http://www8.org/w8-papers/4c-server/mirror/mirror.html
XML search system
Potential supervisors: Nick Craswell, David HawkingE-Mail: nick.craswell@csiro.au
A project for students interested in XML technologies, Web technologies, information management and/or coding.
XML is a simple technology which has far-reaching implications and an increasing number of real-world applications. One way of accessing XML documents, particularly complex or large datasets, is via search engine technology. However, supporting rich queries over arbitrarily complex XML raises a number of research questions. In particular, restricting the results to a particular DOM subtree is something that is difficult to do without an additional index structure.
This project could be a flat out engineering and measurement exercise, adding DOM subtree support to a simple inverted index. Alternatively it could involve studying alternate solutions and overall XML search requirements. It could be based on Panoptic (http://www.panopticsearch.com/) or an open source inverted indexing system.
Site and Search Engine Summarisation
Potential supervisors: Nick Craswell, David HawkingE-Mail: nick.craswell@csiro.au
Write and evaluate a site summariser. Optionally apply the same principles to subject-specific search services.
Yahoo! [1] is an example of a successful company which performs manual site summarisation: all their listed sites have hand-written summaries. InvisibleWeb [2] do the same job as Yahoo!, but summarising subject-specific search services. Either type of company might find automatic summarisation techniques useful.
There are many document summarisation algorithms, including those based on the content of one or many documents, and those based on hyper-links [3]. The project is to find out which summarisation methods are best at describing the content of a site (and/or subject-specific search service).
- [1] http://www.yahoo.com
- [2] http://www.invisibleweb.com
- [3] http://www.mri.mq.edu.au/~einat/incommonsense/
Peer to peer postcards: multimedia delivery for wireless handset devices
Supervisor: Chris Johnson
Details:
Mobile phone networks are very expensive way to deliver pictures to mobile phone handsets, but the handsets are now being made with cameras and display screens. We can also see devices that combine remote network phone communications with local Bluetooth communications.
Voice communications over the phone network are still needed, because voice over IP is not yet widely available with sufficient quality and convenience - and the phone network is much more extensive in geopgraphical reach and number of customers; but where two people have compatible devices and have some access to short-range wireless networks, they should be able to exchange pictures over IP.
It is proposed to make the two networks complement each other: to send short messages and interactive voice communications over the instant, mobile phone network, and larger multimedia objects with some delay over the cheap, local Bluetooth or WiFi network. Bluetooth networks are short range, and the devices will need to form ad hoc, peer to peer connections to hand the multimedia chunks along the chain to make the distance. (WiFi networks are more centralised but the same peer to peer behaviour is also possible).
The problem is to discover, design, install, implement the appropriate protocols, taking into account users' social and geographical requirments. Experimental demonstration using mobile wireless devices such as laptops and PDAs in conjunction with normal mobile phones is expected - without deep integration of the two modes of communcation. Some extension work would be desirable towards analysis of network reach, and the density of partygoers needed to make the service work in practice.
References
The problem is mentioned in a report on a talk by Henning Schulzrinne, at Mobicom 2002] - as reported in IEEE Internet Computing, Nov/Dec 2002, p 12.
Peer to peer networks, Bluetooth, and WiFi are hot topics at present, and the student will have to find their way into the gossip and separate hard facts from hyped fancy.
Student skills required
Interest in distributed computing application protocols; interest and ability to search out existing software systems; analytical awareness of user needs and social realities as well as technologies. Self-starting person, at home with finding, installing and experimenting with real world software. An ability to analyse and write about the implications of the technical work is desirable.
Project support will come from the Smart Internet Technology Intelligent Environments group, in both technology and conceptual buzz.
Interval Arithmetic: The New Floating-Point Paradigm?
Supervisor: Alistair RendellFrom using a calculator we are all familiar with the fact that many numbers must, at some level, be truncated and rounded. Thus 2/3 may appear as 0.67, 0.66667, or 0.6666667 etc depending on the precision of your calculator. More generally representing the continuum of floating point numbers by the discrete finite representations used in all modern computers gives rise to rounding errors, overflows, underflows etc. Failure to account for these limitations can give rise to spectacular (expensive!!) results, e.g. the failure of some Patriot missiles to hit their targets during the Gulf War or the explosion of the Ariane 5 space rocket (see disasters)
Interval arithmetic is a fundamental paradigm shift that attempts to address these issues. It defines an "interval" as a continuum of real numbers bounded by its end-points and performs all arithmetic operations with respect to intervals rather than discrete floating point numbers. As a consequence of this errors are automatically propagated and accounted for. Moreover the mathematical properties of intervals can be exploited to develop entirely new algorithms.
The concept of intervals is actually not new, but was considered by Moore in 1965. What is relatively new is the fact that some hardware vendors are now providing direct support for interval arithmetic within their compilers/hardware.
The aim of this project will be investigate the use of interval arithmetic for computational science applications. The initial goal will be to use intervals to study error propagation for some "problematic" algorithms found within the Gaussian computational chemistry code. A second goal will be to investigate the use of intervals for nonlinear global optimizations, eg determining the most stable conformation (geometry) of a large molecular system (eg protein).
For more information on interval arithmetic see the interval home page.
Performance Analysis of Cache Oblivious Algorithms
Supervisors: Alistair Rendell, Peter Christen and Peter StrazdinsAlthough clock speed is important, on modern computers in order to obtain a high percentage of the theoretical peak performance it is vital to maximize cache re-usage. As a consequence much effort has been directed to developing algorithms that are, in some way, blocked so that they spend the majority of their time working on data that is cache resident. The classic example of this is a matrix multiplication algorithm in which large matrices are broken down into sub-matrices small enough to reside in cache. These sub-matrices are then multiplied together and the results used to construct the overall result. A disadvantage of these approaches, however, is that they usually contain explicit parameters relating to the cache size of the computer being used. Thus limiting portability of the code, possibility even between CPUs from the same manufacturer.
An alternative to explicit cache blocking is to construct your algorithm from the outset in such away that, at some point, it naturally becomes cache resident. Such algorithms are referred to as "cache-oblivious" and have been strongly advocated by Matteo Frigo in a series of papers recently published. Ideal candidates for cache-oblivious algorithms are recursive algorithms, where frequently the data set is successively divided into half, and thus at some stage becomes small enough to reside in cache.
Unfortunately to date most of the work on cache oblivious algorithms has been theoretical rather than practical. The aim of this project is to implement some cache oblivious algorithms and use hardware performance counters both to compare their performance characteristics with more traditional algorithms, and to study their performance on a range of different hardware platforms.
OpenMP Futures
Supervisors: Alistair Rendell and Peter Strazdins
OpenMP is an Application Program Interface (API) for multi-platform shared-memory parallel programming in C/C++ and Fortran. It is jointly defined by a group of hardware and software vendors and, via cOMPunity , the wider community. The first OpenMP specification was released in 97, while version 2.0 for C/C++ was released in 02. Since its initial release it has gained increasing use, particularly in the scientific community.
Currently there is much debate within the OpenMP community as to what should be included in future OpenMP specifications. We are very interested in these issues as part of the CC-NUMA project. In particular we are keen to explore possible
- Support for Non-Uniform Memory Access (NUMA) architectures
- Task queues models
A further opportunity exists to consider OpenMP on cluster
environments. Thus, although OpenMP is a shared memory
programming paradigm there has been some work to enable OpenMP on
clusters (similar to software distributed shared memory models).
Of particularly note is the work of the PC cluster
consortium in Japan. Work in this area would involve using their SCore
cluster development kit on the Bunyip Cluster and the
APAC SC system.
Data Warehousing and OLAP Project(Small ARC funded project)
Supervisor: Weifa LiangEmail: wliang@cs.anu.edu.au
Phone: 6125 3019
One of the fundamental problems in the design of data warehouses is the formation of effective methods for selecting which views to materialize, and which materialized views to replace. In other words, given the resource constraint and query load to be supported in a data warehouse, the selection problem usually is to select a set of views to materialize such that the sum of query response time and view refreshment (or maintenance) time is minimized. Though the previous work has offered significant insight into the nature of the problem, in all of the existing work, it is assumed that the initial data warehouse is empty. In normal operation, the data warehouse grows incrementally. Also, the query and update frequencies of materialized views are subject to changes with time.
Another related challenging problem in data warehousing is the view refreshment (view maintenance) problem under the constraint of a given refreshment time window. In order to keep the content of a materialized view in the warehouse consistent with the contents of the remote sources, the materialized view in the warehouse must be updated immediately or periodically in response to the source updates. Usually such updates are done incrementally. In summary, the following work will be conducted in this project. design of new algorithms for view selection and replacements in the dynamic data warehouse environment, and evaluation of new strategies for such applications through implementation and performance evaluation. examination of the existing view refreshment policies, development of new view refreshment algorithms that shorten view refreshment time for the data warehouse.
Routing Algorithms for WDM Optical Networks (ARC Discovery Funded Project)
Supervisor: Weifa LiangEmail: wliang@cs.anu.edu.au
Phone: 6125 3019
Optical network technique plays a key role to the next-generation networks. In particular, wavelength-division-multiplexing (WDM) networks have emerged as a promising candidate for next-generation networks in providing large available bandwidth and connectivity. Routing and wavelength assignment problem is one of the fundamental problems in WDM optical networks.
Designing efficient routing protocols is a fundamental problem for any network. There is no exception for the WDM optical networks. By the nature of WDM network, the traditional routing algorithms are ill-designed for this setting. It is thus necessary to devise new algorithms and algorithmic methodology for it.
In this project we primarily focus on devising efficient routing algorithms for two fundamental routing problems in WDM networks. They are the all-to-all routing and robust routing problems. Incorporated with various known techniques, practical algorithms and new techniques for solving them will be developed. The performance of the developed algorithms will also be analyzed through experimental simulation. Our objectives are to devise efficient routing algorithms for several fundamental routing problems in WDM optical networks. In particular we will focus on devising routing algorithms for all-to-all routing and robust routing problems through the use of a collection of algorithmic skills and graph techniques including path-coloring, network flow, maximum matching, etc. The performance of the developed routing algorithms will be analyzed through experimental simulation. develop a generic methodology for the design of ``efficient'' routing protocols for WDM networks. To effectively utilize the network resources, an efficient routing protocol should be able to optimize multiple objectives simultaneously in terms of different sorts of network resources consumption. investigate various known techniques to achieve efficient routing in WDM optical networks, and develop new concepts and techniques to solve the routing problems in WDM optical networks. Practical algorithms to solve some real problems, will be developed, implemented, and tested.
Power-Aware Routing for Ad Hoc Mobile Networks
Supervisor: Weifa LiangEmail: wliang@cs.anu.edu.au
Phone: 6125 3019
Mobile multihop radio networks, also termed peer-to-peer, all wireless, or wireless ad hoc networks, are expected to fulfill a critical role in applications in which wired backbone networks are not available or not economical to build. These span the commercial, public as well as tactical communications sectors. The networks provide the only solution in situations where instant infrastructure is needed and no central system backbone and administration (like base stations and wired backbone in a cellular system) exist. Some of the applications of the networks include mobile computing in areas where other infrastructure is unavailable, law enforcement operations, disaster recovery situations, as well as ad-hoc networks for large events such as sporting events or congresses when it is not economical to build a fixed infrastructure for a short temporary usage. In tactical battlefield communications the hostility of the environment prevents the application of a fixed backbone network. We consider a peer-to-peer mobile network consisting of a large number mobile nodes that create a network on demand an may communicate with each other via intermediate modes in a multihop mode, i.e., every node can be a router.
In this project we focus on new power-aware routing protocols to extend the life of the nodes and the network, based on homogeneous and heterogeneous mobile models, a new power-aware routing protocol will be proposed, which takes the capacities of different types of mobiles into consideration.
Memory Management Toolkit
Supervisor: Steve BlackburnEmail: Steve.Blackburn@anu.edu.au
Phone: +61 2 6125 4821
JMTk is a new memory management toolkit for Java, which I have co-developed with Perry Cheng of IBM Research and Kathryn McKinley of the University of Texas. JMTk has proved very popular and there is significant interest in allowing it to work in other language environments including Microsoft's CLI, and the Haskell runtime. This project will explore the issues associated with interfacing JMTk to these environments. For exceptional students, the possibility exists of an internship at Microsoft Research as part of the project.
Performance analysis of Java
Supervisor: Steve BlackburnEmail: Steve.Blackburn@anu.edu.au
Phone: +61 2 6125 4821
Java is very important commercially, and yet the runtime systems required to support it and other languages such as C# are relatively immature. This project will involve measurement and analysis of the performance of Java programs, with a view to better understanding where there are opportunities for performance improvement. This work will involve using and modifying JMTk, a memory management toolkit for JikesRVM.
Emergency Web
Supervisor: Tom Worthington, Ian BarnesEmail: Tom.Worthington@tomw.net.au
Phone: 0419 496 150
The World Wide Web provides a useful way to provide information to the general public and organization staff. However, in an emergency, web sites can be overloaded by demand. Investigate how web based systems have coped with emergencies and suggest techniques to allow web based systems to work better. Techniques might include use of accessibility guidelines and design to enhance the use of caching. Recommend changes, include prototypes for the ACT Government and the Sentinel Fire Mapping System, as used in the ACT bush fire emergency of January 2003.
Simplifying XML for E-commerce
Supervisor: Tom Worthington, Ian BarnesEmail: Tom.Worthington@tomw.net.au
Phone: 0419 496 150
Web Services provides a way to define a set of transactions and the format of the data which those transactions can be performed on. In theory the XML format used makes e-commerce simple to implement, but in practice the take-up has been slow. The Australian Taxation Office provides free XML based e-commerce software for small businesses to submit electronic business activity statements (eBAS) on-line, but most still use paper forms. It has been demonstrated that it is possible to transform the eBAS format using XSLT, to be displayable by a web browser without the need for additional software. Investigate enhancements to the eBAS format to allow processing by a web browser, without additional software. Propose changes to the eBAS XML format, and if needed to the relevant web services and XML standards to make this possible.
Standardising Office Document Formats
Supervisor: Tom Worthington, Ian BarnesEmail: Tom.Worthington@tomw.net.au
Phone: 0419 496 150
An XML based standard format for office documents containing text, spreadsheets, charts, and graphics has been proposed. The standard is to be based on OpenOffice.Org's file format. However, the OOO format is complex, contains errors, is not compatible with available web browsers and does not make full use of the latest XML standards. Investigate options for a file format which is simpler than OOO, uses existing standards and is compatible with web browsers. Implement the proposed standard as an extension to the OOO package.
Making Movies with Metadata
Supervisor: Tom Worthington, Ian BarnesEmail: Tom.Worthington@tomw.net.au
Phone: 0419 496 150
The Australian Creative Resources Archives (ACRA) is an ARC funded pilot project to make “waste” materials (such as video, animation, audio and text) from Australian cultural production processes available to researchers. The pilot infrastructure will consist of commercial digital video equipment, interfaced to GrangeNet. Some possible areas for research:
Multimedia design principles: Extend the OpenOffice.org package to work with XML encoded film scripts and interface it to the archive. Investigate user interfaces for displaying the scripts (text, video timeline, or a tree of objects). Implement a prototype allowing the archive's content to be manipulated by OOO as a film script and played back as high resolution video over GrangeNet.
Database and interface design: Investigate the metadata definitions built into the archive's commercial software, compare these with avaliable standards and the projects requirements to suggest what needs to be added. Implement an interface to the archive to allow metadata, including additonal items) to be entered, searched and extracted.
Automatic Web Page Layout
Supervisor: Tom Worthington, Ian BarnesEmail: Tom.Worthington@tomw.net.au
Phone: 0419 496 150
Investigate artificial intelligence algorithms for automatically laying out web pages and produce an open source prototype software. Document layout "hints" for different renderings of the document (print, web, slideshow and AV) would be explicitly encoded in the document (using XML or similar format) or would be inferred from an existing screen layout. Documents would be rendered to suit the user's requirements and the capabilities of their display device and communications link, through features in the display device and/or in a server (for low capability display devices). As an example multiple frames would be used on large screens and one frame with links on small screens. The software would generate documents incorporating accessibility features for the disabled as described in W3C Web Content Accessibility Guidelines. Multiple renderings of information objects (for example multiple language versions for text, text captions for images) would be available.
Declarative theorem-proving
Supervisor: Michael NorrishEmail: Michael.Norrish@cs.anu.edu.au
This project would suit a student keen to work in an area involving logic and proof, and the development of infrastructure to support these activities.
When specifying systems formally, a significant error-finding benefit can accrue simply from writing everything down in an expressive logic, and getting it all to type-check. Working in this way allows users to quickly deal with the “low-hanging fruit”; important bugs that are easy to find.
The HOL system currently doesn’t allow for this model of work very well. Part of the reason for this is that script files for HOL inter-mingle SML (which is the theorem-prover’s implementation language) and embedded bits of logic. We would like to check the logic without having to be able to parse (and execute) SML as well.
Therefore, the first aim of this project is to design a simple file format in which users could write “SML-less HOL”, and to then implement at least two tools for manipulating this format. The first tool would be the type-checker. It would make sure that the definitions and other logical material in the file actually made sense at the level of types. The second tool would do all this and also produce a genuine HOL theory, using the various definitional facilities already present in HOL. This file format would need to include mechanisms for referencing other files, where previous logical developments had been built up.
The final stage of the project would be to implement a simple declarative language for writing proofs. The issues here are similar to those for definitions; to escape the use of files that include (SML) code, and move to formats that are easy to parse and manipulate as data.
Implementing proof kernels
Supervisor: Michael NorrishEmail: Michael.Norrish@cs.anu.edu.au
At the core of LCF systems like HOL, there is a small “kernel” that implements the primitive rules of inference of the logic. There are a number of interesting issues relating to the design of these kernels. For example
- How to represent free and bound variables?
- Optimising common calculations (such as calculating the set of free variables)
Using Model Checkers to Automatically Verify the Correctness of Complex B2B protocols
Supervisors: Dean Kuo
and Sylvie Thiebaux
E-Mails: Dean.Kuo@csiro.au,
Sylvie.Thiebaux@anu.edu.au
Phones: (02) 6216 7066, (02) 6125 8678
Suitability: This project would suit a student interested in Software Engineering research. It is best conducted as a 4th year BSE individual research project or as a BInfTech honours thesis. If necessary, it may be possible to approach it as a 4th year BSE team project, but this would require negotiations as to how to appropriately modify its scope given below.
Prerequisites: Knowledge and interest in distributed computing, logic and formal specification and verification.
Project Description:
Application integration technologies have been needed ever since the earliest
days of commercial computing since no single application was ever hope to
meet the complete needs of an organisation. Integration can mean many things,
from simple file based exchanges (e.g. FTP) to tightly coupled systems using
remote procedure calls. The need for application integration has given rise
to technologies such as EAI (Enterprise Application Integration) and B2Bi
(Business to Business integration) and most recently has been the driving
force the development and adoption of Web Services. These technologies provide
solutions to the low-level issues of connectivity, data interchange and cross-application
invocation. They make it possible to integrate autonomous and independent
applications with some degree of ease.
The interactions between autonomous applications often require quite complex B2B protocols - that is, the interaction defines complex stateful interaction between the applications that cross organisational boundaries. An example of a protocol is e-procurement: after the customer sends a purchase order to the merchant, the merchant notifies the customer either the order has been accepted or rejected; if the merchant accepts the order, the merchant will later send a shipping notice and after the customer has received the goods, it sends an acknowledgement that goods have been received. In addition, the merchant sends an invoice to the customer, the customer is then expected to send payment before its due date and then the merchant, on receipt of payment, needs to send a receipt to the customer. The protocol becomes extremely complex when the customer or merchant cancels an order since goods may need to be returned and payment refunded. In the real world, we do not have the luxury of a global architecture and design when integrating autonomous applications, as these have been typically designed and implement separately. Thus, one crucial question that needs to be answered before we can reliably integrate applications is whether they are compatible. An example of two applications that are not compatible is if the customer is unwilling to send payment before goods have been received while the merchant refuses to deliver the goods until payment has been received.
Model-checking is an area of computer science which aims at developing tools for verifying that a system (or a set of communicating systems) satisfies a specification usually given in temporal logic, by showing that this (set of) system(s) is a logical model for the temporal logic formulae. Model-checking techniques have proven extremely successful in verifying hardware, and their applicability to other areas, including distributed protocols, has been the subject of much investigation for the past 10 years.
This research project is to investigate the following:
- Can we use the model checker SPIN to automate the process of verifying that two or more applications are compatible to perform a given business process? The B2B protocol that we will verify is a protocol, derived from one of our consultancies, for e-procurement. For simple cases, we are confident that it is possible. However, it is not clear whether more complex cases can be handled by SPIN. If not, we want to identify why not and what sort of extensions is required to be able to handle these cases.
- Identify strength and weakness of using SPIN to verify that applications are compatible.
References:
- [1] B. Bérard, M. Bidoit, A. Finkel, F. Laroussinie, A. Petit, L. Petrucci, and Ph. Schnoebelen. Systems and Software Verification. Model-Checking Techniques and Tools. Springer, 2001.
- [2] SPIN website: http://spinroot.com/spin/whatispin.html
