Faculty of Engineering & Information Technology ANU Crest
The Australian National University
Department of Computer Science, FEIT  
 

Previous honours topics

These projects topics were offered in 2000. While these - or similar - topics may not offered this year, they do serve as a guide to the diverse range of topics that will be available.

If you wish you could work on a topic like one of these, then why not write to the person who offered it. They might be persuaded to let you pursue a similar project under their supervision this year.

2000 Proposed Project Topics


Keyword-Based Approaches To Text Comparison

Supervisor: Peter Strazdins
E-mail: Peter.Strazdins@anu.edu.au
Phone: (02) 6279 5041

Text documents, which include computer programs, reports and essays, can often be characterised by the types and distribution of the (key) words that they contain. In the case of computer programs, these words can include identifiers, which have no predefined meaning. Key word characteristics can thus be used to define some metric of similarity between documents. Practical uses of this include determining whether the two documents have the same main subject, and/or whether they have a common origin.

Most existing means of computing the similarity between such documents rely purely in the spelling of the words, without utilising any predefined meaning that a word may have. Another method (so far only partially developed) which is also reasonably successful uses the frequency counts of key words, which have some predefined meaning. In this case, the concepts of `synonyms' of the key words may be useful in certain contexts.

However, much more information can be extracted from the words in a document. One is applying the frequency counts idea to words without predefined meaning, ie program identifiers, which introduces some new challenges. Others include the occurrences of small groups of words (with some reasonable definition of locality), and the distribution of particular words throughout the document.

This project seeks to investigate simple but useful methods to analyse and compare documents based on their composition of (key) words, and to prove their usefulness over collections of documents from a variety of different areas. Its relation to text retrieval methods, which are also key word based, and have concepts of locality, frequency and synonyms, should also be investigated. As such it has a strong research component, and the diversity of its application means that there will be a strong interest in the academic community in any significant results.


Selection of Materialized Views in the Dynamic Data Warehousing Environments

Supervisor: Weifa Liang
E-mail: Weifa.Liang@anu.edu.au
Phone: (02) 6249 3019

The major aims of this project are the design, implementation and evaluation of algorithms for view selection for materialisation, view replacement, and view refreshment in data warehouses. Thus our objectives are:

  • Design of new algorithms for view selection and replacements in the dynamic data warehouse environment, and evaluation of new strategies for such applications through implementation and performance evaluation.
  • Examination of the existing view refreshment policies, development of new view refreshment algorithms that shorten view refreshment time for the data warehouse.

The query response time and the view refreshment time are very crucial factors in determining data warehouse applicability. These factors in turn depend heavily on which views are selected for materialisation, which views will be replaced, and how fast the materialised views will be refreshed, given limited storage space and limited refreshment time window. Therefore, any improvement in handling view selection, refreshment and replacement will lead to a substantial improvement of the entire performance of the data warehouses.


Mining Generalized Associated Rules in Distributed Computing Environments

Supervisor: Weifa Liang
E-mail: Weifa.Liang@anu.edu.au
Phone: (02) 6249 3019
A number of efficient sequential algorithms have been suggested for mining the generalized association rules. In this project, we will investigate different strategies to mine generalized association rules in parallel and distributed computing environments. Particularly we will focus on designing and implementing efficient mining algorithms where the dimension tables and the fact table are decoupled and distributed in different remote sites.

Agent Negotiation

Supervisor: Roger Clarke
E-mail: Roger.Clarke@anu.edu.au
Phone: (02) 6288 1472

Investigate the practicability of implementing agents, and negotiations between agents, in particular by measuring the increase in the complexity of code as the complexity of the interactions between the two agents increases.

The suggested implementation context is the W3C's Platform for Privacy Preferences (P3P) specification. This defines how client-side software can store people's privacy preferences, and server-side software can store privacy policy statements by corporations and government agencies. The two agents can then negotiate with one another in order to permit a transaction to be entered into (such as the provision of shoe-size and credit-card details), to prevent the transaction, or to refer a mismatch to the consumer for a decision.

It is envisaged that a succession of prototypes of increasing completeness would be implemented, and key process and product factors would be measured. This would depend on a thorough appreciation of theories relating to agents, P3P, software development and maintenance, software complexity, and development and maintenance productivity.

Some background reading is at:


Automated Location of Internet Nodes

Supervisor: Roger Clarke
E-mail: Roger.Clarke@anu.edu.au
Phone: (02) 6288 1472

Investigate the extent to which it is feasible to devise and implement a tool that provides the location in physical space of Internet nodes.

Netizens assert that cyberspace is different from the real world. Important among the tenets of netizenship is the belief that the machines that people use cannot be readily found by other parties, even by multiple law enforcement agencies working in concert, especially if quite basic countermeasures are adopted. The increasing incidence of wireless communications channels to mobile nodes adds to the difficulties involved.

A number of sources of relevant data are publicly available (such as IP-address registers and the DNS), and others are available to law enforcement agencies (such as registers maintained by telecommunications companies and Internet services providers). The question is whether such sources of information can be combined into a tool that provides real-time (or even retrospective) computation of the node's geophysical coordinates at a point in time.

It is envisaged that this would involve an analysis of a range of relevant literature (including IETF and probably other engineering standards), study of relevant tools, establishment of a model of the relevant characteristics of nodes, manual investigation of the locatability of nodes in a sub-set of the circumstances defined by the model, and prototyping and testing of tools and tool enhancements. Because the task is likely to be partly dependent on data schemas that are not in the public domain, it is probably necessary to conduct interviews with telecommunications companies and ISPs.

Some background reading is at:


Secure Storage of Private Keys

Supervisor: Roger Clarke
E-mail: Roger.Clarke@anu.edu.au
Phone: (02) 6288 1472

Investigate the risks to private signature keys, and the extent to which security regimes and products are able to satisfy the need for protection against access to the key by any other person.

Digital signature technology is entirely dependent on the protection of the private signature key. There are difficulties in designing a secure environment for that key. This is the case in corporate settings, and especially so in the case of personal work-and-playstations. Throughout its life-cycle, the key must only ever be accessible to the owner, commencing with the generation of the key-pair, and through the phases of usage, storage, recovery from storage, backup, and recovery from backup. Artefacts designed to provide protections of this kind are sometimes referred to as 'secure wallets', and the most obvious implementation platform is a smartcard.

It is envisaged that a theoretical analysis of requirements, risks, security designs, and countermeasures would be undertaken; and that existing proposals for, and implementations of, 'secure wallets' would be evaluated. It may be appropriate to perform prototyping of some aspects of such products, and/or to devise and implement attacks on secure wallet products or prototypes. Because it is possible that some relevant information may not be in the public domain, it may be necessary to conduct interviews with smartcard technology providers and consultants.

Some background reading is at:


Centralised Storage of Biometrics

Supervisor: Roger Clarke
E-mail: Roger.Clarke@anu.edu.au
Phone: (02) 6288 1472

Investigate the extent to which the centralised storage of biometric measures of humans creates the risk of masquerade.

Biometrics is a generic term encompassing a wide range of measures of human physiography and behaviour. Measures of relatively stable aspects of the body include fingerprints, thumb geometry, aspects of the iris and ear-lobes, and DNA. Dynamic measures of behaviour include the process (as distinct from the product) of creating a hand-written signature, and the process of keying a password.

Schemes can be devised that apply biometrics in such a manner that the measure is only ever known to a chip held by the individual, and the device currently measuring the person concerned. (This is analogous to the mechanism used for protecting secure PINs input on ATM and EFT/POS keyboards).

It is very common, however, for proposals for biometric schemes to involve central storage of the biometrics, as police fingerprint records do now, and as current proposals by the Australian Government would do in relation to DNA records. This raises the question as to whether a person who gains access to the store could masquerade as that individual. Possible uses would be to gain access to buildings, software or data, to digitally sign messages and transactions, to capture the person's identity, to harm the person's reputation, or to `frame' the person.

It is envisaged that this would involve an analysis of a range of literature relating to identification and biometrics, and investigation of the extent to which abstract representations of biometric data can be used to produce artefacts that will satisfy biometric measurement devices. Because the task is likely to be partly dependent on information that is not in the public domain, it is probably necessary to conduct interviews with biometrics technology providers, and perhaps also physicists, biologists and engineers.

Some background reading is at:


Vertical Portal Web Crawler

Supervisor: Nick Craswell, Peter Bailey, David Hawking
E-mail: Nick.Craswell@cs.anu.edu.au
Phone: (02) 6249 4001

Write a prototype next-generation Web crawler, capable of both Web browsing and searching. Test its ability to automatically collate subject-specific Web resources for use in a "vertical portal".

Vertical portals [1] are a hot area in the Web industry, acting as a gathering place for a particular industry or interest group. They can include a complete and very up-to-date search facility for pages in the portal's area, as opposed to a general search engine which covers 16% of every area [2]. A hard task, however, is to efficiently find pages which a portal should search. Current portals rely on manually identified sites in the portal's subject area and crawling their pages using a dumb Web crawler [3]. Automating this process would find more pages with less human effort.

The project involves building a better crawler, that given the portal's area e.g. Linux, Chinese language or SCUBA industry, would automatically find matching pages. It would find pages using: 1) targeted browsing (crawling), building on the work of a 1999 honours student, and 2) automated searches of existing search engines. The crawler would accept or reject pages based on a string match (e.g. "Linux" or "SCUBA") or a language test (e.g. software for detecting Chinese). Evaluation would involve measuring how many pages the crawler can find with only browsing, only searching and then browsing plus searching. The set of pages found could also be compared with the coverage of an existing service such as [4]. This project suits students who want a deep understanding of HTTP, Web crawling and Web search for use in their later career.


Visual "Information Foraging"

Supervisor: Nick Craswell, Raj Nagappan
E-mail: Nick.Craswell@cs.anu.edu.au
Phone: (02) 6249 4001

Part of human nature is searching and foraging behaviour, even if today it is most often used at the supermarket or looking for car keys. Current text search applications, such as search engines, are not much like real-world searches. They require a user to enter a text query and view a textual list of results. The only indication of the "lay of the land" presented by current Web search engines is the 10 results and a number indicating e.g. 750,000 other "hits".

The structural nature of the Web and of relations between documents in a given search forms a network or graph. This dynamic network could be explored visually, giving the user many paths that they could choose to follow as they "forage" for information. The project involves development and assessment of a system that displays a visual map of document properties and relations. The project focuses on understanding existing and developing new ways of exploring and navigating through a complex information space. This project suits a student who wants to learn about the Web and interactive graphics.

Initial information:


Site and Search Engine Summarisation

Supervisor: Nick Craswell, Peter Bailey, David Hawking
E-mail: Nick.Craswell@cs.anu.edu.au
Phone: (02) 6249 4001

Write and evaluate a site summariser. Time permitting, apply the same principles to subject-specific search services.

Yahoo! [1] is an example of a successful company which performs manual site summarisation: all their listed sites have hand-written summaries. InvisibleWeb [2] do the same job as Yahoo!, but with subject-specific search services. Either type of company might find automatic summarisation techniques useful.

There are many document summarisation algorithms, such as those referenced in [3,4]. This project first involves implementing one of them and trying it out on pages from a site or search service. The ideal would be to modify the algorithm to build a single summary based on several pages from a single source, then evaluate whether having more pages improves the summary. Evaluation would involve listing several summaries and asking a human judge (a volunteer) which is best. This project suits a student who is interesting in conducting really original research with commercial applications, performing an experiment and analysing the results.


Searching the Web: Less is More

Supervisor: Nick Craswell, Peter Bailey, David Hawking
E-mail: Nick.Craswell@cs.anu.edu.au
Phone: (02) 6249 4001

Analyse the Web search problem to see whether searching fewer documents can give better results.

IR since the 60's has asked "How to search?" a single set of documents. Today, any Web search engine could attempt to index a billion documents, and the question has become "What to search?" or "How much do I need to search?". Cutting down on what you index saves time, money, bandwidth and other resources. The question is whether doing so helps or hinders people in finding useful information.

The project is to evaluate three approaches used on the Web when deciding what to index. One is to build the biggest index possible. If you cover every document you ensure that every document is findable. Another is to only index "good" documents, based on link information. This is what large search engines do, and some (AltaVista, Excite and Google) give hints of their criteria for "goodness". The third approach is to only index documents that are within the right subject area. This is the vertical portal approach listed in one of the other honours topics. We now have ways of evaluating these different approaches, using Web-data test collections and over up to 100 gigabytes of Web pages (the size of a small Web search engine). This project would suit a student who wants a deep understanding of Web search, for use either in the Web search industry or Web research.

Information on limited search coverage:


Concurrent Application Spaces in Virtual Environments

Supervisor: Raj Nagappan
E-mail: Raj.Nagappan@cs.anu.edu.au
Phone: (02) 6279 8179

Users of desktop computers possess the ability to interact with multiple applications concurrently through the advent of the WIMP (Windows, Icons, Menus, Pointer) interface. Currently there is no such similar system for Virtual Environment (VE) applications. In 1999 Justin Waddell carried out an exploration of this concept, resulting in the development of a small prototype system that used rectangular prisms to contain varied applications in a single VE workspace.

This project continues work on the concept of a "window manager" for VE applications. There are a number of avenues that may be explored, possibilities include development of a different space metaphor (with comparison to the prism metaphor), or extensions to the prism-based system and detailed user analysis thereof. This project would suit a student who is interested in user interface design, task analysis and VE applications development.


Efficient Task Scheduling on Cluster Computers

Supervisor: Peter Strazdins
E-mail: Peter.Strazdins@anu.edu.au
Phone: (02) 6249 5041

Details are here.


Extending the SimICS SPARC Simulator

Supervisor: Peter Strazdins
E-mail: Peter.Strazdins@anu.edu.au
Phone: (02) 6249 5041

Details are here.


Emulating a Symmetric multiprocessor on the Fujitsu AP+

Supervisor: Peter Strazdins
E-mail: Peter.Strazdins@anu.edu.au
Phone: (02) 6249 5041

Details are here.


Trace and performance visualisation tools for Beowulf parallel computers

Supervisor: Chris Johnson
E-mail: Chris.Johnson@anu.edu.au
Phone: (02) 6249 4509

The Beowulf is a class of parallel computers which are built from commodityhardware rather than highly specialised, expensive hardware. The Beowulf system being constructed at ANU Computer Sciences Laboratory (RSISE) and Dept of Computer Science (FEIT) will have at least 128, possibly 192, processors running variants of the Linux operating system, interconnected by 100 Mbps Ethernet networks. Application programs include machine learning.

Effective use of parallel computers requires debugging and performance analysis tools. The standard approach is off-line visualisation of large event trace files; numbers of tools exist on many different systems. Until recently most parallel computers had proprietory operating systems and there are few standardised tools.

An investigation of existing tools is needed, with the practical part of the project involving implementation or porting of one or more tools onto the Beowulf. Issues include:

  • design of trace collection and management utilities (large amounts of data are involved)
  • design and implementation of intelligent analysis assistant software such as recognition of patterns of events, discovering and displaying anomalies and deviations from the intended communication patterns, identification of bottlenecks and hotspots (some of this possibly relates to data mining, but simpler approaches are more common)
  • design and implementation of visualisation and navigation tools
  • elimination or minimisation of probe effects (unintended modification of program behaviour caused by tracing the behaviour)
  • analysing event streams from processors with unsynchronised clocks

Background required

  • an interest in programming utilities and programming development environments
  • COMP3037 Operating Systems Implementation, or similar
  • an interest in concurrency, parallel computing, message passing programs.

Further reading

P.B. Thistlewaite and C.W. Johnson
Towards Debugging and Analysis Tools for Kilo-Processor Computers, Fujitsu Scientific and Technical Journal, vol 29 no. 1 (Spring 1993), pp. 32-40 (available in hard copy from Chris Johnson)
Debugging parallel programs using incomplete information
current, related work on debugging program topologies at University of Western Australia
Debugging Concurrent Programs
(an earlier survey article): C. E. McDowell and D. P. Helmbold, ACM Computing Surveys, vol. 21, no. 4, December 1989, pp. 593-622.
UPSHOT
the UPSHOT program visualisation system from Argonne (the latest version is Tcl/tk based).
Beowulf systems
NASA Beowulf project home page from NASA Jet Propulsion Lab (the origin of Beowulf systems)

Optimizing proofs for Proof Carrying Code

Supervisor: Malcolm Newey
E-mail: Malcolm.Newey@anu.edu.au
Phone: (02) 6249 4506

When a user gets a program from across the web (an applet, for example) there is need for some guarrantee that the code does what it says it does and does not do nasty things. The Java sandbox is a way of addressing the second of these concerns. The ultimate way to give confidence to a user that appropriate correctness and safety properties hold is to provide a machine checkable proof of those properties. This is the basis for the notion of PCC (proof carrying code).

For any particular property of some given code there will be many possible versions of many different proofs; some more suitable than others for transmission and checking, For example, a base level proof in a primitive logic can be shown to be valid by a simple checker in which the user can have great confidence; on the other hand, such a proof is likely to be so bulky that its transmission will be too expensive and its parsing will be time-consuming. On the other hand, some properties need just a one-line proof which is simply the name of a powerful decision procedure; in this case, transmission is not bottleneck but decision procedures are often exponential in execution time and the user may find the delay in executing an applet unbearable.

This project is about exploring how one can select an optimal version of a family of proofs - an appropriate compromise between bulk of proof to transmit and speed of checking at the receiving end. For the sake of the study, the properties will be expressed in higher order logic using propositional logic, integer arithmetic and finite set theory. The suggested mechanical prover/checker is the HOL system (which is used in the honours course on mechanical verification).


Genetic algorithms

Supervisor: Ramesh Sankaranarayana and Georg Weiller
E-mail: Ramesh@cs.anu.edu.au
Phone: (02) 6249 4281

Involves developing aspects of an algorithm to simulate genetic recombination. Need to test the algorithm on genetic data and compare results with those generated by existing algorithms. Background in algorithms needed.


Optimization of Java programs by bytecode transformation

Supervisor: Alonso Marquez
E-mail: Alonso.Marquez@cs.anu.edu.au
Phone: (02) 6279 8193

Java's high level bytecode makes very easy to implement interesting program transformations like specialization, partial evaluation, inlining, cloning, etc. This bytecode modification allows to implement many sort of optimization and semantic transformation in a transparent way. Our group have been applying this technique as a way to add orthogonal persistence and object versioning to Java programs. We also have developed a first prototype of Object Inlining in Java. In this project we would like to explore several bytecode transformations for optimization of Java code. In particular, we want to complete the Object inlining prototype and evaluate different folding strategies.


Garbage Collection in RDBMS

Supervisor: Ramesh Sankaranarayana
Alonso Marquez
E-mail: Ramesh@cs.anu.edu.au
Phone: (02) 6249 4281

Making Java programs orthogonally persistence requires that persistent objects be stored in a long term storage medium like a database. One of the different types of databases that can be used for this purpose is a relational database. In order to use the database efficiently, persistent objects that are no longer required need to be removed from the database. This project looks at developing and implementing a garbage collector that does the above.


Exploring Multivariate Data Fields with a Haptic Device

Supervisor: Matthew Hutchins
E-mail: Matthew.Hutchins@cmis.csiro.au
Phone: (02) 6216 7088

This project would look at ways of using the PHANToM haptic device to explore or "visualise" scientific data sets. The PHANToM is an excellent 3D input device for probing a 3D space. Further, force feedback can be used to constrain the exploration to surfaces or areas of interest. The probe can be connected to other forms of feedback, such as visual or auditory, to provide detailed information about values at a point. Further, the haptic properties of the device could be used to convey data values through force, vibration, etc. The project could tackle one, some, or all of these challenges, and more:

  • integrating haptics with existing visualisation or scenegraph systems
  • using haptic constraints to facilitate probing
  • integrating probing with visual and/or audio feedback
  • design of techniques for conveying data through forces
  • investigation of whether the haptics improves appreciation of the data set by the user
  • determining what sort of data sets are amenable to this sort of display, and finding some good examples

Constrained Navigation of 3D Scenes using a Haptic Device

Supervisor: Matthew Hutchins
E-mail: Matthew.Hutchins@cmis.csiro.au
Phone: (02) 6216 7088

This project would extend work done by Hanson and Wernert (Vis '97, Vis '99) on constrained navigation of 3D scenes. Navigation can be seen as selecting a sequence of viewing parameters from a fairly high-dimensional space. The task can be made easier by defining a lower-dimensional constraint surface and navigating that instead. This could be an ideal situation to use haptically constrained input techniques. The approach also has natural extensions to collaborative exploration of a VE by multiple participants. The project could tackle one, some or all of these challenges, and more:

  • combining haptic input with 3D display and navigation software
  • design of interaction techniques to empower navigation
  • implementation of constraint surfaces
  • extensions to collaborative exploration
  • extensions to general exploration of multi-dimensional spaces


[DCS Home] [News & Events] [What is CS & IT] [Education] [People] [Mailing Lists] [Research] [StReaMS] [Contact Details] [FEIT Home] [ANU Home]


Authorised by the Dean, FEIT
© The Australian National University
ANU CRICOS Provider Code - 00120C
Last modified Dec 14, 2001
General enquiries: 6125 4043
For more information or to make comments please email webmaster@cs.anu.edu.au