PhD project proposal:
Efficient Application Performance Modelling and Prediction for Cluster Computers

supervisors: Dr Peter Strazdins , Dr Paul Coddington University of South Australia

A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. The ANU Beowulf cluster Bunyip is such a cluster based on Fast (100Mb) Ethernet switches. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks.

With the dominance of the cluster computer, a large range of commodity networks has arrived. As well as various forms of ethernet networks (notably GigaBit Ethernet) it is possible to assemble a cluster using interconnects from Myrinet, InfiniBand or Quadrics. The ANU Jabberwocky multicluster is such a beast. Furthermore, it is possible to increase bandwidth by using multiple network connections per node in the cluster. Thus, there is a huge choice of potential cluster configurations possible, and determining the best possible configuration that meets a computational group's needs is a significant problem.

While it is infeasible for a variety of medium to large clusters to be available in advance for performance evaluation for a particular group's needs, it is feasible for a variety of cluster components (network and node types) to be made available to create a series of small-scale clusters for this purpose. The issue is then how to reliably extrapolate the benchmark results on these small systems to a cluster of significantly larger size.

This project aims to develop tools and methodologies to accurately predict the performance of MPI applications running on medium to large scale clusters. It will do this by enhancing and extending the MPIBench / PEVPM methodology, making it easier to use by automating some of the steps that are currently done manually, and developing tools to efficiently derive models of an application executing on the clusters.

This project will be undertaken in partnership with the University of Adelaide and the local company Alexander Technology, who has provided a range of clusters to be used for experimentation (other DCS/Uni of SA clusters such as the Bunyip and Hydra will also be available).

References

See the links above, and also:
Last Modified: Peter Strazdins, Fri Jun 8 10:18:42 EST 2007