MSc/PhD project proposal:
Performance Instrumentation Infrastructure for Cluster Computers

supervisor: Dr Peter Strazdins

A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. The ANU Beowulf cluster Bunyip is such a cluster based on Fast (100Mb) Ethernet switches. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. They have even competed with the traditional symmetric multiprocessor (SMP) server in some applications, and also support for threaded programming models, based on distributed shared memory (DSM) middleware, is available on them.

Due to their highly mature performance instrumentation infrastructure (based on performance event counting libraries), performance evaluation methodologies have been largely successful for improving application performance and architectural design for SMPs. However, similar infrastructure does not exist, to the same level of maturity, on clusters. This is exacerbated by the fact that clusters are much more reconfigurable than SMPs, making the evaluation of cluster design an even more critical issue.

This project will develop and evaluate performance instrumentation infrastructure for cluster computers, developing it to a similar degree of functionality, usability and portability as is available on modern SMPs. This will permit an application, running on a cluster, to use library calls to obtain information such as the CPU cycles spent related to data sent/received, both directly (on message processing) and indirectly (waiting for messages). To do this, data must be gathered from the CPU, PCI bus (or HyperTransport, in the case of Opterons) and Network Interface card chipsets, and also some information must be recorded, and gathered from, the Linux kernel. Issues such as determining the effect of network contention, and identifying which events were on the `critical path' add an interesting challenge to this project.

This project will be undertaken in partnership with the local company Alexander Technology, which are providing/will provide a range of clusters to be used for experimentation (other DCS clusters such as the Bunyip will also be available). There may be possibilities for financial support and internships for suitably qualified applicants.

References

See the links above, and also:
Last Modified: Peter Strazdins, 07 Dec 2005