Student research opportunities
Fault-tolerant Algorithms for Petascale (and beyond) Supercomputers
Project Code: CECS_645
This project is available at the following levels:
CS single semester, Honours, Masters, PhD
Keywords:
parallel computing, multicore-computing, fault-tolerant algorithms, petascale computing, partial differential equations
Supervisors:
Emeritus Professor Richard BrentDr Peter Strazdins
Professor Alistair Rendell
Outline:
The ANU (Mathematical Sciences Institute and Research School of Computer Science) have entered an exciting collaborative research project with Fujitsu Labs Europe. It is part of the worldwide Open Petascale Libraries Project. Our focus will be on the robust and accurate solution of partial differential equations, with applications to tsunami modelling (ANUGA) and plasma physics (GENE).
Using new mathematical techniques, the resulting algorithms will be naturally fault-tolerant and will require minimal synchronization. These properties will be essential for petascale simulations on hundreds of thousands of processor cores (the sheer size of these systems means that the average time between processor failures will be of the order of tens of minutes!).
The PhD project in Computer Science will help investigate and develop the computational infrastructure to support these new algorithms. There is also a PhD project in Mathematics investigating the development of suitable mathematical techniques in investigating their robustness and scalability. Scholarships are available for both projects.
Goals of this project
The CS project will formulate and evaluate policies and techniques for the detection and recovery of faults in petascale computations. Techniques for optimization of communication and computational performance will also be looked at, targeting state-of-the-art highly multicore processors and communication networks (in particular, for the Japanese K supercomputer which became operational in 2011).
For the Maths project, the scope of the research would include the fundamental mathematical questions including a study of the models, the existence, uniqueness and stability of solutions. The main component of the PhD work would be investigating various algorithms for the approximation and solution of these equations. Questions considered here which have been studied recently in the literature for both application areas include in particular the comparison of various integrators and different discretisation methods study and develop the scalability of the approach in the these codes to very large numbers of multicore processors both with respect to the time and energy used for the computations and with respect to the reliability of the results.
Sub-projects can be defined for Honours and project students
Requirements/Prerequisites
(PhD scholars) First-class or upper second class Honours degree in computer science or mathematics, or equivalent. Experience in high performance computing or computational science would be an advantage.
Student Gain
This project is a unique opportunity to work on computational challenges on state-of-the-art systems. It is also part of an international project.
Local members of the project team include Markus Hegland (MSI), Peter Strazdins (CECS), Richard Brent (MSI/CECS), Alistair Rendell (CECS), Steve Roberts (MSI) and a post-doctoral fellow (MSI - to be appointed).
An APA scholarship will be available to the successful PhD applicants, with top-ups of up to $5000 for highly qualified applicants. For domestics students who are awarded an (APA or other) scholarship by the ANU on their own, the top-ups may be extended to up to $15000.
There are also travel and internship opportunities (Daresbury, UK). These are available from October 2011.
Scholarships/travel opportunities may be available for suitably qualified Honours students.
Background Literature
Jack Dongarra and Pete Beckman et al. International exascale software project – roadmap 1.0. CS Technical Report
ut-cs-10-654, University of Tennessee, 2010.
T. Maruyama, T. Yoshida, R. Kan, I. Yamazaki, S. Yamamura, N. Takahashi, M. Hondou, and H. Okano. Sparc64 VIIIfx:
a New-Generation octocore processor for petascale computing. Micro, IEEE, 30(2):30–40, 2010.
Y. Ajima, S. Sumimoto, and T. Shimizu. Tofu: A 6D Mesh/Torus interconnect for exascale computers. Computer, pages 36–40, 2009.
Links
Open Petascale Libraries projectThe K Supercomputer
ANUGA



