Student research opportunities
Fault-tolerant Data Services for High Performance Computing on Clouds
Project Code: CECS_631
This project is available at the following levels:
Honours, Masters, PhD
Keywords:
Service-oriented architecture, high performance computing, cloud computing, data fabrics
Supervisor:
Dr Peter StrazdinsOutline:
The Service-oriented Architecture approach has been successfully applied to large-scale grid applications (which are `embarrassingly parallel'), such as in computational finance. The `embarrassingly parallel' nature of these applications is exploited, yielding a friendly parallel programming paradigm which is naturally fault-tolerant and efficient under (dynamically) heterogeneous processing conditions. The HPNumSOA project at ANU has extended this approach to general parallel applications through the use of a `data service' (see the paper in the links below). However, the desired goal of (efficiently supported) fault-tolerance in this data service has not yet been addressed in this work.
Goals of this project
This project will investigate approaches to introduce fault-tolerance in such an extended SOA framework, under the constraints of memory scalability and minimal impact on performance. Particular issues of when such a framework is implemented on a public cloud, such as data transfer time and variability in latency, will also be investigated.
Requirements/Prerequisites
An Honours degree in Computer Science or equivalent.
Student Gain
This project is an extension on an existing project with Platform (TM) Computing, with opportunities to continue the collaboration. An Internship with Platform Computing (Beijing or Toronto) may be offered. There will also be access to a moderate-sized scientific cloud infrastructure through the adjoining NCI National Facility.
Background Literature
See the links below.
Links
HPNumSOA projectAn SOA Approach to HPC (HiPC'10 paper)



