Parallel Tiled Algorithm Development

Description

The tiled algorithm approach to parallel programme has gained popularity  over recent years, often in the context of task-DAG (directed acyclic graph) programming models such as PLASMA and Intel CnC.  Based on these ideas, a programming model was introduced in the COMP4300/8300 Parallel Systems course that requires only minimal extra infrastructure and is far simpler and less error-prone than traditional approaches. That year, right-looking Cholesky factorization was used as an example, and the approach was proved on the example of right-looking Cholesky factorization using the MPI (message passing), OpenMP and CUDA programming models.

As well as being simpler, the approach has the advantage of being data distribution independent, and lends itself to 'lazy' (simpler) vs 'eager' (faster) strategies, which enabled the students.

This project will look at applying the approach to one or more other promising algorithms, developing them to a similar extent so these could also be used for future teaching assignments. These include block-sparse Cholesky, All-pairs Shortest Paths, stencils, N-body and Smith-Waterman. As well as the development of algorithms, the performance tradeoffs of various strategies and data distributions will be examined. Supporting  infrastructure for the generation of meaningful data,  convenient and ideally efficient result checking and result visualization will also be required. This requires developing some domain specific knowledge of the algorithm (particularly challenging for Smith-Waterman)

 

Requirements

Exoperinet in parallel programming (MPI, OpenMP and CUDA)

Background Literature

G.~Bosilca et~al., ``Flexible development of dense linear algebra algorithms on massively parallel architectures with dplasma,'' in  2011  IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum}.IEEE, 2011, pp. 1432--1441.

F.~Schlimbach, J.~C. Brodman, and K.~Knobe, ``Concurrent collections on distributed memory theory put into practice,'' in 2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing. IEEE, 2013, pp. 225--232

https://cs.anu.edu.au/courses/comp4300/2020/assign...

 

Gain

This is potentially publishable and could have impact on ANU future parallel computing curriculum.

Updated:  1 June 2019/Responsible Officer:  Head of School/Page Contact:  CECS Webmaster