A Beowulf-style cluster computer is a parallel computer using Commercial-off-the-Shelf switch-based network to communicate between the processors. Clusters have proved a highly cost-effective high performance computer model, and have largely displaced the traditional massively parallel computers built with expensive vendor-supplied networks. Instead, cluster networks such as Myrinet, InfiniBand or Quadrics support, in addition to the traditional message passing communication model, remote direct memory access (RDMA). This permits a process running on one node of the cluster to directly access the memory of a remote node without the involvement of the remote CPU.
However, it remains a major challenge to improve the usability of cluster systems, both to cover a wider range of applications and to make them easier for non-experts to use. This is largely because clusters are still largely programmed using the distributed memory message passing programming paradigm such as the Message Passing Interface (MPI) Message passing, often dubbed the ``assembly language of parallel computing'', typically requires a major re-write of the application code that can be difficult, both in terms of time and expertise required.
The OpenMP shared memory programming model has enjoyed increasing popularity as a parallel programming model, as it permits high-level specification of parallelism and synchronisation, supports a global address space, and affords incremental parallelization of applications. While it enjoys efficient support on shared memory multiprocessors, it can be implemented on clusters through Software Distributed Shared Memory (SDSM). An example of this is the OMNI Open MP compiler and the SCASH (SCore-based) SDSM.
However, the performance obtained in this way is in many situations unsatisfactory, as there will always be situations where an SDSM performs sub-optimally, and the implementation of coherency protocols over an SDSM often requires more communication than is necessary. Parallel compiler technology in terms of dependency analysis has been improving to the point where the compilation of OpenMP programs may bypass the SDSM entirely for many situations. This project will investigate how shared memory programs may be compiled into distributed memory programs using only remote memory operations to communicate. Where this is successful, it may be possible to approach, or even surpass, the performance achieved by hand-written MPI programs.