Student research opportunities
Super Job Scheduling for Supercomputers
Project Code: CECS_939
This project is available at the following levels:
Honours, Masters, PhD
Keywords:
supercomputing, scheduling, cluster computing
Supervisor:
Dr Peter StrazdinsOutline:
Supercomputers, such as the newly-installed Raijin supercomputer at the NCI National Facility, are precious resources. Achieving and sustaining high utilization of their compute nodes is important both in order to get the best use of the resources and to save on the very significant ongoing energy costs.
The problem is non-trivial because at any time there are a large number of job request with various requirements for the number of nodes and total memory needed, and differing job queues and priorities. It has been observed that commercial job schedulers, which are based on PBS (Parallel Batch Scheduler), achieve unsatisfactory utilization. Extensive in-house experience at the NCI National Facility have indicated that better algorithms and supporting techniques exist.
This project will undertake an in-depth study and analysis of existing scheduling algorithms that are in the open literature and reflected in commercial products, and of the algorithms developed at NCI NF. Enabling techniques such ad pre-emption and migration will also be studied.
Goals of this project
The goals of this project is to develop a scheduler that can significantly outperform existing products and improve and disseminate new insights into the wider community.
Requirements/Prerequisites
For PhD level, Honours degree in Computer Science I or IIa (or equivalent), or other credentials showing promise for such a research project.
Student Gain
Collaboration with and access to the facilites of the NCI NF. This project has very strong potential impact.
Background Literature
Scheduling HPC workflows for responsiveness and
fairness with networking delays and inaccurate
estimates of execution times.
Andrew Burkimsher, Iain Bate, Leandro Soares Indrusiak.
Europar 2011



