This trend has coincided with the resurgence of virtualization technology. The virtualization technology is receiving widespread adoption mainly due to the potential benefits of server consolidation and isolation, flexibility, security and fault tolerance. Virtualization also offers other benefits, which include development/testing of applications, live migration and load balancing. Virtualization has also generated considerable interest in High Performance Computing (HPC) community dealing with COTS. This is mainly due to the reasons of high availability, fault tolerance, cluster partitioning and balancing out conflicting user requirements.
We believe that one can leverage the virtualization environment to achieve reduced job turn around times, especially in the case of COTS. These clusters are highly heterogeneous in nature which is due to the presence of different CPU architectures, available memory and the communication interfaces. Different CPU architectures, memory capacities, communication and I/O interfaces of the participating compute nodes present many challenges to the job schedulers and often result in under or over utilization of the compute resources. For this, the application programmers have to specifically write the application for the set of compute nodes, which is time consuming and is not a scalable option.
In this research, we have investigated resource scheduling in the compute clusters with the perspective of dynamic resource remapping. Our approach is to profile each job in the compute farm at runtime, and arrive at a near optimal resource map for each job. We then migrate the jobs to the best suited compute nodes to improve the overall through put of the compute farm. For this, we have developed a novel heterogeneity and virtualization aware profiling framework, which is able to predict the CPU and communication characteristics of the high performance scientific applications.