############################################################ Seminar Announcement School of Computer Science, CECS The Australian National University ############################################################ Date: Thursday, 19 Feb 2009 Time: 4:00 pm to 5:00 pm Venue: Room N101, CSIT Building [108] Speaker: Danny Robson Title: Efficient and Accurate Performance Evaluation of Multi-core NUMA Clusters (Thesis Proposal Review seminar) Abstract: In recent years there has been increasing investment in large, parallel, compute clusters; particularly those with more capable, parallel processor technology. These systems have proved useful in executing highly numerical workloads in various scientific fields. As hardware becomes more complex, it becomes increasingly difficult to accurately predict system performance and discover attributes that contribute significantly to workload execution. This is especially apparent when utilising Chip Multi-Processing (CMP) and Non-Uniform Memory Access (NUMA) architectures. Due to this complexity, optimal use of these systems requires use of performance analysis tools, on both individual nodes and clusters as a whole. Existing analysis tools employ a variety of techniques with a range of efficiency, accuracy and precision characteristics; analytical methods, software and hardware profiling, and emulation and simulation are all widely applied in many contexts. Dynamic Binary Translation (DBT) and Dynamic Binary Instrumentation (DBI) techniques allow an efficient means of adding instrumentation code to arbitrary executables at runtime. These allow a tool developer to insert arbitrary code which drives performance analysis models during guest execution. By investigating simulation techniques for workload and system analysis, we will create an accessible means for performance evaluation. The focus will be on assisting developer analysis of highly parallel, numerical applications through precise and easily accessible feedback on their application's execution. This talk focuses on our existing and proposed performance analysis methods. We will discuss the tools we are developing with the Valgrind DBI framework, including modifications which allow parallel execution of guest threads with associated execution speedup. To enable efficient analysis of single cluster node we are constructing an improved cache simulator and NUMA emulation layer within our analysis tool, NUMAgrind. We will outline future work in the addition of timing accuracy to the NUMAgrind tool, for single nodes and full clusters. In the single node case we outline methods for coupling existing processor models and simulators to the memory system for timing estimation. In the cluster case, we describe coupling timing estimators for network communications and how the results will moderate the timing of guest execution. Biography: Danny received a Bachelor of Engineering (Software) from the University of New South Wales in 2007. He is currently a PhD student at the School of Computer Science associated with the CC-NUMA Project. URL: http://cs.anu.edu.au/lib/seminars/seminars09/dept20090219 ############################################################ Seminars homepage: http://cs.anu.edu.au/seminars/ If you like to give a seminar please contact: seminars-owner [at] cs.anu.edu.au ############################################################