Machine Learning workloads are becoming increasingly more prevalent and compute-intensive. They are run on standard multicore porocessors and accelerators such as GPUs, as well as custom or semi-custom devices such as Tensor Processing Unnits and Qualcomms Snapdragon DSP core.
This project will involve the benchmarking and performance analysis of various ML, with an emphasis on Deep Learning, workloadws, on a slection of processors, including standard x86-64 processors, GPUs and custom devices. The goals will be to find what are the dominant functions in the workloads andtheir characteristics (e.g. memory vs compute intensity), and to evaluate the effectivness of the different classes of architectures on processing them.