CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science
School of Computer Science
Printer Friendly Version of this Document

UniSAFE

COMP8320 Tutorial 05 -- week 7, 2011

Graphics Processing Units

Please read the articles mentioned below before the tutorial. Afterwards, discuss in small groups the following questions.

  1. Why is CPU + GPU a useful programming model?

  2. In OpenMP, it is possible to incrementally parallelize an existing serial application. Is this the case with GPUs? Discuss.

  3. What are the main similarities and differences between the concepts of threading on chip multiprocessors (e.g. the Niagara-2) and GPU?

  4. When programming GPUs, one aim to use a number of threads that is much greater than the number of cores. Give (at least) two reasons for this.

  5. On the S2050 GPU, one can have up to 1,536 threads organized into warps of 32. How long a latency (in cycles) can be hidden?

  6. The Fermi SM actually has two warp dispatchers, which dispatches an instruction from each on 16 cores (as opposed to a single dispatcher which dispatched all 32 threads). Why is this? (I would like to know!)

  7. Consider the matrix multiply calculation C += A*B where C is nxn, A> is nxk and B> is kxn -- all these are double precision. Assume k<<n. A CPU-GPU system such as Xe has an 8 Gbytes/s transfer speed (via PCI-e Gen2). Assume that the CPU can perform a (large) matrix multiply at 6 GFLOP/s (note the multiply has 2n^2k FLOPs). How large does n need to be before it becomes worthwhile to offload this computation onto the GPU (assuming that the GPU can calculate infinitely fast)? Repeat the above for a CPU speed of 40 GFLOP/s.

  8. Table 3 of the first article gives some `speedups' for various application on a GPU? What precisely do the authors mean by this term? How meaningful are these figures?

  9. Critically assess the claim that the future of GPUs such as the Tesla is secure because of its market in the gaming industry. i.e. what kinds of games really require such an accelerator?

  10. How valid is the objection I have N codes but one IT budget? In other words, what (kind of) applications can be accelerated effectively and what cannot?

  11. In terms of heterogeneous processing, where do GPUs fit in the spectrum?

Last modified: 6/09/2011, 16:49

Copyright | Disclaimer | Privacy | Contact ANU