CECS Home | ANU Home | Search ANU
The Australian National University
ANU College of Engineering and Computer Science
School of Computer Science
Printer Friendly Version of this Document

UniSAFE

COMP8320 Tutorial 02 -- week 2, 2011

T2 Design and Architecture

As well as complete/revise Lab 01, revise Lecture 2, please read the articles mentioned below before the tutorial (handouts of all will be distributed in lecture 2). Compile a list of architectural (or other) concepts and ideas that you would like your tutor to explain.

Afterwards, discuss in small groups the following questions:

  1. Using your knowledge of the T2 architecture, why might you expect to get better performance for hardware threading (4 to 32 threads in Lab 01) for (i) sum over dsum and (ii) dsum over dcopy.

  2. The dcopy 4000000 program with 32 threads performed the copy on wallaman (with 32 CPUs allocated) in about 0.014s. Calculate both copy speed in MB/s and the relative size of the two arrays to the the L2$. Compare this with the bandwidth of the T2 between the (4) cores and L2$ and L2$ and main memory. What reasons could account for the performance gap?

    Note: when the program was modified to initialize both arrays, and also accessed each element of both (by a sum loop) before the copy, it performed slightly better (0.0057s with 16 threads),

  3. The program dsum 1000000 ran best (on wallaman, 1.2 GHz, 56 virtual CPUs allocated) at 0.0014s with 16 threads. Calculate the floating point operation rate and compare it with the peak theoretical rate. Considering that there must be at least one load operation per floating point addition, and considering the T2's pipeline, what would be the maximum speed you could expect on this application?
  4. Describe the sequence of events when a load instruction at virtual address x is executed that misses both the level 1 and 2 caches. Assume that the DLTB has a mapping for x to the physical address x', whose lowest 16 bits is 0x02a4. Your answer should cover all aspects of the T2 pipeline and memory system. Repeat the above, for a store instruction instead of a load.

  5. What are is the main difference in the terms instruction-level parallelism and thread-level parallelism? What is their common objective?

  6. What is the difference (in design approach) between Intel's Hyperthreading and the hardware-multithreading in the Niagara chip?

  7. Why was the Niagara-1 design (multicore combined with hardware multi-threading, with minimal floating point support) a cost-effective solution for typical commercial server applications? (find at lease 5 reasons).

    Why would it be better than a traditional SMP architecture (with the same number of CPUs and same clock speed)?

Last modified: 18/08/2011, 12:56

Copyright | Disclaimer | Privacy | Contact ANU