|
|
COMP8320 Tutorial 02 -- week 2, 2011
T2 Design and Architecture
As well as complete/revise Lab 01,
revise Lecture 2,
please read the articles mentioned below before the tutorial
(handouts of all will be distributed in lecture 2).
Compile a list of architectural (or other) concepts and ideas that you
would like your tutor to explain.
Afterwards, discuss in small groups the following questions:
-
Using your knowledge of the T2 architecture, why might you expect
to get better performance for hardware threading (4 to 32 threads
in Lab 01) for (i) sum over dsum and (ii)
dsum over dcopy.
- The dcopy 4000000 program with 32 threads performed the copy
on wallaman (with 32 CPUs allocated) in about 0.014s.
Calculate both copy speed in MB/s and the relative size of the
two arrays to the the L2$. Compare this with the bandwidth of the T2
between the (4) cores and L2$ and L2$ and main memory. What reasons
could account for the performance gap?
Note: when the program was modified to initialize both arrays,
and also accessed each element of both (by a sum loop) before the copy,
it performed slightly better (0.0057s with 16 threads),
-
The program dsum 1000000 ran best (on wallaman, 1.2 GHz,
56 virtual CPUs allocated) at 0.0014s with 16 threads. Calculate the floating
point operation rate and compare it with the peak theoretical rate.
Considering that there must be at least one load operation per floating
point addition, and considering the T2's pipeline, what would be the
maximum speed you could expect on this application?
- Describe the sequence of events when a load instruction at virtual
address x is executed that misses both the level 1 and 2
caches. Assume that the DLTB has a mapping for x to the
physical address x', whose lowest 16 bits is 0x02a4. Your answer
should cover all aspects of the T2 pipeline and memory system. Repeat the
above, for a store instruction instead of a load.
- What are is the main difference in the terms instruction-level
parallelism and thread-level parallelism? What is their
common objective?
- What is the difference (in design approach) between Intel's
Hyperthreading and the hardware-multithreading in the Niagara chip?
- Why was the Niagara-1 design (multicore combined with hardware
multi-threading, with minimal floating point support) a cost-effective
solution for typical commercial server applications? (find at lease 5
reasons).
Why would it be better than a traditional SMP architecture
(with the same number of CPUs and same clock speed)?
Last modified: 18/08/2011, 12:56
|