Sun UltraSPARC T2 (Niagra) Processor Summary
COMP3320 -
March 2010
The following
table briefly summarises the Sun UltraSPARC T2 processor which will
be used for Lab 2 (Hardware Architecture and Code Performance). Note
that our T2 system (wallaman.anu.edu.au) can be accessed via SSH.
The T2 is a processor consisting
of 8-cores, with each core supporting 8 hardware threads. Released in
2007, it is a rather unique example of a Post-RISC multi-core
architecture, more aimed towards server/cryptographic applications
than floating point performance. Unlike contemporary Intel processors
which emphasise on high clock speeds, branch prediction, deep
pipelines, out-of-order/speculative execution, the T2 adopts a highly
threaded approach using relatively simple set of processing cores.
While the performance of each core may not be spectacular, the
performance of the memory subsystem is heavily emphasised upon in its
design, with features such as high bandwidth cross bar switching
between L2 Cache and Core, and on-chip memory controllers and network
cards. As such, the T2 has broken several single/dual CPU SPEC
benchmark records, including SPECweb2005 (link),
SPECint_rate2006, and SPECfp_rate2006
(link).
Diagram from David
Kanter's Article
The T2 can also support virtualization.
That is, a single T2 system can be partitioned into 64 virtualized
domains, potentially with a different operating system running on
each domain. For Lab2, however, we will only be concerned with the
memory subsystem, specifically the L1 and L2 cache performance. Since
we are not using threads, and aside from the possibility of
contention due to twenty students running their programs on the T2 at
once, the performance results obtained in Lab 2 should be reasonably
stable.
|
Architecture |
SPARC v9 |
|
Technology |
- 65 nm |
|
Year |
2007 |
|
Basic Descriptions |
- Post-RISC |
|
Execution Cores |
|
|
Cores |
8 x [1.2-1.6] GHz SPARC 64-bit Cores |
|
Pipelines (Per Core) |
- Relatively short Pipeline Length |
|
On-Chip Memory Subsystem |
|
|
L1 Instruction Cache (Per Core) |
- 16KB, 32B Cache Line, 8-way Set Associative |
|
L1 Data Cache (Per Core) |
- 8KB, 16B Cache Line, 4-way Set Associative |
|
L2 Data Cache (Shared) |
- 4MB, 64B Cache Line, 16-way Set Associative |
|
TLB |
- 128-entry Data-TLB |
|
Memory Controller |
- 4 x 667 MHz dual channel FB-DIMM controllers |
|
I/O |
- Support for DMA |
FPU = Floating Point Unit
LSU = Load Store Unit
ALU =
Arithmetic Logic Unit
SPU = Security Processing Unit
References:
Niagra
II: The Hydra Returns
COMP8320
Lecture 3: T2