ANU Computer Science Technical Reports
TR-CS-06-03
Stephen M. Blackburn and Kathryn S. McKinley.
Transient Caches and Object Streams.
October 2006.
[POSTSCRIPT (363525 bytes)] [PDF (409926 bytes)]
Abstract: Memory latency limits program performance.
Object-oriented languages such as C# and Java exacerbate this problem, but
their software engineering benefits make them increasingly popular. We show
that current memory hierarchies are not particularly well suited to Java in
which object streams write and read a window of short-lived objects
that pollute the cache. These observations motivate the exploration of
transient caches which assist a parent cache. For an L1 parent cache,
transient caches are positioned similarly to a classic L0, providing one
cycle access time. Their distinguishing features are (1) they are tiny (4 to
8 lines), (2) they are highly associative, and (3) the processor may seek
them in parallel with their parent. They can assist any cache level.
To address object stream behavior, we explore policies for read and write
instantiation, promotion, filtering, and valid bits to implement no-fetch on
write. Good design points include a parallel L0 (PL0) which improves
Java programs by 3% on average, and C by 2% in cycle-accurate simulation
over a two-cycle 32KB, 128B line, 2-way L1. A transient qualifying
cache (TQ) improves further by a) minimizing pollution in the parent by
filtering short-lived lines without temporal reuse, and b) using a write
no-fetch policy with per-byte valid bits to eliminate wasted fetch bandwidth.
TQs at L1 and L2 improve Java programs by 5% on average and up to 15%. The
TQ even achieves improvements when the parent has half the capacity or
associativity compared to the original larger L1. The one-cycle access time,
a write no-fetch policy, and filtering bestow these benefits. Java motivates
this approach, but it also improves for C programs.
Technical Reports <Technical-DOT-Reports-AT-cs-DOT-anu.edu.au>
Last modified: Tue May 31 12:56:01 EST 2011