Z-Rays: Divide Arrays and Conquer Speed and Flexibility. Jennifer B. Sartor, Stephen M. Blackburn, Daniel Frampton, Martin Hirzel, and Kathryn S. McKinley. In PLDI 2010: The 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation. DOI 10.1145/1809028.1806649
Abstract:
Arrays are the ubiquitous organization for indexed data. Throughout programming language evolution, implementations have laid out arrays contiguously in memory. This layout is problematic in space and time. It causes heap fragmentation, garbage collection pauses in proportion to array size, and wasted memory for sparse and over-provisioned arrays. Because of array virtualization in managed languages, an array layout that consists of indirection pointers to fixed-size discontiguous memory blocks can mitigate these problems transparently. This design however incurs significant overhead, but is justified when real-time deadlines and space constraints trump performance.
This paper proposes z-rays, a discontiguous array design with flexibility and efficiency. A z-ray has a spine with indirection pointers to fixed-size memory blocks called arraylets, and uses five optimizations: (1) inlining the first N array bytes into the spine, (2) lazy allocation, (3) zero compression, (4) fast array copy, and (5) arraylet copy-on-write. Whereas discontiguous arrays in prior work improve responsiveness and space efficiency, z-rays combine time efficiency and flexibility. On average, the best z-ray configuration performs within 12.7% of an unmodified Java Virtual Machine on 19 benchmarks, whereas previous designs have two to three times higher overheads. Furthermore, language implementers can configure z-ray optimizations for various design goals. This combination of performance and flexibility creates a better building block for past and future array optimization.
Demystifying Magic: High-level Low-level Programming. Daniel Frampton, Stephen M. Blackburn, Perry Cheng, Robin Garner, David P. Grove, J. Eliot B. Moss and Sergey I. Salishev. In VEE 2009: The 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments. DOI 10.1145/1508293.1508305
Abstract:
The power of high-level languages lies in their abstraction over hardware and software complexity, leading to greater security, better reliability, and lower development costs. However, opaque abstractions are often show-stoppers for systems programmers, forcing them to either break the abstraction, or more often, simply give up and use a different language. This paper addresses the challenge of opening up a high-level language to allow practical low-level programming without forsaking integrity or performance.
The contribution of this paper is three-fold: 1) we draw together common threads in a diverse literature, 2) we identify a framework for extending high-level languages for low-level programming, and 3) we show the power of this approach through concrete case studies. Our framework leverages just three core ideas: extending semantics via intrinsic methods, extending types via unboxing and architectural-width primitives, and controlling semantics via scoped semantic regimes. We develop these ideas through the context of a rich literature and substantial practical experience. We show that they provide the power necessary to implement substantial artifacts such as a high-performance virtual machine, while preserving the software engineering benefits of the host language.
The time has come for high-level low-level programming to be taken more seriously: 1) more projects now use high-level languages for systems programming, 2) increasing architectural heterogeneity and parallelism heighten the need for abstraction, and 3) a new generation of high-level languages are under development and ripe to be influenced.
Wake Up and Smell the Coffee: Evaluation Methodology for the 21st Century. Stephen M. Blackburn, Kathryn S. McKinley, Robin Garner, Chris Hoffmann, Asjad M. Khan, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage and Ben Wiedermann. In Communications of the ACM, August 2008, Volume 51, No. 8. DOI 10.1145/1378704.1378723
Abstract:
Evaluation methodology underpins all innovation in experimental computer science. It requires relevant workloads, appropriate experimental design, and rigorous analysis. Unfortunately, methodology is not keeping pace with the changes in our field. The rise of managed languages such as Java, C#, and Ruby in the past decade and the imminent rise of commodity multicore architectures for the next decade pose new methodological challenges that are not yet widely understood. This paper explores the consequences of our collective inattention to methodology on innovation, makes recommendations for addressing this problem in one domain, and provides guidelines for other domains. We describe benchmark suite design, experimental design, and analysis for evaluating Java applications. For example, we introduce new criteria for measuring and selecting diverse applications for a benchmark suite. We show that the complexity and nondeterminism of the Java runtime system make experimental design a first-order consideration, and we recommend mechanisms for addressing complexity and nondeterminism. Drawing on these results, we suggest how to adapt methodology more broadly. To continue to deliver innovations, our field needs to significantly increase participation in and funding for developing sound methodological foundations.
Generational Real-Time Garbage Collection. Daniel Frampton, David F. Bacon, Perry Cheng and David Grove. In ECOOP 2007: The 21st European Conference on Object-Oriented Programming. DOI 10.1007/978-3-540-73589-2_6
Abstract:
While real-time garbage collection is now available in production virtual machines, the lack of generational capability means applications with high allocation rates are subject to reduced throughput and high space overheads. Since frequent allocation is often correlated with a high-level, object-oriented style of programming, this can force builders of real-time systems to compromise on software engineering.
We have developed a fully incremental, real-time generational collector based on a tri-partite nursery, which partitions the nursery into regions that are being allocated, collected, and promoted. Nursery collections are incremental, and can occur within any phase of a mature collection.
We present the design, mathematical model, and implementation of our collector in IBM's production Real-time Java virtual machine, and show both analytically and experimentally that the collector achieves real-time bounds comparable to a non-generational Metronome-style collector, while cutting memory consumption and total execution times by as much as 44% and 24% respectively.
Stopless: A Real-time Garbage Collector for Multiprocessors. Filip Pizlo, Daniel Frampton, Erez Petrank and Bjarne Steensgaard. In ISMM 2007: The 2007 International Symposium on Memory Management. DOI: 10.1145/1296907.1296927
Abstract:
We present Stopless: a concurrent real-time garbage collector suitable for modern multiprocessors running parallel multithreaded applications. Creating a garbage-collected environment that supports real-time on modern platforms is notoriously hard,especially if real-time implies lock-freedom. Known real-time collectors either restrict the real-time guarantees to uniprocessors only, rely on special hardware, or just give up supporting atomic operations (which are crucial for lock-free software). Stopless is the first collector that provides real-time responsiveness while preserving lock-freedom, supporting atomic operations, controlling fragmentation by compaction, and supporting modern parallel platforms. Stopless is adequate for modern languages such as C# or Java. It was implemented on top of the Bartok compiler and runtime for C# and measurements demonstrate high responsiveness (a factor of a 100 better than previously published systems), virtually no pause times, good mutator utilization, and acceptable overheads.
Effective Prefetch for Mark-Sweep Garbage Collection. Robin Garner, Stephen M. Blackburn and Daniel Frampton. In ISMM 2007: The 2007 International Symposium on Memory Management. DOI: 10.1145/1296907.1296915
Abstract:
Garbage collection is a performance-critical feature of most modern object oriented languages, and is characterized by poor locality since it must traverse the heap. In this paperwe show that by combining two very simple ideas wecan significantly improve the performance of the canonical mark-sweep collector, resulting in improvements in application performance. We make three main contributions: 1) we develop a methodology and framework for accurately and deterministically analyzing the tracing loop at the heart ofthe collector, 2) we offer a number of insights and improvements over conventional design choices for mark-sweep collectors, and 3) we find that two simple ideas: edge order traversal and software prefetch. combine to greatly improve garbage collection performance although each is unproductive in isolation. We perform a thorough analysis in the context of MMTk and Jikes RVM on a wide range of benchmarks and four different architectures. Our baseline system (which includes a number of our improvements) is very competitive with highly tuned alternatives. We show a simple marking mechanism which offers modest but consistent improvements over conventional choices. Finally, we show that enqueuing the edges pointers) of the object graph rather than the nodes (objects) significantly increases opportunities for software prefetch, despite increasing the total number of queue operations. Combining edge ordered enqueuing with software prefetching yields average performance improvements over a large suite of benchmarks of 20-30% in garbage collection time and 4-6% of total application performance in moderate heaps, across four architectures.
The DaCapo Benchmarks: Java Benchmarking Development and Analysis. Stephen M. Blackburn, Robin Garner, Chris Hoffmann, Asjad M. Khan, Kathryn S. McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z. Guyer, Martin Hirzel, Antony L. Hosking, Maria Jump, Han Bok Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovic, Thomas VanDrunen, Daniel von Dincklage and Ben Wiedermann. In OOPSLA 2006: The 21st ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications. DOI: 10.1145/1167473.1167488
Abstract:
Since benchmarks drive computer science research and industry product development, which ones we use and how we evaluate them are key questions for the community. Despite complex runtime tradeoffs due to dynamic compilation and garbage collection required for Java programs, many evaluations still use methodologies developed for C, C++, and Fortran. SPEC, the dominant purveyor of benchmarks, compounded this problem by institutionalizing these methodologies for their Java benchmark suite. This paper recommends benchmarking selection and evaluation methodologies, and introduces the DaCapo benchmarks, a set of open source, client-side Java benchmarks. We demonstrate that the complex interactions of (1) architecture, (2) compiler, (3) virtual machine, (4) memory management, and (5) application require more extensive evaluation than C, C++, and Fortran which stress (4) much less, and do not require (3). We use and introduce new value, time-series, and statistical metrics for static and dynamic properties such as code complexity, code size, heap composition, and pointer mutations. No benchmark suite is definitive, but these metrics show that DaCapo improves over SPEC Java in a variety of ways, including more complex code, richer object behaviors, and more demanding memory system requirements. This paper takes a step towards improving methodologies for choosing and evaluating benchmarks to foster innovation in system design and implementation for Java and other managed languages.
Extended version available as a tech report here.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation. Samuel Z. Guyer, Kathryn S. McKinley and Daniel Frampton. In PLDI 2006: The ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation. DOI: 10.1145/1133981.1134024
Abstract:
Garbage collection has proven benefits, including fewer memory related errors and reduced programmer effort. Garbage collection, however, trades space for time. It reclaims memory only when it is invoked: invoking it more frequently reclaims memory quickly, but incurs a significant cost; invoking it less frequently fills memory with dead objects. In contrast, explicit memory management provides prompt low cost reclamation, but at the expense of programmer effort.This work comes closer to the best of both worlds by adding novel compiler and runtime support for compiler inserted frees to a garbage-collected system. The compiler's free-me analysis identifies when objects become unreachable and inserts calls to free. It combines a lightweight pointer analysis with liveness information that detects when short-lived objects die. Our approach differs from stack and region allocation in two crucial ways. First, it frees objects incrementally exactly when they become unreachable, instead of based on program scope. Second, our system does not require allocation-site lifetime homogeneity, and thus frees objects on some paths and not on others. It also handles common patterns: it can free objects in loops and objects created by factory methods.We evaluate free() variations for free-list and bump-pointer allocators. Explicit freeing improves performance by promptly reclaiming objects and reducing collection load. Compared to marksweep alone, free-me cuts total time by 22% on average, collector time by 50% to 70%, and allows programs to run in 17% less memory. This combination retains the software engineering benefits of garbage collection while increasing space efficiency and improving performance, and thus is especially appealing for real-time and space constrained systems.
Demonstration: On-Line Visualization and Analysis of Real-Time Systems with TuningFork. David F. Bacon, Perry Cheng, Daniel Frampton, David Grove, Matthias Hauswirth and V. T. Rajan. In CC 2006: The 15th International Conference on Compiler Construction. DOI: 10.1007/11688839_8
Efficient Concurrent Cycle Detection. Daniel Frampton, Stephen M. Blackburn, Luke N. Quinane and John Zigman. Presented at WOSSA'2006: The Second International Workshop on Object Systems and Software Architectures
An Investigation into Automatic Dynamic Memory Management Strategies using Compacting Collection. Daniel Frampton. Presented at WOSSA'2004: The First International Workshop on Object Systems and Software Architectures
An Investigation into Automatic Dynamic Memory Management Strategies using Compacting Collection. Daniel Frampton. Honours Thesis for the degree of Bachelor of Software Engineering, Australian National University, 2003. Available here.