Myths and Realities: The Performance Impact of Garbage Collection

Similar documents
Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

Opera&ng Systems CMPSCI 377 Garbage Collec&on. Emery Berger and Mark Corner University of Massachuse9s Amherst

Garbage Collection Techniques

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection

Myths and Realities: The Performance Impact of Garbage Collection

A new Mono GC. Paolo Molaro October 25, 2006

Run-Time Environments/Garbage Collection

Dynamic Selection of Application-Specific Garbage Collectors

Exploiting the Behavior of Generational Garbage Collector

Run-time Environments -Part 3

Ulterior Reference Counting: Fast Garbage Collection without a Long Wait

Heap Compression for Memory-Constrained Java

Lecture 15 Garbage Collection

Implementation Garbage Collection

Garbage Collection. Weiyuan Li

Memory Management in JikesNode Operating System

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors

Garbage Collection. Hwansoo Han

Garbage Collection Algorithms. Ganesh Bikshandi

IBM Research Report. Efficient Memory Management for Long-Lived Objects

Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance

Java Performance Evaluation through Rigorous Replay Compilation

CRAMM: Virtual Memory Support for Garbage-Collected Applications

Progress & Challenges for Virtual Memory Management Kathryn S. McKinley

Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

Free-Me: A Static Analysis for Automatic Individual Object Reclamation

Reducing Generational Copy Reserve Overhead with Fallback Compaction

Older-First Garbage Collection in Practice: Evaluation in a Java Virtual Machine

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

CMSC 330: Organization of Programming Languages

Using Transparent Compression to Improve SSD-based I/O Caches

Cycle Tracing. Presented by: Siddharth Tiwary

Short-term Memory for Self-collecting Mutators. Martin Aigner, Andreas Haas, Christoph Kirsch, Ana Sokolova Universität Salzburg

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Garbage Collection (2) Advanced Operating Systems Lecture 9

Hierarchical PLABs, CLABs, TLABs in Hotspot

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

CMSC 330: Organization of Programming Languages

MC 2 : High-Performance Garbage Collection for Memory-Constrained Environments

CS399 New Beginnings. Jonathan Walpole

JVM Performance Tuning with respect to Garbage Collection(GC) policies for WebSphere Application Server V6.1 - Part 1

6.172 Performance Engineering of Software Systems Spring Lecture 9. P after. Figure 1: A diagram of the stack (Image by MIT OpenCourseWare.

New Java performance developments: compilation and garbage collection

CS2210: Compiler Construction. Runtime Environment

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

CS842: Automatic Memory Management and Garbage Collection. Mark and sweep

Adaptive Optimization using Hardware Performance Monitors. Master Thesis by Mathias Payer

Computer Memory. Data Structures and Algorithms CSE 373 SP 18 - KASEY CHAMPION 1

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

Java Garbage Collector Performance Measurements

ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency

Generation scavenging: A non-disruptive high-performance storage reclamation algorithm By David Ungar

Run-time Environments - 3

The Garbage Collection Advantage: Improving Program Locality

Automatic Heap Sizing: Taking Real Memory Into Account

A Comparative Evaluation of Parallel Garbage Collector Implementations

Optimizing Translation Information Management in NAND Flash Memory Storage Systems

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

An Examination of Reference Counter Costs and Implementation Approaches

Manual Allocation. CS 1622: Garbage Collection. Example 1. Memory Leaks. Example 3. Example 2 11/26/2012. Jonathan Misurda

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Fiji VM Safety Critical Java

Robust Memory Management Schemes

Towards Garbage Collection Modeling

QUANTIFYING AND IMPROVING THE PERFORMANCE OF GARBAGE COLLECTION

Name, Scope, and Binding. Outline [1]

Garbage Collector Refinement for New Dynamic Multimedia Applications on Embedded Systems

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Lecture 15 Advanced Garbage Collection

Garbage collection: Java application servers Achilles heel

EECS 482 Introduction to Operating Systems

CSE P 501 Compilers. Memory Management and Garbage Collec<on Hal Perkins Winter UW CSE P 501 Winter 2016 W-1

Method-Level Phase Behavior in Java Workloads

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Mark-Sweep and Mark-Compact GC

Garbage Collection (1)

Concurrent Garbage Collection

OS-caused Long JVM Pauses - Deep Dive and Solutions

Harmony GC Source Code

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

THE TROUBLE WITH MEMORY

PERFORMANCE ANALYSIS AND OPTIMIZATION OF SKIP LISTS FOR MODERN MULTI-CORE ARCHITECTURES

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Effective Prefetch for Mark-Sweep Garbage Collection

Dynamic Storage Allocation

Garbage Collection. Steven R. Bagley

CSE 4/521 Introduction to Operating Systems. Lecture 23 File System Implementation II (Allocation Methods, Free-Space Management) Summer 2018

Java Performance Tuning

Jazz: A Tool for Demand-Driven Structural Testing

The DaCapo Benchmarks: Java Benchmarking Development and Analysis

CSE 373 OCTOBER 23 RD MEMORY AND HARDWARE

CS 241 Honors Memory

High Performance Managed Languages. Martin Thompson

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

The Memory Management Unit. Operating Systems. Autumn CS4023

Transcription:

Myths and Realities: The Performance Impact of Garbage Collection Tapasya Patki February 17, 2011 1 Motivation Automatic memory management has numerous software engineering benefits from the developer s point of view. Garbage collection frees the developer from the burden of tracking memory, prevents security violations and facilitates modularity. However, it incurs a space and runtime performance penalty, and the performance impact of garbage collection remains a matter of debate. This paper analyzes the behavior of whole heap and generational garbage collection algorithms and attempts to answer the following questions: Is GC a good idea? How often should we garbage collect? How should we garbage collect (Whole-heap or generational)? Does GC affect instruction throughput and locality? Does the underlying processor architecture influence GC performance? How sensitive are collectors to heap size? 2 Background Each GC algorithm is comprised of an allocation strategy and a collection strategy. There are two possible allocation strategies: Contiguous allocation, in which we allocate a new object by appending a bump pointer by the size of the object, and Free-list allocation, which divides memory into size classes, and allocates the object in the smallest size class. This is similar to binning. Contiguous allocation is faster as it does not have to scan multiple lists before allocating. Also, it has higher spatial locality of reference. Free-list allocation can lead to internal fragmentation, but has the advantage that it can be compacted faster. Collection strategies describe how allocated memory is reclaimed. The standard implementations are: Tracing collection, in which we identify the live objects by taking a transitive close of the roots and the remembered-set. Memory is reclaimed by using copying live data out of the space. This is similar to the idea of copying collection. 1

Reference Counting, in which the number of references each object are counted, and the objects with no references are freed. The tables below illustrate how various GC algorithms (whole-heap and generational) can be implemented using the aforementioned allocation and collection strategies. Algorithm Allocator Collector SemiSpace Contiguous Tracing MarkSweep Free-list Tracing RefCount Free-list Reference Counting Algorithm Nursery Mature GenSS SemiSpace SemiSpace GenMS SemiSpace MarkSweep GenRC SemiSpace RefCount In case of generational garbage collectors, a write-barrier needs to be inserted for every pointer store to keep track of pointers that go from mature-space to the nursery. This incurs an extra overhead. 3 Methodology Each program is divided into two phases- a mutator phase which runs the application code, and a garbage collection phase. The application code may consist of some memory management activities like allocation sequences and write barriers. Authors use the Java Memory Management Toolkit (MMTk) in the IBM Jikes RVM environment. IBM Jikes RVM supports high-performance pseudo-adaptive compilation and is a Java-in-Java design. MMTk implements garbage collection policies using the allocation and collection strategies described in the last section, and supports inlining of write-barriers. This environment provides for fair experimentation and an apples-toapples comparison of garbage collection algorithms. Authors measure the performance impact of the six algorithms described in the last section on the three architectures shown in the table below. They use the SPEC JVM benchmarks for the experiments. Arch CPU Freq RAM L1 L2 Athlon 1.9Ghz 1GB 64K both 512K P4 2.6GHz 1GB 8K D, 12K I 512K PPC 1.6GHz 768MB 32K D, 64K I 512K 2

4 Results and Observations Contiguous is better than Free-list Allocation is 11% faster, total improvement 1% Locality improves mutator performance by 5-15% (SS vs MS) Tracing is usually better than Reference Counting Only live objects are touched Locality improves RC can be useful for mature objects, when most of the heap consists of live objects Generational better than Whole-heap 3

Write-barrier over head is 1-14%, 3.2% average Collection time benefits outweigh the write-barrier overhead Locality of nursery: spatial Locality of mature objects: temporal Nursery Size Fixed overhead of scanning roots (about 64KB) Need not be matched to L2 cache size (512KB) Should depend on fixed overhead and not L2 (4-8MB is ideal for SPEC JVM) If large, collection cost degrades performance (>8MB) Sensitivity to Heap Size Determines collection frequency Small heaps: MarkSweep, Modest to large heaps: SemiSpace 4

Influence on Architecture Crossover point: When SemiSpace outperforms MarkSweep Limited space versus Locality tradeoff 5