Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

Similar documents
Limits of Parallel Marking Garbage Collection....how parallel can a GC become?

An Examination of Reference Counter Costs and Implementation Approaches

Cycle Tracing. Presented by: Siddharth Tiwary

An On-the-Fly Reference-Counting Garbage Collector for Java

Age-Oriented Concurrent Garbage Collection

An On-the-Fly Reference-Counting Garbage Collector for Java

An On-the-Fly Reference Counting Garbage Collector for Java

Ulterior Reference Counting: Fast Garbage Collection without a Long Wait

Dynamic Selection of Application-Specific Garbage Collectors

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Myths and Realities: The Performance Impact of Garbage Collection

Garbage Collection Algorithms. Ganesh Bikshandi

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Method-Level Phase Behavior in Java Workloads

Derivation and Evaluation of Concurrent Collectors

Lecture 15 Garbage Collection

Using Prefetching to Improve Reference-Counting Garbage Collectors

Using Prefetching to Improve Reference-Counting Garbage Collectors

An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

A Comparative Evaluation of Parallel Garbage Collector Implementations

Managed runtimes & garbage collection

Mostly Concurrent Garbage Collection Revisited

Towards Parallel, Scalable VM Services

Garbage Collection. Hwansoo Han

Free-Me: A Static Analysis for Automatic Individual Object Reclamation

Hierarchical Real-time Garbage Collection

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

IBM Research Report. Efficient Memory Management for Long-Lived Objects

Exploiting Prolific Types for Memory Management and Optimizations

Garbage Collection (2) Advanced Operating Systems Lecture 9

Tick: Concurrent GC in Apache Harmony

Computer Languages, Systems & Structures

The Segregated Binary Trees: Decoupling Memory Manager

Automatic Memory Management

A Parallel, Incremental and Concurrent GC for Servers

Write Barrier Elision for Concurrent Garbage Collectors

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Run-Time Environments/Garbage Collection

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

CMSC 330: Organization of Programming Languages

Concurrent Garbage Collection

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Concurrent Remembered Set Refinement in Generational Garbage Collection

New Java performance developments: compilation and garbage collection

The VMKit project: Java (and.net) on top of LLVM

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Swift: A Register-based JIT Compiler for Embedded JVMs

Adaptive Optimization using Hardware Performance Monitors. Master Thesis by Mathias Payer

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

CMSC 330: Organization of Programming Languages

A new Mono GC. Paolo Molaro October 25, 2006

Garbage Collection. Steven R. Bagley

Older-First Garbage Collection in Practice: Evaluation in a Java Virtual Machine

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Reducing Pause Time of Conservative Collectors

Eliminating Exception Constraints of Java Programs for IA-64

Reducing the Overhead of Dynamic Compilation

Heckaton. SQL Server's Memory Optimized OLTP Engine

On the Effectiveness of GC in Java

Simple Garbage Collection and Fast Allocation Andrew W. Appel

Hardware-Supported Pointer Detection for common Garbage Collections

An Efficient Memory Management Technique That Improves Localities

Implementing an On-the-fly Garbage Collector for Java

Myths and Realities: The Performance Impact of Garbage Collection

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology

The Z Garbage Collector Low Latency GC for OpenJDK

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Replicating Real-Time Garbage Collector

Hard Real-Time Garbage Collection in Java Virtual Machines

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there.

Non-blocking Real-Time Garbage Collection

Parallel Garbage Collection

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2

Portable, Mostly-Concurrent, Mostly-Copying Garbage Collection for Multi-Processors

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

Exploiting the Behavior of Generational Garbage Collector

Real Time: Understanding the Trade-offs Between Determinism and Throughput

Eventrons: A Safe Programming Construct for High-Frequency Hard Real-Time Applications

DNWSH - Version: 2.3..NET Performance and Debugging Workshop

A Fully Concurrent Garbage Collector for Functional Programs on Multicore Processors

Points-to Analysis for Java Using Annotated Constraints*

Garbage Collection for General Graphs

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Lecture Notes on Advanced Garbage Collection

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Multicore OCaml. Stephen Dolan Leo White Anil Madhavapeddy. University of Cambridge. OCaml 2014 September 5, Multicore OCaml

Tax-and-Spend: Democratic Scheduling for Real-time Garbage Collection

Lecture 15 Advanced Garbage Collection

Azul Systems, Inc.

CS 241 Honors Memory

Cycler: Improve Tracing Garbage Collection with Real-time Object Reuse

Running class Timing on Java HotSpot VM, 1

Garbage Collection Techniques

Hardware Concurrent Garbage Collection for Shortlived Objects in Mobile Java Devices

The Z Garbage Collector An Introduction

Transcription:

Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector David F. Bacon IBM T.J. Watson Research Center Joint work with C.R. Attanasio, Han Lee, V.T. Rajan, and Steve Smith ACM Conference on Programming Language Design and Implemenentation Snowbird, Utah June 20, 200

Overview Two types of Garbage Collection Tracing [McCarthy 960] Reference Counting [Collins 960] Contributions of this work: Improved concurrent reference counting algorithm Practical cycle collection algorithm Both stop-the-world and concurrent Implementation in collector w/very low pause times

Features of the Recycler Concurrent Multiprocessor Reference-counted Cycle collecting Low-latency: under 0 ms High-performance: 90% of stop-the-world

Why do it? RC has good locality properties Will tracing collection scale to multi-gb heaps? Effect of rising memory latency RC is easily decoupled from mutator Promises lower synchronization costs Java makes good GC a commercial requirement Makes sense to have genetic diversity People said it couldn t be done

Why is it hard? Must defer reference counting stack Complicates algorithm Reference counts are shared variables Synchronization required Write barriers for RC more expensive Affect two objects RC doesn t handle cycles Other systems use backup tracing collector

Outline Introduction Motivation System Overview Concurrent Reference Counting Cycle Collection Measurements Conclusions

System Overview Producer/Consumer System Mutator Emit inc/dec Allocate Mutator Emit inc/dec Allocate Collector Free Memory Process inc/dec Collect Cycles Similar to Deutsch-Bobrow DRC As implemented by DeTreville on SRC Firefly

Implementation Implemented in Jalapeño JVM at IBM TJW All of VM, JIT, and GC are written in Java Extended with unsafe primitives Multiple GC implementations GC is machine-independent except for barriers Targets IBM RS/6000 multiprocessors (optimized) Linux/Intel (unoptimized; optimized coming)

Outline Introduction Motivation System Overview Concurrent Reference Counting (no cycles) Cycle Collection Measurements Conclusions

Concurrent Reference Counting Time divided into epochs All CPUs must participate before epoch advances Write barrier on heap updates inc/dec operations placed in buffer Objects allocated with RC=, dec enqueued Decrements processed one epoch behind increments Stack references are deferred Snapshot stacks at epoch boundary First increment; decrement at next epoch Simpler invariant than Deutsch-Bobrow; no ZCT required

CPU Mutator Ragged Barriers CPU 2 Mutator CPU 3 Collector Epoch 7 Time Epoch 8 Epoch 9

Outline Introduction Motivation System Overview Concurrent Reference Counting Cycle Collection Measurements Conclusions

Synchronous Cycle Collection Class loader identifies acyclic classes Arrays, String, and such are marked green Two key observations: Most reference counts are Garbage cycles created by decrement to non-0 Use those objects as starting points DFS-based algorithm subtracts internal RC s If resulting count is 0, collect cyclic garbage Based on algorithm by Lins, but O(n) instead of O(n 2 )

Root Buffer Live Data 2 2 2 0 2 0 2 0 2 0. Process Decrements and Accumulate Roots 2. Mark Gray: Subtract Internal Reference Counts 3. Scan: Restore Live, Mark Dead White 4. Collect White

Concurrent Cycle Collection Based on synchronous algorithm But second reference count field is used Relies on stability property of garbage If no mutation, synchronous algorithm will work Detect when mutation occurs and avoid collecting When cycle is found Place in a buffer, wait for next epoch Inc/dec stream will provide necessary information

Detecting Concurrent Mutation Delta test Detects local changes Observes recolorings of buffered cycle objects Sigma test Detects non-local changes Checks sum of RC s = number of pointers For details, see ECOOP 0 paper

Outline Introduction Motivation System Overview Concurrent Reference Counting Cycle Collection Measurements Conclusions

Measurements Compared to parallel mark&sweep collector High-performance, throughput-oriented Stop-the-world Benchmarks: SPEC, SPECjbb, Jalapeño compiler Synthetic cycle garbage generator (ggauss) In MP mode, more CPU than threads

App. Speed vs. Parallel Relative Speed.4.2 0.8 0.6 0.4 Mark&Sweep Multiprocessing Uniprocessing 0.2 0 compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss

Pause Time vs. Parallel Mark&Sweep Max. Pause Time (us) 0000000 000000 00000 0000 000 RC Pause M&S Pause 00 compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss

Cycle Collection: Buffering Roots 00% 90% Potential Roots 80% 70% 60% 50% 40% 30% 20% 0% 0% compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss Roots Unbuffered Freed Repeat Acyclic

Reference Tracing vs. Mark&Sweep 80 Refs. Traced (millions) 60 40 20 00 80 60 40 20 RC M&S 0 compress jess* raytrace db javac* mpegaudio mtrt jack specjbb jalapeno ggauss

Related Work Dijkstra et al [976], Steele [975,976], Lamport [976] DeTreville [990] Doligez, Leroy, Gonthier [993,994] Domani, Kolodner, Petrank [2000] Huelsbergen et al [993,999] Martínez et al [990], Lins [992] Plakal & Fischer [200], Levanoni & Petrank [200]

Conclusions Recycler sets new benchmark for concurrent GC 2.6 ms max. pause time for general-purpose programs End-to-end execution times comparable RC can perform very well in concurrent system No synchronization required in common case Standard RC problems can be overcome Concurrent cycle collection works Viable alternative to backup mark & sweep More study needed to reduce memory tracing costs

More Information The Recycler (including this presentation) Cycle Collection Algorithm/Proof (ECOOP 0) www.research.ibm.com/people/d/dfb Jalapeño www.research.ibm.com/jalapeno Jalapeño BoF Thursday 7pm

Addendum Algorithm Animations Reference counting across multiple CPUs Successful acyclic cyclic collection Aborted acyclic cycle collection Additional Measurements Collection cost breakdown Objects freed by cycle collection Allocation and mutation rates

CPU CPU 2 Collector CPU

Root Buffer Cycle Buffer Live Data 2 2 0 2 00 02. Process Decrements 2. Mark Gray 3. Scan 4. Collect White 2 2 0 2 2 00 5. Await next epoch 6. If still orange, GC 7. If changed, restore

Root Buffer Cycle Buffer Live Data 0 0 2 2 2 0 2 3 03 2 2 3. Process Decrements 2. Mark Gray 3. Scan 4. Collect White 5. Calculate external in-degree 6. Await next epoch 7. Compute in-degree sum 8. If 0, GC/decrement neighbors 9. If non-0, restore

Collection Cost Breakdown Collection Time 00% 90% 80% 70% 60% 50% 40% 30% 20% 0% Free Cycles Validate Collect Scan Mark Purge Increment Decrement 0% compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss

Objects Freed by Cycle Collection 40% 35% 30% Objects Freed 25% 20% 5% 0% 5% 0% compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss

Mutation Rate (Barrier Frequency) 450 K Mutations/sec 400 350 300 250 200 50 00 50 K Inc/s K Dec/s 0 compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss

Allocation Rate 200 80 60 K Allocations/sec 40 20 00 80 60 40 20 0 compress jess raytrace db javac mpegaudio mtrt jack specjbb jalapeno ggauss