Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

Similar documents
Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

G1 Garbage Collector Details and Tuning. Simone Bordet

Lecture 15 Garbage Collection

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

The Garbage-First Garbage Collector

Concurrent Garbage Collection

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

Shenandoah GC...and how it looks like in September Aleksey

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Lecture 15 Advanced Garbage Collection

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Garbage-First Garbage Collection

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Robust Memory Management Schemes

New Java performance developments: compilation and garbage collection

The Z Garbage Collector An Introduction

Reference Object Processing in On-The-Fly Garbage Collection

Automatic Garbage Collection

The Z Garbage Collector Low Latency GC for OpenJDK

Reference Counting. Reference counting: a way to know whether a record has other users

Boosting the Priority of Garbage: Scheduling Collection on Heterogeneous Multicore Processors

Automatic Memory Management

Dynamic Selection of Application-Specific Garbage Collectors

Reference Counting. Reference counting: a way to know whether a record has other users

Garbage Collection Algorithms. Ganesh Bikshandi

Go GC: Prioritizing Low Latency and Simplicity

Implementation Garbage Collection

Run-Time Environments/Garbage Collection

Java Without the Jitter

Lecture 13: Garbage Collection

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Java Performance Tuning

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Opera&ng Systems CMPSCI 377 Garbage Collec&on. Emery Berger and Mark Corner University of Massachuse9s Amherst

Garbage Collection. Hwansoo Han

High Performance Managed Languages. Martin Thompson

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Tick: Concurrent GC in Apache Harmony

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there.

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Azul Systems, Inc.

Garbage Collection. Vyacheslav Egorov

Hard Real-Time Garbage Collection in Java Virtual Machines

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Garbage Collection (2) Advanced Operating Systems Lecture 9

Heap Compression for Memory-Constrained Java

THE TROUBLE WITH MEMORY

(notes modified from Noah Snavely, Spring 2009) Memory allocation

A new Mono GC. Paolo Molaro October 25, 2006

Structure of Programming Languages Lecture 10

Memory Management. Memory Management... Memory Management... Interface to Dynamic allocation

Inside the Garbage Collector

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

Older-First Garbage Collection in Practice: Evaluation in a Java Virtual Machine

Habanero Extreme Scale Software Research Project

Harmony GC Source Code

Myths and Realities: The Performance Impact of Garbage Collection

Lecture Notes on Advanced Garbage Collection

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst

The Application Memory Wall

Understanding Garbage Collection

Garbage Collection. Steven R. Bagley

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

Simple Garbage Collection and Fast Allocation Andrew W. Appel

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Garbage Collection (aka Automatic Memory Management) Douglas Q. Hawkins. Why?

Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you

Garbage Collection. CS 351: Systems Programming Michael Saelee

Name, Scope, and Binding. Outline [1]

CSE P 501 Compilers. Memory Management and Garbage Collec<on Hal Perkins Winter UW CSE P 501 Winter 2016 W-1

Garbage Collection Basics

A Parallel, Incremental and Concurrent GC for Servers

Managed runtimes & garbage collection

The Fundamentals of JVM Tuning

CA31-1K DIS. Pointers. TA: You Lu

Java Memory Management. Märt Bakhoff Java Fundamentals

Mark-Sweep and Mark-Compact GC

Manual Allocation. CS 1622: Garbage Collection. Example 1. Memory Leaks. Example 3. Example 2 11/26/2012. Jonathan Misurda

Lecture 7: Binding Time and Storage

Exploiting the Behavior of Generational Garbage Collector

CS 241 Honors Memory

Software Speculative Multithreading for Java

Review. Partitioning: Divide heap, use different strategies per heap Generational GC: Partition by age Most objects die young

Contents. Created by: Raúl Castillo

Dynamic Memory Management

Evaluating and improving remembered sets in the HotSpot G1 garbage collector

Cycle Tracing. Presented by: Siddharth Tiwary

Transcription:

Shenandoah: An ultra-low pause time garbage collector for OpenJDK Christine Flood Principal Software Engineer Red Hat 1

Why do we need another Garbage Collector? OpenJDK currently has: SerialGC ParallelGC ParNew/Concurrent Mark Sweep(CMS) Garbage First (G1) 2

Why do we need another Garbage Collector? OpenJDK currently has: SerialGC ParallelGC ParNew/Concurrent Mark Sweep(CMS) Garbage First (G1) Shenandoah Pause times similar to CMS. Region based like G1. Concurrent Compaction 3

Why is concurrent compaction difficult? Reference Stack Frame Method Foo Heap Value Reference Reference 42 B Reference Stack Frame Method Bar Value 6.847 Reference Reference Object A C 4

Pause to update the roots, and have heap object accesses go through a forwarding pointer. Reference Stack Frame Method Foo Heap Value Reference Reference 42 Forwarding ptr Forwarding ptr B Reference Stack Frame Method Bar Value 6.847 Reference Reference C' A Forwarding ptr Forwarding ptr C 5

Forwarding Pointers Foo indirection pointer Foo Bar indirection pointer You can still walk the heap. You can still choose your GC at runtime. Software only solution. Bar 6

Shenandoah divides the heap into regions. Region 1 Region 2 Region 3 Region 4 Region 5 7

We use concurrent marking to keep track of the live data in each region. Regions Region 1 Live Data 20k Regions Region 6 Live Data 200k Region 2 100k Region 7 100k Region 3 500k Region 8 empty Region 4 10k Region 9 empty Region 5 70k Region 10 empty 8

We pick the most garbage-y regions for evacuation. Regions Region 1 Region 2 Live Data 20k Fromregion 100k Regions Region 6 Region 7 Live Data 20k 100k Region 3 500k Region 8 to-region Region 4 Region 5 10k Fromregion 70k Region 9 Region 10 empty empty 9

Concurrent Marking tells us which objects need to be copied. Reference Stack Frame Method Foo Heap Value Reference 42 Object Array Reference Stack Frame Method Bar Value 6.847 Reference Object Object Reference Reference Object Object 10

We evacuate the live objects while the Java threads are running. FOO FOO' BAR' BAR 11

Reclaiming free space Eagerly Run another pass over the data to update references. Lazily Wait for the next concurrent mark to update references. Once the references have been updated we can free the now empty regions. 12

Shenandoah Java Init Mark Java Final Mark Java Concurrent Mark Concurrent Evacuation We use as many threads as are available to do concurrent phases. 13

From-Region To-Region FOO A B A' 14

Added benefit to forwarding pointers Object We can be lazy about updating references to objects in from regions. We no longer need remembered sets. 15

Concurrent Marking SATB Snapshot At The Beginning Anything live at Initial Marking is considered live. Anything allocated since Initial Marking is considered live. Used to update references, keep track of amount of live data for each region, and tell us which objects are live and need to be evacuated. 16

What's tricky about SATB? Start of Concurrent Marking X Y Sometime during marking Requires a write barrier to ensure overwritten values get marked. z y X w 17

How to move an object while the program is running. Read the forwarding pointer to the from-region. Allocate a temporary copy of the object in a toregion. Speculatively copy the data. CAS the forwarding pointer to point to the new copy. If the CAS fails, another thread already copied the object and you can roll back your speculative copy. 18

Forwarding Pointers - reads From-Region To-Region A B Reading an object in a From-region doesn't trigger an evacuation. 19 Note: If reads were to cause copying we might have a read storm where every operation required copying an object. Our intention is that since we are only copying on writes we will have less bursty behavior.

Forwarding Pointers - writes From-Region To-Region A A' B Writing an object in a From-Region will trigger an evacuation of that object to a To-Region and the write will occur in there. Invariant: Writes never occur in from-regions. 20

Forwarding Pointers writes of references From-Region To-Region A A' B B We resolve all references before we write them. Invariant: Never write a reference to a fromregion object into a to-region object. 21

Forwarding Pointers - CAS CompareAndSwap(A.X, Foo, Bar) From-Region To-Region A Foo Bar 22

Forwarding Pointers - CAS CompareAndSwap(A.X, Foo, Bar) From-Region A To-Region A' Foo Bar 23

Forwarding Pointers - CAS CompareAndSwap(A.X, Foo, Bar) From-Region A To-Region A' Foo Foo' Bar 24

Forwarding Pointers - CAS CompareAndSwap(A.X, Foo, Bar) From-Region A To-Region A' Foo Foo' Bar 25

Forwarding Pointers - CAS CompareAndSwap(A.X, Foo, Bar) From-Region A To-Region A' Foo Foo' Bar Bar' 26 Now we can CAS.

Forwarding Pointers - CAS CompareAndSwap(A.X, Foo, Bar) From-Region A To-Region A' Foo Foo' Bar Bar' 27

Does that mean a write could keep a Java thread from making progress for an unbounded amount of time? No Copies are bounded by region size. Objects that are larger than a region are treated specially. 28

Read Barriers? Unconditional Read Barrier movq R10, [R10 + #-8 (8-bit)] # ptr movq RBX, [R10 + #16 (8-bit)] # ptr! Field: java/lang/classloader.parent 29

Write Barriers? We need the SATB write barrier which adds previous reference values onto a queue to be scanned. We also need write barriers on all writes (even base types) to ensure we copy objects in targeted regions before we write to them. 30

Costs and Benefits Costs Space An extra word / object can be expensive if you have a lot of small objects. Time Benefits Ultra-low pause times which can be important to interactive and SLA applications. Reads and Writes require barriers 31

Current status We have something working. We can pass small tests specjvm, specjbb We have passed the smoke test for Eclipse, Thermostat We are working on performance tuning Radargun, Elastic Search/Lucene 32

Performance Tuning One very important area for application specific performance tuning will be Shenandoah heuristics. Lazy Heuristics GC as little as possible Aggressive Heuristics... GC as frequently as possible to maintain minimum heap size. 33

Encouraging preliminary results SpecJVM2008 compiler Not our target application. We are just starting performance tuning. Shenandoah Initial Mark (avg=5.65ms,max=9.53ms, total=1.40s) Final Mark (avg=8.74ms,max=15.43ms,total=2.17s) As compared to G1 (avg=31.38ms, max=75.48ms, total=6.84s) 34

References Trading Data Space for Reduced Time and Code Space in Real-Time Garbage Collection on Stock Hardware - Brooks Garbage-first garbage collection - Detlefs, Flood, Heller, Printezis. 35

Future Work Finish big application testing. Move the barriers to right before code generation. Heuristics tuning. Round Robbin Thread Stopping? NUMA Aware? 36

More information Download the code and try it. http://icedtea.classpath.org/wiki/shenandoah Blogs http://christineflood.wordpress.com/ http://rkennke.wordpress.com/ Email chf@redhat.com rkennke@redhat.com 37