Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Similar documents
Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Garbage Collection. Hwansoo Han

Garbage Collection Algorithms. Ganesh Bikshandi

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Lecture 15 Garbage Collection

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Garbage Collection (2) Advanced Operating Systems Lecture 9

Concurrent Garbage Collection

CSE P 501 Compilers. Memory Management and Garbage Collec<on Hal Perkins Winter UW CSE P 501 Winter 2016 W-1

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

New Java performance developments: compilation and garbage collection

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

Managed runtimes & garbage collection

The Garbage-First Garbage Collector

Programming Language Implementation

CMSC 330: Organization of Programming Languages

Implementation Garbage Collection

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

Java Without the Jitter

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

A new Mono GC. Paolo Molaro October 25, 2006

Hard Real-Time Garbage Collection in Java Virtual Machines

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

CMSC 330: Organization of Programming Languages

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Exploiting the Behavior of Generational Garbage Collector

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

JVM Memory Model and GC

Garbage Collection (aka Automatic Memory Management) Douglas Q. Hawkins. Why?

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

Robust Memory Management Schemes

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Java Performance Tuning

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

Run-Time Environments/Garbage Collection

Lecture 13: Garbage Collection

Automatic Memory Management

Automatic Garbage Collection

Garbage Collection. Vyacheslav Egorov

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Myths and Realities: The Performance Impact of Garbage Collection

Heap Compression for Memory-Constrained Java

Run-time Environments

Run-time Environments

High Performance Managed Languages. Martin Thompson

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

Mark-Sweep and Mark-Compact GC

Shenandoah GC...and how it looks like in September Aleksey

Garbage Collection. Steven R. Bagley

Garbage Collection. Weiyuan Li

CA341 - Comparative Programming Languages

Simple Garbage Collection and Fast Allocation Andrew W. Appel

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

Attila Szegedi, Software

Reference Counting. Reference counting: a way to know whether a record has other users

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

THE TROUBLE WITH MEMORY

G1 Garbage Collector Details and Tuning. Simone Bordet

Garbage Collection (1)

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Azul Systems, Inc.

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

Lecture Conservative Garbage Collection. 3.2 Precise Garbage Collectors. 3.3 Other Garbage Collection Techniques

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Reference Object Processing in On-The-Fly Garbage Collection

Reference Counting. Reference counting: a way to know whether a record has other users

Tick: Concurrent GC in Apache Harmony

Lecture 15 Advanced Garbage Collection

Last week, David Terei lectured about the compilation pipeline which is responsible for producing the executable binaries of the Haskell code you

The Z Garbage Collector Low Latency GC for OpenJDK

Garbage Collection Techniques

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there.

TECHNOLOGY WHITE PAPER. Azul Pauseless Garbage Collection. Providing continuous, pauseless operation for Java applications

Google File System. Arun Sundaram Operating Systems

Proceedings of the Java Virtual Machine Research and Technology Symposium (JVM '01)

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide. Release 10

Opera&ng Systems CMPSCI 377 Garbage Collec&on. Emery Berger and Mark Corner University of Massachuse9s Amherst

The Z Garbage Collector An Introduction

Understanding Garbage Collection

SQL Server 2014: In-Memory OLTP for Database Administrators

High Performance Managed Languages. Martin Thompson

Heap Management. Heap Allocation

Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine

JVM Performance Tuning with respect to Garbage Collection(GC) policies for WebSphere Application Server V6.1 - Part 1

Java Garbage Collector Performance Measurements

Q1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100

Garbage-First Garbage Collection

Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

Lecture 7: Binding Time and Storage

Compiler Construction D7011E

Replicating Real-Time Garbage Collector

Transcription:

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK Christine H. Flood Roman Kennke 1

What does ultra-low pause time mean? It means that the pause time is proportional to the size of the root set, not the size of the heap. Our goal is to have < 10ms GC pause times for 100gb+ heaps. 2

Why not pause-less? Our long term goal is to have an entirely pause-less collector. Shenandoah is a giant step in the right direction. 3

Shenandoah Features Pauses only long enough to scan root set. Concurrent and parallel marking. Concurrent and parallel evacuation. No card tables or remembered sets. 4

Sequential GC Parallel GC Concurrent Mark and Sweep No compaction Shenandoah Compaction 5

Why is Compaction Important? Allocation by pointer bumping is much faster than scanning a free list. Mark and Sweep eventually leads to fragmentation which results in an inefficient VM. 6

How do we get there? We need to have a compacting GC do the evacuation work as well as marking work while the Java threads are running. All the existing OpenJDK GC algorithms stop the Java threads during compaction. 7

Why is it hard? What if the GC moves an object that the Java threads are modifying? We can't have two active copies of an object. There may be multiple heap locations that refer to a single object. When that object moves they all have to start using the new copy of the object. 8

How do we do that? There are options: We could have memory protection traps on objects which are scheduled to be moved. We could add a level of indirection. Accesses to objects go through a forwarding pointer which allows us to update all references to an object with a single atomic instruction. 9

We chose forwarding pointers. No more remembered sets. Can be concurrency bottleneck. Can grow large. In fact some claim that the benefit of generational collectors is smaller remembered sets, not better modeling of object lifetimes. No assumptions about object lifetimes. Not all applications obey the generational hypothesis that most objects die young. No dependence on user or kernel level traps. Software only solution. No read storms to update references. 10

Digression on remembered Sets Remembered sets allow you to collect part of your heap without collecting all of your heap. Generational garbage collectors usually use card tables. G1 uses into remembered sets. 11

From Generational GC Old Card Table Card table keeps track of old to young references so you can GC the young generation independently. To The old generation is broken into cards, and each card is represented by a single bit in the card table. If that bit is set then the whole card must be scanned at young generation gc time. 12

Why is this a problem? Large multi-threaded applications which are carefully crafted to scale with padded data structures sometimes end up thrashing over the cache lines making up the card table. 13

Remembered Sets Pointers into R1 Pointers into R2 Pointers into R3 Pointers into R4 Pointers into R5 Original G1 Regions Live Data Region 1 20k Region 2 100k Region 3 500k Region 4 10k 70k Region 5 G1 can independently collect whichever regions have the least live data. Unfortunately into remembered sets can grow large. This was made better by generational G1. They no longer needed to keep track of into pointers from young regions since they were guaranteed to be a part of the next collection. Unfortunately that forced G1 into a generational paradigm. 14

Shenandoah Regions Region 1 Region 2 Region 3 Live Data 20k 100k 500k Forwarding pointers enable Shenandoah to collect each region independently without remembered sets. We truly are garbage first Region 4 10k Region 5 70k 15

Forwarding pointers based on Brooks Pointers Rodney A. Brooks Trading Data Space for Reduced Time and Code Space in Real-Time Garbage Collection on Stock Hardware 1984 Symposium on Lisp and Functional Programing 16

Forwarding Pointer in an Indirection Object Foo Indirection Object Foo Object Format inside the JVM remains the same. Third party tools can still walk the heap. Can choose GC algorithm at run time. We hope to one day be able to take advantage of unused space in double word aligned objects when possible. 17

Adapted Brooks Pointers for Regions targeted for evacuation regions. Foo Free regions Bar Foo' Bar' GC work is not tied to allocation work, instead dedicated GC threads evacuate regions. 18

Forwarding Pointers From-Region To-Region A B A' Any reads or writes of A will now be redirected to A' 19

Forwarding Pointers From-Region A To-Region B Reading an object in a From-region doesn't trigger an evacuation. Note: If reads were to cause copying we might have a read storm where every operation required copying an object. Our intention is that since we are only copying on writes we will have less bursty behavior. 20

Forwarding Pointers From-Region To-Region A B A' Writing an object in a From-Region will trigger an evacuation of that object to a To-Region and the write will occur in there. This is necessary to keep one consistent copy of the object. 21

Shenandoah Algorithm Heap divided into regions. Concurrent marking keeps track of live data in each region. GC threads pick the regions with the most garbage to join the collection set. GC threads evacuate live objects in those regions. Subsequent concurrent marking updates all references to evacuated regions. Evacuated regions reclaimed. 22

References are updated lazily Region 2 Region 1 Region 3 A C B Region 2 is chosen to be part of the collection set. 23

After Evacuation Region 2 Region 1 Region 3 A C B Region 4 B' After evacuation is completed you still have references to B which need to be resolved when read. 24

After the next concurrent marking Region 2 Region 1 Region 3 A C B Region4 B' After the next concurrent marking all references to B will have been updated and Region 2 may be reclaimed. 25

How does the GC interact with the Java Threads Reading is straightforward you simply indirect through the forwarding pointer. If the GC moved the object you see the new copy. Writing requires the Java thread to move the object. We need to ensure that writes only occur in to regions. 26

Problem with writes in from-regions. GC thread copies Foo Java Thread updates Foo GC thread updates forwarding pointer to now obsolete copy of Foo. All writes must occur in to-regions. 27

What if a Java Thread was writing an object when the GC moved it? Forwarding PTR GC thread makes a copy Forwarding PTR The GC thread and the Java thread both attempt to CAS the Forwarding pointer to point to their copy of the object. Only one can succeed. The other thread must rollback their allocation. Java thread makes a copy Forwarding PTR 28

What if a Java Thread was reading that object when the GC moved it? The Java thread either sees the old A or the new A' depending on whether the read barrier is executed before or after the move. 29

What could go wrong? Race windows get larger Before Thread 1 read(a) Thread 2 write(a). After Thread 1 Resolve(a) Read a Thread 2 ResolveAndMaybeCopy(a) Write(a) 30

Read Barriers Read (x) X has moved Read X's forwarding pointer to get the current address for X. X X' Y has not moved Y 31

Write Barriers Is X in the collection set? Copy X to a new location. CAS new address of X in X's Forwarding Pointer. The CAS is to protect against other threads (Java or GC) attempting to move the object. If the CAS fails, rollback the copy, and use the new value of the Forwarding Pointer as the address. 32

OK, Write barriers a little more complicated... We use Snapshot at the Beginning, so write barriers also need to keep track of previous values on writes to make sure everything that was live at the beginning of GC is still live. 33

Start of Concurrent Marking Issue with SATB X Y Sometime during marking When a write occurs we keep track of the previous value to ensure that it gets marked. X w z y 34

Compare and Swap Object is complicated too. From Space Foo BAR Compare and swap object is both a read and a write so it presents a special problem. If we want to CAS BAR to BAZ we first need to ensure that both FOO and BAR are in a to-region. If BAR is in a from-region than the CAS could fail because the GC updated BAR which isn't what we want. 35

Why is Shenandoah worth it? Concurrent Evacuation Greatly reduced pause times No more remembered sets Card table marking can be a concurrency bottleneck. Into remembered sets as in G1 can grow large and cumbersome Truly adaptive If your application doesn't behave generationally you aren't saddled with a generational collector. 36

What we aren't telling you. There's a time overhead. Read and write barriers aren't free. There's a space overhead, especially when we are allocating an entire object (4 words) for our forwarding pointers. We are only just now at a point where we can start measuring those overheads. 37

What's left to do? Compiler support (C1 is almost done) Performance heuristics (when do we start a concurrent mark? How to select collection sets?) Round robin thread stopping instead of stop the world root scanning. Moving forwarding pointers into unused slots in the previous object if possible. Compressed oops? 38

More information http://icedtea.classpath.org/shenandoah/ Until it becomes proper OpenJDK project http://openjdk.java.net/jeps/189 Blogs http://rkennke.wordpress.com/ http://christineflood.wordpress.com/ Email chf@redhat.com rkennke@redhat.com 39