Concurrent Garbage Collection

Similar documents
The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Understanding Garbage Collection

C4: The Continuously Concurrent Compacting Collector

The Application Memory Wall

Understanding Java Garbage Collection

How NOT to Measure Latency

Understanding Java Garbage Collection

JVM Memory Model and GC

Understanding Java Garbage Collection

Garbage Collection. Hwansoo Han

TECHNOLOGY WHITE PAPER. Azul Pauseless Garbage Collection. Providing continuous, pauseless operation for Java applications

Finally! Real Java for low latency and low jitter

Java Performance Tuning

Azul Pauseless Garbage Collection

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Lecture 15 Garbage Collection

Understanding Java Garbage Collection

G1 Garbage Collector Details and Tuning. Simone Bordet

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

Understanding Java Garbage Collection

THE TROUBLE WITH MEMORY

Garbage Collection Algorithms. Ganesh Bikshandi

JVM Performance Study Comparing Oracle HotSpot and Azul Zing Using Apache Cassandra

Exploiting the Behavior of Generational Garbage Collector

Garbage Collection (aka Automatic Memory Management) Douglas Q. Hawkins. Why?

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

How NOT to Measure Latency

Low latency & Mechanical Sympathy: Issues and solutions

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

A new Mono GC. Paolo Molaro October 25, 2006

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Java Memory Management. Märt Bakhoff Java Fundamentals

Enabling Java in Latency Sensitive Environments

Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

Low Latency Java in the Real World

Tick: Concurrent GC in Apache Harmony

New Java performance developments: compilation and garbage collection

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Do Your GC Logs Speak To You

The Garbage-First Garbage Collector

Attila Szegedi, Software

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Lecture 13: Garbage Collection

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

Java Without the Jitter

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

Run-Time Environments/Garbage Collection

Towards High Performance Processing in Modern Java-based Control Systems. Marek Misiowiec Wojciech Buczak, Mark Buttner CERN ICalepcs 2011

Garbage Collection. Vyacheslav Egorov

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Garbage Collection (2) Advanced Operating Systems Lecture 9

Java Performance Tuning From A Garbage Collection Perspective. Nagendra Nagarajayya MDE

The Z Garbage Collector Low Latency GC for OpenJDK

Garbage Collection. Weiyuan Li

Managed runtimes & garbage collection

JVM Performance Tuning with respect to Garbage Collection(GC) policies for WebSphere Application Server V6.1 - Part 1

Lecture 15 Advanced Garbage Collection

High Performance Managed Languages. Martin Thompson

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

The Z Garbage Collector An Introduction

High Performance Managed Languages. Martin Thompson

Azul Systems, Inc.

MEMORY MANAGEMENT HEAP, STACK AND GARBAGE COLLECTION

Contents. Created by: Raúl Castillo

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Implementation Garbage Collection

Java Performance: The Definitive Guide

Task-Aware Garbage Collection in a Multi-Tasking Virtual Machine

A JVM Does What? Eva Andreasson Product Manager, Azul Systems

High-Level Language VMs

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

JVM and application bottlenecks troubleshooting

Simple Garbage Collection and Fast Allocation Andrew W. Appel

Incremental GC for Ruby interpreter

Harmony GC Source Code

Myths and Realities: The Performance Impact of Garbage Collection

Hard Real-Time Garbage Collection in Java Virtual Machines

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

CS842: Automatic Memory Management and Garbage Collection. Mark and sweep

NUMA in High-Level Languages. Patrick Siegler Non-Uniform Memory Architectures Hasso-Plattner-Institut

HBase Practice At Xiaomi.

Evaluating and improving remembered sets in the HotSpot G1 garbage collector

Robust Memory Management Schemes

Garbage Collection (1)

Hierarchical PLABs, CLABs, TLABs in Hotspot

Automatic Memory Management

Lecture Conservative Garbage Collection. 3.2 Precise Garbage Collectors. 3.3 Other Garbage Collection Techniques

CS Computer Systems. Lecture 8: Free Memory Management

Garbage collection. The Old Way. Manual labor. JVM and other platforms. By: Timo Jantunen

Transcription:

Concurrent Garbage Collection Deepak Sreedhar JVM engineer, Azul Systems Java User Group Bangalore 1 @azulsystems azulsystems.com

About me: Deepak Sreedhar JVM student at Azul Systems Currently working on enhancing the C4 garbage collector implementation in Azul Zing JVM Prior experience with dynamic binary translation and server migration tools 2

Introduction 3

Quiz Does java spec mandate automatic GC? Is GC efficient? Can GC collect all dead objects? Can GC impact application throughput? Can GC impact application latency? Does a larger heap imply poorer performance? Does increasing Xmx (more free space) Improve GC efficiency? 4

Terminology The java heap memory Objects and references Live, reachable and dead objects Fragmentation and headroom wastage Virtual and physical memory Mutators Allocation and mutation rates 5

GC Safepoint A point in thread execution when GC can identify all references correctly, and there is no mutation Global safepoint (STW) all threads are at safepoint Safepointing not same as halting. A thread running native code (JNI) is at a safepoint Time to safepoint is as crucial for low latency as is the GC operation time. Try -XX: +PrintGCApplicationStoppedTime Safepoints may be needed for non GC reasons such as deoptimization and JVMTI heap iteration 6

GC classification Precise vs. Conservative Incremental vs. Monolithic Parallel vs. Serial Concurrent vs. Stop-the-world Multi-generational collectors Weak generational hypothesis Young (new) and Old (tenured) generation Promotion (tenuring) Lesser pauses usually in new gen (smaller set of live objects) Remembered sets, card tables for cross-generational references Can delay, but not avoid old gen collections 7

Copying collector Copy and fixup as objects are discovered From and To spaces Used for young (new) gen in many collectors Usually implemented as monolithic, stop-the-world Complexity of the order of live objects Theoretically, requires double the memory Practically many objects may be dead Eden and survivor spaces Early promotion to old gen when more memory is needed 8

Mark Compact Separate mark and compact phases Mark (trace) - identify live objects Compact - Move objects to reduce fragmentation Compact to To space Complexity of the order of live objects Can be implemented incrementally Full compaction can be delayed 9

Mark Sweep Compact Mark - identify live objects Sweep iterate over the heap and find free space Compact - Move objects to reduce fragmentation Used for old gen in many collectors Complexity of the order of heap size In-place, does not need more memory Can be implemented incrementally Can delay compaction to reduce pauses, but not eliminate it 10

Object allocation Increasing memory availability on servers into the terabyte space Efficient allocation using Thread Local Allocation Buffers (TLAB) and simple advance the top algorithm Not many java applications able to fully utilize this facility GC pauses (including in new gen) Difficulty in arriving at the right tuning Object pools, off heap memory used to get around this problem not perfect solutions since memory management layer needs to be coded Can we have a continuously concurrent garbage collector? 11

Challenges and approaches 12

Concurrent Marking Marking start from roots and traverse the object graph through discovered references Mutators can modify the object graph while GC is marking Move a reference to an already visited portion of the graph Remove references to an object from heap and keep a single reference in a register hiding it from GC marker Approaches Incremental update revisit root-set and modified portions of the graph iteratively, end with a re-mark pause SATB (snapshot at the beginning) intercept writes and store old contents into buffers 13

Concurrent Compaction Mutators can modify an object while it is being copied Mutators can read an object using stale pointers after it has been copied Incremental compact - G1GC Approach Divide heap into regions, maintain inter region references using remembered sets Minor collections use a copying collector Some minor collections do incremental compaction for old gen After concurrent mark, estimate efficiency of collecting regions, those with no or smaller RSets can be collected easier, so will be prioritized for upcoming minor collections Source regions updated while copying, RSets updates on new regions follow copying Mark sweep compact for STW major collections Read Barriers 14

GC Barriers Instructions executed by mutators that aid gar bage collection Help maintain metadata Impose invariants Write barriers Update cross generation or cross region references SATB barrier to ensure snapshot is fully marked Incremental update barriers that store new references Read barriers Baker-style barrier Brooks-style forwarding pointer C4 Load Value Barrier 15

The Continuously Concurrent Compacting Collector (C4) 16

Loaded Value Barrier A read barrier that ensures, at time of load, that the following invariants are met before reference is visible to application If GC cycle is in marking phase, the reference will be marked through If GC cycle is in relocation phase, or has completed relocation but not fixup, the reference will be updated to point to the relocated object Simultaneously guarantees that No reference misses GC attention during marking There is no stale access to a compacted page The result of the load will always be a valid reference to a valid object 17

Self Healing Contents of source location overwritten with the result of LVB Loading from same source cannot trigger barrier again Critical property that ensures finite and predictable amount of work There may be trap storms at phase shifts, but they will settle down as we do healing and complete Unique to the C4 barrier (LVB) 18

Mark phase Like other collectors start from root set and traverse the object graph NMT (not marked through) LVB check does reference metadata match expected GC state for the generation? Trap handling Fix NMT state for the reference, heal the source location and add to collector s work queue Checkpoints to clean stacks and transfer ref buffers Marking followed by a concurrent weak reference processing phase 19

Relocation phase Forwarding information kept outside of heap pages Virtual memory of compacted pages remain reserved until fixup is complete Physical memory can be released immediately (Quick Release) and recycled Hand over hand relocation Each GC thread can complete with just one seed page Compacted pages are protected to catch accesses performed without LVB Mutators cooperate in the relocation if GC hasn t moved the object yet at the relocate LVB trap Also heal the source memory with the new address of the object Large objects are just remapped to new virtual addresses, not physically copied 20

Fixup phase Traverse object graph and heal memory locations if not already done by mutators At end of fixup phase, virtual memory corresponding to compacted pages can be freed Can be combined with marking phase for next GC cycle, helping reduce GC cycle duration Mutators will do the fixup as part of LVB 21

Generational features New and old collections can proceed simultaneously and almost independently, unlike most collectors Perm gen processed by Old collector Old and new collectors use the same algorithm Synchronization using simple interlocks and limited suspension at phase changes Precise card marks for inter generational references. Updated by Store Value Barriers (SVB) Can be extended to N generations 22

Heap management Allocation in 2 MB pages Quick Release allows physical pages to be recycled to satisfy allocation requests before fixup is complete New, old and perm gen pages interleaved in virtual space Tiered allocation - Objects divided into small, mid and large spaces based on size helps limit maximum headroom wastage (currently 12.5%) TLABs for small space allocation, bump-the-pointer Relocation uses a different mechanism for each space to limit the maximum copy that a mutator needs to do 23

Zing Safepoints C4 algorithm is pauseless, but current implementation has few short pauses mostly at collector phase transitions (for ease and efficiency) Pause times independent of heap size, live object size, object lifetime, allocation rate, mutation rate, count of weak/soft/phantom references Provides sufficient safepoint opportunities to reduce time to bring threads to safepoint Pause times remain consistent Employs thread checkpoints when there is a specific action to be performed for/by that thread or when the thread needs to observe a GC state change 24

More on Zing GC scheduled by heuristics In most cases no tuning required Elastic memory - helps reduce occurrences of OOM Linux kernel module to improve performance of virtual memory operations 25

Keywords for reference search Talks by Gil Tene, CTO Azul Systems The Garbage Collection Handbook C4: The Continuously Concurrent Compacting Collector Garbage-First Garbage Collection Azul Zing JVM 26

Where Zing shines Low latency Eliminate behaviour blips down to the sub-millisecond-units level Machine-to-machine stuff Support higher *sustainable* throughput (one that meets SLAs) Messaging, queues, market data feeds, fraud detection, analytics Human response times Eliminate user-annoying response time blips. Multi-second and even fraction-of-a-second blips will be completely gone. Support larger memory JVMs *if needed* (e.g. larger virtual user counts, or larger cache, in-memory state, or consolidating multiple instances) Large data and in-memory analytics 27 Make batch stuff business real time. Gain super-efficiencies. Cassandra, Spark, Solr, DataGrid, any large dataset in fast motion

Q & A 28