G1 Garbage Collector Details and Tuning. Simone Bordet

Similar documents
Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

HBase Practice At Xiaomi.

10/26/2017 Universal Java GC analysis tool - Java Garbage collection log analysis made easy

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

JVM Memory Model and GC

Java Performance Tuning

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Concurrent Garbage Collection

The Garbage-First Garbage Collector

Garbage Collection. Hwansoo Han

The Fundamentals of JVM Tuning

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Do Your GC Logs Speak To You

Attila Szegedi, Software

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

Contents. Created by: Raúl Castillo

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

Java Memory Management. Märt Bakhoff Java Fundamentals

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Garbage Collection Algorithms. Ganesh Bikshandi

THE TROUBLE WITH MEMORY

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Java Performance Tuning From A Garbage Collection Perspective. Nagendra Nagarajayya MDE

2011 Oracle Corporation and Affiliates. Do not re-distribute!

Java Without the Jitter

Garbage Collection (aka Automatic Memory Management) Douglas Q. Hawkins. Why?

Exploiting the Behavior of Generational Garbage Collector

Kodewerk. Java Performance Services. The War on Latency. Reducing Dead Time Kirk Pepperdine Principle Kodewerk Ltd.

CA341 - Comparative Programming Languages

Towards High Performance Processing in Modern Java-based Control Systems. Marek Misiowiec Wojciech Buczak, Mark Buttner CERN ICalepcs 2011

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Garbage Collection (2) Advanced Operating Systems Lecture 9

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Reference Object Processing in On-The-Fly Garbage Collection

It s Good to Have (JVM) Options

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Lesson 2 Dissecting Memory Problems

Low latency & Mechanical Sympathy: Issues and solutions

Tick: Concurrent GC in Apache Harmony

New Java performance developments: compilation and garbage collection

The Z Garbage Collector Low Latency GC for OpenJDK

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

A new Mono GC. Paolo Molaro October 25, 2006

CS 241 Honors Memory

Typical Issues with Middleware

Hard Real-Time Garbage Collection in Java Virtual Machines

Java Application Performance Tuning for AMD EPYC Processors

Java Performance Tuning and Optimization Student Guide

Lecture 13: Garbage Collection

The Z Garbage Collector An Introduction

OS-caused Long JVM Pauses - Deep Dive and Solutions

Garbage Collection. Vyacheslav Egorov

Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide. Release 10

Lecture 15 Advanced Garbage Collection

Shenandoah GC...and how it looks like in September Aleksey

Garbage-First Garbage Collection

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

Garbage Collection. Weiyuan Li

Apple. Massive Scale Deployment / Connectivity. This is not a contribution

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

HTTP, WebSocket, SPDY, HTTP/2.0

Managed runtimes & garbage collection

CMSC 330: Organization of Programming Languages

WHO AM I.

Reference Counting. Reference counting: a way to know whether a record has other users

Troubleshooting Memory Problems in Java Applications

TECHNOLOGY WHITE PAPER. Azul Pauseless Garbage Collection. Providing continuous, pauseless operation for Java applications

Habanero Extreme Scale Software Research Project

Insight Case Studies. Tuning the Beloved DB-Engines. Presented By Nithya Koka and Michael Arnold

Reference Counting. Reference counting: a way to know whether a record has other users

Configuring the Heap and Garbage Collector for Real- Time Programming.

Generational Garbage Collection Theory and Best Practices

JDK 9/10/11 and Garbage Collection

Understanding Garbage Collection

Automatic Memory Management

Azul Pauseless Garbage Collection

Present but unreachable

CMSC 330: Organization of Programming Languages

Part I Garbage Collection Modeling: Progress Report

JVM Performance Tuning with respect to Garbage Collection(GC) policies for WebSphere Application Server V6.1 - Part 1

Understanding Java Garbage Collection

Java Performance: The Definitive Guide

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

Robust Memory Management Schemes

Lecture 15 Garbage Collection

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

High Performance Managed Languages. Martin Thompson

Garbage collection. The Old Way. Manual labor. JVM and other platforms. By: Timo Jantunen

Java Garbage Collection Best Practices For Tuning GC

Cycle Tracing. Presented by: Siddharth Tiwary

Enabling Java-based VoIP backend platforms through JVM performance tuning

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

Understanding Java Garbage Collection

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

Finally! Real Java for low latency and low jitter

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

Transcription:

G1 Garbage Collector Details and Tuning

Who Am I - @simonebordet Lead Architect at Intalio/Webtide Jetty's HTTP/2, SPDY and HTTP client maintainer Open Source Contributor Jetty, CometD, MX4J, Foxtrot, LiveTribe, JBoss, Larex CometD project leader Web messaging framework JVM tuning expert

G1 Overview

G1 Overview G1 is the HotSpot low-pause collector First papers date back to 2004 Available and supported since JDK 7u4 (April 2012) Long term replacement for CMS Scheduled to be the default GC for JDK 9 JEP 248: http://openjdk.java.net/jeps/248 Low pauses valued more than max throughput For majority of Java Apps For the others, ParallelGC will still be available

G1 Overview G1 is designed to be really easy to tune java -Xmx32G -XX:MaxGCPauseMillis=100... Tuning based on max Stop-The-World pause -XX:MaxGCPauseMillis=<> By default 250 ms

G1 Overview G1 is a generational collector G1 implements 2 GC algorithms Young Generation GC Stop-The-World, Parallel, Copying Old Generation GC Mostly-concurrent marking Incremental compaction Piggybacked on Young Generation GC

G1 Logging G1 has a very detailed logging Keep it ALWAYS enabled! -XX:+PrintGCDateStamps Prints date and uptime -XX:+PrintGCDetails Prints G1 Phases -XX:+PrintAdaptiveSizePolicy Prints ergonomic decisions -XX:+PrintTenuringDistribution Print aging information of survivor regions

G1 Memory Layout

G1 Memory Layout Familiar with this? Eden Young Generation Survivor Tenured Old Generation FORGET IT! Eden Survivor Tenured

G1 Memory Layout G1 divides the heap into small regions Targets 2048 regions Tuned via -XX:G1HeapRegionSize=<> Eden, Survivor, Old regions Humongous regions When a single object occupies > 50% of the region Typically byte[] or char[]

G1 Memory Layout G1 Memory Layout O E H S E E Eden O O S E S Survivor H H H O O O Old S E O O E H Humongous

G1 Young GC JVM starts, G1 prepares Eden regions Application runs and allocates into Eden regions Eden regions fill up When all Eden regions are full Young GC

G1 Young GC The application does not only allocates Application modifies pointers of existing objects An old object may point to an eden object E.g. an old Map has just been put() a new entry G1 must track these inter-generation pointers (Old Humongous) (Eden Survivor) pointers

G1 Young GC Inter-generation references OLD EDEN A D C F B E Card Table Remembered Set (RS)

G1 Remembered Set A write barrier tracks pointer updates object.field = <reference> Triggers every time a pointer is written Records the write information in the card Cards are stored in a queue (dirty card queue) The queue is divided in 4 zones: white, green, yellow, red

G1 Remembered Set White zone Nothing happens, buffers are left unprocessed Green zone (-XX:G1ConcRefinementGreenZone=<>) Refinements threads are activated Buffers are processed and the queue drained Yellow zone (-XX:G1ConcRefinementYellowZone=<>) All available refinement threads are active Red zone (-XX:G1ConcRefinementRedZone=<>) Application threads process the buffers

G1 Young GC Phases

G1 Young GC Phases G1 Stops The World G1 builds a collection set The regions that will be subject to collection In a Young GC, the collection set contains: Eden regions Survivor regions

G1 Young GC Phases First phase: Root Scanning Static and local objects are scanned Second phase: Update RS Drains the dirty card queue to update the RS Third phase: Process RS Detect the Eden objects pointed by Old objects

G1 Young GC Phases Fourth phase: Object Copy The object graph is traversed Live objects copied to Survivor/Old regions Fifth phase: Reference Processing Soft, Weak, Phantom, Final, JNI Weak references Always enable -XX:+ParallelRefProcEnabled More details with -XX:+PrintReferenceGC

G1 Young GC Phases G1 tracks phase times to autotune Phase timing used to change the # of regions Eden region count Survivor region count Updating the # of regions Respect of max pause target Typically, the shorter the pause target, the smaller the # of Eden regions

G1 Young GC Phases G1 Young GC E and S regions evacuated to new S and O regions O E H S E E Eden O O S E S Survivor H H H O O O Old S E O O E H Humongous

G1 Young GC Phases G1 Young GC O S E H E E Eden O O E S Survivor H H H S O O O Old E O O O E H Humongous

G1 Old GC

G1 Old GC G1 schedules an Old GC based on heap usage By default when the entire heap is 45% full Checked after a Young GC or a humongous allocation Tunable via -XX:InitiatingHeapOccupancyPercent=<> The Old GC consists of old region marking Finds all the live objects in the old regions Old region marking is concurrent

G1 Old GC Concurrent marking Tri-color marking

G1 Old GC Concurrent marking Roots marked black children marked gray

G1 Old GC Concurrent marking Gray wavefront advances

G1 Old GC Concurrent marking Gray wavefront advances

G1 Old GC Concurrent marking Gray wavefront advances

G1 Old GC Concurrent marking Marking complete white objects are garbage

G1 Old GC Concurrent marking: Lost Object Problem Marking in progress A B C

G1 Old GC Concurrent marking: Lost Object Problem Application write: A.c = C; A B C

G1 Old GC Concurrent marking: Lost Object Problem Application write: B.c = null; A B C

G1 Old GC Concurrent marking: Lost Object Problem Marking completes A B C

G1 Old GC G1 uses a write barrier to detect: B.c = null; More precisely that a pointer to C has been deleted G1 now knows about object C Speculates that object C will remain alive Snapshot-At-The-Beginning (SATB) Preserves the object graph that was live at marking start C is queued and processed during remark May retain floating garbage, collected the next cycle

G1 Old GC Phases

G1 Old GC Phases G1 Stops The World Performs a Young GC Piggybacks Old region roots detection (initial-mark) G1 resumes application threads Concurrent Old region marking proceeds Keeps track of references (soft, weak, etc.) Computes per-region liveness information

G1 Old GC Phases G1 Stops The World Remark phase SATB queue processing Reference processing Cleanup phase Empty old regions are immediately recycled Application threads are resumed

G1 Old GC Phases [GC pause (G1 Evacuation Pause) (young) (initial-mark) [GC concurrent-root-region-scan-start] [GC concurrent-root-region-scan-end, 0.0566056 secs] [GC concurrent-mark-start] [GC concurrent-mark-end, 1.6776461 secs] [GC remark, 0.0488021 secs] [GC cleanup 16G->14G(32G), 0.0557430 secs]

G1 Old GC Phases Cleanup phase recycles empty old regions What about non-empty old regions? How is fragmentation resolved? Non-empty Old regions processing Happens during the next Young GC cycle No rush to clean the garbage in Old regions

G1 Mixed GC Mixed GC piggybacked on Young GCs By default G1 performs 8 mixed GC -XX:G1MixedGCCountTarget=<> The collection set includes Part (1/8) of the remaining Old regions to collect Eden regions Survivor regions Algorithm is identical to Young GC Stop-The-World, Parallel, Copying

Old regions with most garbage are chosen first -XX:G1MixedGCLiveThresholdPercent=<> Defaults to 85% G1 wastes some heap space (waste threshold) -XX:G1HeapWastePercent=<> Defaults to 5% Mixed GCs are stopped When old region garbage <= waste threshold Therefore, mixed GC count may be less than 8

G1 Mixed GC G1 Mixed GC O S E H E E Eden O O E S Survivor H H H S O O O Old E O O O E H Humongous

G1 Mixed GC G1 Mixed GC O E H O E E Eden O S S E S Survivor H H H O O O Old E O O O E H Humongous

G1 General Advices

G1 General Advices Avoid at all costs Full GCs The Full GC is single threaded and REALLY slow Also because G1 likes BIG heaps! Grep the GC logs for Full GC Use -XX:+PrintAdaptiveSizePolicy to know what caused it

G1 Mixed GC Avoid to-space exhausted Not enough space to move objects to Increase max heap size G1 works better with more room to maneuver O O E H O E E Eden E O S E E S E S Survivor H H H E E O O S O E O O O E O H Old Humongous

G1 General Advices Avoid too many humongous allocations -XX:+PrintAdaptiveSizePolicy prints the GC reason Increase max heap size Increase region size: -XX:G1HeapRegionSize=<> Example Max heap size 32 GiB region size = 16 MiB Humongous limit 8 MiB Allocations of 12 MiB arrays Set region size to 32 MiB Humongous limit is now 16 MiB 12 MiB arrays are not humongous anymore

G1 General Advices Avoid lengthy reference processing Always enable -XX:+ParallelRefProcEnabled More details with -XX:+PrintReferenceGC Find the cause for WeakReferences ThreadLocals RMI Third party libraries

Real World Example

Real World Example Online Chess Game Application 20k requests/s Jetty server 1 server, 64 GiB RAM, 2x24 cores Allocation rate: 0.5-1.2 GiB/s CMS to G1 Migration

Real World Example Issue #1: MetaSpace [Full GC (Metadata GC Threshold) [Times: user=19.58 sys=0.00, real=13.72 secs] 13.72 secs Full GC! Easy fix: -XX:MetaspaceSize=<> Cannot guess how much MetaSpace you need? Start big (e.g. 250 MiB) and monitor it

Real World Example Issue #2: Max target pause We set -XX:MaxGCPauseMillis=250 Percentiles after GC log processing (24h run): 50.00% 250 ms 90.00% 300 ms 95.00% 360 ms 99.00% 500 ms 99.90% 623 ms 99.99% 748 ms 100.0% 760 ms

Issue #3: Mixed GCs G1 tries to respect max pause target Must account for old regions, not only young Currently G1 shrinks the young generation [GC pause (G1 Evacuation Pause) (young) [Eden: 12.4G(12.4G)->0.0B(608.0M)... [GC pause (G1 Evacuation Pause) (mixed) Eden: 12.4 GiB 0.6 GiB

Issue #3: Mixed GC Young Generation shrunk by a factor 20x But the allocation rate does not change! Young GCs become more frequent Application throughput suffers Minimum Mutator Utilization (MMU) drops

Young/Mixed GCs: Events X axis: event time 18370 18380 18390 18400 18410 18420 18430 18440 18450 18460 18470

Young/Mixed GCs: Eden Size 14000000 12000000 10000000 8000000 6000000 4000000 2000000 0 18370 18380 18390 18400 18410 18420 18430 18440 18450 18460 18470

Young/Mixed GCs: MMU (2 s window) 120 100 80 60 40 20 0 18370 18380 18390 18400 18410 18420 18430 18440 18450 18460 18470

Young/Mixed GCs: MMU (5 s window) 120 100 80 60 40 20 0 18370 18380 18390 18400 18410 18420 18430 18440 18450 18460 18470

Conclusions

Conclusions G1 is the future Good chances that it just works Easier to tune than CMS Yet, you must know exactly how it works to tune it better Not yet that respectful of GC pause target At least in our case, YMMV

Conclusions Still based on Stop-The-World pauses For extremely low latencies you need other solutions Always use the most recent JDK You'll benefit from continuous improvements to G1

References

References Search SlideShare for G1 GC Beckwith, Hunt, and others Numerous articles on Oracle's site Yu Zhang Poonam Bajaj Monica Beckwith OpenJDK HotSpot GC Mailing List hotspot-gc-use@openjdk.java.net

Questions & Answers