Tick: Concurrent GC in Apache Harmony

Similar documents
Harmony GC Source Code

Parallel Garbage Collection

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection

Concurrent Garbage Collection

Garbage Collection Algorithms. Ganesh Bikshandi

Lecture 15 Garbage Collection

Programming Language Implementation

A new Mono GC. Paolo Molaro October 25, 2006

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Parallel GC. (Chapter 14) Eleanor Ainy December 16 th 2014

Garbage Collection Basics

G1 Garbage Collector Details and Tuning. Simone Bordet

Simple Garbage Collection and Fast Allocation Andrew W. Appel

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

Run-Time Environments/Garbage Collection

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Implementation Garbage Collection

A Parallel, Incremental and Concurrent GC for Servers

Lecture 15 Advanced Garbage Collection

Java without the Coffee Breaks: A Nonintrusive Multiprocessor Garbage Collector

The C4 Collector. Or: the Application memory wall will remain until compaction is solved. Gil Tene Balaji Iyengar Michael Wolf

Garbage Collection. Hwansoo Han

Mark-Sweep and Mark-Compact GC

Garbage Collection. Vyacheslav Egorov

CSE P 501 Compilers. Memory Management and Garbage Collec<on Hal Perkins Winter UW CSE P 501 Winter 2016 W-1

Exploiting the Behavior of Generational Garbage Collector

Complementary Garbage Collector.

A Fully Concurrent Garbage Collector for Functional Programs on Multicore Processors

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

CS 241 Honors Memory

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Principal Software Engineer Red Hat

Garbage-First Garbage Collection by David Detlefs, Christine Flood, Steve Heller & Tony Printezis. Presented by Edward Raff

Freeing memory is a pain. Need to decide on a protocol (who frees what?) Pollutes interfaces Errors hard to track down

Freeing memory is a pain. Need to decide on a protocol (who frees what?) Pollutes interfaces Errors hard to track down

Do Your GC Logs Speak To You

How do we mark reachable objects?

Lecture 13: Garbage Collection

CMSC 330: Organization of Programming Languages

The Generational Hypothesis

Verifying a Concurrent Garbage Collector using a Rely-Guarantee Methodology

CS842: Automatic Memory Management and Garbage Collection. Mark and sweep

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Pause-Less GC for Improving Java Responsiveness. Charlie Gracie IBM Senior Software charliegracie

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Opera&ng Systems CMPSCI 377 Garbage Collec&on. Emery Berger and Mark Corner University of Massachuse9s Amherst

Heap storage. Dynamic allocation and stacks are generally incompatible.

Reference Object Processing in On-The-Fly Garbage Collection

Reference Counting. Reference counting: a way to know whether a record has other users

Robust Memory Management Schemes

CMSC 330: Organization of Programming Languages

Garbage Collection Techniques

JVM Memory Model and GC

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

Go GC: Prioritizing Low Latency and Simplicity

Shenandoah GC...and how it looks like in September Aleksey

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

Memory management has always involved tradeoffs between numerous optimization possibilities: Schemes to manage problem fall into roughly two camps

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

Cycle Tracing. Presented by: Siddharth Tiwary

Mostly Concurrent Garbage Collection Revisited

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Memory Management. Memory Management... Memory Management... Interface to Dynamic allocation

The HALFWORD HEAP EMULATOR

Inside the Garbage Collector

Incremental GC for Ruby interpreter

Heap Management. Heap Allocation

An On-the-Fly Reference-Counting Garbage Collector for Java

Hazard Pointers. Number of threads unbounded time to check hazard pointers also unbounded! difficult dynamic bookkeeping! thread B - hp1 - hp2

Automatic Memory Management

Reference Counting. Reference counting: a way to know whether a record has other users

An On-the-Fly Reference-Counting Garbage Collector for Java

Multicore OCaml. Stephen Dolan Leo White Anil Madhavapeddy. University of Cambridge. OCaml 2014 September 5, Multicore OCaml

RincGC: Incremental GC for CRuby/MRI interpreter. Koichi Sasada Heroku, Inc.

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

New Java performance developments: compilation and garbage collection

6.172 Performance Engineering of Software Systems Spring Lecture 9. P after. Figure 1: A diagram of the stack (Image by MIT OpenCourseWare.

Schism: Fragmentation-Tolerant Real-Time Garbage Collection. Fil Pizlo *

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there.

Understanding Garbage Collection

Hard Real-Time Garbage Collection in Java Virtual Machines

Software Prefetching for Mark-Sweep Garbage Collection: Hardware Analysis and Software Redesign

Partial Marking GC. A traditional parallel mark and sweep GC algorithm has, however,

Structure of Programming Languages Lecture 10

THE TROUBLE WITH MEMORY

Garbage Collection. CS 351: Systems Programming Michael Saelee

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

High-Level Language VMs

Attila Szegedi, Software

Memory Management: The Details

Zing Vision. Answering your toughest production Java performance questions

Space-and-Time Efficient Parallel Garbage Collector for Data-Intensive Applications

Memory management. Knut Omang Ifi/Oracle 10 Oct, 2012

The Garbage-First Garbage Collector

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

Transcription:

Tick: Concurrent GC in Apache Harmony Xiao-Feng Li 2009-4-12 Acknowledgement: Yunan He, Simon Zhou

Agenda Concurrent GC phases and transition Concurrent marking scheduling Concurrent GC algorithms Tick code in Apache Harmony

Concurrent GC Options At the moment, Tick supports only mark-sweep algorithm #define USE_UNIQUE_MARK_SWEEP_GC Command line options -XXgc.concurrent_gc = TRUE Use concurrent GC (and default concurrent algorithm) You can also specify concurrent phases separately -XXgc.concurrent_enumeration -XXgc.concurrent_mark -XXgc.concurrent_sweep Write barrier (generate_barrier) is set TRUE for any concurrent collection

Mutator Object Allocation gc_alloc uses mark-sweep gc_ms_alloc Thread local allocation As normal, except that It checks if a concurrent collection should start

Collection Triggering Case 1: heap is full Has to trigger STW collection Case 2: trigger collection at proper time so as to avoid STW Check at allocation site Actually also trigger collection phase transition at allocation site Then schedule proper phase

Collection Phases One collection cycle Mutator Collector 1 2 3 4 5 1 Time enum mark collect Tick uses a state graph to guide the phase transitions 2, Enum: suspend and enumerate rootset 3, Mark: trace and mark the live objects 4, Collect: recycle the dead objects Note: All of the three phases can be executed in a STW manner When the heap is full, the collection transitions to STW manner

Phase Transition 1: No collection 5: Wrap up collection 6: STW collection 1 2 Time to collect 2 3 Finish enumerating 3 4 Finish marking 4 5 Finish collecting 1,2,3,4 6 STW if heap is full 5,6, 1 1 Finish concurrent/stw collection, or keep no collection Global gc->gc_concurrent_status tracks the states

State Transition Scheduler Entry point: gc_sched_collection() Invoked in every gc_alloc() Calls gc_con_perform_collection() when command line option specifies concurrent GC Every mutator invokes the scheduler Use atomic operation so that only one mutator makes the state transition Protocol between mutator and collector Mutator drives the state transition between phases Collector reports its state back to mutator Mutator Collector

Phase Interactions 1 2 3 4 5 1 NIL ENUM START_MARKERS SWEEPING RESET NIL TRACING TRACE_DONE SWEEP_DONE BEFORE_SWEEP BEFORE_FINISH State transitioned by mutator State transitioned by collector Mutator Collector Time

States Definition enum GC_CONCURRENT_STATUS { GC_CON_NIL = 0x00, GC_CON_STW_ENUM = 0x01, GC_CON_START_MARKERS = 0x02, GC_CON_TRACING = 0x03, GC_CON_TRACE_DONE = 0x04, GC_CON_BEFORE_SWEEP = 0x05, GC_CON_SWEEPING = 0x06, GC_CON_SWEEP_DONE = 0x07, GC_CON_BEFORE_FINISH = 0x08, GC_CON_RESET = 0x09, GC_CON_DISABLE = 0x0A //STW collection };

gc_con_perform_collection() in src/common/concurrent_collection_scheduler.cpp switch( gc->gc_concurrent_status ) { case GC_CON_NIL : if(!gc_con_start_condition(gc) ) return FALSE; state_transformation( gc, NIL, ENUM ); gc->num_collections++; gc_start_con_enumeration(gc); //now it is a stw enumeration state_transformation( gc, ENUM, START_MARKERS ); gc_start_con_marking(gc); break; 1 2 3 (continued in next slide)

gc_con_perform_collection() case GC_CON_BEFORE_SWEEP : state_transformation( gc, BEFORE_SWEEP, SWEEPING ); gc_ms_start_con_sweep(gc); 4 break; case GC_CON_BEFORE_FINISH : state_transformation( gc, BEFORE_FINISH, RESET ); gc_reset_after_con_collection(gc); 5 state_transformation( gc, RESET, NIL ); break; 1 default : return FALSE;

Agenda Concurrent GC phases and transition Concurrent marking scheduling Concurrent GC algorithms Tick code in Apache Harmony

Write Barrier Tick uses write barrier to track heap modifications Remember set (or called dirty set sometimes) Write barrier is instrumented by JIT, so it is enabled at Harmony startup NULL method if no collection (state 1) NULL method if STW collection (state 6) In other states, a specified method for its respective algorithm Write barrier also catches object_clone and array_copy

Remember Set Mutator holds a local RemSet (or dirty_set) Implemented as an array (Vector_Block) Mutator grabs an empty set from global free pool Mutator puts a full set to global rem set pool During marking, root set and rem set are processed together At the same time, global rem set pool is growing So, concurrent access to the pool by mutators and markers

mutators Global Rem Set Pool global free pool global rem set pool markers

Global Mark Task Pool 1. Shared pool for task sharing 2. One reference is a task 3. Collector grabs task block from pool 4. Pop one task from task block, push into mark stack 5. Scan object in mark stack in DFS order 6. If stack is full, grow into another mark stack, put the full one into pool 7. If stack is empty, take another task from task block Mark Stack ❻ ❺ Markers ❹ Task Block ❼ Global Task Pool ❸ ❷ ❶ global free pool

Marking Termination In some algorithm, it requires all the data sets are consumed to terminate Root set Rem set (mutator local, global pool) Task set (marker local stack, global pool) Since marking and remembering are concurrent, how to ensure mutators local rem sets are empty? Tick s solution: 1. Marker copies mutators local sets to global pool gc_copy_local_dirty_set_to_global() 2. Marker scans the global pool 3. Check if any mutator local set is non-empty, goto 1; or terminate. dirty_set_is_empty()

Trigger Collection Collection consumes system resource Avoid collection if possible Trigger strategy Can not be too late Otherwise has to STW, no concurrent Can not be too early Otherwise waste system resource Ideally, trigger at a time point T, so that When marking finishes, heap becomes full

Trigger Point T At time T, heap has free size S Assuming mutators allocation rate is AllocRate, markers tracing rate is TraceRate, Then ideally, S = AllocRate * MarkingTime H = TraceRate * MarkingTime S = H * AllocRate/TraceRate I.e., when heap has free size S, collection is triggered I.e., it takes time S/AllocRate to conduct a full marking If current free size is S now, collection after time T delay = (S now - S)/AllocRate Tick does not check heap free size in every allocation It checks after time T delay /2. (I.e., binary approximation)

Agenda Concurrent GC phases and transition Concurrent marking scheduling Concurrent GC algorithms Tick code in Apache Harmony

Tick Concurrent Algorithms Known concurrent mark-sweep GCs Mostly concurrent (Boehm, Demers, Shenker PLDI 91 ) Snapshot-at-the-beginning DLG on-the-fly (Doligez, Leroy, Gonthier, POPL93, POPL94) Sliding-view on-the-fly (Azatchi, et al. OOPSLA03 ) Harmony Tick implemented three algorithms, similar to those listed above XXgc.concurrent_algorithm MOSTLY_CON: similar to mostly concurrent OTF_SLOT: similar to DLG on-the-fly OTF_OBJ: similar to sliding-view on-the-fly

Tick 1: MOSTLY_CON Steps 1. Rootset enumeration (optionally STW). WB turn on. 2. WB rem updated objects; Con marking from roots. 3. WB keep remembering; Con rescan from rem set; Repeat 3. 4. WB turn off. STW re-enumeration and marking 5. Concurrent sweeping Write barrier { *slot = new_ref; if( obj is marked && obj is clean){ dirty obj; remember(obj); } } New object handling: nothing

Tick 2: OTF_SLOT Steps 1. Root set enumeration (optionally STW). WB turn on. 2. WB rem overwritten ref; Con marking from root set and rem set 3. WB turn off. Con sweeping Write barrier { old_ref = *slot; if( *old_ref is unmarked){ remember(old_ref); } *slot = new_ref; } New object handling: created marked (black)

Tick 3: OTF_OBJ Steps 1. Root set enumeration (optionally STW). WB turn on. 2. WB rem overwritten ref; Con marking from root set and rem set 3. WB turn off. Con sweeping Write barrier { if( obj is unmarked && obj is clean){ remember(obj snapshot); dirty obj; } *slot = new_ref; } New object handling: created marked (black)

MOSTLY_CON vs. SATB Mostly-concurrent Root Set A A A dirty New Object A B B B D B D C C C E C E Root Set Root Set Snapshot-at-the-beginning New Object Root Set A B C A B C A B C A D B D Rem Set C

MOSTLY_CON vs. SATB Marking termination New objects handing Write barrier SATB Terminate once no gray objs Pro: deterministic Con: snapshot may keep more floating garbage Born marked Con: floating garbage (newborns potentially die soon) Care old values in snapshot Pro: sliding-view remembers small # objects Mostly-concurrent Need STW final tracing Con: STW pause No special handling Con: rescans (new-borns normally active) Care new values in execution Con: remembered # depends on marker thread priority

OTF_SLOT vs. OTF_OBJ OTF_SLOT Root Set A x yz B C D A S x yz B C D T A S x yz B C D Rem Set Rem Set OTF_OBJ Root Set A x yz B C D A S x yz B C D T A S x yz B C D Rem Set Rem Set

OTF_SLOT vs. OTF_OBJ OTF_SLOT Pro: remember only the updated slots Con: check the old slot value if it is marked before remembering, causing cache misses OTF_OBJ Con: log the entire updated object Pro: need log only once per object

Differences Between Tick GCs Implementation level They share the same infrastructure Collection scheduler Rem set data structure Mark-sweep algorithm Differences are not big Write barrier (already shown) Termination condition (shown next)

MOSTLY_CON Marking Termination MOSTLY_CON marking process does not converge Dirty objects are consumed and produced concurrently Should terminate voluntarily or when heap is full No strict termination condition. Try to mark most live objects Conditions for termination 1. Root sets and mark stacks are empty 2. A termination request from a mutator whose allocation fails Markers then quit, and the requesting mutator triggers STW GC STW re-enumeration and marking Only scan unmarked objects, shorter pause time 3. Marker also terminates voluntarily when not many tasks remain

SATB Marking Termination SATB marking process converges Live object quantity is fixed at snapshot Conditions for termination 1. Global rootset pool and mark stacks are empty Implemented as all markers finish marking 2. Mutator local remsets are empty 3. Global remset pool are empty Check 3 must precede 4, because mutator might put local remset to global pool Conditions must be satisfied In order not to lose any live objects, i.e., to find the snapshot

Agenda Concurrent GC phases and transition Concurrent marking scheduling Concurrent GC algorithms Tick code in Apache Harmony

Tick Entry Point gc_alloc gc_ms_alloc wspace_alloc wspace: space managed by mark-sweep gc_gen\src\mark_sweep\wspace_alloc.cpp wspace_alloc gc_sched_collection gc_con_perform_collection gc_gen\src\common\collection_scheduler.cpp gc_con_perform_collection -> gc_con_start_condition Check if a collection should trigger Perform all the collection phases transition gc_start_con_enumeration gc_start_con_marking gc_ms_start_con_sweep

Collectors Scheduling gc_start_con_marking gc_ms_start_con_mark or gc_ms_start_mostly_con_mark From general control to specific algorithms gc_gen\src\common\gc_concurrent.cpp gc_ms_start_con_mark conclctor_execute_task_con notify_conclctor_to_work() triggers the idle concurrent collectors gc_gen\src\mark_sweep\gc_ms.cpp conclctor_thread_func conclctor_wait_for_task task_func Concurrent collectors are waken up and work on assigned task gc_gen\src\thread\conclctor.cpp Conclctors were initialized by conclctor_initialize in gc_init()

Summary Harmony Tick has implemented three concurrent GC algorithms MOSTLY_CON, OTF_SLOT, OTF_OBJ Tick has a common design infrastructure for phase control and collection scheduling Experimental measurement showed very short pause time Next step to optimize and polish the code More documents on the design and code to lower the entry barrier