Programming Language Implementation

Similar documents
Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Lecture 13: Garbage Collection

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

Implementation Garbage Collection

Lecture 15 Garbage Collection

Garbage Collection Algorithms. Ganesh Bikshandi

Run-time Environments -Part 3

G Programming Languages - Fall 2012

Run-Time Environments/Garbage Collection

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Dynamic Memory Management! Goals of this Lecture!

Dynamic Memory Management

Exploiting the Behavior of Generational Garbage Collector

Run-time Environments - 3

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Robust Memory Management Schemes

Garbage Collection (1)

Algorithms for Dynamic Memory Management (236780) Lecture 4. Lecturer: Erez Petrank

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

Compiler Construction D7011E

CS 345. Garbage Collection. Vitaly Shmatikov. slide 1

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

Automatic Garbage Collection

Heap Management. Heap Allocation

CS 241 Honors Memory

Dynamic Memory Management

Automatic Memory Management

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection

CA341 - Comparative Programming Languages

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

CS Computer Systems. Lecture 8: Free Memory Management

Garbage Collection. Steven R. Bagley

Garbage Collection. Hwansoo Han

Opera&ng Systems CMPSCI 377 Garbage Collec&on. Emery Berger and Mark Corner University of Massachuse9s Amherst

CS577 Modern Language Processors. Spring 2018 Lecture Garbage Collection

Compiler Construction

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

Limitations of the stack

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

INITIALISING POINTER VARIABLES; DYNAMIC VARIABLES; OPERATIONS ON POINTERS

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

Compiler Construction

Compilers. 8. Run-time Support. Laszlo Böszörmenyi Compilers Run-time - 1

Name, Scope, and Binding. Outline [1]

Structure of Programming Languages Lecture 10

Memory Management. Memory Management... Memory Management... Interface to Dynamic allocation

Lecture Notes on Garbage Collection

A.Arpaci-Dusseau. Mapping from logical address space to physical address space. CS 537:Operating Systems lecture12.fm.2

Advanced Programming & C++ Language

CSE P 501 Compilers. Memory Management and Garbage Collec<on Hal Perkins Winter UW CSE P 501 Winter 2016 W-1

Last week. Data on the stack is allocated automatically when we do a function call, and removed when we return

In Java we have the keyword null, which is the value of an uninitialized reference type

Dynamic Storage Allocation

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

CS61C : Machine Structures

CPSC 213. Introduction to Computer Systems. Winter Session 2017, Term 2. Unit 1c Jan 24, 26, 29, 31, and Feb 2

G Programming Languages - Fall 2012

6.172 Performance Engineering of Software Systems Spring Lecture 9. P after. Figure 1: A diagram of the stack (Image by MIT OpenCourseWare.

Garbage Collection. Vyacheslav Egorov

Garbage Collection Techniques

Memory Management. Didactic Module 14 Programming Languages - EEL670 1

Memory Management. Chapter Fourteen Modern Programming Languages, 2nd ed. 1

Hard Real-Time Garbage Collection in Java Virtual Machines

Dynamic Memory Allocation: Advanced Concepts

CS61, Fall 2012 Section 2 Notes

Memory Management: The Details

Run-time Environments

Run-time Environments

Manual Allocation. CS 1622: Garbage Collection. Example 1. Memory Leaks. Example 3. Example 2 11/26/2012. Jonathan Misurda

Shenandoah An ultra-low pause time Garbage Collector for OpenJDK. Christine H. Flood Roman Kennke

CS558 Programming Languages

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Lecture 13: Complex Types and Garbage Collection

A new Mono GC. Paolo Molaro October 25, 2006

CSCI 171 Chapter Outlines

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection. CS 351: Systems Programming Michael Saelee

Lecture 15 Advanced Garbage Collection

CS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016

CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Lecture Notes on Memory Management

Memory Allocation III

ACM Trivia Bowl. Thursday April 3 rd (two days from now) 7pm OLS 001 Snacks and drinks provided All are welcome! I will be there.

Tick: Concurrent GC in Apache Harmony

Garbage Collection. Weiyuan Li

Lecture Notes on Advanced Garbage Collection

CS201- Introduction to Programming Current Quizzes

Review. Partitioning: Divide heap, use different strategies per heap Generational GC: Partition by age Most objects die young

CS558 Programming Languages

CS2210: Compiler Construction. Runtime Environment

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Understanding Pointers

the gamedesigninitiative at cornell university Lecture 9 Memory Management

Transcription:

A Practical Introduction to Programming Language Implementation 2014: Week 10 Garbage Collection College of Information Science and Engineering Ritsumeikan University 1

review of last week s topics dynamic and static (lexical) scoping functions actual arguments formal parameters body lexical scoping free variables higher-order functions functions as closures user-defined functions project: add user-defined functions (closures) to your language 2

this week s topics memory management the need for garbage collection precise collectors tracing, reference counting copying, compacting and fragmenting generation scavenging non-precise (conservative) collectors program memory image segments: text, data, stack implementation complications due to GC tagged immediate values to reduce GC overhead project: add garbage collection to your language 3

memory management the execution of a program generates data structures dynamically the program uses malloc() to allocate some memory which is filled with data according to some program type we call these structs/arrays/etc. by the generic term memory object or usually just object objects can contain pointers to other objects thereby forming a tree or graph of objects memory management is the process of reclaiming their storage by calling free() when an object is no longer needed therefore, in a correct program: every malloc() must eventually have a corresponding free() 4

garbage collection without memory management, memory would become exhausted your program will be terminated by the OS, or... malloc() returns NULL instead of memory which, of course, you forget to check for and your program crashes manual memory management relies on program(mer) logic this is tedious and error-prone failing to free() an object leads to memory leaks prematurely free()ing an object leads to dangling pointers free()ing the same an object twice leads to run-time errors (sometimes called double-free bugs ) garbage collection is the process of automatically calling free() when, and only when, it is safe to do so (GC was invented by John McCarthy in 1959 for his Lisp language) 5

detecting garbage a variable that stores a pointer to an object is called a root objects can contain pointers to other objects we have a graph of pointers to objects we can ignore cycles, so the graph becomes a tree the top of a tree is called its root objects that are reachable from a root are considered live by following (transitively) one or more pointers all other objects are unreachable and therefore garbage they can never again be used in the computation and must be collected so their memory can be reclaimed to prevent memory exhaustion 6

roots, live objects and garbage roots memory objects foo bar live baz global variables, local vaiables, function arguments, etc. garbage 7

precise collectors two kinds of garbage collection precisely identify all objects that are garbage most efficient use of memory can be intrusive require cooperation from the programmer and/or the language implementation two approaches: tracing and reference counting non-precise (conservative) collectors conservatively estimate which objects are garbage less efficient use of memory least intrusive often a drop-in replacement for malloc() only one approach: scan the whole of memory for potential roots 8

precise collection: tracing determine which objects should be collected by removing live objects from the set of all objects what remains are the dead objects recursively follow object references, starting from the root objects until all live objects have been visited any objects not visited are unreachable (or inaccessible) they can be collected and their storage reclaimed many algorithms and implementations from very simple to very complex one of the simplest is mark-and-sweep effective for small numbers of objects (thousands rather than millions) relatively low rates of allocation 9

precise collection: mark and sweep root objects object memory before collection M starting from a root, mark accessible object mark phase M M M following pointers in that object, mark transitively-accessible objects repeat until... M M M M all live objects are marked; remaining objects are unreachable garbage sweep phase storage for dead objects can be reclaimed and reused 10

precise collection: mark and sweep 1. initialisation: for each object p set is_marked(p) = false 2. mark phase: calculate the transitive closure of reachable objects for each root r call mark_and_trace(r) where mark_and_trace(p) = if is_marked(p) then return set is_marked(p) = true for each pointer q in p call mark_and_trace(q) any objects still not marked are unreachable (or inaccessible) 3. sweep phase: for each object p if is_not_marked(p) then free(p) 11

advantages: precise collection: mark and sweep simple (a few tens of lines of C) does not require any special data structures can be a wrapper around malloc() fast, for small object populations disadvantages: does not scale: cost is linear in the size of the object memory requires additional information ( is-marked bit) within objects must be able to enumerate (iterate over) all objects requires precise knowledge of object layout (location of pointers) requires precise knowledge of roots explicit declaration of global root variables explicit protection of local roots function arguments, local variables, intermediate results 12

tracing collectors: programming complications all roots must be identified to the collector global variables containing roots function arguments containing temporary references local variables containing temporary references tedious and very error-prone object *foo(object *arg) { GC_protect(arg); /* temporary root */ object *bar = twiddle(arg); GC_protect(bar); /* temporary root */ run_fermat_solver(); /* potential GC here */ GC_unprotect(bar); GC_unprotect(arg); return bar; } bugs can take months to find and days to diagnose 13

fragmentation consider the situation at the end of our mark-sweep collection total available memory for allocation recently-reclaimed objects four units of memory are unused, but the largest object we can allocate is two units after which we can only allocate two more objects of one unit the four units of available memory are fragmented 14

memory compaction a compacting collector removes fragmentation sweep phase groups live objects together available space is left in one contiguous region live objects compacted to start of memory available memory is contiguous more complex than simple mark-sweep recently-reclaimed objects all pointers must be updated with new locations of objects 15

precise collection: reference counting the opposite of tracing while the application is executing... record the total number of references that exist to each object when the count reaches zero, the object is unreachable incoming references reference_count = 3 application fields... 16

precise collection: reference counting reference counting relies on a write barrier when an existing pointer is overwritten decrement the reference count of the object it refers to if the count has reached zero, collect the object when a pointer is stored in a variable or other memory location increment the reference count of the object it refers to obj reference_count = 1 pointer_member = &p... p reference_count = 1 q reference_count = 1 obj.pointer_member := q obj p garbage reference_count = 1 pointer_member = &q... reference_count = 0 q reference_count = 2 17

precise collection: reference counting a typical write barrier might be implemented like this: object *store_pointer(object **location, object *pointer) { if (*location) { (*location)->reference_count -= 1; if ((*location)->reference_count == 0) { /* decrement the ref counts of all pointer */ /* fields in the object at *location and */ /* then reclaim its storage */ collect(*location); } } *location = pointer; if (pointer) pointer->reference_count += 1; return pointer; } all stores to pointer variables must use this function programmer discipline has to enforce this special naming and macros for setters can help 18

advantages precise collection: reference counting objects are collected as soon as they become unreachable reference counting has good locality with computation reference_count word tends to be in the CPU cache disadvantages decrementing one reference count can lead to mass extinction avalanche of reference counts reaching zero potential pause in program execution can be mitigated by (e.g.) incremental sweeping round-robin check for zero ref counts during allocation one object checked/deallocated per object allocated cycles must be broken explicitly programmer must predict cycles, store NULL to break them or have a secondary tracing collector to periodically find them 19

precise collection: reference counting a cycle of unreachable garbage with non-zero reference count ref_count = 1 ref_count = 1 ref_count = 1 ref_count = 1 20

precise collection: reference counting an avalanche of collections waiting to happen pointer about to be overwritten ref_count = 1 ref_count = 1... one million objects with ref_count = 1... ref_count = 1 21

non-precise (conservative) collectors all data values in a program can be enumerated those that appear to be pointers to memory blocks can be identified if a word looks like a pointer, it is assumed to be a pointer some will not be pointers: integers, floats, strings, etc. some objects that are not live will be considered live hence non-precise and conservative collector any objects with no such pointers are unreachable the program can never refer to them again they can be collected and recycled 22

non-precise collection: program memory image text segment contains the program data and stack segments contain variables variables can contain roots heap contains objects program start of memory machine code instructions initialized variables uninitialized variables text segment data segment scanning the data and stack segments provides potential root pointers to heap objects data heap malloc()- allocated objects pointers to addresses outside of heap can be ignored run-time stack end of memory stack segment 23

non-precise collection conservative collector advantages: almost transparent to the programmer replace malloc with GC_malloc, remove calls to free no need to declare root variables, etc. conservative collector disadvantages: false pointers look like roots, preventing reclamation unpredictable run-time memory requirements (almost always more than is really in use) actual value pointer value (void *) malloc(32) (void *) 0x00100130 (int) 1000000 (void *) 0x000f4240 (char []) "ptr" (void *) 0x00727470 (float) 0.0 (void *) 0x3f800000 (void *) &printf (void *) 0x90377270 24

generation scavenging objects tend to do one of two things: die very quickly (e.g., arithmetic intermediate results) last a long time (e.g., data structures, global environments, etc.) generational collectors exploit this by having two memory areas new space is where objects are born new space is small, and traced/collected frequently objects that outlive several new space collections are tenured they are moved to old space old space is large, and traced/collected rarely new space therefore contains only young objects most of which will die during collections and therefore cost very little to trace, mark and compact 25

generation scavenging new space old space new objects garbage collection & compaction allocation collection & compaction survivors tenured to old space 26

tagged immediate values to reduce GC overhead alignment rules mean pointers to objects are always even actually, multiples of 4 or even 8 we can use an odd pointer to encode 31 bits of information LS bit is set to 0 for pointers, 1 for non-pointers pointer to object in memory 31 bits of non-pointer value 0 1 31 30...... 1 0 27

tagged immediate values to reduce GC overhead in our language, integer arithmetic allocates objects Token *primitive_add(token *operands) { return make_integer(operands->first.integer + operands->second.integer); } if 31-bit integers are sufficient, we can encode them within a pointer void *make_integer(int i) { return (void *)((i << 1) 1); } int get_integer(void *p) { return (int)p >> 1; } Token *primitive_add(token *operands) { return make_integer(get_integer(operands->first) + get_integer(operands->second)); } these are called tagged integers 28

garbage collection summary application allocates objects until some threshold is crossed; e.g: number of objects allocated total size of objects allocated GC identifies live objects and garbage (dead objects) object graph tracing conservative scan for pointers GC collects dead objects, reclaims their unused storage possibly only on objects younger than a certain generation GC tidies up memory to reduce fragmentation compaction (copy or move objects to make them contiguous) application continues reference counting is an alternative approach with different trade-offs 29

project: add garbage collection to your language check memory usage for your interpreter running this program (define x 0) (while 1 (print (assign x (add x 1)))) Boehm-Demers-Weiser Garbage Collector popular conservative collector for C and C++ home page: http://www.hboehm.info/gc/ Linux (Debian/Ubuntu): apt-get install libgc-dev MacOS X (via Homebrew): brew install boehmgc Windows (Cygwin): install libgc-devel from setup.exe 30

project: add garbage collection to your language drop-in replacement for malloc() install libgc add #include <gc/gc.h> to your program replace calloc() with GC_malloc() in make_token() call GC_INIT(): at the start of main() link your interpreter with -lgc Token *make_token(int type) { Token *token = GC_malloc(sizeof(Token)); /*... */ } int main(int argc, char **argv) { GC_INIT(); /*... */ } the program should now run without consuming all available memory 31

glossary allocate obtain memory from the operating system and give it to the program for use in storing data. available space the amount of memory that is available for allocation. collect (an object) perform any final actions on the object (such as collecting any objects it references) and then reclaim the storage allocated to it. compact move live objects together, eliminating spaces between them, to reduce memory fragmentation. conservative cautious. In garbage collection, a non-precise collector that makes a cautious estimation of which objects are unreachable. If there is any doubt at all, an object will be considered reachable. This results in safe collection, but less than optimal use of memory. 32

contiguous having no gaps. cycle a series of pointer references which eventually lead back to the same object or location. In a reference-counting GC, a cycle can artificially maintain a non-zero reference count in all of its objects even though those objects are unreachable and therefore garbage. dangling pointer a pointer that can be used by the program in future computation but whose referent object has been collected prematurely. When the pointer is dereferenced unexpected things will happen, including but not limited to the program crashing. data segment the area of a program s memory space in which the variables and malloc()-allocated memory resides. Grows as needed towards higher addresses. deallocate return memory that is being used to store data from the program to the operating system. decrement reduce by one. 33

double-free call free() on a pointer twice. The results are unpredictable, including but not limited to the program crashing. exhaustion (memory or other resource) running out, leaving none available for use by the program. fragmentation breaking up of some quantity (such as available memory) into smaller pieces, making it impossible to use all the available resource or any amount of it larger than the largest of those pieces. garbage unreachable objects that can never be used by a program for future computation. garbage collection deallocation of objects that have become unreachable. 34

generational collector a collector that uses two or more distinct spaces, in which objects of different ages are stored and garbage collected. Generational collectors exploit the observation that objects tend to either last a very short time or a very long time. graph a structure containing objects (nodes) and references (edges) from objects to other objects. inaccessible something that can no longer be accessed. Synonymous with unreachable. incremental doing a little work at a time, rather than all the work at once. An incremental collector typically performs a little collection work every time an object is allocated. This prevents long pauses due to the collection of many objects at the same time. increment increase by one. live not dead; still reachable by the program. 35

mark phase the phase of a mark-sweep collector in which the object graph is traced and all reachable objects are marked as live. mark-and-sweep a collector in which a mark phase identifies live objects and then a sweep phase deallocates dead objects, possibly compacting the live objects in the process to eliminate fragmentation. memory leak failure to deallocate a garbage object and reclaim its storage, preventing that storage from being used again by the program. memory management ensuring the efficient use of memory. For example, ensuring that all memory objects are deallocated when they are no longer needed by the program. new space the region in a generational collector into which new objects are born. If they have not been collected after several generations (garbage collections) have elapsed, they will be tenured into old space. New space is traced and collected much more frequently than old space. 36

non-precise a garbage collector that makes a conservative estimate of which objects are live, rather than attempting to make a precise determination based on tracing. object a region of memory that was allocated to the program by malloc(). old space a region in a generational collector into which objects that have survived for several generations (garbage collections) are tenured. Old space is traced and collected far less frequently than new space. precise a collector that traces the object graph to determine exactly which objects are live and which objects are garbage. reachable something that can the program can access by following a chain of one or more pointers. reclaim return memory to the system, e.g., by calling free(). 37

reference counting identifying garbage by keeping a count of the total number of references that exist to each object. When the count drops to zero, the object is garbage. root a variable or other memory location that contains a pointer to an object and which the program can be access by name, or by some other mechanism that does not involve following another reference. Objects referenced from roots can therefore be used in future computation and are always considered live. round-robin performing a sequence of tasks, or processing pieces of data, one after the other and wrapping around from the last back to the first. stack segment the region of memory in which the program s call stack is located. Usually grows towards lower memory addresses. 38

sweep phase a phase in a mark-sweep collector during which dead objects are deallocated and their storage reclaimed. Live objects can also be compacted during this phase to eliminate fragmentation. tagged integer an integer whose value is encoded in an object pointer. Tagged integers are distinguished from pointers by having some property that is impossible for a pointer, such as the least significant bit being set (an odd address) in a system where the minimum alignment of memory objects is 4 or 8. tenure to move an object from new space to old space when it has survived a certain number of generations and is therefore likely to survive for many more. text segment the region of a program s memory in which the machine code instructions (the text of the program, written in machine language) is stored. This region is usually of a fixed size and often located at the start of memory. 39

trace to follow a graph of object references, starting from one or more roots, in order to calculate the transitive closure of reachable objects in a program s memory. transitive a mathematical relation having the property that whenever A is related to B and B is related to C, then A is also related to C. In computer memory, object reachability (by following pointers) is a transitive relationship. transitive closure the set of all items that can be related, by a transitive relation, to some initial item. In computer memory, the transitive closure of reachable objects starting from the roots is the set of all reachable objects. tree a graph in which cycles have been removed. unreachable an object that is inaccessible to the program. No reference to the object exists either in the program s variables or in any other object that is reachable by the program. 40

write barrier a piece of code that is executed before and/or after performing a write operation on a location of importance. In garbage collection, a write barrier can be used to detect a significant mutation of a pointer variable for purposes such as reference counting. 41