Lecture 22: Garbage Collection 10:00 AM, Mar 16, 2018

Similar documents
Lecture 14: Exceptions 10:00 AM, Feb 26, 2018

Lecture 23: Priority Queues 10:00 AM, Mar 19, 2018

Lecture 24: Implementing Heaps 10:00 AM, Mar 21, 2018

1: Introduction to Object (1)

Lecture 23: Priority Queues, Part 2 10:00 AM, Mar 19, 2018

Lecture 21: The Many Hats of Scala: OOP 10:00 AM, Mar 14, 2018

Lecture 5: Implementing Lists, Version 1

Lecture 23: Priority Queues, Part 2 10:00 AM, Mar 19, 2018

Garbage Collection. Steven R. Bagley

Lecture 16: HashTables 10:00 AM, Mar 2, 2018

Lab 9: More Sorting Algorithms 12:00 PM, Mar 21, 2018

Burning CDs in Windows XP

CMSC 330: Organization of Programming Languages

The Dynamic Typing Interlude

CMSC 330: Organization of Programming Languages

Lecture 11: In-Place Sorting and Loop Invariants 10:00 AM, Feb 16, 2018

Lecture Notes on Garbage Collection

Lecture 17: Implementing HashTables 10:00 AM, Mar 5, 2018

Lecture 7: Implementing Lists, Version 2

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Homework 2: Imperative Due: 5:00 PM, Feb 15, 2019

Lab 10: Sockets 12:00 PM, Apr 4, 2018

CS 167 Final Exam Solutions

Lecture 8: Iterators and More Mutation

EXCEL + POWERPOINT. Analyzing, Visualizing, and Presenting Data-Rich Insights to Any Audience KNACK TRAINING

Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi.

Lecture 12: Dynamic Programming Part 1 10:00 AM, Feb 21, 2018

Unit 9 Tech savvy? Tech support. 1 I have no idea why... Lesson A. A Unscramble the questions. Do you know which battery I should buy?

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Heap Management. Heap Allocation

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Homework 4: Hash Tables Due: 5:00 PM, Mar 9, 2018

(Provisional) Lecture 08: List recursion and recursive diagrams 10:00 AM, Sep 22, 2017

Total Score /15 /20 /30 /10 /5 /20 Grader

Intro. Scheme Basics. scm> 5 5. scm>

A few important patterns and their connections

Plan. A few important patterns and their connections. Singleton. Singleton: class diagram. Singleton Factory method Facade

Chapter 3 Memory Management: Virtual Memory

1.7 Limit of a Function

CS 241 Honors Memory

Lecture 1: Overview

Data Structure Series

Crash Course in Modernization. A whitepaper from mrc

XP: Backup Your Important Files for Safety

Lecture 27: (PROVISIONAL) Insertion Sort, Selection Sort, and Merge Sort 10:00 AM, Nov 8, 2017

Ext3/4 file systems. Don Porter CSE 506

1 Dynamic Programming

(Python) Chapter 3: Repetition

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

CS 31: Intro to Systems Virtual Memory. Kevin Webb Swarthmore College November 15, 2018

Introduction. In this preliminary chapter, we introduce a couple of topics we ll be using DEVELOPING CLASSES

CS1 Lecture 3 Jan. 22, 2018

In fact, as your program grows, you might imagine it organized by class and superclass, creating a kind of giant tree structure. At the base is the

CS162 Operating Systems and Systems Programming Lecture 11 Page Allocation and Replacement"

CS 4349 Lecture October 18th, 2017

Computer Science 210 Data Structures Siena College Fall Topic Notes: Recursive Methods

COPYRIGHTED MATERIAL. An Introduction to Computers That Will Actually Help You in Life. Chapter 1. Memory: Not Exactly 0s and 1s. Memory Organization

Thread Safety. Review. Today o Confinement o Threadsafe datatypes Required reading. Concurrency Wrapper Collections

Lecture Notes on Garbage Collection

Memory Allocation. Static Allocation. Dynamic Allocation. Dynamic Storage Allocation. CS 414: Operating Systems Spring 2008

CS 4120 Lecture 37 Memory Management 28 November 2011 Lecturer: Andrew Myers

CS 31: Introduction to Computer Systems : Threads & Synchronization April 16-18, 2019


Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23

Week - 01 Lecture - 04 Downloading and installing Python

Animations involving numbers

Adding content to your Blackboard 9.1 class

How to Get Your Inbox to Zero Every Day

Reference Counting. Reference counting: a way to know whether a record has other users

CSCI341. Lecture 22, MIPS Programming: Directives, Linkers, Loaders, Memory

Advanced Programming & C++ Language

Reliable programming

CS558 Programming Languages

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 15: Caching: Demand Paged Virtual Memory

Last week: Breadth-First Search

Engine Support System. asyrani.com

LOOPS. Repetition using the while statement

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

CS1 Lecture 3 Jan. 18, 2019

It s possible to get your inbox to zero and keep it there, even if you get hundreds of s a day.

Text Input and Conditionals

Rescuing Lost Files from CDs and DVDs

Racket Style Guide Fall 2017

CS 553 Compiler Construction Fall 2006 Project #4 Garbage Collection Due November 27, 2005

Speed Up Windows by Disabling Startup Programs

CS143 Final Spring 2016

ò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now

CS1 Lecture 5 Jan. 26, 2018

Lecture Notes on Advanced Garbage Collection

CS2 Algorithms and Data Structures Note 10. Depth-First Search and Topological Sorting

Lecture 14: Cache & Virtual Memory

(Refer Slide Time: 02.06)

CS558 Programming Languages

(Refer Slide Time: 01:25)

COSC 2P91. Bringing it all together... Week 4b. Brock University. Brock University (Week 4b) Bringing it all together... 1 / 22

(Provisional) Lecture 20: OCaml Fun!

How to fill out forms, forms, & add your own pictures / notes to forms (Mobile 2.0)

the NXT-G programming environment

Transcription:

CS18 Integrated Introduction to Computer Science Fisler, Nelson Lecture 22: Garbage Collection 10:00 AM, Mar 16, 2018 Contents 1 What is Garbage Collection? 1 1.1 A Concrete Example.................................... 1 1.2 A Second Example..................................... 2 1.3 Defining Garbage Formally................................ 3 2 How Does Garbage Collection Work? 3 2.1 Mark-and-sweep Collection................................ 4 3 How Much Does Mark-and-Sweep Cost? 4 4 Creating More Garbage 5 Objectives By the end of these notes, you will know: ˆ What garbage collection is and how it works By the end of these notes, you will be able to: ˆ Organize your code to enable more garbage to be collected 1 What is Garbage Collection? As programs run, they allocate objects on the heap. Over time, memory can fill up with objects that programs are no longer using. Thie becomes a problem if memory gets sufficiently full that new data cannot be created because there is no place to put it. When memory starts to get full, the system that runs programs either has to find some unused memory or halt the program. Garbage collection is the process of finding objects (or any data, if you are in a non-oo language) that aren t being used and freeing up the memory they occupied for allocating new objects. In most modern languages, the execution system performs garbage collection automatically, though there are still some languages in regular use (most notably C) in which programmers must remove garbage themselves. Today, we focus on understanding, and working with, automatic garbage collection.

1.1 A Concrete Example Assume that we have a Book class and a Main class that creates a couple of Book objects: class Book (val title: String, val author: String) { object Main extends App { new Book("The Hobbit", "Tolkien") val b = new Book("The Cat in the Hat", "Seuss") println(''end of main'') What does the heap look like just before executing the println? It has two Book objects. One of them is associated with a known name, but the other is not. If we wanted to access the fields of the Hobbit object, we wouldn t have a way to do it, because there is no way to get to the object from within the program. That suggests that the Hobbit object is garbage because it isn t associated with a known name. 1.2 A Second Example Let s consider a slightly more complicated example to dig deeper into the nature of garbage: object Library { def ordercopy(b : Book) = { val testcopy = new Book(b.title, b.author) // point A new Book(b.title, b.author) object Main extends App { val b1 = new Book("A Brief History of Time", "Hawking") val holdings = List(b1, Library.orderCopy(b1), Library.orderCopy(new Book("The Martian", "Weir"))) // point B println(holdings) Take a moment and draw the heap and stack at the first time execution reaches Point A. What is garbage at this point? Nothing, because both Book objects are associated with names. Notice that one object has two names (b1 from Main and b from ordercopy). Now, update the heap and stack to how they are when execution has finished the first call to ordercopy (while constructing the list in Main): there are more objects now, but fewer names: 2

Finally, what does memory look like when we reach Point B? There are even more objects, and a list that refers to some objects. The ones with green circles are reachable through names; the ones without green circles are garbage. This example illustrates ways in which garbage arises in programs. At point A, execution is inside the ordercopy function, so the stack (i.e., the known names space) contains names b and testcopy. At point B, the function has finished executing, so neither of those names is still active (on the stack). 1.3 Defining Garbage Formally With these two examples in mind, we can now define when a piece of data is considered garbage: An object or datum on the heap is garbage if there is no way to reach it from the stack. 1 Our examples have illustrated two concrete ways that data become garbage: ˆ they were never given names or stored in other named data within a program ˆ they were referenced solely inside a function which has finished executing Question to think about: Our second example calls ordercopy twice. Does that mean that the testcopy object is only garbage after the second call to testcopy? 1 under the hood, there is actually one more place that can have references to heap data. Computers also store some data in quick-access locations called registers; you ll learn about those if you take cs33. 3

2 How Does Garbage Collection Work? We now know what garbage is, and we have a sense of how to identify it (follow references from the stack). But how does a language implementation track garbage, and more interestingly, how does it take out the trash, as it were? There are several different algorithms for taking out the trash: we ll look at one (called mark and sweep), to give you a sense of what is involved. 2.1 Mark-and-sweep Collection In a mark-and-sweep collector, when the runtime system is out of room for creating new objects, it temporarily stops executing the program, marks all the non-garbage (a.k.a, the live data), then walks across memory looking for garbage (non-marked data). It builds a list of memory locations that can be reused, which gets searched every time you create new data (with new, List, Map, etc). Let s see this in action on our second example program. Assume that execution is at line point B when garbage collection gets triggered. The garbage collector will traverse the stack (it s just a data structure that can be iterated over, after all), marking live data. If it finds a reference to a datum that itself references data, such as a list, it marks those referenced data as well. The garbage collector will then walk all of memory (well, all that is set aside for the heap) to find the memory that can be reused. To see this in action, watch this portion of the lecture capture video. 3 How Much Does Mark-and-Sweep Cost? So how expensive is mark-and-sweep? Well, we have to walk the stack and touch all data that you can reach from the stack to mark it as live. The marking phase potentially reaches every item in memory, so the worst-case time of marking is proportional to the amount of memory that the runtime system uses for the stack and heap combined. The sweeping phase always visits the entire heap, so its runtime is proportional to the amount of memory used for the heap. On a modern machine, that usually amounts to several GB. What were the inventors of mark-and-sweep thinking??? Mark-and-sweep was invented in 1962. Back then, memory was on the order of a couple of hundred kilobytes (roughly the space needed to store your.java files for showdown). Walking all of memory isn t that bad when you don t have a lot of memory. On today s systems, mark-and-sweep is prohibitively expensive. Fortunately, we ve made a lot of progress in 50+ years, and there are much faster algorithms that effectively balance space and time complexity. To learn more, take cs173 (Programming Languages), which devotes a couple of lectures to modern garbage collection algorithms. Modern garbage collection uses sophisticated algorithms to achieve a strong balance between time and space complexity. Garbage collection is efficient and powerful. Unless you are in a highly specialized context with very critical space or time requirements, garbage collection is a good solution to managing allocation and reclaimation of memory. 4

The Cost of Acquisition Despite its time overhead, mark-and-sweep is also good for illustrating another very important point about garbage collection (which is also true in modern systems). We don t only incur costs of collecting garbage the runtime also has to spend effort finding an open memory location when you create new data. Remember that mark-and-sweep created this list of open memory locations? The runtime walks this list to find a space that is large enough for the datum you are trying to store. The upshot is that you don t just pay to take out garbage, you also incur costs to acquire new data (a point reiterated in many self-improvement articles, but that s another issue...). Summary The core idea of garbage collection is straightforward: when an object on the heap can no longer be reached through a name (on the stack), the space for that object can be reused for something else. Garbage collection and memory allocation go hand in hand, as the allocation of space needs to know what space is free, and the garbage collector determines which spaces are free. Modern collectors do NOT walk all of memory as mark-and-sweep does. We showed that algorithm because it is a simple initial presentation of the interplay between collection and allocation though. 4 Creating More Garbage Understanding how data becomes garbage is critical to you as a programmer, because the way you structure your code determines when a datum becomes garbage. If you are working with limited memory or large data, this can be the difference between your program finishing execution or crashing with an out-of-memory exception. So now, let s turn to the programmer side of things. Here is a sample program for managing catalogs of items (this is for an online store, but it could just as well be a music library, a collection of webpages, etc). This program reads a catalog in from a csv file, filters out a subset of the catalog, then performs multiple computations to analyze the subset. After filtering out the subset, the program doesn t consult the complete catalog again. Look at the code in figure 1. Ask yourself: ˆ When do each of the inventory and subset of games become garbage according to our technical definition of garbage? ˆ When are each of the inventory and subset of games effectively garbage relative to the computation we are trying to perform (meaning the point at which we know we won t use it again)? To program well against garbage collection, you want the answers to these two questions to align for large pieces of data (like the inventory). As this code is written, the inventory never becomes garbage in the eyes of Scala, even though it is effectively garbage after the games list is created. Could we have written it differently to align these two answers better? Think back to earlier in the lecture: what are ways that programs create garbage? With our second example, we saw garbage created when we exited the ordercopy function. Can we move the inventory creation into a function, so that it becomes garbage when the function finishes? We already create the inventory inside the readcatalog function, so simply creating data inside a function isn t the whole picture. We have to finish using large data inside a function to get it to appear as garbage to the runtime system. 5

import scala.io.source class Item(descr : String, format : String, timesordered : Integer) { def informat(f: String): Boolean = { this.format == f override def tostring(): String = { "Item[" + descr + ", " + format + ", " + timesordered + "]" class Catalog { // turn a three - element csv row (comma - separated) into a string // in an ideal world, this would have error checking that csv row has three elements private def csvstringtoitem(cs : String): Item = { val s = cs.split(",").map(_.trim) new Item(s(0), s(1), s(2).toint) // Convert a catalog stored as a csv file into a list of items // For example, one row of the csv file might contain // Bread, Food, 4 def readcatalog(filename : String): List[Item] = { val bufferedsource = io.source.fromfile(filename) val items = bufferedsource.getlines.map(csvstringtoitem).tolist bufferedsource.close items // summarize info on a collection of items def summarize(items : List[Item]) = { println("the collection has " + items.length + " items") object Main extends App { val c = new Catalog() // read in the catalog (assume it could have millions of entries) val inventory: List[Item] = c.readcatalog("itemdata.csv") c.summarize(inventory) // focus on a subset of the catalog val games = inventory.filter(i => i.informat("game")) c.summarize(games) // now do all the remaining computation with this subset of the catalog // - compute best three - selling games // - compute various other analytics on games println("catalog Analysis Complete") Figure 1: An initial program to analyze a subset of a catalog 6

// one function to perform the read and the filter def readcatalogset(filename : String, format : String): List[Item] = { val inventory = readcatalog(filename) inventory.filter(i => i.informat(format)) With this code organization, which runs the same computations in the same order as our original program, the inventory stops being referenced once readcatalogset finishes. The value returned from readcatalogset is just the filtered list, not the inventory. So inventory isn t held onto in the main program, and hence can be reclaimed as garbage as needed once the list of games has been filtered out. Please let us know if you find any mistakes, inconsistencies, or confusing language in this or any other CS18 document by filling out the anonymous feedback form: http://cs.brown.edu/ courses/cs018/feedback. 7