Worksheet #4. Foundations of Programming Languages, WS 2014/15. December 4, 2014

Similar documents
HOT-Compilation: Garbage Collection

What goes inside when you declare a variable?

Programming Project 4: COOL Code Generation

Programming Assignment IV Due Thursday, November 18th, 2010 at 11:59 PM

Lecture 8 Dynamic Memory Allocation

CS 553 Compiler Construction Fall 2006 Project #4 Garbage Collection Due November 27, 2005

Compiling Techniques

CS61, Fall 2012 Section 2 Notes

Announcements. assign0 due tonight. Labs start this week. No late submissions. Very helpful for assign1

Structure of Programming Languages Lecture 10

In Java we have the keyword null, which is the value of an uninitialized reference type

Spring 2016, Malloc Lab: Writing Dynamic Memory Allocator

Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

INITIALISING POINTER VARIABLES; DYNAMIC VARIABLES; OPERATIONS ON POINTERS

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

CSC C69: OPERATING SYSTEMS

Introduction to Programming Using Java (98-388)

Qualifying Exam in Programming Languages and Compilers

CS 251 Intermediate Programming Java Basics

CSC 1600 Memory Layout for Unix Processes"

Java: framework overview and in-the-small features

CS61C : Machine Structures

C PROGRAMMING LANGUAGE. POINTERS, ARRAYS, OPERATORS AND LOOP. CAAM 519, CHAPTER5

Introduction to C. Sean Ogden. Cornell CS 4411, August 30, Geared toward programmers

18-600: Recitation #3

Intermediate Programming, Spring 2017*

Introduction to C. Ayush Dubey. Cornell CS 4411, August 31, Geared toward programmers

Algorithms & Data Structures

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

CE221 Programming in C++ Part 1 Introduction

CE221 Programming in C++ Part 2 References and Pointers, Arrays and Strings

Programming Assignment 2

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

[0569] p 0318 garbage

CSCI-1200 Data Structures Fall 2017 Lecture 5 Pointers, Arrays, & Pointer Arithmetic

Programming Assignment IV Due Thursday, June 1, 2017 at 11:59pm

Memory Management: The Details

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

last time Assembly part 2 / C part 1 condition codes reminder: quiz

Memory and Addresses. Pointers in C. Memory is just a sequence of byte-sized storage devices.

CS 231 Data Structures and Algorithms, Fall 2016

MPATE-GE 2618: C Programming for Music Technology. Unit 4.1

Class Information ANNOUCEMENTS

Lesson 10A OOP Fundamentals. By John B. Owen All rights reserved 2011, revised 2014

Why Study Assembly Language?

CS 61C: Great Ideas in Computer Architecture C Pointers. Instructors: Vladimir Stojanovic & Nicholas Weaver

Writing a Dynamic Storage Allocator

Garbage Collection. Vyacheslav Egorov

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

Introduce C# as Object Oriented programming language. Explain, tokens,

CprE 288 Introduction to Embedded Systems Exam 1 Review. 1

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Dynamic Data Structures. CSCI 112: Programming in C

Lecture 3: C Programm

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

NOTE: Answer ANY FOUR of the following 6 sections:

Recitation: C Review. TA s 20 Feb 2017

Introduction to C++ with content from

NEXT SET OF SLIDES FROM DENNIS FREY S FALL 2011 CMSC313.

Project 5 - The Meta-Circular Evaluator

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Heap Arrays. Steven R. Bagley

High Performance Computing MPI and C-Language Seminars 2009

Introduction to C. Zhiyuan Teo. Cornell CS 4411, August 26, Geared toward programmers

Lecture 14. No in-class files today. Homework 7 (due on Wednesday) and Project 3 (due in 10 days) posted. Questions?

Pointers (continued), arrays and strings

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

Due: 9 February 2017 at 1159pm (2359, Pacific Standard Time)

Object Oriented Software Design II

Lectures 13 & 14. memory management

Data Representation and Storage. Some definitions (in C)

Outline. Computer programming. Debugging. What is it. Debugging. Hints. Debugging

CS 314 Principles of Programming Languages. Lecture 11

Lectures 5-6: Introduction to C

Pointers and Arrays CS 201. This slide set covers pointers and arrays in C++. You should read Chapter 8 from your Deitel & Deitel book.

Project #1 rev 2 Computer Science 2334 Fall 2013 This project is individual work. Each student must complete this assignment independently.

CS 240 Final Exam Review

Name: CIS 341 Final Examination 10 December 2008

APS105. Malloc and 2D Arrays. Textbook Chapters 6.4, Datatype Size

Chapter 1 Getting Started

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Common Misunderstandings from Exam 1 Material

Introduction to C. Sami Ilvonen Petri Nikunen. Oct 6 8, CSC IT Center for Science Ltd, Espoo. int **b1, **b2;

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.

CSE 361S Intro to Systems Software Final Project

Programming refresher and intro to C programming

So far, system calls have had easy syntax. Integer, character string, and structure arguments.

Pointer Casts and Data Accesses

Final CSE 131B Spring 2004

Computer Science E-119 Fall Problem Set 4. Due prior to lecture on Wednesday, November 28

Index. object lifetimes, and ownership, use after change by an alias errors, use after drop errors, BTreeMap, 309

Introduction to Programming (Java) 2/12

ECE454, Fall 2014 Homework3: Dynamic Memory Allocation Assigned: Oct 9th, Due: Nov 6th, 11:59PM

Chapter 2 Basic Elements of C++

Dynamic Data Structures (II)

CSE 303: Concepts and Tools for Software Development

Lecture 03 Bits, Bytes and Data Types

CS143 Final Spring 2016

2. Reachability in garbage collection is just an approximation of garbage.

Page 1. Where Have We Been? Chapter 2 Representing and Manipulating Information. Why Don t Computers Use Base 10?

Transcription:

Worksheet #4 Foundations of Programming Languages, WS 2014/15 December 4, 2014 In this exercise we will re-examine the techniques used for automatic memory management and implement a Cheney-style garbage collector for AttoVM. 0.1 Collaboration and Submission The maximum amount of points you can get for this exercise is 100; if your points for individual tasks sum up to a greater amount, your score will be capped at 100. You can omit some questions and still get full credit. Specifically, sections labelled as [warm-up] don t give you any credit and don t need answering, though they are strongly recommended as means to prepare you for the actual worksheet challenges. Also, questions marked with a do not need answering. Some questions ask you for code changes. Make sure to submit code changes for different sections separately. Other questions ask you for textual answers. Put all of your textual answers into a single text file (ANSWERS). Submit this one SEPARATELY from your source code. You may work alone or as a team of two. If you are working in a team, make sure to write down the name of your partner in the ANSWERS file. For this question and the following, put all of your textual answers into a separate file. Submit this file separately from your code and do not commit it to the revision control system. Initial submissions are due at 20:00 on Wednesday, December 10th. This is for code only. Your solution need not be complete, but it should show progress. After your submission, we will send you two (partial) solutions from other students to look at. You can write feedback to those solutions, which we will distribute back to your peers. The TA will discuss the details of this process with you. Final submissions are due at 20:00 on Wednesday, December 17th. This is for code and the ANSWERS file. If you changed your code after the initial submission, you must explain the reasoning behind your changes in the ANSWERS file. You can cite code that you reviewed, feedback that you got during the review process, additional tests that you ran, or suitable documentation as sources. You must give a technical explanation for why you chose to change your code, in your own words (e.g., I had to add a separate check here to catch whether $v0 overflows during the addition, since that is what the specification asks for. In that case, my code jumps to label foo, which sets $v0 to zero. ) 1

0.2 AttoVM installation Your copy of AttoVM is again stored in a git revision control system. The repositories for the exercises will be: git@sepl.cs.uni-frankfurt.de:mps/teamx/3-1 git@sepl.cs.uni-frankfurt.de:mps/teamx/3-2 git@sepl.cs.uni-frankfurt.de:mps/teamx/3-3 (substitute your own team name for teamx.) This copy of AttoVM has been extended to generate helper information such as stack maps to help you implement garbage collection. Appendix A describes helpful information you may need for this exercise or point you to the relevant modules. To simplify your implementation, we have already provided a skeleton implementation for you to use. You MAY use alternative implementation strategies, as long as you implement a fully oprerational Cheney-style copying garbage collector. This exercise makes heavy use of pointers. You may find it helpful to review the semantics of pointers in Appendix B. Valgrind Use valgrind to check your code for memory errors. When handling raw pointers, it is easy to cause a segmentation fault; valgrind will not only give you meaningful error messages but also help you detect memory errors early. 1 Finding the Root Set [30 points] Copying collection consists of several steps, as discussed in the slides and lecture module on Cheney s Copying Collector. We begin with the search for the root set. AttoVM has a number of useful properties: No heap references are kept in registers during heap allocation. No part of the run-time system may maintain a reference to any object on the AttoVM heap. Thus, the root set consists exclusively of references on the stack and in the global variables. a. To get started, compile AttoVM and download the benchmark program suite from http: //www.sepl.cs.uni-frankfurt.de/2014-ws/m-ps/heap-samples.tar.gz. Each of the atl programs in the archive requires more RAM than AttoVM pre-allocates. Run each program to make sure that AttoVM crashes with an out of memory message. b. In this exercise, you will be modifying heap.c exclusively. (You may modify other parts of the system if this helps you, e.g., to print out debug messages, but it is definitely not necesary.) Locate the code that prints the offending error message in heap.c. c. Take a moment to familiarise yourself with the operations in heap.c and heap.h. The purpose of this module is to facilitate dynamic heap allocation. heap init() and heap free() initialise and free the heap, respectively. Heap allocation uses special operating system calls that map heap memory to a particular memory address (0x10000000000 and above). The allocated memory is then split in half, with each half assigned to a semi-space. Appendix A.2 explains semi-spaces in more detail. d. For testing, set up function gc move() so that it prints out something. (As a start, just a simple string might suffice, but later you may find it helpful to print out the parameter address; e.g., printf("move(%p) points to %p\n", memref, *memref).) 2

e. Implement gc rootset static(): it should iterate over all global variables and call gc move() to relocate all variables that are of type TYPE OBJ (of course, right now gc move() will only print out debug information). You can use information from the variable img to help you find the globals. Specifically, img is the runtime image, containing all information needed to start up the compiled program. When garbage collection is invoked you are of course already within that compiled program, but most of the information is still accurate. Check runtime.h for a full description. For this exercise, you will only need: img->globals nr (number of global variables) img->globals (symbols for each global variable) img->static memory (memory containing the global variables) The symbols in the int array img->globals reference the symbol table. Using symtab lookup() (symbol-table.h) you can look up each symbol s definition. You can then use SYMTAB TYPE() to check whether the symbol has type TYPE OBJ. If so, the corresponding global variable contains either an object or NULL, and if it contains an object, the object must be moved and the global variable updated to point to the object s new location. Make the system invoke your gc rootset static(), and make sure that the output matches your expectations. f. Implement gc rootset stack(). For this operation, you may find it helpful to recall calling conventions on x86-64, especially Section 4.3 in http://www.sepl.cs.uni-frankfurt.de/ 2014-ws/m-ps/asm-docs.pdf. Note that frame pointers always point to backups of their parent frame pointers, so they form a linked list in RAM. You may find it helpful to break down the process into the following steps: (i) Find all stack frames (activation records) from the stack frame that invoked garbage collection down to heap root frame pointer, which is the frame pointer of the loader program that started AttoVM execution. (ii) For each stack frame, determine the subroutine that called this function and obtain its stack map. You can use stackmap get() from stackmap.h to access AttoVM s pre-computed stack maps. Note that stackmap get() s first parameter takes a return address (not a frame pointer). Hint: Since stackmap get() gives you the stack map for the code containing the return address, make sure that you associate its stack map with the correct stack frame! Make the system invoke your gc rootset stack(). g. Validate that gc rootset stack() produces the expected output. In particular, most of your calls to stackmap get() should be successful (exceptions being non-attol stack frames, such as the loader or the heap allocation function), and all addresses that the stack frame indicates to be objects should be either NULL or point into from-space. 2 Move [30 points] a. Implement the gc move() operation, as described in the slides for Cheney s collector. Note the following resources: object size() in heap.c memcpy(d, s, n) copies n bytes from the address that s points to to the address that d points to. Hint: heap.c already contains some useful helper functions. You may want to search for functions that contain the term forwarding pointer. 3

b. Use valgrind to ensure that your code does not introduce any memory errors yet. 3 Copying Collection [30 points] a. Implement the DoScan operation from the slides as the gc scan() operation in heap.c. To determine the layout of an object in memory (cf. object t in object.h) you can use the object maps stored in their type descriptor (Appendix A.4). b. Confirm that your extended version of AttoVM is able finish executing all benchmarks in the archive. A AttoVM Background A.1 Variables, Memory and AttoVM In most VMs, fields (on the heap), stack-dynamic variables and static globals can be one of two things: values, such as integers or floating point numbers, and objects, meaning that they are either NULL or point to a valid heap address. The heap memory that objects point to follows the usual idea of a homogeneous memory layout (Appendix A.4). In your version of AttoVM, objects are identified by the type TYPE OBJ. All other fields, globals, local variables etc. are integers (type TYPE INT). Arrays, strings, and other objects are stored as objects. Both objects and integers are stored in 64 bit words (8 bytes). A.2 Semispaces Cheney-style copying garbage collection splits the heap into two two semi-spaces, called to-space (the space that we allocate to) from-space (the space that we copy from during garbage collection). AttoVM currently allocates memory by increasing the heap free pointer. The heap free pointer always points between the beginning and end of the to-space, and as soon as it hits the end of to-space, garbage collection is triggered. A.3 AttoVM Bit Vectors AttoVM bit vectors store a sequence of bits (0 or 1). Their API is described in bitvector.h. For this exercise, you will only need to use two operations: bitvector size(bitvector), which returns the number of bits stored in the bit vector, and The is-set check, as in BITVECTOR IS SET(bitvector, bitnr) which returns zero if the bit is not set and one otherwise. 4

a a[0] a[1] a[2] a[3] one byte a0p a1p a2p sizeof(int) * 2 cp cp3 sizeof(char) * 3 int a[4] =...; int *a0p = &a[0]; int *a1p = a0p + 1; int *a2p = a0p + 2; char *cp = (char *)a2p; char *cp3 = cp + 3; Figure 1: Example of arrays, pointers, and pointer arithmetic in C. The grid represents individual bytes. Assumptions: sizeof(int) = 4, sizeof(char) = 1. A.4 AttoVM Objects and Object Maps AttoVM objects fall into two categories: Regular objects (most objects) Irregular objects (only arrays and strings). Both regular and irregular objects follow the following layout: class t classref: Dynamic type descriptor (8 bytes) field[0] First field, if allocated (8 bytes) field[1] Second field, if allocated (8 bytes)... The dynamic type descriptor classref is described in class.h. It maintains a hashtable for selector lookups, the virtual method table, object map. For regular objects, this object map describes for each field whether that field stores objects (1) or not (0). The object map is again a bitvector (see above). For irregular objects, the fields follow the follwing rules: strings: Strings contain no sub-objects. Their first field contains the number of characters; the string body is encoded in subsequent fields. Given the string length, we can compute how much space it uses up in memory (cf. object size). arrays: The first field (index 0) of an array stores how many entries there are in the array. All subsequent fields store those exact entries. All array entries are objects 1. B Pointers Garbage collection makes heavy use of pointers. Keeping pointers and pointees apart can take some practice, and even experienced programmers occasionally mix up the various levels of abstraction involved in pointer handling. Thus, take care to think about what pointers you are dealing with. Recall the C primitives: int* p; declares p to have the type of a pointer to an int. This means that the variable p has one binding (the storage binding) that is able to store arbitrary memory addresses, and another binding (the value binding) that represents the current address stored in p. 1 This is actually configurable, but for this exercise we assume the default, i.e., objects. 5

In C, assigning to or reading from p accesses its storage binding. However, there are also ways to access the value binding, described below. *p accesses a pointer s value binding. If p is int*, then *p is an int. We can assign to and read from *p, thereby accessing the variable that p points to. Note that pointers may be pointers to pointers etc., in which case expressions such as **p may arise. Let int i. Then &i is of type int * and represents the address at which the variable i is stored in memory. For example, if p is int*, as above, then &p is of type int**. The actual in-memory size of a C data type τ can be computed by calling sizeof(τ). On x86-64 machines, we have sizeof(τ *) = 8 for any pointer and sizeof(char) = 1 or sizeof(unsigned char) = 1. For that reason, unsigned char * and char * are often used as pointers if we want to operate in a byte-wise fashion. We can cast between pointers freely. If you have a void *z and want to read a byte from it, you can write the following: *(unsigned char *)z Pointer arithmetic takes place when you have a number and add it to a pointer. In that case, the memory address changes by the number multiplied by the size of the object it is pointing at. So if (void **p) points to 0x1000, then (p + 1) points to 0x1008, but if (char *q) points to 0x1000, then (p + 1) points to 0x1001. Figure 1 illustrates a brief C example and the state of all pointers at the end. You can find tutorials on C pointers in various places 2, including a discussion in Kernighan & Ritchie s language manual. 2 such as http://pw1.netcom.com/ tjensen/ptr/pointers.htm, especially Chapters 1 and 5. 6