Project 2. Assigned: 02/20/2015 Due Date: 03/06/2015

Similar documents
Writing a Dynamic Storage Allocator

ECE454, Fall 2014 Homework3: Dynamic Memory Allocation Assigned: Oct 9th, Due: Nov 6th, 11:59PM

CS 105 Malloc Lab: Writing a Dynamic Storage Allocator See Web page for due date

CSE 361 Fall 2017 Malloc Lab: Writing a Dynamic Storage Allocator Assigned: Monday Nov. 13, Due: Monday Dec. 05, 11:59PM

Spring 2016, Malloc Lab: Writing Dynamic Memory Allocator

CSE 351, Spring 2010 Lab 7: Writing a Dynamic Storage Allocator Due: Thursday May 27, 11:59PM

CS 213, Fall 2002 Malloc Lab: Writing a Debugging Dynamic Storage Allocator Assigned: Fri Nov. 1, Due: Tuesday Nov. 19, 11:59PM

1 Introduction. 2 Logistics. 3 Hand Out Instructions

COMP 321: Introduction to Computer Systems

CS 3214, Fall 2015 Malloc Lab: Writing a Dynamic Storage Allocator Due date: November 16, 2014, 11:59pm

CSCI 2021, Fall 2018 Malloc Lab: Writing a Dynamic Storage Allocator Assigned: Monday November 5th Due: Monday November 19th, 11:55PM

CSE 361S Intro to Systems Software Final Project

CS 105 Malloc Lab: Writing a Dynamic Storage Allocator See Web page for due date

CS201: Lab #4 Writing a Dynamic Storage Allocator

CSCI0330 Intro Computer Systems Doeppner. Project Malloc. Due: 11/28/18. 1 Introduction 1. 2 Assignment Specification Support Routines 4

Project 4: Implementing Malloc Introduction & Problem statement

HW 3: Malloc CS 162. Due: Monday, March 28, 2016

Parallel storage allocator

Project 3a: Malloc and Free

Project 2 Overview: Part A: User space memory allocation

Assignment 5. CS/ECE 354 Spring 2016 DUE: April 22nd (Friday) at 9 am

My malloc: mylloc and mhysa. Johan Montelius HT2016

Lab 1: Dynamic Memory: Heap Manager

The Art and Science of Memory Allocation

Recitation #11 Malloc Lab. November 7th, 2017

Recitation #12 Malloc Lab - Part 2. November 14th, 2017

Memory management. Johan Montelius KTH

In Java we have the keyword null, which is the value of an uninitialized reference type

ANITA S SUPER AWESOME RECITATION SLIDES

Advanced Memory Allocation

1. Overview This project will help you understand address spaces and virtual memory management.

CSE351 Winter 2016, Final Examination March 16, 2016

Copyright 2013 Thomas W. Doeppner. IX 1

Malloc Lab & Midterm Solutions. Recitation 11: Tuesday: 11/08/2016

Allocating memory in a lock-free manner

Memory Allocation. Copyright : University of Illinois CS 241 Staff 1

C: Pointers. C: Pointers. Department of Computer Science College of Engineering Boise State University. September 11, /21

Dynamic Memory Allocation

SmartHeap for Multi-Core

10.1. CS356 Unit 10. Memory Allocation & Heap Management

Recitation 10: Malloc Lab

Programming Tips for CS758/858

CS 322 Operating Systems Programming Assignment 4 Writing a memory manager Due: April 5, 11:30 PM

CS 137 Part 5. Pointers, Arrays, Malloc, Variable Sized Arrays, Vectors. October 25th, 2017

Princeton University Computer Science 217: Introduction to Programming Systems. Dynamic Memory Management

Machine Problem 1: A Simple Memory Allocator

Recitation 11: More Malloc Lab

Carnegie Mellon. Malloc Boot Camp. Stan, Nikhil, Kim

o Code, executable, and process o Main memory vs. virtual memory

CS 322 Operating Systems Programming Assignment 4 Writing a memory manager Due: November 16, 11:30 PM

Today. Dynamic Memory Allocation: Basic Concepts. Dynamic Memory Allocation. Dynamic Memory Allocation. malloc Example. The malloc Package

CS 429H, Spring 2012 Optimizing the Performance of a Pipelined Processor Assigned: March 26, Due: April 19, 11:59PM

Heap Management portion of the store lives indefinitely until the program explicitly deletes it C++ and Java new Such objects are stored on a heap

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

Dynamic Memory Allocation I Nov 5, 2002

Hoard: A Fast, Scalable, and Memory-Efficient Allocator for Shared-Memory Multiprocessors

CS61C : Machine Structures

Project 0: Implementing a Hash Table

Dynamic Memory Allocation. Gerson Robboy Portland State University. class20.ppt

CSE351 Spring 2018, Final Exam June 6, 2018

Secure Software Programming and Vulnerability Analysis

Project #1: Tracing, System Calls, and Processes

HOT-Compilation: Garbage Collection

18-447: Computer Architecture Lecture 16: Virtual Memory

Lecture 8 Dynamic Memory Allocation

CSCI-UA /2. Computer Systems Organization Lecture 19: Dynamic Memory Allocation: Basics

PROJECT 2 - MEMORY ALLOCATOR Computer Systems Principles. October 1, 2010

Directory. File. Chunk. Disk

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

Reminder: compiling & linking

Introduction to Computer Systems /18 243, fall th Lecture, Oct. 22 th

CMSC 341 Lecture 2 Dynamic Memory and Pointers

Recall: Address Space Map. 13: Memory Management. Let s be reasonable. Processes Address Space. Send it to disk. Freeing up System Memory

Week 5, continued. This is CS50. Harvard University. Fall Cheng Gong

Programming with MPI

HW1 due Monday by 9:30am Assignment online, submission details to come

SYSTEM CALL IMPLEMENTATION. CS124 Operating Systems Fall , Lecture 14

MEMORY MANAGEMENT UNITS

Segmentation. Multiple Segments. Lecture Notes Week 6

Heap Management. Heap Allocation

Pointers in C/C++ 1 Memory Addresses 2

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2017 Lecture 7

Dynamic Memory Allocation I

Pebbles Kernel Specification September 26, 2004

Operating Systems, Assignment 2 Threads and Synchronization

CSC/ECE 506: Computer Architecture and Multiprocessing Program 3: Simulating DSM Coherence Due: Tuesday, Nov 22, 2016

COMP 321: Introduction to Computer Systems

Creating a String Data Type in C

UW CSE 351, Winter 2013 Final Exam

A Comprehensive Complexity Analysis of User-level Memory Allocator Algorithms

CS Programming In C

377 Student Guide to C++

Memory Allocation II. CSE 351 Autumn Instructor: Justin Hsia

Run-Time Environments/Garbage Collection

CS-537: Midterm Exam (Fall 2013) Professor McFlub

Process s Address Space. Dynamic Memory. Backing the Heap. Dynamic memory allocation 3/29/2013. When a process starts the heap is empty

Princeton University. Computer Science 217: Introduction to Programming Systems. Dynamic Memory Management

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

Optimizing Dynamic Memory Management

CS 11 C track: lecture 5

Transcription:

CSE 539 Project 2 Assigned: 02/20/2015 Due Date: 03/06/2015 Building a thread-safe memory allocator In this project, you will implement a thread-safe malloc library. The provided codebase includes a simple implementation of a serial malloc library (i.e., NOT thread safe). Your job is to make it thread safe and possibly optimize it so that it has a better performance (in terms of both utilization and speed). Remember that correctness comes first, then performance. Thus, even though this document describes the provided infrastructure on how to test your serial memory allocator first, then parallel memory allocator, it doesn t necessarily mean that you want work on the project in that order; it highly depends on what you plan to do and how you structure your code. Please read through the entire document carefully before you start. The information in this document will help you navigate the codebase. If you organize your code well, you can likely make it thread safe first, and then optimize the baseline malloc implementation without too much disruption to the thread-safe part. Logistics for this project As in project1, you will be obtaining a copy of the base code via your svn repository (if you svn up, you should see a proj2 directory). You are encouraged to work in pairs for this project (though you are not required to). If you decide to work in a group of two, you just need to turn in one copy of the code please designate one person s svn repository for turning in code. You will still have to turn in separate writeups, however. Since this project is due on Friday, you will turn in your writeup by committing the pdf file of your writeup in your own repository (even if you are in a group of two and your repo is not the designated code repo). The due time will be before midnight, at 11 : 59 : 59 pm. Heap memory allocator interface Your dynamic storage allocator will consist of the following four functions, which (among other functions) are declared in allocator interface.h and should be defined in allocator.cpp. The allocator interface.h and allocator.cpp are encapsulated in the namespace called my to prevent name collision with the malloc library in libc. 1

int allocator::init(void); Before calling any functions relating to memory allocations, an application we use to evaluate your implementation will call allocator::init. You may use this function to perform any necessary initialization of your library, such as allocating the initial heap area. The return value should be 1 if there was a problem in performing the initialization and 0 if everything went smoothly. void* allocator::malloc(size t size); This call must return a pointer to a contiguous block of newly allocated memory which is at least size bytes long. This entire block must lie within the heap region and must not overlap any other currently allocated chunk. The pointers returned by allocator::malloc must always be aligned to 8-byte boundaries and within the heap boundary (i.e., between values returned by mem heap lo() and mem heap hi() from memlib.c); you ll notice that the libc implementation of malloc does the same. If the requested size is zero or an error occurs and the requested block cannot be allocated, a NULL pointer must be returned. void allocator::free(void* ptr); This call notifies your storage allocator that a currently allocated block of memory should be deallocated. The argument must be a pointer previously returned by allocator::malloc or allocator::realloc, and not previously freed. You are not required to detect or handle either of these error cases. However, you should handle freeing a NULL pointer it is defined to have no effect. void* allocator::realloc(void* ptr, size t size); This call returns a pointer to an allocated region, similarly to how allocator::malloc behaves. There are two special cases you should be aware of. If ptr is NULL, the call is equivalent to allocator::malloc(size);. If size is equal to zero, the call is equivalent to allocator::free(ptr);. Otherwise, ptr must meet the same constraints as the argument to allocator::free; it must point to a previously allocated block and it must have been previously returned by either allocator::malloc or allocator::realloc. You do not need to defend against frees to invalid pointers. The return value of allocator::realloc must meet all of the same constraints as the return value of allocator::malloc; namely, it be 8-byte aligned, must point to a block of memory of at least size bytes, and within the heap boundary. There is one additional constraint on the behavior of allocator::realloc. Any data in the old block must be copied over to the new block. If the new block is smaller, the old values are truncated; if the new block is larger, the value of each of the bytes at the end of the block is undefined. 2

A naive implementation of allocator::realloc might consist of nothing more than a call to allocator::malloc, a memory copy, and a call to allocator::free. This is, in fact, how the reference implementation works; leaving this solution in place is probably a good way to get started. Once you ve made progress on allocator::malloc and allocator::free, you will want to consider ways of improving the performance of allocator::realloc. All of this behavior matches the semantics of the corresponding libc routines. Type man malloc at the shell to see additional documentation, if you re curious. The allocator.cpp file we have given you currently simply makes calls to a fairly simple and serial (i.e., NOT thread-safe) malloc library that uses a freelist without binning (code in mm-implicit.c). Each block in the freelist has a header and footer to allow coalescing. (A high level description of the implicit free-list implementation can be found here: http://www. cs.cmu.edu/afs/cs/academic/class/15213-f11/www/lectures/18-allocation-basic. pdf). Support routines The code in memlib.c simulates the memory system for your dynamic memory allocator. You can invoke the following functions in memlib.c: void* mem sbrk(int incr); Expands the heap by incr bytes, where incr is a positive non-zero integer and returns a generic pointer to the first byte of the newly allocated heap area. The semantics are identical to the Unix sbrk function, except that mem sbrk accepts only a positive non-zero integer argument. void* mem heap lo(void); Returns a pointer to the first byte in the heap. void* mem heap hi(void); Returns a generic pointer to the last byte in the heap. size t mem heapsize(void); Returns the current size of the heap in bytes. size t mem pagesize(void); Returns the system page size in bytes (4 KB on Linux systems). It is unlikely that you will need this. 3

Improving and testing the serial malloc library implementation You are encouraged to improve the serial implementation of the malloc library. The simple implementation in mm-implicit functions correctly, but it is slow, and the utilization could be improved. We have provided infrastructure for you to improve and test the serial malloc library. The serial implementation can be tested by running the trace-based driver program mdriver. The mdriver tests your allocator.cpp package for correctness, space utilization, and throughput. You should be sure to test that your serial implementation works correctly before you try to make it thread safe; otherwise you will be in debugging hell for the rest of the week. The driver program is controlled by a trace file. Each trace file contains a sequence of allocate, reallocate, and free commands that instruct the driver to call your allocator::malloc, allocator::realloc, and allocator::free routines in some sequence. The driver mdriver accepts the following command line arguments: -t <tracedir>: Look for the default trace files in directory <tracedir> instead of the default directory (../traces). -f <tracefile>: Use one particular tracefile for testing instead of the default set of tracefiles. -h: Print a summary of the command line arguments. -l: Run and measure libc malloc in addition to the students allocator::malloc package. Note that there is no utility measure for the libc runs. -v: Verbose output. Print a performance breakdown for each tracefile in a compact table. -V: More verbose output. Prints additional diagnostic information as each trace file is processed. Useful during debugging for determining which trace file is causing your malloc package to fail. -c: Invoke the allocator::check method after each call to the malloc library. This is extremely useful for debugging. Right now it simply calls the mm checkheap function in mm-implicit.c which checks the heap validity based on how mm-implicit.c manages the heap space. As you modify the malloc library, you should customize the allocator::check to check for the validity of your heap space. 4

Making the malloc library thread safe Even though the malloc library in mm-implicit is a fine starting point, you need to think about how you want to restructure the library so that you can make it thread safe. In particular, mm-implicit uses static variables liberally, and you probably wouldn t want that. It is best to organize the code so that you can easily encapsulate variables for each thread-local heap, and anything that s not thread-local will need to be protected via some form of synchronization. If you add any new *.c or *.cpp files, be sure to edit your Makefile to reflect that (adding the corresponding *.o files under OBJS or the benchmarks won t compile). For testing the thread-safe version of your malloc library, we will use following additional files: benchmarks/ contains benchmark programs. wrapper.[cpp,h] contains wrapper functions for building benchmark programs. You should not modify these files. validate.py validates the correctness and computes a utilization score of an allocator for multi-threaded programs. You should not modify this file. To build the testing benchmarks, you run make benchmark. This command builds 3 versions for each benchmark program. For example, benchmarks/cache-thrash.cpp generates cache-thrash, cache-thrash-validate and cache-thrash-libc. cache-thrash uses your allocator to allocate memory, while cache-thrash-libc uses a standard allocator. cache-thrash-validate will be used with the script validate.py to test the correctness of your allocator. You should not use it for performance testing. Benchmarks for testing the thread-safe library We are using concurrent benchmarks from the paper Hoard: A Scalable Memory Allocator for Multithreaded Applications. [1] cache-thrash tests resilience against active false sharing. Active false sharing occurs when malloc satisfies memory requests by different threads from the same cache line. Parameters: <threads> <inner-loop> <object-size> <iterations>./cache-thrash P 100 8 1000000 cache-scratch tests resilience against passive false sharing. Passive false sharing occurs when free allows a future malloc to produce false sharing. 5

Parameters: <threads> <inner-loop> <object-size> <iterations>./cache-scratch P 100 8 1000000 larson simulates a server: each thread allocates and deallocates objects, and then transfers some objects (randomly selected) to other threads to be freed. This producerconsumer problem might lead to a blowup in the allocator, where some threads keep allocating bigger and bigger heaps. Parameters: <seconds> <min-obj-size> <max-obj-size> <objects> <iterations> <rng seed> <num-threads>./larson 10 7 8 1000 10000 RAND P linux-scalability tests allocator throughput. Parameters: <object-size> <iterations> <number-of-threads>./linux-scalability 8 10000000 P You can also write your own benchmarks. The program should include../wrapper.cpp, call end thread() at the end of each thread, and call end program() at the end of the program. Then, you can add this new program in Makefile. Although we will not run your benchmarks to evaluate your program, they are useful for your regression testing. Correctness tests You can use validate.py to test the correctness of the allocator:./validate.py./cache-scratch-validate 12 100 8 100000 This will print out VALIDATION SUCCESS if there is no error in the program. Note that you need to use a *-validate binary. validate.py runs the program and logs all memory operations to multiple files in tmp/. After that, it reads all logs, verifies that all memory operations are legal, and calculates a space utilization score. There is also a script testbenchmarks.sh already in your directory. It shows a few parameters that you can use for testing your program. During grading, we will run your program with slightly different parameters. Note that right now your program succeeds in some of these runs specified in testbenchmarks.sh, because they are configured to run with single thread or because the thread interleaving behavior doesn t trigger an error. More likely the runs with multiple threads will fail with validation error or segmentation fault. This should not occur once you have a correctly implemented thread-safe library. Hints and tips Use thread for thread-local storage. 6

Use volatile keyword when variables can be changed by itself. Note that volatile is not a memory fence; it s a compiler hint telling the compiler that it should always fetch the value from memory instead of keeping it in a register. Since we haven t covered memory fences and hardware memory model, you probably should not write code that requires memory fences, unless you know what you are doing. Your allocator should allocate memory such that programs run fast. You should think about how the programs access the memory, not just how to allocate memory as fast as possible. Keep in mind that TLB miss can also affect the performance of the program. Rules and reminders You should not change any of the sources in the distribution except for the Makefile, allocator.cpp, validator.h, allocator interface.h and bad allocator.cpp. You are free to add new files and update the Makefile appropriately if you wish. All of the other files will be overwritten with fresh copies during grading You should not invoke any memory-management related library calls or system calls. This excludes the use of malloc, calloc, free, realloc, sbrk, brk, mmap or any variants of these calls in your code. Usage of C++ Standard Template Library containers is NOT allowed (since they will use memory on heap internally bypassing our allocator heap interface). All data structures that allocate memory on heap MUST use our allocator heap interface. Hence, all heap memory space used by your data structures will hence be counted under space utilization. The total size of all defined global and static scalar variables and compound data structures should be small, ideally not exceed 256 bytes per thread, but we will not put a strict upper bound on it. The spirit is that you should also consider space overhead, and any space you use for bookkeeping is counted towards the space utilization. Evaluation You will receive zero points if you break any of the rules or your code is buggy / won t compile. Otherwise, your grade will be calculated based on both correctness and performance (utilization and speed). The library should work correctly for both serial and parallel code (as evaluated by invoking mdriver and the parallel benchmarks. 7

You will get partial credit if you have a correct working version of a thread-safe malloc library that simply uses the mm-implicit implementation protected by locks. The more efficient (both utilization and throughput) your implementation is, the more credit you will get. That said, since this is a project mainly about thread-safe malloc library, the first goal should be to make the malloc library thread-safe. Once that s done, go back to optimize the baseline malloc library that you use to implement the per-thread heap. Writeup In your write-up, which you will turn in via committing a pdf in your svn repo, please include: 1. a description of your strategy for making the library thread-safe, including a justification on why it correctly synchronizes accesses from multiple threads and that it is deadlock free. 2. a description of any changes you made to the serial basecode to speed it up (or a description of the new memory allocator if you re-implemented the serial version). 3. a description of any optimizations that you did for speeding up the parallel malloc library implementation. 4. a description of how you divide up the work if you worked in a group of two. Note that even if you work in a group of two, you still need to turn your own writeup. References [1] Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, Paul R. Wilson, Hoard: a scalable memory allocator for multithreaded applications, ACM SIGPLAN Notices, v.35 n.11, p.117-128, Nov. 2000 8