COSC345 2013 Software Engineering Lecture 16: Managing Memory Managers
Outline Typical problems (from previous lectures) Memory leaks aren t just for (Objective) C Tracking malloc() calls Catching calls to zero length malloc() Eradicating un-initialized memory Bulletproofing Catching leaks Catching buffer over-runs and under-runs Assumption for this lecture: we do not have access to the internals of the memory management routines: malloc(), free(), realloc(), etc.
A memory leak in Java public class Stack<E> { private int capacity, size; private Object[] array; // constructor, size(), push(), &c public E pop() { assert size > 0; Object r = array[--size]; return (E)r;
How does that leak Stack<Document> s = new Stack<Document>; s.push(a_large_document); s.pop(); // Now you think the document has gone, and // you certainly can t get it back, but // s.array[0] still points to it, so // the Java garbage collector will not reclaim it.
How do you stop that? public E pop() { assert size > 0; Object r = array[size--]; array[size] = null; return (E)r; Vector.setSize, removeelementat, removeallelements, remove, clear; ArrayList.remove, clear; all take care to do this.
Typical Problems in C Allocate zero length blocks Allocate a block and use its uninitialized contents Free a block then use it Call realloc() then use the old pointer Allocate a block then lose the pointer to it Read and write beyond the boundaries of a block Fail to notice any of these
malloc() Tracking Put a wrapper around malloc() aspt_malloc.h #ifdef MALLOC_DEBUG #define aspt_malloc(s) do_aspt_malloc( LINE, FILE, s) extern void *do_aspt_malloc(long, char const *, size_t); #else #define aspt_malloc(s) malloc(s) #endif aspt_malloc.c #include <stdio.h> #include <stdlib.h> void *do_aspt_malloc(long line, char const *file, size_t size) { void *r = malloc(size); printf("%s(%d): log, malloc(%d) => %p\n", file, line, size, r); return r;
main.c #include <stdlib.h> #define MALLOC_DEBUG #include "aspt_malloc.h" malloc() Tracking int main(void) { aspt_malloc(100); return 0; Output main.c(6): log, malloc(100) Why use #ifdef MALLOC_DEBUG?
Zero Length Blocks Zero size is left implementation-defined in the ISO C standards We catch all cases and return NULL (ISO allows) aspt_malloc.c void *do_aspt_malloc(long line, char const *file, size_t size) { if (size == 0) { printf("%s(%d): error, zero length malloc\n", file, line); return NULL; else { #ifdef MALLOC_DEBUG_ALL printf("%s(%d): log, malloc(%d)\n", file, line, size); #endif return malloc(size);
Uninitialized Memory Leads to unpredictable behavior so initialize it! aspt_malloc.c #define BAD_MEM 0xCC void *do_aspt_malloc(long line, char const *file, size_t size) { char *mem; if (size == 0) { printf("%s(%d): error, zero length malloc\n", file, line); return NULL; else { #ifdef MALLOC_DEBUG_ALL printf("%s(%d): log, malloc(%d)\n", file, line, size); #endif mem = malloc(size); if (mem == NULL) { printf("%s(%d): error, malloc failure (NULL returned)\n, file, line); return NULL; return memset(mem, BAD_MEM, size);
BAD_MEM, why 0xCC? S. Maguire, Writing Solid Code Must look like garbage but not be garbage 0x00, 0xFF, 0x01 all look like valid values Maguire recommends 0xCC on PC (Macs too now) If word-alignment is enforced then make it odd p = malloc(sizeof (int)); **p = 5; /* will crash */ As an index into an array it is large (and noticeable) p = malloc(sizeof (int)); a[*p]=5; /* p is very large */ If ever called it will crash p = malloc(sizeof (int)); p(); /* will crash */ Easy to spot long sequences in a debugger Easy to spot the value in a printf() All done to make it crash at the first opportunity Increase instability when the program goes wrong
Bulletproofing Bulletproofing Making a program (module / class / method) robust to bad, incorrect, or unexpected input Bulletproofing malloc() Forbid zero-length calls Forbid the return of un-initalised memory Why should we bulletproof? Why should we not bulletproof? How else can we bulletproof malloc() and free()?
Leak Detection Keep a set (e.g., in a tree) of in-use memory For brevity source code not included here Four routines needed: Add an node to the tree void *aspt_mem_add(void *mem, long size, char const *file, long line) Delete a node from the tree void *aspt_mem_delete(void *mem) Find a node in the tree and return its size long aspt_mem_size(void *mem) List all nodes in the tree void aspt_mem_leaks(void) This set is called the in-use set.
Changes to malloc() void *do_aspt_malloc(long line, char const *file, size_t size) { void *mem; if (size == 0) { printf("%s(%ld): error, zero length malloc\n", file, line); return NULL; else #ifdef MALLOC_DEBUG_ALL printf("%s(%ld): log, malloc(%d)\n", file, line, size); #endif mem = malloc(size); if (mem == NULL) { printf("%s(%ld): error, malloc failure (NULL returned)\n", file, line); return NULL; memset(mem, BAD_MEM, size); if (aspt_mem_add(mem, size, file, line)) { return mem; else { free(mem); return NULL;
Changes to free() From aspt_malloc.h #define aspt_free(p) do_aspt_free( LINE, FILE, p) void do_aspt_free(long line, char const *file, void *p); From aspt_malloc.c void do_aspt_free(long line, char const *file, char *mem) { size_t const size = aspt_mem_size(mem); if (size > 0) { aspt_mem_delete(mem); memset(mem, BAD_MEM, size); free(mem); do_aspt_free() Clears the unused memory (set equal to BAD_MEM) Removes it from the in-use set Can we fit p = NULL into the macro somehow?
From main.c int main(int argc, char **argv) { char *g; int index; int const n = 10; atexit(aspt_mem_leaks); g = aspt_malloc(n); Example for (index = 0; index <= n; index++) g[index] = 0; return 0; Output main.c(7): warning leaked 10 bytes A program cannot possibly tell where it should have been freed, only where it was allocated What is the mistake in this slide?
The Special-Case of realloc() Problem realloc() sometimes moves memory Solution Leads to irreproducible behavior Example: dangling pointers in a tree (node->str) Debug version Always move the block Release version Avoid moving the block if possible Remember to debug both versions Always test both debug and release versions
Uses of the In-Use List At termination of program list all in-use memory These are program leaks atexit(aspt_mem_leaks); Does a pointer point to valid memory? Check the size is greater than zero Validate pointer before calling system routines Prevent memory overwrites Is destination a valid pointer in call to strcpy()? Validate pointer-size before calling system routines Prevent memory overwrites Is destination large enough in call to strncpy()? Pair free() and malloc() calls The allocation location is in the in-use list This technique can be extended to the stack too Requires manual manipulation of stack objects (or does it?)
Write Outside Buffer Write past end of buffer char *mem = aspt_malloc(10); for (index = 0; index <= 10; index++) mem[index] = 0; mem User Memory Write before beginning of buffer char *mem = g = aspt_malloc(10); *mem-- = 0; *mem-- = 0; write write mem User Memory
Catching Outside Buffer Writes In do_aspt_malloc() insert padding at each end Padding base mem User Memory Padding Similar to last lecture Allocate size + 2 * sizeof (Padding) bytes Word align everything correctly Put one at each end of the block Initialize the whole block (BAD_MEM) Return mem + sizeof (Padding) This hides the existence of the padding
Catching Outside Buffer Writes In do_aspt_free() check padding is BAD_MEM Padding mem In do_aspt_free() Check under-writes (underflow) if (!aspt_pad_check(mem sizeof (Padding),sizeof (Padding))) printf("%s(%ld): error, memory under-run\n", file, line); Check over-writes (overflow) if (!aspt_pad_check(mem + aspt_mem_size(mem),sizeof(padding))) printf("%s(%ld): error, memory over-run\n", file, line); aspt_pad_check() long aspt_pad_check(char *where, long size) { char *ch; for (ch = where; ch < where + size; ch++) if (*ch!= (char)bad_mem) return 0; return 1; How (and when) can this check fail? User Memory Padding
More Besides These techniques do not catch everything Dangling pointers to re-allocated memory Memory trashes outside the Padding Thought experiment: is it possible to Check the padding during runtime? Keep a list of all used and unused memory? Verify the heap is OK? Verify the free-list and the in-use list? Verify every single memory access in the program? Set p = NULL after each call to free(p)? How would we do all this?
Conclusions Memory leaks are easily caught Memory over-runs and under-runs are easily caught Memory size mismatches are easily caught Uninitialized memory blocks easily spotted in debugger Always manage the memory manager Wait a second Shouldn t there be a library that does all this?
References S. Maguire, Writing Solid Code, Chapter 3 Electric Fence; see man 3 efence in Linux. Forked as DUMA (Detect Unintended Memory Access) at http://sourceforge.net/projects/duma/ mtrace(3), muntrace(3), malloc_hook(3) in Linux Valgrind, basically emulates your program. It runs on Mac OS X but not ios (although it does handle Android). See valgrind.org http://www.youtube.com/watch?v=lqtpr8bkb3g