Pointer Ownership Model

Similar documents
Pointer Ownership Model

CS201 - Introduction to Programming Glossary By

Short Notes of CS201

Type Checking and Type Equality

CS 241 Honors Memory

In Java we have the keyword null, which is the value of an uninitialized reference type

CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays

Lecture Notes on Memory Layout

Motivation was to facilitate development of systems software, especially OS development.

Data Structure Series

Frequently Asked Questions. AUTOSAR C++14 Coding Guidelines

Limitations of the stack

Run-Time Environments/Garbage Collection

Programming Languages Third Edition. Chapter 10 Control II Procedures and Environments

The issues. Programming in C++ Common storage modes. Static storage in C++ Session 8 Memory Management

Motivation was to facilitate development of systems software, especially OS development.

Pointers, Dynamic Data, and Reference Types

CS 161 Computer Security. Security Throughout the Software Development Process

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

Pointers and References

Lecture Notes on Memory Management

Programming Languages Third Edition. Chapter 7 Basic Semantics

Verification & Validation of Open Source

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Heap Management. Heap Allocation

Compiler Construction

Lecture Notes on Arrays

Dynamic Allocation in C

Lecture Notes on Memory Management

377 Student Guide to C++

CS 536 Introduction to Programming Languages and Compilers Charles N. Fischer Lecture 11

CS 11 C track: lecture 5

A Short Summary of Javali

CS558 Programming Languages

Lecture 3 Notes Arrays

Pointers. A pointer is simply a reference to a variable/object. Compilers automatically generate code to store/retrieve variables from memory

A Capacity: 10 Usage: 4 Data:

Lecture 2, September 4

Last week. Data on the stack is allocated automatically when we do a function call, and removed when we return

Secure Programming I. Steven M. Bellovin September 28,

Guidelines for Writing C Code

Dynamic Allocation in C

Dynamic Memory Management

Homework 3 CS161 Computer Security, Fall 2008 Assigned 10/07/08 Due 10/13/08

Compiler Construction

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

CS 6353 Compiler Construction Project Assignments

Weeks 6&7: Procedures and Parameter Passing

C Review. MaxMSP Developers Workshop Summer 2009 CNMAT

G Programming Languages - Fall 2012

Introduction. Background. Document: WG 14/N1619. Text for comment WFW-1 of N1618

Lecture 8 Dynamic Memory Allocation

nptr = new int; // assigns valid address_of_int value to nptr std::cin >> n; // assigns valid int value to n

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Creating a String Data Type in C

Lectures 5-6: Introduction to C

My malloc: mylloc and mhysa. Johan Montelius HT2016

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

CS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016

CS558 Programming Languages

Kurt Schmidt. October 30, 2018

Understanding Undefined Behavior

CE221 Programming in C++ Part 2 References and Pointers, Arrays and Strings

Lecture 8 Data Structures

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Review of the C Programming Language

Undefined Behaviour in C

(13-2) Dynamic Data Structures I H&K Chapter 13. Instructor - Andrew S. O Fallon CptS 121 (November 17, 2017) Washington State University

Machine-Independent Optimizations

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Buffer overflow background

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

Dynamic Memory Management! Goals of this Lecture!

Single-pass Static Semantic Check for Efficient Translation in YAPL

A brief introduction to C programming for Java programmers

[0569] p 0318 garbage

Lecture 10 Notes Linked Lists

Types. Type checking. Why Do We Need Type Systems? Types and Operations. What is a type? Consensus

QUIZ on Ch.5. Why is it sometimes not a good idea to place the private part of the interface in a header file?

EL2310 Scientific Programming

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

Advances in Programming Languages: Regions

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

EMBEDDED SYSTEMS PROGRAMMING Language Basics

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Wrapping a complex C++ library for Eiffel. FINAL REPORT July 1 st, 2005

C Introduction. Comparison w/ Java, Memory Model, and Pointers

Object-Oriented Programming for Scientific Computing

Summary: Issues / Open Questions:

1/29/2011 AUTO POINTER (AUTO_PTR) INTERMEDIATE SOFTWARE DESIGN SPRING delete ptr might not happen memory leak!

Examining the Code. [Reading assignment: Chapter 6, pp ]

Programming Project 4: COOL Code Generation

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Quiz 0 Review Session. October 13th, 2014

Exception Namespaces C Interoperability Templates. More C++ David Chisnall. March 17, 2011

Programming Assignment IV Due Thursday, November 18th, 2010 at 11:59 PM

Lecture 4 September Required reading materials for this class

ECE 449 OOP and Computer Simulation Lecture 12 Resource Management II

Language Security. Lecture 40

Transcription:

2014 47th Hawaii International Conference on System Science Pointer Ownership Model David Svoboda Carnegie Mellon svoboda@cert.org Lutz Wrage Carnegie Mellon lwrage@sei.cmu.edu Abstract The incorrect use of pointers in the C and C++ programming languages is a common source of bugs and vulnerabilities. Most languages that are newer than C eliminate pointers or severely restrict their capabilities. Nonetheless, many C and C++ programmers work with pointers safely by maintaining a mental model of when memory accessed through pointers should be allocated and subsequently freed. This mental model is seldom documented outside of the evidence of its application in the source code. The Pointer Ownership Model (POM) improves the ability of developers to statically analyze C programs for errors involving dynamic memory. To make a program comply with POM, a developer must identify the program s responsible pointers, whose objects must be explicitly freed before the pointers themselves may be destroyed. Any program that complies with POM can be statically analyzed to ensure that the design is consistent and secure and that the code correctly implements the design. Consequently, POM can be used to diagnose and eliminate many dynamic memory errors from C programs. 1. Introduction Dynamically allocated memory and pointers can be misused in many ways. A program can neglect to free allocated memory (memory leak). Conversely, a program may free allocated memory twice. A program may also try to read or write to memory that it has already freed. Finally, a pointer could be uninitialized or have a null value; dereferencing such a pointer is undefined behavior. [1] The severity of these errors varies from memory leaks to more serious errors, such as double-frees, that can be exploited by attackers to run malicious code. [2] Writing safe programs in C requires the developer to ensure that each dynamically allocated chunk of memory is freed exactly once and never read or written afterwards. To work with pointers safely, many developers employ a policy or model of how to manage dynamic memory. Commonly applied models include garbage collection, Resource Acquisition Is Initialization (RAII), and smart pointers. The chosen model is frequently not documented in the program, because C and C++ lack any mechanism to document these models. The pointer ownership model (POM) adds information to a program, allowing automated analysis tools to statically identify certain classes of errors involving dynamic memory in C programs. The model works by requiring the developer to identify responsible pointers; these are the pointers whose objects must be explicitly freed before the pointers themselves may be destroyed. A program enhanced by POM can be statically analyzed by a verifier to ensure that the program complies with its design and that the design with regard to responsible pointers is correct. Static analysis of programs that comply with POM can identify several types of errors, including memory leaks, double-frees, and null-pointer dereferences. Outof-bounds memory reads and writes are not detectable by POM. Static analysis can identify some, but not all, use-after-free errors. Finally, the model provides an escape clause for the developer: any pointer may be marked as out-of-scope (OOS), which means that no checking is performed on it. This is useful for pointers to dynamic memory that are handled by other ownership models, such as garbage collection. 2. Impact POM aims to improve software security at the coding level, which not only improves software security in general but also improves productivity. It has been shown that eliminating bugs during the coding phase is less expensive than eliminating them during the testing phase or, even worse, after the bugs appear in production code. [2] Because POM could be considered an extension to the C language, it would enhance the expressiveness of the C language to address coding challenges. Also, it would achieve effective evaluation for a degree of memory safety. Ultimately, we believe that adding a small, judicious amount of annotations to a C program enables us to statically determine whether the program is safe with regard to memory management. This means that the program need not be built or run. Such a 978-1-4799-2504-9/14 $31.00 2014 IEEE DOI 10.1109/HICSS.2014.625 5090

strong guarantee will improve the maintainability of the program and consequently speed up software development. 3. Related Work POM is related to and partially inspired by several earlier research projects. Many software tools, both open source and proprietary (including both Fortify SCA and Coverity Prevent), employ pure static analysis to find memory-related errors. These tools suffer from false positives; because they analyze code with no extra annotations or semantic information, they must make heuristic guesses about code correctness, and sometimes they guess wrongly. Many other tools, such as Valgrind, provide dynamic analysis. They do not suffer from false positives, but they do not always find problems. To analyze a program, they require the developer to execute the program, and if the program does not misbehave while being analyzed, dynamic analysis tools may not realize that it might misbehave when given a different set of inputs. Dynamic analysis also impedes performance, so dynamic analysis tools are useless on programs that are being used in production. Consequently, dynamic analyzers are typically used only by QA testers. One possible approach to verifying a POMcompliant program is to add POM verification to a compiler. In this case, the verifier, as part of the compiler, could perform static analysis during compilation. It could also add checks to bits of code that it could not verify statically; these checks would be performed at runtime, so technically, they would be dynamic analysis. The Safe/Secure C and C++ system [3] and AIR Integers, [4] jointly developed by Plum Hall and SEI, [5] both rely primarily on static analysis, with dynamic analysis as a backup. Likewise, the Compiler Enhanced Buffer Overflow Elimination [3] proposal is an ongoing research project at SEI that again relies primarily on static analysis with dynamic analysis as a backup. There are several attempts to improve memory management using models. The most well-known model for pointer management is reference counting, which is implemented in the C++11 standard library as the shared_pointer<> template. Garbage collection is an alternative model, which is used by the Java language, and there are several C implementations; however, garbage collection is not part of standard C or C++. Finally, the standard C++ auto_ptr<> template provides a model similar to our pointer ownership model, but it suffers from a serious technical deficiency. The saga [6] of the auto_ptr serves as an important cautionary tale when considering pointer ownership. In C++, the auto_ptr class represented a pointer that automatically freed its object upon its own destruction. It also supported assignment, with the stipulation that assigning one auto_ptr to another caused the first auto_ptr to be set to a null pointer. This is an unusual form of assignment that prevents perfect duplication of owning pointers. Consequently, the use of auto_ptrs is problematic, especially in containers, because containers assume that their objects are fully copyable, but auto_ptrs do not support this constraint. The common implementation of the quicksort algorithm often fails with vectors containing auto_ptrs, because it typically involves assigning one single auto_ptr as a pivot and then partitioning the vector s remaining elements around the pivot. Today the auto_ptr template has been deprecated in favor of the new unique_ptr<> template, which specifically forbids assignment or copying. We believe that POM can prevent these problems for two reasons. First, unlike auto_ptrs, POM is not required to strictly conform to the C and C++ type system. Assignment can be performed from one responsible pointer to another or from a responsible pointer to an irresponsible pointer based on the discretion of the developer. In contrast, auto_ptrs are constrained by the type system. For example, assigning a variable to an element in a vector of auto_ptrs requires that the variable also be an auto_ptr. If it is not, a temporary auto_ptr is constructed around the variable s type. In either case, the subsequent assignment would cause the vector s corresponding auto_ptr to be set to null. A vector of responsible pointers would appear to the compiler to be no different from a vector of raw pointers. Second, POM adds semantic information to working code but does not modify the code itself. As previously noted, copying a single element from a vector of auto_ptrs causes the vector to be modified (because the corresponding vector item is set to NULL). Under POM, a vector of responsible pointers would not be modified, although the element that was copied might be considered a zombie pointer. A quick-sort algorithm that operates on a vector of responsible pointers could use an irresponsible pointer as a pivot, thus requiring no changes to the quick-sort implementation code while still allowing pointer responsibility to be correctly monitored. Pointer models have been used to address other problems, too. The theory of pointer sharing has been explored by Bierhoff. [7] The CCured system [8] used a model of pointers, partially specified by the developer, to identify insecure pointer arithmetic and catch buffer overflows. Likewise, the Safe/Secure C and C++ 5091

system [3] uses both static and dynamic analysis to guarantee the safety of pointer arithmetic and array I/O. This enables the system to identify safe pointer usage at compile time and to add runtime checks when safety cannot be verified statically. POM is similar in that it performs safety checks at compile time and could add runtime checks when necessary. Unlike traditional static analysis, POM imposes on the developer a requirement of providing some additional semantic information about the pointers used by the program. This extra information makes POM accurate; any error it finds will be a true positive. POM will also be comprehensive, finding all of the problems that are exposed by the model. To be more precise, POM requires developers to specify a model for how memory is managed. Once the model is provided, a tool can determine if the model is incomplete or inconsistent, and it can also determine whether the program complies with the model. 4. Design POM seeks to provide accurate static analysis by using additional semantic information. Some of this information can be inferred from the source code, but some information must be explicitly provided by the developer. This model focuses on pointer types, as defined by C11, [1] as opposed to specific bytes in memory. In particular, the model categorizes pointers into responsible pointers and irresponsible pointers. A responsible pointer is responsible for freeing its pointed-to object, whereas an irresponsible pointer is not responsible for freeing whatever object it points to. Every object on the heap must be referenced by exactly one responsible pointer. Consequently, responsible pointers form trees of heap objects, with the tree roots living outside the heap. Irresponsible pointers may point anywhere but cannot free anything. A pointer variable is always responsible or irresponsible; it cannot change responsibility throughout its lifetime. Irresponsible pointers can be used to point inside heap objects, including structures and arrays. All pointers to nonheap objects, such as FILE pointers and function pointers, are irresponsible. Ownership of an object can be transferred between responsible pointers, but when ownership is transferred, one of the pointers must relinquish responsibility for the object, as only one pointer may be responsible for freeing any object. Any responsible pointer that relinquishes responsibility for an object becomes a zombie pointer, and a POM-compliant program will never use the value of any zombie pointer. Ideally, a responsible pointer s lifetime ends shortly after it becomes a zombie pointer. Responsible pointers do not always point to objects they must free. Any responsible pointer can be in one of four states: good, null, uninitialized, and zombie. A good pointer is one that must be freed. A null pointer has the null value. It may be freed but need not be; the C Standard guarantees that explicitly freeing a null pointer does nothing. An uninitialized pointer is one that does not hold a valid value, often because it has not been set to one. Finally, a zombie pointer is either a formerly good pointer that has relinquished responsibility for an object to another good pointer or a pointer that was passed to free() and therefore no longer points to valid memory. Only good pointers may be dereferenced. Responsible pointers that are good or null are said to be consistent, whereas zombie pointers and uninitialized pointers are inconsistent. POM tool support consists of an advisor and a verifier, as shown in Figure 1. The advisor examines a source code file and builds a model of pointer ownership, possibly with the help of the developer. (A sufficiently advanced developer could bypass the advisor entirely and build the model from scratch, but the purpose of the advisor is to save the developer from this work.) The verifier takes a compilation unit and a model and indicates whether the model is complete and the code complies with it. The verifier can output errors if the model is incomplete (for example, it does not indicate any responsibility for some pointer), if the model is inconsistent (it may say that a certain pointer is both responsible and irresponsible), or if the program violates the model (by trying to free an irresponsible pointer, for example). Figure 1. POM Implementation To analyze a compilation unit, the verifier requires that the model indicate the responsibility status of all pointers referenced by the compilation unit. However, pointers in the program that are not referenced by that compilation unit need not be specified. The model can be built alongside the verification of individual 5092

compilation units. As such, POM performs analysis on the level of the compilation unit rather than whole program analysis. In POM analysis, functions can be identified as producing new responsible pointers or consuming good responsible pointers. The malloc() function is the simplest example of a producer, producing a responsible pointer as its return value. But any function could be recognized to produce a responsible pointer as its return value, and some functions can produce a responsible pointer as an argument. One such example is the getline() function, defined in the GNU C library, whose first argument is a char**. If this argument is a pointer to a null pointer, the function assigns it to point to memory allocated on the heap. The developer is expected to free this pointer later. Likewise, any function could be recognized as consuming an argument, with free() being the simplest example. A function could consume multiple arguments, perhaps by passing each to free(). Some function arguments, such as the first argument to the realloc() function, could be both produced and consumed by the same function. 4.1 Specification All declared pointers can be in one of several subtypes, as listed in Table 1. Table 1 Pointer Subtypes Type Definition s Unknown Not yet assimilated into POM Out of scope (OOS) POM does not apply to this pointer; it is handled some other way Reference counting Garbage collection Circular linked list FILE pointers (could be considered distinct resources under a model similar to POM) Irresponsible Responsible Not responsible for cleaning up memory it points to Responsible for cleaning up memory it points to Any pointer pointing into the stack or data segment Any pointer that gets the result of a call to malloc() These pointer subtypes are subject to the following rules. This list of rules is illustrative rather than exhaustive. A POM-compliant program shall not have any unknown pointers. An OOS pointer shall not be assigned the value of a responsible pointer. An irresponsible pointer should never be a producer argument or producer return value. char* irp = INITIALIZE_IRRESPONSIBLE; irp = malloc(5); // bad, malloc s return is a producer An irresponsible pointer should never by assigned to a responsible pointer. A responsible pointer should never be assigned a value from pointer arithmetic. char *rp1 =...; // responsible char *rp2; // responsible rp1 = rp2 + 1; // bad A responsible pointer can be assigned only the following: null, a producer return value (or be supplied as a producer argument), another responsible pointer. Responsible pointers can be in one or more of the following states: uninitialized, null, good, or zombie. The state may not always be known, in which case the pointer can occupy multiple states, and be resolved later. An uninitialized responsible pointer shall never be consumed or dereferenced. char *rp; // responsible free(rp); // bad A responsible null pointer shall never be dereferenced. A good pointer shall never be overwritten. char *rp = new char[5]; // responsible rp = new char[7]; // bad 5093

A responsible pointer shall never go out of scope while good. { char *rp = malloc(5); // consistent // bad, memory might be leaked struct foo_s { char* c; // responsible *s; s = new struct foo_s; s->c = new char[3]; //... delete s; // bad, s->c's memory lost A zombie pointer s value shall never be dereferenced, read or consumed. char *rp = malloc( 5); // consistent free(rp); // zombie free(rp); // bad (double free) 4.1.1 Responsible pointer state changes. The subtype of a pointer variable never changes: once a responsible pointer, always a responsible pointer. However, the state of responsible pointers can change in the following ways: Assignment: This includes any read of one responsible pointer and write of the value into another responsible pointer. The pointer written assumes the other pointer s state, and the other pointer becomes a zombie. Besides the assignment operator, assignment could be accomplished with the memcpy() function. Producer return value: This makes a pointer good or null under some circumstances. char *rp1; // responsible rp1 = malloc(5); // now good or null) char *rp2; // responsible rp2 = new char[5]; // now good Producer argument: This makes a pointer good, although sometimes only if pointer was null. char *rp = NULL; // responsible, null getline( &rp, 80, stdin); // now good Consumer argument: This makes a pointer a zombie if and only if it was good. Null check: A pointer that may be in several states can be disambiguated by comparing it to 0. char *rp = malloc(5); // consistent if (rp == NULL) { // rp is now null abort(); else { // rp is now good // rp is now good Like pointers, functions can be annotated when they take pointer arguments or when they return a pointer. Functions can be annotated with the following: Nonreturning function: Nonreturning functions never return to their caller. They always exit via abort(), exit(), _Exit(), longjmp(), a goto statement, or by invoking another noreturn function. The C11 _Noreturn keyword denotes a nonreturning function. void panic(const char *s) { printf("error: %s\n", s); abort(); Producer return function: This is a function that returns a responsible pointer. Any producer return function must have its return value assigned to a responsible pointer Function consumer argument: This is an argument to certain functions. These functions either consume the pointer s data or give it to some other responsible pointer. A consumer argument must always be provided by a responsible pointer. Function producer argument: This is an argument to certain functions. They are of type pointer to pointer to x, and the functions generally allocate memory and fill the argument with a pointer to the newly allocated memory. Any producer argument must be a pointer to a responsible pointer object. The getline() function is a good example; its first argument is a pointer to a responsible pointer, and if the responsible pointer is null, it allocates space, making the responsible pointer good. Irresponsible pointer argument: A function that does not produce or consume a pointer argument effectively treats that argument as irresponsible. It is permissible to pass any pointer (responsible or not) to such a function. The strcpy() function takes two char* arguments, and both can be considered irresponsible. Forced-irresponsible pointer argument: This is a function that takes a pointer-to-pointer argument and modifies it without modifying the memory or consuming it. A responsible pointer must not be passed as a forced-irresponsible pointer argument. void fn(char** x) { // irresponsible 5094

// x points into a string (*x)++; 4.1.2 Structs and classes. Because pointers can also be used in other C objects, we must extend our notions of responsibility to other objects. Following are the rules for C structs and C++ classes: A struct is responsible if it contains any responsible pointers or if it contains any other responsible objects. A responsible object is consistent if all of its responsible pointers are consistent (good or null). A responsible object is a zombie if none of its responsible pointers are good. (Unlike pointers, objects can first exist in a zombie state when their responsible pointers are uninitialized). A responsible object is inconsistent if it is neither good nor a zombie. Any function that modifies the state of an object must not exit with the object in an inconsistent state. typedef struct foo_s { char *rptr1; // responsible char *rptr2; // responsible foo_t; void f1(foo_t* foo) { // consistent free( foo->rptr1); // inconsistent, rptr2 still good void f2(foo_t* foo) { // consistent free( rptr1); free( rptr2); // OK 4.1.3 Unions. A union that contains a pointer may have that pointer be out-of-scope, irresponsible, or responsible. The rules for pointers in unions are the same as for structs. However, because all elements in a union share memory, reading or writing to any union member also reads or writes to the pointer. A union with a responsible pointer thus has two extra rules. These rules depend on the following example datatype: #define SIZE (sizeof(char*)) typedef struct foo_u { char *rptr; // responsible char other[size]; // any other data foo_t; In any union with a good responsible pointer, reading any element besides the pointer automatically converts the pointer to a zombie. (This rule presumes that the pointer s value was read and assigned elsewhere.) char string[size]; foo_t foo; memcpy(&foo.string, foo.other, SIZE); // foo.rptr: good -> zombie In any union with a good responsible pointer, writing any element besides the pointer automatically sets the pointer state to uninitialized (its pointer value is invalid). foo_t foo; foo.rptr = NULL; foo.other[0] = 'a'; // uninitialized 4.1.4 Arrays. An array of irresponsible pointers requires no monitoring except for the standard rules about irresponsible pointers. An array of responsible pointers is like any other responsible object, but with one wrinkle: a size value. The size value indicates the number of responsible pointers in the array. It might be set to the index of the last element or one past the last element. It might also be an irresponsible pointer that points to the high element or one past the array. This model implicitly assumes that elements before the size are good or null and that elements after the size are not good. That is, only the elements before size must be freed; elements after size must not be freed. It is acceptable for this number to be less than the array bounds; the array may have several unused and uninitialized values beyond the size. (This means we do not prevent buffer overflow on the array.) void f() { int *array=malloc(10*sizeof(int)); // responsible int size = 8; for (int i = 0; i < size; i++) { array[i] = malloc(sizeof( int)); // OK 4.1.4 Typecasts. Typecasts are easily handled by POM. There is no problem with typecasting a pointer (responsible or irresponsible) to a pointer of different type. Likewise, there is no issue with converting an irresponsible or out-of-scope pointer to an integer or some other representation (such as a char array), or vice versa. A responsible pointer can be converted to an integer, which could be considered an implicit assignment to an irresponsible pointer and subsequent conversion to int. 4.2 Producers and consumers 5095

A function that produces a responsible object (which may just be a responsible pointer) is a producer function. s of producers include malloc(), calloc() and various special functions such as strdup(). Some functions take as an argument a pointer to a pointer, and if they are not null, the function allocates space for an object and sets the pointer to the produced space (an example is getline() from GNU glibc.) In other words, a producer function can produce a pointer in one of its arguments, not just in its return value. A function that frees a responsible object is a consumer function. The most well-known example of such a function is free(). It is possible for a function to consume multiple responsible objects. It is also possible for a function to produce one argument (or return value) and consume an argument. A function can produce an object via its return value, but it can consume an object only via arguments, not via its return value. A producer function need not even invoke malloc(). For example, a producer function could split a responsible pointer from a responsible object and return the pointer. The following is a producer function because it returns a pointer that must be subsequently freed. typedef struct foo_s { char *a; // responsible char *b; // irresponsible foo_t; // producer function char *function(struct foo_t* foo) { char *tmp = foo->a; foo->a = NULL; return tmp; // OK The following function consumes its second argument because its second argument no longer needs to be explicitly freed; it is incorporated into a responsible object. // consumer function void function2(struct foo_t* foo, char *a) { if (foo->a!= NULL) { // get rid of foo->a somehow foo->a = a; The behavior of the realloc() function depends on its size argument. It requires a responsible pointer (that is good or null), so the pointer argument is consumed. It also returns the pointer successfully reallocated, or a pointer to the new internal copy (if it had to copy), or null if it failed to copy. So it also has a producer return value. Some implementations treat a call to realloc() with a size of 0 as a call to free(), but this is nonstandard behavior. The C11 Standard declares that specifying 0 as the size is an obsolete feature, so we will ignore this case. Many functions will have no producer or consumer arguments. The main() function must not produce or consume any arguments. 4.3 Analysis For each function declaration, we need to know if the function returns a responsible pointer; that is, is it a producer? Also, we need to know if the function produces any argument pointer-to-pointer. Finally, we need to know if the function consumes any argument pointer. We can use static analysis to answer these questions for a function whose definition is available. But if a function s definition is not available, we require some notation to answer these questions. To analyze a function body properly, we must inspect its definition, as well as all of the variables available to the function, including arguments, local variables, and static variables. Does the function produce or consume any of them? A responsible pointer can be in multiple states at once. We always assume that the states a pointer is in can be determined statically. For any two states, it is easy to use branching to create a pointer that could be in both states. Consider this example: char *rptr; // uninitialized bool flag = //... if (flag) { rptr = new char[...]; // rptr now good or uninitialized This code can cause trouble later on; there is no way to distinguish good responsible pointers from uninitialized pointers. To be safe, this code must use the flag variable to determine if the pointer is good, and determining flag s value statically might be impossible. We can handle this code statically by allowing rptr to be either good or initialized and indicating an error if either state causes an inconsistency later in the code. This means that this code will always yield an error: either when rptr is freed (because it may have been uninitialized) or when rptr leaks (it may have been good). The code can always be brought to compliance with POM by initializing rptr to NULL: char *rptr = NULL; bool flag = //... 5096

if (flag) { rptr = new char[...]; //... free(rptr); // was good or NULL Besides using control flow to create multiple states, malloc() and other producers can make pointers that are either good or null. No other combinations of states are typically possible without control flow. But because of control flow, we must assume any combination of states is possible. So a responsible pointer s state should be expressed as a list of Booleans (or a bit vector), one for each possible state. However, this does mean that this safe code snippet does not abide by POM: char *rptr; // uninitialized bool flag = //... if (flag) { rptr = new char[...]; //... if (flag) { free( rptr); 4.4 Remaining issues POM has several unresolved issues. These can be tackled in future implementations using a waterfall model of development. That is, future designs for POM can tackle these issues with only minor extensions to the current design: How should POM handle a circular linked list? By POM s model, either every pointer in the list is responsible, or every pointer is irresponsible. Irresponsible pointers renders the list useless for managing the elements, but responsible pointers would prevent any responsible pointer from pointing to the list from outside it. One solution would be to have one pointer be explicitly irresponsible, but this would be difficult to enforce. If the list contains one special sentinel element, then POM could simply declare that the pointer to the sentinel element is irresponsible but all other pointers are responsible. Such a list is not representable under the current ownership model specified in this paper; it would require an extension. Can a responsible pointer be created from an integer or other nonpointer representation? An initial implementation can simply disallow this possibility; later we can investigate programs that require this capability. How should C++ code be handled in POM? (The current design ignores C++, focusing on C. C++ is more complex, but no new fundamental approaches are required.) Is there any way to limit the lifetime of an irresponsible pointer? Irresponsible pointers that refer to freed memory are still an unsolved problem. Would the model be damaged if the zombie and uninitialized responsible pointer states were merged into one state? How should POM handle programs that can benefit from its model but that also use an alternative ownership model, such as reference counting, for some memory? A simple extension would be to mark some pointers as out-of-scope, indicating that the tools should ignore them. How should POM handle programs that use nonstandard extensions to the C language or that rely on undefined or implementation-defined behavior? In all likelihood, such programs will still be amenable to POM analysis, but accuracy will be lower than for standards-compliant programs. How should POM handle programs with potential memory management bugs in nonanalyzable code, such as assembly code? This could be rectified by some kind of trust-me notation for the verifier to use when it cannot verify code safety. The developer must assume responsibility that any such code is secure. How should POM handle programs that intentionally leak memory (most likely because the program relies on its platform to free the memory when the program exits)? This could be mitigated by having the verifier tool provide an option to ignore memory leaks. 5. Prototype A POM prototype has been developed as a proofof-concept. The prototype is built using the C Intermediate Language (CIL). [9] CIL differs from the parsers used by typical compilers such as GCC because it simplifies the C grammar. This makes syntax trees produced by CIL easier to analyze than trees built using traditional compiler front-ends while simultaneously limiting its applicability. In particular, CIL has no support for analyzing C++ programs. Eventually, we wish to build an implementation using mainstream static analysis and compiler technology such as Clang/LLVM. This implementation will have the advantage over CIL of being able to analyze a broad code base as well as support future extension of POM to handle C++ programs. For example, virtually all of the code in Mac OS X 10.7 Lion and ios 5 was built with Clang and llvm-gcc. 5.1. Lessons learned The purpose of a prototype is to identify design flaws as well as refine future extensions, and this 5097

prototype served its purpose quite well. Following are some of the changes we made to the design as a result of the prototype. The current design does not address functions that may duplicate a responsible pointer. One such function is memcpy() if it duplicates memory that contains a pointer. In the prototype, we hardwired the behavior for memcpy(), but this feature must be provided for other functions, too. Our annotations can identify function arguments that must be irresponsible, must be responsible, or may be either, and this is adequate for pointers. But structs that contain multiple pointers often have more complex invariants when passed as function arguments. Consider a struct to represent a doubly linked list: struct node_t { char* data; // responsible struct node_t* prev; // irresponsible struct node_t* next; // responsible ; A function that removed an element from a linked list would require that any struct node_t* arguments it receives have consistent next pointers. But it does not require that the data pointers are consistent; they could be zombies or uninitialized. For this prototype, we assumed that all responsible pointers must be consistent (good or null), but this clearly makes POM complain when given valid code, such as struct node_t* element =...; free(element->data); delete_element( element); // fails, argument has zombie data Properly cleaning up zombies can be a tricky process and incorrect cleanup may not be properly detected by POM. Suppose we have a linked list with three elements: struct node_t* start = // first struct node_t* end = // last free(start->next->data); free(end->prev->data); // oops, double-free! POM fails to realize that start->next->data and end->prev->data represent the same pointer, so it does not detect the double-free error. Our prototype already catches double-free errors when they are encapsulated in single pointers. It also can detect such errors when they occur in nonrecursive structs. In this case, the hierarchy of pointers in runtime memory is mirrored by the hierarchy of structs at compile time. But a recursive struct, including a linked list, can have a much more complex runtime image, and so cannot be detected by our current rules. One solution would be to mandate that if a struct member becomes a zombie, we disallow any reads or writes to that struct member from any other pointers in the case that any such member might refer to the same data. The developer could mitigate this by setting the freed memory to NULL, thus killing the zombie: free(start->next->data); // now cannot read any data member! start->next->data = NULL; // now reading any data member is OK free(end->prev->data); // OK, no-op But this would work only if POM can detect when the zombie pointer is cleared. free(start->next->data); end->prev->data = NULL; // zombie dead // but POM doesn t detect this! 6. Summary POM can be used as a static analysis tool to verify the memory safety of C (and eventually C++) programs. It is based on a general strategy that states that judicious use of semantics added by a developer can greatly aid static analysis tools. Unaided static analysis runs in to many limitations, such as the inability to recognize null pointer dereferences with high accuracy. We believe that requiring developers to provide a small amount of carefully chosen semantic information can be a great aid to static analysis, and POM is an example of this strategy applied to dynamic memory management. Proper memory management is currently obtainable using various techniques, such as garbage collection or reference counting. All such strategies impose a performance penalty, and none are provided with most implementations of C. In contrast, C++ offers reference counting via the shared_ptr<> template, and many newer languages require the developer to use some high-level form of memory management. POM imposes no performance overhead at runtime and only a minor penalty at compile time to validate pointer usage. POM is designed to work with existing C code, and one of our goals is to demonstrate its effectiveness by using it on a large open-source program. It may require some additional annotations from a developer, but it requires no code changes (unless it detects memory bugs). Consequently, the cost for developers to test a program that uses POM should be minimal. 7. References 5098

[1] International Organization for Standardization and the International Electrotechnical Commission (ISO/IEC). Programming Languages C, 3rd ed. (ISO/IEC 9899:2011). International Organization for Standardization, Geneva, Switzerland, 2011. [2] Seacord, R. Secure Coding in C and C++, 2nd ed. Addison-Wesley Professional, Boston, 2013. [3] Plum, T., and D. Keaton, Eliminating Buffer Overflows, Using the Compiler or a Standalone Tool. Workshop on Software Security Assurance Tools, Techniques, and Metrics (SSATTM), Long Beach, CA, Nov. 7 8, 2005. www.plumhall.com/ase-ssattm-plum+keatonproceedings.pdf. [4] Dannenberg, R. B., W. Dormann, D. Keaton, R. C. Seacord, D. Svoboda, A. Volkovitsky, T. Wilson, and T. Plum. As-If Infinitely Ranged Integer Model. In Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering (ISSRE 10), Washington, DC, pp. 91 100. IEEE Computer Society, Los Alamitos, CA, 2010. [5] Software Engineering Institute at Carnegie Mellon University, Pittsburgh, PA. [6] Dewhurst, S. C., C++ Gotchas: Avoiding Common Problems in Coding and Design. Addison-Wesley Professional, Boston, 2002. [7] Bierhoff, K., and J. Aldrich. Modular Typestate Checking of Aliased Objects. Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems and Applications (OOPSLA 07), pp. 301 320, ACM Press, New York,, 2007. [8] Necula, G. C., S. McPeak, and W. Weimer. CCured: Type- Proceedings of the ACM Symposium on Principles of Programming Languages, London, pp. 128 139, ACM Press, New York, 2002. [9] C Intermediate Language, freely available at http://kerneis.github.com/cil/. 5099