CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

Similar documents
CCured in the Real World

Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues

Type Checking Systems Code

CCured: Type-Safe Retrofitting of Legacy Software

One-Slide Summary. Lecture Outline. Language Security

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

Short Notes of CS201

CS201 - Introduction to Programming Glossary By

In Java we have the keyword null, which is the value of an uninitialized reference type

Dependant Type Systems (saying what you are) Data Abstraction (hiding what you are) Review. Dependent Types. Dependent Type Notation

CCured: Type-Safe Retrofitting of Legacy Code

Ironclad C++ A Library-Augmented Type-Safe Subset of C++

HW1 due Monday by 9:30am Assignment online, submission details to come

CMPSC 497 Other Memory Vulnerabilities

Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

Pointer Basics. Lecture 13 COP 3014 Spring March 28, 2018

CS527 Software Security

Memory Safety for Embedded Devices with nescheck

Language Security. Lecture 40

Lectures 5-6: Introduction to C

A program execution is memory safe so long as memory access errors never occur:

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

CS 161 Computer Security

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

CS61C Machine Structures. Lecture 3 Introduction to the C Programming Language. 1/23/2006 John Wawrzynek. www-inst.eecs.berkeley.

COS 320. Compiling Techniques

An Efficient and Backwards-Compatible Transformation to Ensure Memory Safety of C Programs

Simply-Typed Lambda Calculus

The role of semantic analysis in a compiler

Runtime Defenses against Memory Corruption

QUIZ. Write the following for the class Bar: Default constructor Constructor Copy-constructor Overloaded assignment oper. Is a destructor needed?

Class Information ANNOUCEMENTS

Information Flow Analysis and Type Systems for Secure C Language (VITC Project) Jun FURUSE. The University of Tokyo

CSolve: Verifying C With Liquid Types

Software Security: Vulnerability Analysis

CS 330 Lecture 18. Symbol table. C scope rules. Declarations. Chapter 5 Louden Outline

CS61C : Machine Structures

Topics Covered Thus Far CMSC 330: Organization of Programming Languages

Dynamic Memory Allocation: Advanced Concepts

Operational Semantics. One-Slide Summary. Lecture Outline

QUIZ How do we implement run-time constants and. compile-time constants inside classes?

Lecture 27. Pros and Cons of Pointers. Basics Design Options Pointer Analysis Algorithms Pointer Analysis Using BDDs Probabilistic Pointer Analysis

CMSC 313 COMPUTER ORGANIZATION & ASSEMBLY LANGUAGE PROGRAMMING

Lecture 20 Pointer Analysis

Lecture 14 Pointer Analysis

CS61C : Machine Structures

Region-Based Memory Management in Cyclone

CPSC 3740 Programming Languages University of Lethbridge. Data Types

CSE 303 Concepts and Tools for Software Development. Magdalena Balazinska Winter 2010 Lecture 27 Final Exam Revision

Java and C CSE 351 Spring

Dynamic Memory Allocation

QUIZ. What is wrong with this code that uses default arguments?

Types and Type Inference

Types and Type Inference

Lecture 16 Pointer Analysis

CS-527 Software Security

Contents. I. Classes, Superclasses, and Subclasses. Topic 04 - Inheritance

CS152: Programming Languages. Lecture 23 Advanced Concepts in Object-Oriented Programming. Dan Grossman Spring 2011

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

The Java Language Implementation

The Mercury project. Zoltan Somogyi

A brief introduction to C programming for Java programmers

Topic 9: Type Checking

Topic 9: Type Checking

Inheritance, Polymorphism and the Object Memory Model

CMSC 330: Organization of Programming Languages. Ownership, References, and Lifetimes in Rust

CSE 351, Spring 2010 Lab 7: Writing a Dynamic Storage Allocator Due: Thursday May 27, 11:59PM

CS 314 Principles of Programming Languages. Lecture 11

Memory Safety (cont d) Software Security

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

CSE 307: Principles of Programming Languages

Lectures 5-6: Introduction to C

Java and C I. CSE 351 Spring Instructor: Ruth Anderson

Organization of Programming Languages CS320/520N. Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Types. What is a type?


CSE 431S Type Checking. Washington University Spring 2013

CS 31: Intro to Systems Pointers and Memory. Martin Gagne Swarthmore College February 16, 2016

G52CPP C++ Programming Lecture 3. Dr Jason Atkin

CS558 Programming Languages

Lecture 14. No in-class files today. Homework 7 (due on Wednesday) and Project 3 (due in 10 days) posted. Questions?

Functions in C C Programming and Software Tools

High Performance Computing and Programming, Lecture 3

CS6202: Advanced Topics in Programming Languages and Systems Lecture 0 : Overview

Simple Overflow. #include <stdio.h> int main(void){ unsigned int num = 0xffffffff;

Introduction Primitive Data Types Character String Types User-Defined Ordinal Types Array Types. Record Types. Pointer and Reference Types

Ch. 3: The C in C++ - Continued -

Announcements. assign0 due tonight. Labs start this week. No late submissions. Very helpful for assign1

Agenda CS121/IS223. Reminder. Object Declaration, Creation, Assignment. What is Going On? Variables in Java

377 Student Guide to C++

Organization of Programming Languages CS3200 / 5200N. Lecture 06

Secure Software Development: Theory and Practice

Chapter 10. Improving the Runtime Type Checker Type-Flow Analysis

Limitations of the stack

Week 7. Statically-typed OO languages: C++ Closer look at subtyping

QUIZ. What are 3 differences between C and C++ const variables?

Identifying Memory Corruption Bugs with Compiler Instrumentations. 이병영 ( 조지아공과대학교

CMSC 330: Organization of Programming Languages

Pierce Ch. 3, 8, 11, 15. Type Systems

Memory Allocation III

Transcription:

CCured Type-Safe Retrofitting of C Programs [Necula, McPeak,, Weimer, Condit, Harren] #1 One-Slide Summary CCured enforces memory safety and type safety in legacy C programs. CCured analyzes how you use pointers and either proves the usage safe statically or inserts run-time checks. Along the way we ll see cameo appearances by just about every CS 415 topic. #2 Lecture Outline Type and Memory Safety CCured Motivation SAFE Pointers SEQuence Pointers WILD Pointers Experimental Results Analysis #3 1

Why Now, Brown Cow? Type Systems [ Type Safety ] Language Security [ Memory Safety ] Static and Dynamic Types [ Run-Time Type Info ] Runtime Organization [ Pointer Layout ] Subtyping [ Convertibility ] Garbage Collection [ Tag Bits ] Dataflow Analysis [ Pointer-Kind Inference ] Object-Oriented Programming [ OOP in C ] Aspect-Oriented Programming [ Checks Everywhere ] Libraries [ Library Compatibility ] Debuggers and Profilers [ Purify and Valgrind ] #4 Two Kinds of Safety Type safety is a property of a programming language that prevents certain errors (type errors) that result from attempts to perform an operation on a value of the wrong type. Type safety prevents: hello + 3 Not Type safe: 3 + (int) hello Some languages allow unsafe casts between types. Memory safety: if a value of type T 1 is read from address A, then the most recent store to A had type T 2 with T 2 T 1 Store a Dog in memory, read an Animal later #5 We Need Them Both You cannot have true Type Safety without Memory Safety Why? Hint: an unsafe cast defeats type safety #6 2

We Need Them Both You cannot have true Type Safety without Memory Safety Why? Example: table * t = new table(); char buf[10]; buf[10] = buf[11] = buf[12] = buf[13] = 55; t->countlegs(); Buffer overrun! #7 Memory Safety Essential component of a security infrastructure Prevents interference 50% of reported attacks are due to buffer overruns Software engineering advantages Memory bugs are hard to find (why?) Memory safety ensures component isolation Required for soundness of many program analyses (why? hint: aliasing) Does not even need an explicit specification #8 C and Memory Safety C was designed for flexibility and efficiency Many operators can be used unsafely Memory safety is sacrificed! In practice, many C programs use those operators safely Only a small portion of the pointers and operators are responsible for the unsafe behavior #9 3

CCured Idea 1. Devise a sound type system and a type inference algorithm that handles most C programs Combination of static and dynamic types 2. Insert run-time checks (e.g. array-bounds checks and dynamic type checking) in those places where safety cannot be verified statically This way we sacrifice performance instead of safety Makes sense for more and more applications every day Hardware progress improves performance but not safety #10 CCured Goals Compatibility: support existing C code Source-to-source transformation Handle GCC/MSVC source, Makefiles All that is needed is a recompilation: make CC=ccured Efficiency: 0-50% overhead rather than 1000% Other research: 10x, Purify: 20x, BoundsChecker: 150x More effective and more efficient than Purify Because it leverages existing type information in source Use for production code not just during testing #11 Diseases We Want to Cure Focus on pointer usage Is it lupus? Dereferencing a non-pointer (or NULL) Invoking a non-function Complicated by casts and union types It s never lupus! Dereferencing outside of object bounds Buffer overruns Complicated by pointer arithmetic Not always caught by Purify Freeing non-pointers, using freed memory #12 4

Example Pointer Usage in C Consider an implementation of a hash table struct list {void * data; struct list * next} * * hash; Cast allowed Arith allowed Yes Yes No No No Yes #13 SAFE Pointer Invariants and Representation T * safe Can be 0 or a pointer to storage containing a T All aliases agree on the type of the referenced storage Must do null-check before dereference Inexpensive to store and to use Prototypical example: FILE * p T statically typed home area #14 Quiz How many pointers in an average C program are SAFE according to the previous definition? Answer: between 40-95% of the declared pointers #15 5

Typing Rules for SAFE Pointers e : T * safe 0 : T * safe * e : T e 1 : T * safe e 2 : T * e 1 = e 2 : T e : T * safe T * safe T 1 * safe (T 1 * safe) e : T 1 * safe T * safe T 1 * safe means that a pointer of the first kind is convertible to a pointer of second kind #16 Convertibility of Pointers The convertibility relation is based on the physical layout of types (flatten structures and arrays) Examples: struct { int x, y; int *p;} * int * struct { int x; struct { int y; int *p;} s; } * struct { int x, y; } * It s like an upcast in OOP #17 SEQ Pointer Invariants and Representation T * seq Can be 0 Can be involved in pointer arithmetic Null check and bounds check before use Carries the bounds of a home area consisting of a sequence of T s All SAFE or SEQ aliases agree on the type of the referenced area base p end T T T statically typed home area #18 6

Typing Rules for SEQUENCE Pointers e : T * seq e : T * seq e + e : T * seq e : int T * seq T 1 * seq (T 1 * seq) e : T 1 * seq Before dereferencing a SEQ pointer it must be converted to SAFE (with a bounds check and with dropping the base and the end fields) e : T * seq (T * safe) e : T * safe #19 Forward SEQUENCE Pointers Often a sequence pointer is only advanced We call it a Forward Sequence (FSEQ) Needs to carry only the end and needs only an upper bound check p end T T Pointer arithmetic must be checked to advance A SEQ is converted to FSEQ via a lower-bound check and dropping the lower-bound field #20 Quiz How many forward-thinking Cylon characters are in a typical Battlestar episode? Answer: Six or Eight #21 7

Quiz How many forward sequence and sequence pointers are in a typical C program? Answer: about 25% FSEQ, 1% SEQ #22 The WILD West So far we have not faced the really ugly pointers Those that are cast to incompatible types We call them WILD pointers For these we cannot count on the static type! We must keep run-time type tags (cf. Python, Ruby) Operations allowed: read/write assign an integer pointer arithmetic cast to other WILD pointers #23 WILD Pointer Invariants and Representation T * wild Can be a non-pointer (any integer) Carries the bounds of a dynamic home area (containing only integers and dynamic pointers) Has only WILD pointer aliases Must do a non-pointer and a bounds check Must maintain the tags when writing (1 bit per word) base end p dynamically typed home area Just like in garbage collection! tag bits #24 8

WILD Complications What if we have a SAFE alias for a WILD? T * safe s; T * wild w = s; // w is an alias for s base w s T *(T * wild)w = random_stuff_of_type_t ; For Safety: WILD pointers can only alias other WILD pointers! #25 WILD Complications What if we have a WILD pointer to a SAFE ptr? base p base q T * safe The type system cannot ensure that all writes through the WILD pointer will write a compatible SAFE pointer For Safety: WILD pointers must point to areas containing only WILD pointers! #26 WILD Pointers are Highly Contagious A WILD pointer will force other pointers to be WILD as well: All pointers to which it is assigned All pointers from which it is assigned All pointers that it points to We need ways to reduce the number of WILD pointers Better understanding of casts #27 9

Handling Downcasts 10-20% of the pointers are void* Cast from void* to T* is unsafe! This is a special case of a downcast Downcasts are frequent in C programs (50-90% of bad casts) We introduce pointers that carry run-time type info Each downcast is checked at run-time (like in Java) T 1 * RTTI p; p rtti(t) T Invariant: T is a subtype of T 1 #28 Programming OO-Style in C With RTTI pointers we can program safely in a object oriented style (e.g. dynamic dispatch): struct Figure { double (*area)(struct Figure * RTTI ); } struct Circle { double (*area)(struct Figure * RTTI ); double radius; } double circle_area(struct Figure* RTTI fig) { struct Circle *circ = (struct Circle*)fig; return PI * circ->radius * circ->radius; } ; struct Figure * RTTI fig; fig->area(fig) Other places where RTTI helps Heterogeneous data structures Polymorphic code and data structures #29 Pointer-Kind Inference For every pointer in the program Try to infer the fastest sound representation We construct a whole-program data flow graph We collect constraints about pointer kinds Then linear-time constraint solving Analysis can be modularized if the interfaces are annotated with pointer kind Extremely simple, fast and predictable #30 10

Example: SAFE/SEQ int *a =... ; int *b = a ; int *c = b ; print(*c) ; b = b + 1 ; so a is FSEQ too bounds check here but c can be SAFE arithmetic: b must be FSEQ #31 Example: WILD pointer to a pointer int foo(int **p) { int *q = (int *)p; return *q; } cast to incompatible type: both p and q are WILD (so is the caller's argument!) #32 Experience Using CCured CCured handles all of C: vararg, function pointers, union types, GCC extensions CCured works on low-level code Apache modules, Linux device drivers CCured scales to large programs sendmail, openssl, ssh, bind (>= 100K lines) ACE infrastructure (>= 1M lines) CCured often requires manual intervention must change between 1/100 to 1/300 lines of code #33 11

CCured Purify Experimental Results Slowdown of CCured and Purify vs. C (low numbers = good) ijpeg 1.6 30 compress 1.2 28 go 1.1 51 li 1.8 50 bh 1.5 94 tsp 1.1 42 0-80% slowdown 60-100% of pointers are statically known to be type safe Found bugs in SPEC95 benchmarks #34 The GOOD, the UGLY, and the BAD Standard techniques from type theory can be used to understand the type safety of existing C programs CCured works automatically in most cases Most pointers are SAFE and some are SEQUENCE The slowdown is minimal in many cases The uglier your program the slower it will be #35 The GOOD, the UGLY, and the BAD Occasional significant slowdown Typically due to either large number of WILD or SEQUENCE pointers Increased memory footprint Larger code size Some pointers take 64-bits and some even 96-bits CCured is confused by custom-memory allocators Forced to treat them WILD Or to trust the allocator (as in the experiments) #36 12

The GOOD, the UGLY, and the BAD Incompatibilities with some libraries Due to different layout of data structures Solved by writing wrappers Some programs require changes Those that store addresses of locals in the heap Those that cast pointers to integers and then back Some (non-portable) programs are terminally-ill Self-modifying programs Those that depend on the size of pointers Those that intentionally skip from one field to another #37 Future Work Allow the programmer to define new pointer kinds Derived from the existing ones Maybe even brand new ones? Open the door to type-safe interoperability with C Type-safe Java native methods Type-safe inline C in C# programs Check it out at http://hal.cs.berkeley.edu/ccured/ #38 Homework PA5 Due Friday April 27 (tomorrow) Final Examination Block 4 Thursday May 10 1400-1700 MEC 214 #39 13