Fast, precise dynamic checking of types and bounds in C

Similar documents
Process-wide type and bounds checking

Fast, precise dynamic checking of types and bounds in C

Run-time type checking of whole programs and other stories.

Run-time type checking of whole programs and other stories

What run-time services could help scientific programming?

Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization

CS527 Software Security

Lightweight Memory Tracing

EffectiveSan: Type and Memory Error Detection using Dynamically Typed C/C++

A program execution is memory safe so long as memory access errors never occur:

Recitation #11 Malloc Lab. November 7th, 2017

CS-527 Software Security

Dynamic Data Structures. CSCI 112: Programming in C

Kruiser: Semi-synchronized Nonblocking Concurrent Kernel Heap Buffer Overflow Monitoring

Programming in C and C++

o Code, executable, and process o Main memory vs. virtual memory

Data Types. Every program uses data, either explicitly or implicitly to arrive at a result.

Improving Error Checking and Unsafe Optimizations using Software Speculation. Kirk Kelsey and Chen Ding University of Rochester

Thread-Level Speculation on Off-the-Shelf Hardware Transactional Memory

Towards a dynamic object model within Unix processes

UniSan: Proactive Kernel Memory Initialization to Eliminate Data Leakages

secubt Hacking the Hackers with User Space Virtualization

Runtime Defenses against Memory Corruption

Cling: A Memory Allocator to Mitigate Dangling Pointers. Periklis Akritidis

High System-Code Security with Low Overhead

CSE 374 Programming Concepts & Tools. Hal Perkins Spring 2010

HA2lloc: Hardware-Assisted Secure Allocator

Dynamic Allocation in C

Week 9 Part 1. Kyle Dewey. Tuesday, August 28, 12

Kurt Schmidt. October 30, 2018

CS107 Handout 08 Spring 2007 April 9, 2007 The Ins and Outs of C Arrays

Memory Allocation in C C Programming and Software Tools. N.C. State Department of Computer Science

Protecting Dynamic Code by Modular Control-Flow Integrity

SVF: Static Value-Flow Analysis in LLVM

Call Paths for Pin Tools

Heap Arrays and Linked Lists. Steven R. Bagley

In Java we have the keyword null, which is the value of an uninitialized reference type

Stack Bounds Protection with Low Fat Pointers

System Assertions. Andreas Zeller

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Heap Arrays. Steven R. Bagley

CS11001/CS11002 Programming and Data Structures (PDS) (Theory: 3-1-0) Allocating Space

Heriot-Watt University

Connecting the EDG front-end to LLVM. Renato Golin, Evzen Muller, Jim MacArthur, Al Grant ARM Ltd.

Baggy bounds with LLVM

A Fast Instruction Set Simulator for RISC-V

Data Representation and Storage. Some definitions (in C)

SoftBound: Highly Compatible and Complete Spatial Safety for C

Practical Memory Safety with REST

Chapter 10. Improving the Runtime Type Checker Type-Flow Analysis

Outline. Classic races: files in /tmp. Race conditions. TOCTTOU example. TOCTTOU gaps. Vulnerabilities in OS interaction

advanced data types (2) typedef. today advanced data types (3) enum. mon 23 sep 2002 defining your own types using typedef

Loop-Oriented Array- and Field-Sensitive Pointer Analysis for Automatic SIMD Vectorization

CS 610: Intermediate Programming: C/C++ Making Programs General An Introduction to Linked Lists

Dynamic Allocation in C

Important From Last Time

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

Security-Aware Processor Architecture Design. CS 6501 Fall 2018 Ashish Venkat

MISRA-C. Subset of the C language for critical systems

Java and C CSE 351 Spring

CS 31: Intro to Systems Pointers and Memory. Kevin Webb Swarthmore College October 2, 2018

System Assertions. Your Submissions. Oral Exams FEBRUARY FEBRUARY FEBRUARY

Tutorial 1: Introduction to C Computer Architecture and Systems Programming ( )

Precise Garbage Collection for C. Jon Rafkind * Adam Wick + John Regehr * Matthew Flatt *

LLVM performance optimization for z Systems

Compiler Construction: LLVMlite

Run-time Environments - 3

CS377P Programming for Performance Single Thread Performance Out-of-order Superscalar Pipelines

SoftBound: Highly Compatible and Complete Spatial Memory Safety for C

My malloc: mylloc and mhysa. Johan Montelius HT2016

CPSC 3740 Programming Languages University of Lethbridge. Data Types

CS61, Fall 2012 Section 2 Notes

Understanding Pointers

CSCI-243 Exam 1 Review February 22, 2015 Presented by the RIT Computer Science Community

Lecture 8 Dynamic Memory Allocation

Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

When you add a number to a pointer, that number is added, but first it is multiplied by the sizeof the type the pointer points to.

Transparent Pointer Compression for Linked Data Structures

CS 61C: Great Ideas in Computer Architecture C Pointers. Instructors: Vladimir Stojanovic & Nicholas Weaver

Lecture 03 Bits, Bytes and Data Types

Final CSE 131B Spring 2004

1d: tests knowing about bitwise fields and union/struct differences.

CS107 Handout 13 Spring 2008 April 18, 2008 Computer Architecture: Take II

valgrind overview: runtime memory checker and a bit more What can we do with it?

Types. What is a type?

Intermediate Programming, Spring 2017*

CPSC 213, Winter 2016, Term 2 Final Exam Solution Date: April 19, 2017; Instructor: Mike Feeley and Alan Wagner

Lab 2: More Advanced C

Real-World Buffer Overflow Protection in User & Kernel Space

Programming in C and C++

Introduce C# as Object Oriented programming language. Explain, tokens,

Introduction to Computer Systems. Exam 2. April 11, Notes and calculators are permitted, but not computers.

What the CPU Sees Basic Flow Control Conditional Flow Control Structured Flow Control Functions and Scope. C Flow Control.

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

Motivation for Dynamic Memory. Dynamic Memory Allocation. Stack Organization. Stack Discussion. Questions answered in this lecture:

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

22c:111 Programming Language Concepts. Fall Types I

CSE409, Rob Johnson, Alin Tomescu, November 11 th, 2011 Buffer overflow defenses

Data Representation and Storage

Transcription:

Fast, precise dynamic checking of types and bounds in C Stephen Kell stephen.kell@cl.cam.ac.uk Computer Laboratory University of Cambridge p.1

Tool wanted if (obj >type == OBJ COMMIT) { if (process commit(walker, (struct commit )obj)) return 1; return 0; } p.2

Tool wanted if (obj >type == OBJ COMMIT) { if (process commit(walker, (struct commit )obj)) return 1; տ ր return 0; CHECK this } (at run time) p.2

Tool wanted if (obj >type == OBJ COMMIT) { if (process commit(walker, (struct commit )obj)) return 1; տ ր return 0; CHECK this } (at run time) But also wanted: binary-compatible source-compatible... for real, idiomatic code in (say) C reasonable performance p.2

Tool wanted if (obj >type == OBJ COMMIT) { if (process commit(walker, (struct commit )obj)) return 1; տ ր return 0; CHECK this } (at run time) But also wanted: binary-compatible source-compatible... for real, idiomatic code in (say) C reasonable performance Enter libcrunch, which does the above. p.2

The user s-eye view $ crunchcc -o myprog... # + other front-ends p.3

The user s-eye view $ crunchcc -o myprog... $./myprog # + other front-ends # runs normally p.3

The user s-eye view $ crunchcc -o myprog... $./myprog # + other front-ends # runs normally $ LD PRELOAD=libcrunch.so./myprog # does checks p.3

The user s-eye view $ crunchcc -o myprog... $./myprog # + other front-ends # runs normally $ LD PRELOAD=libcrunch.so./myprog # does checks myprog: Failed is a internal(0x5a1220, 0x413560 a.k.a. "uint$32") at 0x40dade, allocation was a heap block of int$32 originating at 0x40daa1 Reminiscent of Valgrind (Memcheck), but different... not checking memory definedness, in-boundsness, etc..... in fact, assume correct w.r.t. these! provide & exploit run-time type information p.3

Sketch of the instrumentation for C if (obj >type == OBJ COMMIT) { if (process commit(walker, } return 1; return 0; (struct commit )obj)) p.4

Sketch of the instrumentation for C if (obj >type == OBJ COMMIT) { if (process commit(walker, (CHECK( is a(obj, struct commit )), ( struct commit )obj))) return 1; return 0; } p.4

Sketch of the instrumentation for C if (obj >type == OBJ COMMIT) { if (process commit(walker, (CHECK( is a(obj, struct commit )), ( struct commit )obj))) return 1; return 0; } Need a runtime which provides a fast is a() function... and a few other flavours of check by efficiently tracking allocations... and attaching reified type info p.4

Reified, unique data types struct ellipse { double maj, min; struct point { double x, y; } ctr ; }; _uniqtype int int 4 _uniqtype double double 8 _uniqtype point 0 16 _uniqtype ellipse ellipse 32 0 0 2 3 0 8 0 8 16 also model: stack frames, functions, pointers, arrays,... unique exact type test is a pointer comparison is a() is a short search over containment edges p.5

Is it really that simple? What about...? untyped malloc() et al. opaque pointers, a.k.a. void* conversion of pointers to integers and back function pointers pointers to pointers simulated subtyping {custom, nested} heap allocators alloca() sloppy (non-standard-compliant) code unions, varargs, memcpy() p.6

Is it really that simple? What about...? untyped malloc() et al. opaque pointers, a.k.a. void* conversion of pointers to integers and back function pointers pointers to pointers simulated subtyping {custom, nested} heap allocators alloca() sloppy (non-standard-compliant) code unions, varargs, memcpy() p.6

What data type is being malloc() d? Use intraprocedural sizeofness analysis size t sz = sizeof (struct Foo); /*... */ malloc(sz); Sizeofness propagates, a bit like dimensional analysis. p.7

What data type is being malloc() d? Use intraprocedural sizeofness analysis size t sz = sizeof (struct Foo); /*... */ malloc(sz); Sizeofness propagates, a bit like dimensional analysis. malloc(sizeof (Blah) + n * sizeof (struct Foo)) p.7

What data type is being malloc() d? Use intraprocedural sizeofness analysis size t sz = sizeof (struct Foo); /*... */ malloc(sz); Sizeofness propagates, a bit like dimensional analysis. malloc(sizeof (Blah) + n * sizeof (struct Foo)) Dump typed allocation sites from compiler, for later pick-up source tree main.c widget.c util.c... main.i.allocs widget.i.allocs util.i.allocs... p.7

Polymorphism via multiply-indirected void void sort eight special (void pt){ void tt [8]; register int i ; for( i=0;i<8;i++)tt [ i]=pt[ i ]; for( i =XUP;i<=TUP;i++){pt[i]=tt[2 i]; pt[opp DIR(i)]=tt[2 i+1];} } neighbor = (int )calloc(ndirs, sizeof(int )); sort eight special (( void ) neighbor ); // < must allow! solution: tolerate casts from T** to void**... and check writes through void**... against the underlying object type (here int *[]) p.8

Performance data: C-language SPEC CPU2006 benchmarks bench normal/s crunch % nopreload bzip2 4.95 +6.8% +1.4% gcc 0.983 +160 % % gobmk 14.6 +11 % +2.0% h264ref 10.1 +3.9% +2.9% hmmer 2.16 +8.3% +3.7% lbm 3.42 +9.6% +1.7% mcf 2.48 +12 % ( 0.5%) milc 8.78 +38 % +5.4% sjeng 3.33 +1.5% ( 1.3%) sphinx3 1.60 +13 % +0.0% perlbench p.9

Experience on correct code run-time false positives benchmark compile fixes instances unique (of which... ) total unhelpful bzip2 0 48 3 3 gcc 1 3 10 5 14 3 gobmk 0 0 0 0 h264ref 2 27 2 0 hmmer 0 0 0 0 lbm 0 5 10 7 8 0 mcf 0 0 0 0 milc 0 0 0 0 sjeng 0 0 0 0 sphinx3 0 0 0 0 p.10

A helpful false positive? typedef double LBM Grid[SIZE Z SIZE Y SIZE X N CELL ENTRIES]; typedef LBM Grid LBM GridPtr; #define MAGIC CAST(v) ((unsigned int ) ((void ) (&(v)))) #define FLAG VAR(v) unsigned int const aux = MAGIC CAST(v) //... #define TEST FLAG(g,x,y,z,f) \ (( MAGIC CAST(GRID ENTRY(g, x, y, z, FLAGS))) & (f)) #define SET FLAG(g,x,y,z,f) \ {FLAG VAR(GRID ENTRY(g, x, y, z, FLAGS)); ( aux ) = (f);} p.11

Future work: shopping list for a safe implementation of C ǫ check memcpy(), realloc(), etc.. check syscalls (e.g. read()) add a bounds checker (improve on SoftBound) add a GC (precise! improve on Boehm) check unions and varargs always initialize pointers check unsafe writes through char* safely address-takeable union members (!) Good prospects for all of the above! (ask me) p.12

Future work: shopping list for a safe implementation of C ǫ check memcpy(), realloc(), etc.. check syscalls (e.g. read()) add a bounds checker (improve on SoftBound) add a GC (precise! improve on Boehm) check unions and varargs always initialize pointers check unsafe writes through char* safely address-takeable union members (!) Good prospects for all of the above! (ask me) p.12

Plenty of existing tools do bounds checking Memcheck (coarse), ASan (fine-ish), SoftBound (fine)... detect out-of-bounds pointer/array use first two also catch some temporal errors Problems remaining: overhead at best 50 100% (ASan & SoftBound) problems mixing uninstrumented code (libraries) false positives and negatives abort on false positives p.13

Difficult-to-check bounds errors and non-errors typedef struct {int x [2]; char y[2];} blah; blah z = { {0, 0},! }; (z.x + 2); // error : subobject overflow (( int ) &z)[2]; // error : after bounds narrowing cast ((z.x + 42) 42); // non error: via invalid (OOB) intermediate (( blah ) z.x) >y; // non error: after bounds widening cast ( int )( intptr t )z.x; // non error: via integer strfry (z.y); // non error: after uninstrumented code p.14

Existing checkers using per-pointer metadata p.15

Existing checkers using per-pointer metadata!" p.15

Without type information, pointer bounds lose precision! " p.16

Given allocation type and pointer type, bounds are implicit ellipse[3] ctr maj min ctr maj min ctr maj min x y x y x y 3.5 8.0 2 7 1.0 1.5 5 8 6.5-2.0 4 4 ellipse p_e = &my_ellipses[1] struct ellipse { struct point { double x, y; } ctr; double maj; double min; } my_ellipses[3]; p.17

Given allocation type and pointer type, bounds are implicit double ellipse[3] ctr maj min ctr maj min ctr maj min x y x y x y 3.5 8.0 2 7 1.0 1.5 5 8 6.5-2.0 4 4 double p_d = &p_e->ctr.x struct ellipse { struct point { double x, y; } ctr; double maj; double min; } my_ellipses[3]; p.17

Given allocation type and pointer type, bounds are implicit ellipse[3] ctr maj min ctr maj min ctr maj min x y x y x y 3.5 8.0 2 7 1.0 1.5 5 8 6.5-2.0 4 4 ellipse p_f = (ellipse*) p_d struct ellipse { struct point { double x, y; } ctr; double maj; double min; } my_ellipses[3]; p.17

The importance of being type-aware (when bounds-checking) struct driver { /... / } d = /... / ; struct i2c driver { /... / struct driver driver ; /... / }; #define container of( ptr, type, member) \ ((type )( (char )(ptr ) offsetof(type,member) )) i2c drv = container of(d, struct i2c driver, driver ); p.18

The importance of being type-aware (when bounds-checking) struct driver { /... / } d = /... / ; struct i2c driver { /... / struct driver driver ; /... / }; #define container of( ptr, type, member) \ ((type )( (char )(ptr ) offsetof(type,member) )) i2c drv = container of(d, struct i2c driver, driver ); SoftBound is oblivious to casts, even though they matter: bounds of d: just the smaller struct bounds of the char*: the whole allocation bounds of i2c drv: the bigger struct If only we knew the type of the storage! p.18

Idea Write a bounds-checker consuming per-allocation metadata avoid these false positives avoid libc wrappers,... robust to uninstrumented callers/callees performance? Making it fast: cache bounds: make pointers locally fat, globally thin only check derivation, not use p.19

Handling one-past pointers (diagram: Vladsinger, CC-BY-SA 3.0) On x86-64, use noncanonical addresses as trap reps p.20

No more deref checking int ret = 0; for (int i = 0; i < n; ++i) { struct list node p = malloc(sizeof (struct list node )); p >next = head; head = p; } for (int i = 0; i < m; ++i) { unsigned out = 0; for (struct list node p = head; p; p = p >next) { out += p >x; } ret += out; } return ret ; p.21

Status of the bounds checking It works! still turning the handle... Early indications array-based programs: SoftBound-like perf mostly +50 100% linked structures: much faster! as expected building my own SoftBound-alike to compare... p.22

Conclusions Run-time type info enables efficient and helpful checking source- and binary-compatible low overhead type checking precise & fast bounds checking good prospects for extension Code is here: http://github.com/stephenrkell/libcrunch/ Thanks for your attention. Questions? p.23