C Source Code Analysis for Memory Safety

Similar documents
Static Analysis methods and tools An industrial study. Pär Emanuelsson Ericsson AB and LiU Prof Ulf Nilsson LiU

Software Security: Vulnerability Analysis

Lecture 9 Assertions and Error Handling CS240

Sendmail crackaddr - Static Analysis strikes back

CS 161 Computer Security

Verification & Validation of Open Source

Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues

A Context-Sensitive Memory Model for Verification of C/C++ Programs

18-642: Code Style for Compilers

Shuntaint: Emulation-based Security Testing for Formal Verification

Lecture 3 Notes Arrays

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

Brave New 64-Bit World. An MWR InfoSecurity Whitepaper. 2 nd June Page 1 of 12 MWR InfoSecurity Brave New 64-Bit World

18-642: Code Style for Compilers

The Apron Library. Bertrand Jeannet and Antoine Miné. CAV 09 conference 02/07/2009 INRIA, CNRS/ENS

Structuring an Abstract Interpreter through Value and State Abstractions: EVA, an Evolved Value Analysis for Frama C

Static Code Analysis - CERT C Secure Code Checking

Pierce Ch. 3, 8, 11, 15. Type Systems

Lecture 1 Contracts. 1 A Mysterious Program : Principles of Imperative Computation (Spring 2018) Frank Pfenning

Checking Program Properties with ESC/Java

Dynamic Allocation in C

Program Static Analysis. Overview

SafeRiver SME. Added Value Solutions for Embedded Systems. Tools for FuSa and Software Security. Packaged Services. CIR agreed

Simulink 모델과 C/C++ 코드에대한매스웍스의정형검증툴소개 The MathWorks, Inc. 1

Lecture 8 Data Structures

Overview AEG Conclusion CS 6V Automatic Exploit Generation (AEG) Matthew Stephen. Department of Computer Science University of Texas at Dallas

Pointers (continued), arrays and strings

A Gentle Introduction to Program Analysis

Hoare Logic: Proving Programs Correct

Lecture 1 Contracts : Principles of Imperative Computation (Fall 2018) Frank Pfenning

Outline. Classic races: files in /tmp. Race conditions. TOCTTOU example. TOCTTOU gaps. Vulnerabilities in OS interaction

Hacking in C. Pointers. Radboud University, Nijmegen, The Netherlands. Spring 2019

CS527 Software Security

Program Verification (6EC version only)

Pointers (continued), arrays and strings

Hoare logic. A proof system for separation logic. Introduction. Separation logic

Static Program Analysis Part 1 the TIP language

Program Verification. Aarti Gupta

CS 261 Fall C Introduction. Variables, Memory Model, Pointers, and Debugging. Mike Lam, Professor

Static Analysis in C/C++ code with Polyspace

Simple Overflow. #include <stdio.h> int main(void){ unsigned int num = 0xffffffff;

Safety Checks and Semantic Understanding via Program Analysis Techniques

CS 430 Spring Mike Lam, Professor. Data Types and Type Checking

Static Analysis in Practice

Algebraic Program Analysis

Homework #3 CS2255 Fall 2012

Lab 03 - x86-64: atoi

Hybrid Verification in SPARK 2014: Combining Formal Methods with Testing

Lecture 2: C Programm

Declaring Pointers. Declaration of pointers <type> *variable <type> *variable = initial-value Examples:

Software security, secure programming

Static Analysis by A. I. of Embedded Critical Software

C: Pointers. C: Pointers. Department of Computer Science College of Engineering Boise State University. September 11, /21

Adam Chlipala University of California, Berkeley ICFP 2006

The Rule of Constancy(Derived Frame Rule)

A brief introduction to C programming for Java programmers

CCured. One-Slide Summary. Lecture Outline. Type-Safe Retrofitting of C Programs

Technical presentation

Lecture Notes on Arrays

Memory and Addresses. Pointers in C. Memory is just a sequence of byte-sized storage devices.

Pointers and Arrays CS 201. This slide set covers pointers and arrays in C++. You should read Chapter 8 from your Deitel & Deitel book.

Contents of Lecture 3

Proof Carrying Code(PCC)

Dynamic Allocation in C

C Introduction. Comparison w/ Java, Memory Model, and Pointers

Code Contracts. Pavel Parízek. CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics

Advanced Programming Methods. Introduction in program analysis

Buffer overflow prevention, and other attacks

Gaps in Static Analysis Tool Capabilities. Providing World-Class Services for World-Class Competitiveness

Static program checking and verification

School of Computer & Communication Sciences École Polytechnique Fédérale de Lausanne

In Java we have the keyword null, which is the value of an uninitialized reference type

CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C

Statically Detecting Likely Buffer Overflow Vulnerabilities

Lecture Notes on Linear Search

Outline. Introduction. 2 Proof of Correctness. 3 Final Notes. Precondition P 1 : Inputs include

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Important From Last Time

Synopsys Static Analysis Support for SEI CERT C Coding Standard

COS 320. Compiling Techniques

Learning from Executions

AstréeA From Research To Industry

Important From Last Time

Principles of Software Construction: Objects, Design, and Concurrency (Part 2: Designing (Sub )Systems)

C and C++ Secure Coding 4-day course. Syllabus

Limitations of the stack

An Operational and Axiomatic Semantics for Non-determinism and Sequence Points in C

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

Kurt Schmidt. October 30, 2018

Verification and Test with Model-Based Design

Computer Security. Robust and secure programming in C. Marius Minea. 12 October 2017

Lecture Notes on Contracts

Static Analysis: Overview, Syntactic Analysis and Abstract Interpretation TDDC90: Software Security

Software Development. Modular Design and Algorithm Analysis

Lectures 20, 21: Axiomatic Semantics

Undefined Behaviour in C

Modular and Verified Automatic Program Repairs

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Lectures 5-6: Introduction to C

Transcription:

C Source Code Analysis for Memory Safety Sound Static Analysis for Security Workshop NIST, June 26-27 Henny Sipma Kestrel Technology

Kestrel Technology Founded: Location: Core activity: Languages supported: Underlying technology: 2000 Palo Alto, CA Sound Static Analysis of Software C source, Java bytecode, x86 executables Abstract interpretation (Cousot & Cousot, 1977) Properties: C: Memory safety analysis Java: x86: Information flow analysis, complexity analysis Memory safety, information extraction, malware analysis, reverse engineering

Kestrel Technology CodeHawk Tool Suite Java byte code front end.class.jar.war sound abstraction from Java byte code into CHIF.c.exe CIL C source code front end sound abstraction from preprocessed CIL code into CHIF x86 binary front end disassembly abstraction from x86 binary code into CHIF abstract interpretation engine Iterators Abstract domains: constants intervals strided intervals linear equalities polyhedra symbolic sets value sets taint

CodeHawk C Analyzer.c CIL C source code front end sound abstraction from preprocessed CIL code into CHIF abstract interpretation engine Iterators Abstract domains: constants intervals strided intervals linear equalities polyhedra symbolic sets value sets taint

Sound Static Memory Safety Analysis for C Goal: Mathematically prove absence of memory safety vulnerabilities (covering more than 50 CWEs) for real-world applications Approach: Specification: C Standard specification of undefined behavior Translate into preconditions on instructions and library functions Prove that all preconditions are valid Advantages: If successful: full assurance of memory safety Exhaustive: no false negatives Evidence: results can be independently audited Metrics: progress and success

Sound Static Memory Safety Analysis for C Challenges Not automatic May involve significant effort Approach: ü Specification: C Standard specification of undefined behavior ü Translate into preconditions on instructions and library functions Ø Prove that all preconditions are valid Analysis or Open Primary proof obligations (ppo s) Closed ppo s

Test Applications application LOC Cairo-1.14.12 227,818 Cleanflight-CLFL-v2.3.2 118,758 Dnsmasq-2.76 29,922 Dovecot-2.0.beta6 (SATE 2010) 208,636 File 14,379 Git-2.17.0 205,636 Hping 11,336 Irssi-0.8.14 (SATE 2009) 61,972 Lighttpd-1.4.18 (SATE 2008) 49,747 Nagios-2.10 (SATE 2008) 47,652 Naim-0.11.8.3.1 (SATE 2008) 25,759 Nginx-1.14.0 103,388 Nginx-1.2.9 102,151 Openssl-1.0.1.f 275,060 Pvm3.4.6 (SATE 2009) 60,029 Wpa_supplicant-2.6 96,554 Total 1,638,797

Creating Primary Proof Obligations ppo s CodeHawk C Analyzer File/function semantics CIL Gcc - preprocessor bear.c Makefile

Primary Proof Obligations: How Many? 5,545,304

Primary Proof Obligations: How Many? Ppo s per lines of code 5,545,304 4.1 4.1 2.8 2.8 5.3 1.2 3.7 4.3 4.1 3.7 6.5 3.3 2.3 2.9 3.2 3.3

Primary Proof Obligations: What are they? First-order atomic predicates: allocation-base(p) cast(x,t1,t2) common-base(p1,p2) common-base-type(p1,p2) format-string(p) global-memory(p) index-lower-bound(a) index-upper-bound(a,s) initialized(v) initialized-range(p,s) int-overflow(op,a,b,t) int-underflow(op,a,b,t) lower-bound(p) no-overlap(p1,p2) non-negative(a) not-null(p) not-zero(a) null(p) null-terminated(p) pointer-cast(p,t1,t2) ptr-lower-bound(op,p,a) ptr-upper-bound(op,p,a) ptr-upper-bound-deref(op,p,a) signed-to-unsigned-cast(a,t1,t2) unsigned-to-signed-cast(a,t1,t2) upper-bound(p) valid-memory(p) value-constraint(x) width-overflow(a)

Primary Proof Obligations: Analysis Simple Things First A. Check validity based on individual statement and declarations int a[10];... a[3] = 0; index-lower-bound(3) index-upper-bound(3,10) null-terminated( string ) not-null( string ) strcpy(dst, string ) lower-bound( string ) upper-bound( string ) valid-memory( string )

Primary Proof Obligations Discharge ppo s at the statement level 3,389,371 2,155,365

Primary Proof Obligations Discharge ppo s at the statement level (as a percent of total) 3,389,371 2,155,365

Primary Proof Obligations: Analysis Generating Invariants B. Check validity based on invariants generated int a[10];... for (int i=0; i < 10; i++) { a[i] = 0; } index-lower-bound(i) index-upper-bound(i) i = [ 0.. 9 ] 1. int x;... 10. x =...... 20. x = x + 1;... initialized (x)... proof obligations x:initialized@10 invariant

Analysis: Generating Local Invariants (Context-insensitive) Abstract Interpretation (Cousot,Cousot, 1977) Domains: Intervals (Cousot, Cousot) Linear Equalities (Karr, 1976) Value Sets (Reps, 2004) Symbolic Sets Parametric Ranges Flow-sensitive, Path-insensitive abstract interpretation engine Iterators Abstract domains: constants intervals strided intervals linear equalities polyhedra symbolic sets value sets taint

Analysis: Generating Local Invariants ppo s CodeHawk C Analyzer File/function semantics CIL Gcc - preprocessor bear.c Makefile

Analysis: Generating Local Invariants invariants ppo s File/function semantics CodeHawk C Analyzer

Primary Proof Obligations Discharge ppo s using local function invariants 1,568,639 3,976,097

Primary Proof Obligations Discharge ppo s using local function invariants 1,568,639 3,976,097

Analysis: Delegating Proof Obligations (Context sensitivity) C. Lift responsibility to api int f (int *p) { int *q; int x;... q = p; x = *q + 5; 1. not-null(q) 2. valid-memory(q) 3. lower-bound(q) 4. upper-bound(q) 5. initialized(*q) 6. Int-overflow(*q+5) proof obligations p = q invariant 1. not-null(p) 2. valid-memory(p) 3. lower-bound(p) 4. upper-bound(p) 5. initialized(*p) 6. Int-overflow(*p + 5) API requirements on f

Analysis: Delegating Proof Obligations Impose Preconditions on Callers

Analysis: Delegating Proof Obligations Create Supporting Proof Obligations python API spo s api requirements invariants ppo s linker File/function semantics CodeHawk C Analyzer

Analysis: Delegating Proof Obligations Impose Postconditions on Callers? Ø Global variables? Ø Heap-allocated data structures?

Analysis: Delegating Proof Obligations File/Function Contracts python API spo s ppo s api requirements invariants linker File/function semantics CodeHawk C Analyzer File/Function Contracts

Analysis: Delegating Proof Obligations File/Function Contracts python API spo s ppo s api requirements invariants linker File/function semantics CodeHawk C Analyzer File/Function Contracts

Primary Proof Obligations Discharge ppo s using context sensitivity 1,097,471 4,191,788

Primary Proof Obligations Discharge ppo s using context sensitivity 1,097,471 4,191,788

Primary + Supporting Proof Obligations 1,097,471 415,211 4,191,788 + + 378,013 = = 1,512,682 4,569,801

Primary + Supporting Proof Obligations 1,097,471 415,211 4,191,788 + + 378,013 = = 1,512,682 4,569,801

Bugs, False Positives? Our perspective: Anything that cannot be proven safe needs work: Ø Additional user input (in the form of contract conditions), and/or Ø Additional analysis capabilities, and/or Ø Modifications to the program A proof obligation is marked violated (and closed) if the reason it cannot be proven safe is known, and no additional information can make it safe Violations can indicate A bug, (primary or supporting proof obligation), or A contract condition that is too strong (supporting proof obligation), or A potential violation outside the analysis realm

Violations Mostly not very likely or not very interesting A proof obligation is violated for all behaviors (universal) An existential condition is identified that violates a proof obligation Use of return value from malloc, calloc, realloc without null check Use of return value from fopen, getenv, etc., without null check Cast -1 to unsigned integer Unchecked user input values Volatile values, random values An existential condition outside the realm of reasoning is identified that may violate a proof obligation unchecked return value from strchr, strrchr, strtol, strtoll, etc. But, 3 potentially serious memory vulnerabilities found in one of the test applications

Comparison between applications and Juliet Test Suite ~ 150 test cases

Collaborative Analysis of Open Source Software Tools and Communities python API spo s ppo s api requirement s invariants linker File/function semantics CodeHawk C Analyzer File/Function Contracts

Conclusions Our goal was: Analysis or Open Primary proof obligations (ppo s) Closed ppo s

Conclusions Our goal was: Analysis or Open Primary proof obligations (ppo s) Where we are Closed ppo s Supporting proof obligations Analysis Primary proof obligations

Conclusions Where we are Community effort Analysis other tools Full semantics of application in accessible form Exhaustive set of proof obligations + evidence Function api conditions Invariants generated Programmable api in python enables Specialized analyses (academic tools) Incremental analysis Every result is subject to verification Modular analysis (function/file level) Clear measure of success

Conclusions Where we are Analysis other tools Full semantics of application in accessible form Exhaustive set of proof obligations +evidence Function api conditions Invariants generated Programmable api in python Most analysis results can be reused across versions; Assumptions can be rechecked enables Specialized analyses (academic tools) Incremental analysis Every result is subject to verification Modular analysis (function/file level) Clear measure of success

Conclusions: What s next? Extend with other properties, specified by state machines Extend expressiveness of contract specifications Continuous improvement of the analyzer, increase automation, C++ Make C Analyzer available on the SWAMP.. and eventually (wishful thinking) For every (many) important open-source C applications: Create an open-source community-owned exhaustive set of proof obligations with (partial) analysis results, full set of assumptions (represented as api requirements and contract conditions) that evolves with new versions created and (more wishful thinking) Make sound static analysis an integral part of the opensource software development process

Conclusions: What s next? Currently available on private GitHub repository If you want to contribute contact us: sipma@kestreltechnology.com THANK YOU!

Property Specification Property: Absence of memory vulnerabilities C Standard lists 37 memory-related conditions that lead to undefined behavior Property: Absence of memory-related undefined behavior Primary Proof Obligations on language constructs on standard library functions

Proof by Structural Induction Prove, by structural induction on the program, that every state of every computation is well-defined Initially: the starting state of every computation is well-defined Inductive step: For every operation in the program: Assume the inductive hypothesis: the starting state of the operation is well-defined Prove: the resulting state after the operation is well-defined, according the C semantics Conclude: Ø every state of every computation is well-defined Ø absence of memory access violations

Proof by Structural Induction Prove, by structural induction on the program, that every state of every computation is well-defined Initially: the starting state of every computation is well-defined Inductive step: For every operation in the program: Assume the inductive hypothesis: the starting state of the operation is well-defined Prove: the resulting state after the operation is well-defined, according the C semantics Problem: Inductive hypothesis is not strong enough to prove the inductive step

Proof by Structural Induction Solution: Use abstract interpretation to generate invariants to strengthen the inductive hypothesis Inductive step: For every operation in the program: Assume: the inductive hypothesis (state is well-defined), and invariants generated for the starting state of the operation Prove: the resulting state after the operation is well-defined, according to the C semantics

Proof by Structural Induction Solution: Use abstract interpretation to generate invariants to strengthen the inductive hypothesis Domains: Intervals (Cousot & Halbwachs) Linear Equalities (Karr) Symbolic Sets Value sets (Balakrishnan, Reps)

Proof by Structural Induction Prove, by structural induction on the program, that every state of every computation is well-defined Approach is Ø Sound: if all proof obligations can be proven valid, no memory access violations are possible Ø Complete: if no memory access violations are possible then an inductive invariant exists to prove it but (since undecidable) not complete for demonstrating the existence of counter examples

CWE s covered 118 Improper access of indexed resource (range error) 119 improper restriction of operations within the bound 120 Buffer copy without checking size of input (classic buffer overflow) 121 Stack-based buffer overflow 122 Heap-based buffer overflow 123 Write-what-where condition 124 Buffer underwrite 125 Out-of-bounds read 126 Buffer over-read 127 Buffer under-read 128 Wrap-around error 129 Improper validation of array index 130 Improper handling of length parameter inconsistency 131 Incorrect calculation of buffer size 135 Incorrect calculation of multi-byte string length 170 Improper null termination

CWE s covered 190 Integer Overflow or wrap-around 191 Integer Underflow or wrap-around 193 Off-by-one error 195 Signed to unsigned conversion error 196 Unsigned to signed conversion error 242 Use of inherently dangerous function (as related to memory safety) 415 Double free 416 Use after free 456 Missing initialization of variable 466 Return of pointer value outside of expected range 467 Use of sizeof() on pointer type 469 Use of pointer subtraction to determine size 476 Null pointer dereference 588 Attempt to access child of non-structure pointer 590 Free of memory not on the heap 785 Use of path manipulation function without maximum-sized buffer

CWE s covered 786 Access of memory location before start of buffer 787 Out-of-bounds write 788 Access of memory location after start of buffer 805 Buffer access with incorrect length value 822 Untrusted pointer dereference 823 Use of out-of-range pointer offset 824 Use of uninitialized pointer 825 Expired pointer dereference 839 Numeric range comparison check without maximum check 843 Access of reource using incompatible type (type confusion) 369 Divide by zero 134 Uncontrolled format string 197 Numeric truncation

Abstracting C into CHIF Some constructs are representable precisely: int x;... x = x + 1; x := x + 1 Some constructs are not (yet) supported: int x, y;... x = y; abstract(x)

Abstracting C into CHIF CHIF is a register language: no aliasing, no pointers int x;... f(&x); abstract(x) int x; int *p;... p = &x; *p = *p + 1; remove x from analysis

Abstracting C into CHIF CHIF analysis is intra-procedural use assume-guarantee reasoning for interprocedural relationships int f(...) {... if (...) {... return 0; } }... return 1; int x;... x = f(...); x in [ 0..1 ] Many more constructs in C that require specialized abstraction

Test Applications application LOC PPO s Cairo-1.14.12 227,818 628,808 Cleanflight-CLFL-v2.3.2 118,758 143,015 Dnsmasq-2.76 29,922 110,743 Dovecot-2.0.beta6 (SATE 2010) 208,636 856,210 File 14,379 46,209 Git-2.17.0 205,636 851,087 Hping 11,336 37,079 Irssi-0.8.14 (SATE 2009) 61,972 265,345 Lighttpd-1.4.18 (SATE 2008) 49,747 202,157 Nagios-2.10 (SATE 2008) 47,652 173,868 Naim-0.11.8.3.1 (SATE 2008) 25,759 167,533 Nginx-1.14.0 103,388 343,759 Nginx-1.2.9 102,151 542,697 Openssl-1.0.1.f 275,060 762,621 Pvm3.4.6 (SATE 2009) 60,029 136,320 Wpa_supplicant-2.6 96,554 277,853 Total 1,638,797 5,545,304