CMPSC 497: Static Analysis

Similar documents
Static Analysis. Systems and Internet Infrastructure Security

Module: Operating System Security. Professor Trent Jaeger. CSE543 - Introduction to Computer and Network Security

CMPSC 497: Static Analysis

Module: Future of Secure Programming

Module: Future of Secure Programming

Proceedings of the 11 th USENIX Security Symposium

CMPSC 497 Other Memory Vulnerabilities

Advanced Systems Security: Symbolic Execution

Advanced Systems Security: Control-Flow Integrity

5) Attacker causes damage Different to gaining control. For example, the attacker might quit after gaining control.

Static Analysis Basics II

One-Slide Summary. Lecture Outline. Language Security

Seminar in Software Engineering Presented by Dima Pavlov, November 2010

Security Analyses For The Lazy Superhero

Pointers and References

finding vulnerabilities

Automatic Software Verification

Static Vulnerability Analysis

Software Security: Vulnerability Analysis

Language Security. Lecture 40

Arrays and Pointers in C. Alan L. Cox

CMPSC 497 Execution Integrity

A Gentle Introduction to Program Analysis

Advanced Systems Security: Ordinary Operating Systems

Memory Corruption Vulnerabilities, Part II

Compiler Structure. Data Flow Analysis. Control-Flow Graph. Available Expressions. Data Flow Facts

CSE 501 Midterm Exam: Sketch of Some Plausible Solutions Winter 1997

Tracking Pointers with Path and Context Sensitivity for Bug Detection in C Programs. {livshits,

Advanced Systems Security: Multics

Topics in Systems and Program Security

Interprocedural Analysis. Motivation. Interprocedural Analysis. Function Calls and Pointers

Program Static Analysis. Overview

A brief introduction to C programming for Java programmers

CMPSC 497 Buffer Overflow Vulnerabilities

Secure Software Development: Theory and Practice

CS 0449 Sample Midterm

Control Flow Hijacking Attacks. Prof. Dr. Michael Backes

CYSE 411/AIT681 Secure Software Engineering Topic #10. Secure Coding: Integer Security

2/9/18. Readings. CYSE 411/AIT681 Secure Software Engineering. Introductory Example. Secure Coding. Vulnerability. Introductory Example.

2/9/18. CYSE 411/AIT681 Secure Software Engineering. Readings. Secure Coding. This lecture: String management Pointer Subterfuge

STING: Finding Name Resolution Vulnerabilities in Programs

UNDEFINED BEHAVIOR IS AWESOME

Computer Organization & Systems Exam I Example Questions

Static Analysis methods and tools An industrial study. Pär Emanuelsson Ericsson AB and LiU Prof Ulf Nilsson LiU

Advanced C Programming

Harvard School of Engineering and Applied Sciences CS 152: Programming Languages

CSCE 548 Building Secure Software Integers & Integer-related Attacks & Format String Attacks. Professor Lisa Luo Spring 2018

Using static analysis to detect use-after-free on binary code

Finding User/Kernel Pointer Bugs with Type Inference p.1

Scaling CQUAL to millions of lines of code and millions of users p.1

CYSE 411/AIT681 Secure Software Engineering Topic #12. Secure Coding: Formatted Output

2/9/18. CYSE 411/AIT681 Secure Software Engineering. Readings. Secure Coding. This lecture: String management Pointer Subterfuge

Simple Overflow. #include <stdio.h> int main(void){ unsigned int num = 0xffffffff;

Pointers. Pointers. Pointers (cont) CS 217

Statically Detecting Likely Buffer Overflow Vulnerabilities

CS 61C: Great Ideas in Computer Architecture. C Arrays, Strings, More Pointers

Homework 3 CS161 Computer Security, Fall 2008 Assigned 10/07/08 Due 10/13/08

CPSC 213. Introduction to Computer Systems. Procedures and the Stack. Unit 1e Feb 11, 13, 15, and 25. Winter Session 2018, Term 2

Functions in C C Programming and Software Tools. N.C. State Department of Computer Science

Advanced Systems Security: Principles

Calvin Lin The University of Texas at Austin

CS711 Advanced Programming Languages Pointer Analysis Overview and Flow-Sensitive Analysis

Agenda. Peer Instruction Question 1. Peer Instruction Answer 1. Peer Instruction Question 2 6/22/2011

Sendmail crackaddr - Static Analysis strikes back

Advanced Systems Security: Principles

Module: Return-oriented Programming. Professor Trent Jaeger. CSE543 - Introduction to Computer and Network Security

Static Program Analysis Part 9 pointer analysis. Anders Møller & Michael I. Schwartzbach Computer Science, Aarhus University

Advanced Programming Methods. Introduction in program analysis

EURECOM 6/2/2012 SYSTEM SECURITY Σ

Arrays and Memory Management

Advanced Systems Security: Ordinary Operating Systems

QUIZ. What is wrong with this code that uses default arguments?

Flow Analysis. Data-flow analysis, Control-flow analysis, Abstract interpretation, AAM

Practical Malware Analysis

CSC 438 Systems and Software Security, Spring 2014 Instructor: Dr. Natarajan Meghanathan Question Bank for Module 6: Software Security Attacks

CS 61C: Great Ideas in Computer Architecture. Lecture 3: Pointers. Bernhard Boser & Randy Katz

Static Analysis and Bugfinding

DART: Directed Automated Random Testing

Bounded Model Checking Of C Programs: CBMC Tool Overview

Programming Languages and Compilers Qualifying Examination. Answer 4 of 6 questions.

4/24/18. Overview. Program Static Analysis. Has anyone done static analysis? What is static analysis? Why static analysis?

just a ((somewhat) safer) dialect.

Agenda. Components of a Computer. Computer Memory Type Name Addr Value. Pointer Type. Pointers. CS 61C: Great Ideas in Computer Architecture

Profile-Guided Program Simplification for Effective Testing and Analysis

CS 61C: Great Ideas in Computer Architecture. Lecture 3: Pointers. Krste Asanović & Randy Katz

Testing, code coverage and static analysis. COSC345 Software Engineering

Review Topics. Midterm Exam Review Slides

Algorithm Design and Analysis. What is an Algorithm? Do Algorithms Matter?

Page 1. Today. Last Time. Is the assembly code right? Is the assembly code right? Which compiler is right? Compiler requirements CPP Volatile

Extensible Lightweight Static Checking

Conditioned Slicing. David Jong-hoon An. May 23, Abstract

Binghamton University. CS-211 Fall Variable Scope

Automatic Source Audit with GCC. CoolQ Redpig Fedora

Compiler Optimization and Code Generation

CS 161 Computer Security

Static Analysis and Dataflow Analysis

Processor. Lecture #2 Number Rep & Intro to C classic components of all computers Control Datapath Memory Input Output

Binghamton University. CS-211 Fall Variable Scope

HW1 due Monday by 9:30am Assignment online, submission details to come

Important From Last Time

Transcription:

CMPSC 497: Static Analysis Trent Jaeger Systems and Internet Infrastructure Security (SIIS) Lab Computer Science and Engineering Department Pennsylvania State University Page 1

Our Goal In this course, we want to develop techniques to detect vulnerabilities before they are exploited automatically What s a vulnerability? How to find them? Page 2

Dynamic Analysis Limits Major advantage When we produce a crash, it is a real crash Issue However, may not be exploitable (i.e., not a vulnerability) On the other hand, want to fix memory errors But often not assertion failures Major limitation We cannot find all vulnerabilities in a program Why not? Page 3

Goal Can we build a technique that identifies *all* vulnerabilities? Page 4

Goal Can we build a technique that identifies *all* vulnerabilities? Turns out that we can: static analysis But, it has its own major limitation Can identify many false positives (not actual vulnerabilitiies) Can be effective when used carefully Page 5

Static Analysis Explore all possible executions of a program All possible inputs All possible states Page 6

A Form of Testing Static analysis is an alternative to dynamic testing Dynamic Select concrete inputs Obtain a sequence of states given those inputs Apply many concrete inputs (i.e., run many tests) Static Select abstract inputs with common properties Obtain sets of states created by executing abstract inputs One run Page 7

Static Analysis Provides an approximation of behavior Run in the aggregate Rather than executing on ordinary states Finite-sized descriptors representing a collection of states Run in non-standard way Run in fragments Stitch them together to cover all paths Runtime testing is inherently incomplete, but static analysis can cover all paths Page 8

Static Analysis Various properties of programs can be tracked Control flow Constants Data flow Types Values Which ones will expose which vulnerabilities accurately (not too many false positives) requires some finesse Page 10

No Return Check Can we detect code with no return check? From original Miller fuzzing paper format.c (line 276): while (lastc!= \n ) { rdc(); } input.c (line 27): rdc() { do { readchar(); } while (lastc == lastc == \t ); return (lastc); } Page 11

Control Flow Analysis Compute the control flow of a program I.e., possible execution paths To find an execution path that does not check the return value of a function That is actually run by the program How do we do this? Page 12

Intraprocedural CFG Statements Nodes One successor and one predecessor Basic Blocks Multiple successors (multiple predecessors) Unique Enter and Exit All start nodes are successors of enter All return nodes are predecessors of exit Page 13

Control Flow Analysis Compute Control Flow Function by function intraprocedural Program statements of interest Sequences basic blocks Conditionals transitions between basic blocks in function Loops transitions that connect to prior basic blocks Calls transition to another function Return transition that completes the function Page 14

Control Flow Analysis Compute Intraprocedural Control Flow Page 15

No Return Check Can we detect code with no return check? Detect function calls in CFG reachable that have no return assignment format.c (line 276): while (lastc!= \n ) { rdc(); } input.c (line 27): rdc() { do { readchar(); } while (lastc == lastc == \t ); return (lastc); } Page 16

Double Free Can we detect double frees? CFG shows a flow from free(buf2r1) to free(buf2r1) main(int argc, char **argv) { buf1r1 = (char *) malloc(bufsize2); buf2r1 = (char *) malloc(bufsize2); free(buf1r1); free(buf2r1); buf1r2 = (char *) malloc(bufsize1); strncpy(buf1r2, argv[1], BUFSIZE1-1); free(buf2r1); free(buf1r2); } Page 17

Double Free Can we detect double frees? CFG shows a flow from free(buf2r1) to free(buf2r1) More complex if Free occurs in a different function Interprocedural CFG Free is performed on a different variable Track assignments and aliases Page 18

Partial Evaluation Can we detect double frees? One advantage of static analysis is we can start our analysis from anywhere just analyze this function foo(int x, char **y) // need not be main { buf1r1 = (char *) malloc(bufsize2); buf2r1 = (char *) malloc(bufsize2); free(buf1r1); free(buf2r1); buf1r2 = (char *) malloc(bufsize1); strncpy(buf1r2, argv[1], BUFSIZE1-1); free(buf2r1); free(buf1r2); } Page 19

Interprocedural CFG The basics are easy Call connect to CFG of callee function Return create an edge back to the caller function Challenges May not know targets of CFG edges Function pointers, indirect jumps, and returns without call The same function may be called from multiple places Printk is called 18,000 times in FreeBSD copy it? Exponential growth caused by branching Page 20

Constant Propagation Substitute the values of known constants in expressions Propagate the values among variables assigned those constants Example assignments resulting from propagation to detect problems Page 21

Detect Buffer Overflow What are the constant values below? 1 char text[] = "Foo Bar ; 2 char buffer1[4], buffer2[4]; 3 4 int i, n = sizeof(text); 5 for(i=0;i<n;++i) 6 buffer2[i] = text[i]; 7 printf("last char of text is: %c", text[n]); Page 22

Detect Buffer Overflow Where can they be propagated? 1 char text[] = "Foo Bar ; 2 char buffer1[4], buffer2[4]; 3 4 int i, n = sizeof(text); 5 for(i=0;i<n;++i) 6 buffer2[i] = text[i]; 7 printf("last char of text is: %c", text[n]); Page 23

Detect Buffer Overflow Where are the memory errors? 1 char text[] = "Foo Bar ; 2 char buffer1[4], buffer2[4]; 3 4 int i, n = 20; 5 for(i=0;i<20;++i) 6 buffer2[i] = text[i]; 7 printf("last char of text is: %c", text[20]); Page 24

Detect Buffer Overflow Where are the memory errors? 1 char text[] = "Foo Bar ; 2 char buffer1[4], buffer2[4]; 3 4 int i, n = 20; 5 for(i=0;i<20;++i) 6 buffer2[i] = text[i]; 7 printf("last char of text is: %c", text[20]); Page 25

Constant Propagation Typically, constant propagation is a start, but need more to detect an error For the buffer overflow we need to know that access to buffer2[4-19] and text[20] are memory errors Page 26

Type-based Analysis Maybe we want to check for certain properties about variables in our program Can use type information associated with variables to perform such checks Page 27

Type-based Analysis Maybe we want to check for certain properties about variables in our program Suppose we want to know if a variable s value has been checked such as for input validation We can use type-based analysis to do that Page 28

Type-based Analysis Maybe we want to check for certain properties about variables in our program Suppose we want to know if a variable s value has been checked such as for input validation We can use type-based analysis to do that Controlled Object Security Check Controlled Object Controlled Object Controlled Operation Figure 2: The complete mediation problem. Page 29

Type-based Analysis Maybe we want to check for certain properties about variables in our program Suppose we want to know if a variable s value has been checked such as for input validation Using type qualifiers, can extend basic types void func_a(struct file * $checked filp); U U U U void func_b( void ) { struct file * $unchecked filp; func_a(filp); } C: $checked U: $unchecked C Security Check C <- U C C Page 30

Type-based Analysis Maybe we want to check for certain properties about variables in our program Suppose we want to know if a variable s value has been checked such as for input validation To find missing mediation (e.g., input validation) Initialize untrusted inputs to unchecked Initialize security-sensitive operation to use checked Identify mediation (create checked version) Detect type error from unchecked to checked Page 31

Type-based Analysis Vulnerability in the code to the right Can you see it? /* from fs/fcntl.c */ long sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg) { struct file * filp; filp = fget(fd); err = security ops->file ops ->fcntl(filp, cmd, arg); err = do fcntl(fd, cmd, arg, filp); } static long do_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg, struct file * filp) { switch(cmd){ case F_SETLK: } } err = fcntl setlk(fd, ); /* from fs/locks.c */ fcntl_getlk(fd, ) { struct file * filp; filp = fget(fd); /* operate on filp */ } Page 32

Type-based Analysis Vulnerability in the code to the right fd is unchecked as is filp initially in sys_fnctl However, filp would be reassigned to a checked variable after security_op So what s the problem? /* from fs/fcntl.c */ long sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg) { struct file * filp; filp = fget(fd); err = security ops->file ops ->fcntl(filp, cmd, arg); err = do fcntl(fd, cmd, arg, filp); } static long do_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg, struct file * filp) { switch(cmd){ case F_SETLK: } } err = fcntl setlk(fd, ); /* from fs/locks.c */ fcntl_getlk(fd, ) { struct file * filp; filp = fget(fd); /* operate on filp */ } Page 33

Type-based Analysis Vulnerability in the code to the right fd is unchecked as is filp initially in sys_fnctl However, filp would be reassigned to a checked variable after security_op fd, not the checked filp is passed to do_fcntl and then to fcntl_getlk /* from fs/fcntl.c */ long sys_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg) { struct file * filp; filp = fget(fd); err = security ops->file ops ->fcntl(filp, cmd, arg); err = do fcntl(fd, cmd, arg, filp); } static long do_fcntl(unsigned int fd, unsigned int cmd, unsigned long arg, struct file * filp) { switch(cmd){ case F_SETLK: } } err = fcntl setlk(fd, ); /* from fs/locks.c */ fcntl_getlk(fd, ) { struct file * filp; filp = fget(fd); /* operate on filp */ } Page 34

Data Flow Track dependency among variable values Given a statement y = a * b a data flow dependence is created from a and b to y Data flow can be used for a lot of purposes For security, an important property to compute using data flow is taint Is a public statement/variable tainted by secret data? Is a high integrity statement/variable tainted by low integrity inputs? Page 35

Data Flow Taints from text input 1 char text[] = "Foo Bar ; 2 char buffer1[4], buffer2[4]; 3 4 int i, n = 20; 5 for(i=0;i<20;++i) 6 buffer2[i] = text[i]; 7 printf("last char of text is: %c", text[20]); Page 36

Data Flow Tells us that buffer2 may be affected by untrusted input in text as well as printf output 1 char text[] = "Foo Bar ; 2 char buffer1[4], buffer2[4]; 3 4 int i, n = 20; 5 for(i=0;i<20;++i) 6 buffer2[i] = text[i]; 7 printf("last char of text is: %c", text[20]); Page 37

Data Flow Statement Create node for output variable Connect inputs to output Sequence (Basic Block) Process data flows for each statement Inter-block Connect data flows from any predecessor Call and return Connect data flows from input parameters and to outputs Page 38

Data Flow This simple view of data flow mainly tells use about reachability More detailed analyses later Page 39

Take Away Static analysis evaluates all the ways that a program may execute in one pass Can be sound (no false negatives) But, often produces false positives Examined some building blocks of static analysis and how they could be used Constant propagation, control flow, type analysis, data flow We ll examine static analysis for more complex programs next time try to track values Page 40

Abstract Interpretation Descriptors represent the sign of a value Positive, negative, zero, unknown For an expression, c = a * b If a has a descriptor pos And b has a descriptor neg What is the descriptor for c after that instruction? How might this help? Page 41

Abstract Interpretation Descriptors represent the sign of a value Positive, negative, zero, unknown For an expression, c = a * b If a has a descriptor pos And b has a descriptor neg What is the descriptor for c after that instruction? How might this help? Page 42

Descriptors Choose a set of descriptors that Abstracts away details to make analysis tractable Preserves enough information that key properties hold Can determine interesting results Using sign as a descriptor Abstracts away specific integer values (billions to four) Guarantees when a*b = 0 it will be zero in all executions Choosing descriptors is one key step in static analysis Page 43

Precision Abstraction loses some precision Enables run in aggregate, but may result in executions that are not possible in the program (a <= b) when both are pos If b is equal to a at that point, then false branch is never possible in concrete executions Results in false positives Page 44

Soundness The use of descriptors over-approximates a program s possible executions Abstraction must include all possible legal values May include some values that are not actually possible The run-in-aggregate must preserve such abstractions Thus, must propagate values that are not really possible Page 45

Implications of Soundness Enables proof that a class of vulnerabilities are completely absent No false negatives in a sound analysis Comes at a price Ensuring soundness can be complex, expensive, cautious Thus, unsound analyses have gained in popularity Find bugs quickly and simply Such analyses have both false positives and false negatives Page 46

What Is Static Analysis? Abstract Interpretation Execute the system on a simpler data domain Descriptors of the abstract domain Rather than the concrete domain Elements in an abstract domain represent sets of concrete states Execution mimics all concrete states at once Abstract domain provides an over-approximation of the concrete domain Page 47

Abstract Domain Example Use interval as abstract domain b = [40, 41] a = 2*b a = [x, y]? What are the possible concrete values represented? Which concrete states are possible? Page 48

Joins A join combines states from multiple paths Approximates set-union as either path is possible Use Interval as abstract domain a = [36, 39], b = [40, 41] If (a >= 38) a=2*b; /* join */ a = [x, y], b=[40, 41] what are x and y? What s the impact of over-approximation? Page 49

Impact of Abstract Domain The choice of abstract domain must preserve the over-approximation to be sound (no false negatives) Integer arithmetic vs 2 s-complement arithmetic a = [253, 254], b = [10, 12] What is c = a+b in an 32-bit machine? What is c = a+b in an 8-bit machine? Page 50

Use AI to find bug Find bug from Parfait? Constant propagation Partial execution Example for buffer overflow What do you need in general Data Flow To do Challenges Illegal paths 1 st paper CFG contexts Page 51

Intraprocedural CFG Statements Nodes One successor and one predecessor Basic Blocks Multiple successors to the join (multiple predecessors) Examples? Unique Enter and Exit All start nodes are successors of enter All return nodes are predecessors of exit Page 52