Static Vulnerability Analysis

Similar documents
5) Attacker causes damage Different to gaining control. For example, the attacker might quit after gaining control.

Secure Programming I. Steven M. Bellovin September 28,

CSE 565 Computer Security Fall 2018

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

Visualizing Type Qualifier Inference with Eclipse

Module: Safe Programming. Professor Trent Jaeger. CSE543 - Introduction to Computer and Network Security

MEMORY SAFETY ATTACKS & DEFENSES

Programming refresher and intro to C programming

Lecture 9 Assertions and Error Handling CS240

Module: Future of Secure Programming

EURECOM 6/2/2012 SYSTEM SECURITY Σ

Lecture 4 September Required reading materials for this class

Detecting and exploiting integer overflows

Program Security and Vulnerabilities Class 2

Towards automated detection of buffer overrun vulnerabilities: a first step. NDSS 2000 Feb 3, 2000

A brief introduction to C programming for Java programmers

Converting a Lowercase Letter Character to Uppercase (Or Vice Versa)

One-Slide Summary. Lecture Outline. Language Security

Language Security. Lecture 40

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall 2011.

OWASP 5/07/09. The OWASP Foundation OWASP Static Analysis (SA) Track Session 1: Intro to Static Analysis

3/7/2018. Sometimes, Knowing Which Thing is Enough. ECE 220: Computer Systems & Programming. Often Want to Group Data Together Conceptually

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I Solutions

QUIZ. What is wrong with this code that uses default arguments?

CS 161 Computer Security

CSE 509: Computer Security

Strings(2) CS 201 String. String Constants. Characters. Strings(1) Initializing and Declaring String. Debzani Deb

Today Program Analysis for finding bugs, especially security bugs problem specification motivation approaches remaining issues

Static Analysis and Bugfinding

Buffer overflow prevention, and other attacks

Project Compiler. CS031 TA Help Session November 28, 2011

Lecture 05 Integer overflow. Stephen Checkoway University of Illinois at Chicago

System Security Class Notes 09/23/2013

SYSC 2006 C Winter String Processing in C. D.L. Bailey, Systems and Computer Engineering, Carleton University

Computer Programming: Skills & Concepts (CP) Strings

CSCE 548 Building Secure Software Integers & Integer-related Attacks & Format String Attacks. Professor Lisa Luo Spring 2018

CYSE 411/AIT681 Secure Software Engineering Topic #10. Secure Coding: Integer Security

G52CPP C++ Programming Lecture 9

2/9/18. Readings. CYSE 411/AIT681 Secure Software Engineering. Introductory Example. Secure Coding. Vulnerability. Introductory Example.

2/9/18. CYSE 411/AIT681 Secure Software Engineering. Readings. Secure Coding. This lecture: String management Pointer Subterfuge

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I

So far, system calls have had easy syntax. Integer, character string, and structure arguments.

Final CSE 131B Spring 2004

Lecture 4: Outline. Arrays. I. Pointers II. III. Pointer arithmetic IV. Strings

Module: Future of Secure Programming

Buffer overflow background

Generating String Attack Inputs Using Constrained Symbolic Execution. presented by Kinga Dobolyi

CS558 Programming Languages

CSCE 548 Building Secure Software Buffer Overflow. Professor Lisa Luo Spring 2018

CSE 333 Midterm Exam Sample Solution 5/10/13

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Security Analyses For The Lazy Superhero

Fundamentals of Computer Security

Finding User/Kernel Pointer Bugs with Type Inference p.1

Static Analysis in Practice

A function is a named piece of code that performs a specific task. Sometimes functions are called methods, procedures, or subroutines (like in LC-3).

printf( Please enter another number: ); scanf( %d, &num2);

CSE 333 Midterm Exam 2/14/14

Introduction to Programming Using Java (98-388)

VARIABLES AND TYPES CITS1001

Overview. Concepts this lecture String constants Null-terminated array representation String library <strlib.h> String initializers Arrays of strings

Software Security: Buffer Overflow Attacks

Topics so far. Review. scanf/fscanf. How data is read 1/20/2011. All code handin sare at /afs/andrew/course/15/123/handin

Today s Learning Objectives

Lecture 04 Introduction to pointers

Foundations of Network and Computer Security

Stanford University Computer Science Department CS 295 midterm. May 14, (45 points) (30 points) total

CMPSC 497: Midterm Review

ECE 15B COMPUTER ORGANIZATION

Question Score 1 / 19 2 / 19 3 / 16 4 / 29 5 / 17 Total / 100

CPSC 427a: Object-Oriented Programming

Undefined Behaviour in C

ECS 153 Discussion Section. April 6, 2015

Slide Set 3. for ENCM 339 Fall 2017 Section 01. Steve Norman, PhD, PEng

CS558 Programming Languages

CS510 Assignment #2 (Due March 7th in class)

CSC209H Lecture 4. Dan Zingaro. January 28, 2015

Rule Based Static Analysis of Network Protocol Implementations. Octavian Udrea, Cristian Lumezanu, Jeff Foster Univ. of Maryland, College Park

G52CPP C++ Programming Lecture 12

CSE 374 Programming Concepts & Tools

Static Analysis in Practice

Today, we will dispel the notion that Java is a magical language that allows us to solve any problem we want if we re smart enough.

CSE 333 Midterm Exam 5/10/13

NET 311 INFORMATION SECURITY

+ Abstract Data Types

M.CS201 Programming language

Analyzing Systems. Steven M. Bellovin November 26,

C Programming Language Review and Dissection III

Extensible Lightweight Static Checking

Vector and Free Store (Vectors and Arrays)

Programmer s Points to Remember:

EL2310 Scientific Programming

UNIT II Structuring the Data, Computations and Program. Kainjan Sanghavi

Reading Assignment. Strings. K.N. King Chapter 13. K.N. King Sections 23.4, Supplementary reading. Harbison & Steele Chapter 12, 13, 14

Assertions, pre/postconditions

Secure Software Development: Theory and Practice

However, in C we can group related variables together into something called a struct.

CSE351 Winter 2016, Final Examination March 16, 2016

CPSC 213. Introduction to Computer Systems. Procedures and the Stack. Unit 1e Feb 11, 13, 15, and 25. Winter Session 2018, Term 2

String constants. /* Demo: string constant */ #include <stdio.h> int main() {

Transcription:

Static Vulnerability Analysis Static Vulnerability Detection helps in finding vulnerabilities in code that can be extracted by malicious input. There are different static analysis tools for different kinds of bugs, like format string attack, buffer overflow, and other such vulnerabilities. Following block diagram shows the placement of static analyzers, the input to them and expected output. Buggy C Program Static Analysis Warnings Generated Programmer removes warnings and repeats same procedure. Fig.1 Block diagram of generalized static analysis tool. Static analysis has an advantage over runtime analysis. Since runtime analysis requires checks that reduce the performance of the program. However when the same changes are done in the code itself, it is not considered an overhead! Let us first introduce few terms: 1. Flow insensitive: A technique which does not consider the order of execution of statements while analyzing code is called flow insensitive. 2. Path insensitive: A technique which considers all possible paths, and not just the semantically possible paths in the code is called path insensitive. Here path means control flow in the code. 3. False Positives: A False Positive is when you think you have a specific vulnerability in your program but in fact you don't. 4. False Negatives: A False Negative, as the name suggests is opposite to False Positives. Here the vulnerability is hidden and not identified. 5. Complete: A technique which gives no false positives is called complete. 6. Sound: A technique which gives no False Negatives is called sound. So here there might be lots of false positive warnings but we can rest assure that all the vulnerabilities will be flagged. 7. Monomorphic: A monomorphic analysis does not consider the call site of a function. It concatenates all calls to a function and generalizes the resultant graph. This approach might create many false positives, but it is easier to implement. 8. Polymorphic: A polymorphic analysis considers the call site of a function and formulates different results for different calls of the same function. Thus giving a more precise but also more expensive analysis.

Consider reading the following paper for more information on them. Polymorphic versus Monomorphic Flowinsensitive Points-to Analysis for C by Jeffrey S. Foster, Manuel Fähndrich, Alexander Aiken. There will be two types of static analysis tools that will be explained in the lecture. 1. Tools that use Data Flow graphs. 2. Tools that use Control Flow Automata (Similar to control flow graphs, except for code on edges) Data Flow Graphs: Consider static analysis of format string attacks for studying data flow graphs: Ex 1: Line 1: void log_use (char *username) { Line 2: char buf[512]; Line 3: strcpy (buf, Logging from: ); Line 4: strncat (buf, username, 400); // Taken 400 bytes for avoiding buffer overflow Line 5: printf (buf); Line 6: } The above code has a format string vulnerability. Let us see how we detect it using static analysis. We introduce two keywords $tainted and $untainted in the compiler. These will be assigned to variables depending on the source of the value in the variable. All variables that get their values from untrusted source will be tainted. Now any untrusted source will eventually have to do a read() of the value. (We are not considering recv and other variants that can receive untrusted values over the network) So let us redefine the header containing declarations of read() and printf() int read (int fd, $tainted char *buf, int len); int printf ($untainted const char *fmt,..); Here we have added the required keyword to the parameters depending on what the function assumes its parameters to be. However, if a $tainted data type is passed when a $untainted data type is expected, the compiler will give a type mismatch warning, but the code is not broken. Which means we can actually pass $tainted data when a function expects $untainted data. Hence, $untainted data becomes subclass of $tainted data. (Just like vector is a subclass of objects in Java) $untainted char Є $tainted char $untainted int Є $tainted int Where Є means subclass. Let us now formulate a data flow graph for the following piece of code. Assume that data read using read is passed in as the username argument to log_use, so we can treat this argument as tainted. Ex 2: Line 1: void log_use (char *username) { Line 2: char *p = username; Line 3: char buf[512]; Line 4: strcpy (buf, Logging from: ); Line 5: strncat (buf, p, 400); // Taken 400 bytes for avoiding buffer overflow Line 7: printf (buf); Line 8: }

Fig.2 Data Flow Graph for code in example 2. The above graph is formulated on the basis of how data flows from variables. There are two graphs above, the first graph with no double edges shows the data assignment. The second graph with tainted and untainted nodes is more useful for analysis. Here is an explanation on why there is a double edge in graph 2 for a single edge in graph 1. Consider the following line of code: *p = username; Here *p and *username are the same value. So no matter which pointer changed the value of variable, they will both be affected. Therefore, for all pointers, there will be double edge (going to and coming from) with the pointer it shares the address of variable. In the above graph there is a path from tainted to untainted. This static analysis gives a warning and thus recognizes a format string vulnerability. Consider the same example with following modifications: Ex 3: Line 1: void log_use (char *username) { Line 2: char *p; Line 3: char buf[512]; Line 4: strcpy (buf, Logging from: ); Line 5: strncat (buf, username, 400); // Taken 400 bytes for avoiding buffer overflow Line 7: printf (p); Line 8: p = username; Line 9: } Here, the purpose of program has changed, however we are more interested in concluding something else. Note that this new piece of code will give the same data flow graph, but the vulnerability is no longer there. This is an example of Flow-Insensitive analysis. So the code flow is not considered while formulating or traversing the graph. Hence this analysis can give false positives. Let us consider ex 1. And form a data flow graph for it. Here src and dst come from the definition of strncat. int strncat (char *dst, const char *src, int len);

Fig.3 Data Flow Graph for code in example 1. This graph like the previous gives a data flow graph. This graph also has a path from tainted to untainted data. Hence, this accurately detects a vulnerability. Now consider an extension to this graph. Suppose there is a call to the function strncat() from somewhere else in the same program. Let us consider how this graph changes. strncat (b, a, ); Fig.4 Data Flow Graph for code in example 1 and considering another call to strncat. Here we assume that strncat(b, a, n) is not a vulnerability. That is both the source and destination strings are untainted. However, in graph traversal there can be a path as follows:

*a *src *dst *buf *fmt untainted. This path is not semantically right. Hence all calls to strncat will give false positive warnings. This graph is an example of monomorphic analysis. That is the call site of strncat was not considered while traversal of graph. To make this analysis polymorphic we need to form a context free language which will be associated with the site of function call. Consider there will be ( 1 and ) 1 as the identifiers for data flowing in and out of 1 st instance of strncat. So on so forth. Using this approach, the graph would now look like follows: Fig.5 Data Flow Graph for code in example 1 using polymorphic analysis. Here while traversing the graph we take note that if the function was entered by a certain identifier, the path taken to leave the function should be of the pair of this identifier. For example, if function is entered from an edge with identifier ( 1, then the path with identifier ) 1 should only be considered. This approach gives a sound analysis of the code, which covers all possible vulnerabilities and has no false negatives. CQual and Qink are examples of tools for static analysis of C. Control Flow Automata:Control Flow Automata is important for analyzing control flow bugs. That is, when we need to find if there are vulnerabilities which will cause the control of code to flow in a direction not designed to be. Control flow automata is different from control flow graph in the sense that it uses code statements on edges instead of nodes.

Consider the following piece of code: Ex 4: Line 1: if (.) Line 2: setuid (0); Line 3: if (.) Line 4: system ( ); Line 5: if (.) Line 6: setuid ( 0); Line 7: } Here there will be a control flow bug if the if condition 1 is satisfied and then control flows to system( ) after doing a setuid(0). Let us see how to detect such vulnerabilities statically using form control flow graphs. Fig.5 Control Flow Automata for code in example 4. Let this grammar be called C and its Language L(C) = Superset of all the control flows the original program would have. That is in this graph we consider paths that might not be possible semantically in the code, we are doing this by not considering the condition in if. Such an analysis is called pathinsensitive. Now, consider the following input to DFA of another grammar M. = {setuid (0), system ( ), setuid ( 0)} Here L(M) = Vulnerability found.

Fig.6 Directed Finite Automata for code in example 4. In this DFA, all other edges are self edges. Which does not change the state and hence for simplicity not shown. Now, we have to combine the two state machines and form a single state machine. In the new language if there is a path to the ERR state, then the analysis has found a vulnerability. In the above two languages, if L(C) L(M) is empty, then we have no vulnerabilities. However, this analysis is not free from False Positives, since a superset of actual paths in code is considered for analysis. This might show vulnerabilities that might not exist due to the semantics of the code.