Outline. 1 Background. 2 Symbolic Execution. 3 Whitebox Fuzzing. 4 Summary. 1 Cost of each Microsoft Security Bulletin: $Millions

Size: px

Start display at page:

Download "Outline. 1 Background. 2 Symbolic Execution. 3 Whitebox Fuzzing. 4 Summary. 1 Cost of each Microsoft Security Bulletin: $Millions"

Cory Bruce
6 years ago
Views:

1 CS 6V81-05: System Security and Malicious Code Analysis and Zhiqiang Lin 1 Background 2 Department of Computer Science University of Texas at Dallas 3 April 9 th, Software security bugs can be very expensive 1 Background Cost of each Microsoft Security Bulletin: $Millions 2 Cost due to worms (Slammer, CodeRed, Blaster, etc.): $Billions 3 Many security exploits are initiated via files or packets Ex: MS Windows includes parsers for hundreds of file formats 4 0-day Vulnerability means money/weapon Security testing: hunting for million-dollar bugs

2 Hunting for Security Bugs Black hat 1 Code inspection (of binaries) 2 Blackbox fuzz testing Blackbox fuzz testing 1 A form of blackbox random testing [Miller+90] 2 Randomly fuzz (=modify) a well-formed input 3 Grammar-based fuzzing: rules that encode well-formed ness + heuristics about how to fuzz (e.g., using probabilistic weights) Black-box fuzzing has been heavily used in security testing Simple yet effective: many bugs found this way Introducing Blackbox Fuzzing Examples 1 Peach, Protos, Spike, Autodafe, etc. Why so many blackbox fuzzers? Because anyone can write (a simple) one in a week-end! Conceptually simple, yet effective Sophistication is in the add-on Test harnesses (e.g., for packet fuzzing) Grammars (for specific input formats) No principled test generation No attempt to cover each state/rule in the grammar When probabilities, no global optimization (simply random walks) Idea: mix fuzz testing with dynamic test generation 1 Symbolic execution 2 Collect constraints on inputs 3 Negate those, solve with constraint solver, generate new inputs 4 do systematic dynamic test generation (=DART) = DART meets Fuzz Foundation: DART (Directed Automated Random Testing) Key extensions: ( ), implemented in SAGE [NDSS 08] 1 Background 2 3 4

3 What is symbolic execution A Complete Code Example/Demo with BitBlaze Symbolic execution and program testing, King [Comm. ACM 1976], Cited by 960 Analysis of programs with unspecified inputs Execute a program on symbolic inputs Symbolic states represent sets of concrete states Insight: code can generate its own test cases 1 #include <stdio.h> 2 3 FILE *fp; 4 5 int main () 6 { 7 char buffer[10]; 8 char a, b; 9 scanf ("%s", buffer); 10 fp = fopen("/boot/input","r"); 11 fscanf (fp, "%c%c", &a, &b); 12 fclose (fp); 13 if (a == x ) 14 { 15 printf ("WE ARE IN X\n"); 16 if (b == y ) 17 printf ("WE ARE IN Y\n"); 18 } 19 return 0; 20 } Assembly f4 <main>: 80481f4: 55 push %ebp 80481f5: 89 e5 mov %esp,%ebp 80481f7: 83 ec 38 sub $0x38,%esp 80481fa: 83 e4 f0 and $0xfffffff0,%esp : 8d 45 e7 lea -0x19(%ebp),%eax : mov %eax,0x8(%esp) a: c f9 5f 0a movl $0x80a5ff9,0x4(%esp) : : a c 08 mov 0x80c5018,%eax : mov %eax,(%esp) a: e8 71 0c call 8048ed0 < fscanf> f: a c 08 mov 0x80c5018,%eax : mov %eax,(%esp) : e8 64 0d call 8048fd0 <_IO_fclose> c: 80 7d e7 78 cmpb $0x78,-0x19(%ebp) : 75 1e jne <main+0x9c> : c fe 5f 0a 08 movl $0x80a5ffe,(%esp) : e8 02 0b call 8048d80 <_IO_printf> e: 80 7d e6 79 cmpb $0x79,-0x1a(%ebp) : 75 0c jne <main+0x9c> : c b 60 0a 08 movl $0x80a600b,(%esp) b: e8 f0 0a call 8048d80 <_IO_printf> : b mov $0x0,%eax : c9 leave : c3 ret A Complete Code Example/Demo with BitBlaze SAT Problem 1 #include <stdio.h> 2 3 FILE *fp; 4 5 int main () 6 { 7 char buffer[10]; 8 char a, b; 9 scanf ("%s", buffer); 10 fp = fopen ("/boot/input", "r"); 11 fscanf (fp, "%c%c", &a, &b); 12 fclose (fp); 13 if (a == x ) 14 { 15 printf ("WE ARE IN X\n"); 16 if (b == y ) 17 printf ("WE ARE IN Y\n"); 18 } 19 return 0; 20 } Goal The system needs to automatically generate the input for /boot/input, with the content below. /boot/input xy000 SAT In computer science, satisfiability (often written in all capitals or abbreviated SAT) is the problem of determining if the variables of a given Boolean formula can be assigned in such a way as to make the formula evaluate to TRUE. In complexity theory, the satisfiability problem (SAT) is a decision problem, whose instance is a Boolean expression written using only AND, OR, NOT, variables, and parentheses. The question is: given the expression, is there some assignment of TRUE and FALSE values to the variables that will make the entire expression true?

Background Background FoundationSymbolic and ToolsExecution Decision Problem Definition In computability theory and computational complexity theory, a decision problem is a question in some formal

4 Background Background FoundationSymbolic and ToolsExecution Decision Problem Definition In computability theory and computational complexity theory, a decision problem is a question in some formal system with a yes-or-no answer, depending on the values of some input parameters Basic Concepts Literal A literal p is a variable x or its negation x. Clause A clause C is a disjunction of literals: x 1 x 2 x 3 CNF A CNF is a conjunction of clauses: (x2 x41 x15) (x6 x2) (x31 x41 x6 x156) SAT is a NP-complete problem Yices Example/Demo SAT Problem The SAT-problem is: 1 Find a boolean assignment 2 such that each clause has a true literal First problem shown to be NP-complete (1971) 1 #include<stdio.h> 2 #include"yices_c.h" 3 int main(){ 4 yices_context ctx = yices_mk_context(); 5 yices_type ty = yices_mk_type(ctx, "int"); 6 yices_var_decl xdecl = yices_mk_var_decl(ctx, "x", ty); 7 yices_var_decl ydecl = yices_mk_var_decl(ctx, "y", ty); 8 yices_expr x = yices_mk_var_from_decl(ctx, xdecl); 9 yices_expr y = yices_mk_var_from_decl(ctx, ydecl); 10 yices_expr n1 = yices_mk_num(ctx, 2); 11 yices_expr n2 = yices_mk_num(ctx, 1); 12 yices_expr args[2]; 13 args[0] = x; args[1] = n1; 14 yices_expr e1 = yices_mk_sum(ctx, args, 2); //x args[0] = y; args[1] = n2; 16 yices_expr e2 = yices_mk_sub(ctx, args, 2); //y yices_expr c1 = yices_mk_le(ctx, e1, e2); // x + 2 <= y yices_assert(ctx, c1); 19 switch (yices_check(ctx)) { 20 case l_true: 21 printf("satisfiable\n"); 22 yices_model m = yices_get_model(ctx); 23 yices_display_model(m); 24 break; 25 case l_false: 26 printf("unsatisfiable\n"); 27 break; 28 } 29 return 0; 30 } 1 (define x::int) 2 (define y::int) 3 (assert (<= (+ x 2) (- y 1) ) ) 4 (check) Result satisfiable (= x -3) (= y 0)

STP Example STP Example 1 x0 : BITVECTOR(8); 2 x1 : BITVECTOR(8); 3 x2 : BITVECTOR(8); 4 x3 : BITVECTOR(8); 5 QUERY(NOT(NOT((~((IF (((x3@(x2@(x1@x0))) = 0h64616221)) THEN (0b1) ELSE (0b0) ENDIF)) =

ASSERT( x3 = 0hex64 ); ASSERT( x0 = 0hex21 ); ASSERT( x2 = 0hex61 ); ASSERT( x1 = 0hex62 ); char x, y ; if ( x * y == 16 ) Path Constraint x : BITVECTOR ( 8 ) ; y : BITVECTOR ( 8 ) ;

Z3 supports linear real and integer arithmetic, fixed-size bit-vectors, extensional arrays, uninterpreted functions, and quantifiers.

5 STP Example STP Example 1 x0 : BITVECTOR(8); 2 x1 : BITVECTOR(8); 3 x2 : BITVECTOR(8); 4 x3 : BITVECTOR(8); 5 QUERY(NOT(NOT((~((IF (((x3@(x2@(x1@x0))) = 0h )) THEN (0b1) ELSE (0b0) ENDIF)) = 0b1)))); Result Invalid. ASSERT( x3 = 0hex64 ); ASSERT( x0 = 0hex21 ); ASSERT( x2 = 0hex61 ); ASSERT( x1 = 0hex62 ); char x, y ; if ( x * y == 16 ) Path Constraint x : BITVECTOR ( 8 ) ; y : BITVECTOR ( 8 ) ; QUERY(NOT(BVMULT( 8, x, y ) = 0h10 ) Results Invalid. ASSERT( y = 0hex05 ) ; ASSERT( x = 0hexD0 ) Mostly used SMT Solvers Z3 A high-performance theorem prover being developed at Microsoft Research. Z3 supports linear real and integer arithmetic, fixed-size bit-vectors, extensional arrays, uninterpreted functions, and quantifiers. Yices An efficient SMT solver that decides the satisfiability of arbitrary formulas containing uninterpreted function symbols with equality, linear real and integer arithmetic, scalar types, recursive datatypes, tuples, records, extensional arrays, fixed-size bit-vectors, quantifiers, and lambda expressions Mostly used SMT Solvers MiniSmt MiniSmt is a simple SMT solver for non-linear arithmetic based on MiniSat and Yices CVC3 CVC3 is an automatic theorem prover for Satisfiability Modulo Theories (SMT) problems. It can be used to prove the validity (or, dually, the satisfiability) of first-order formulas in a large number of built-in logical theories and their combination.

6 Background Mostly used SMT Solvers Background For each path, build a path condition Condition on inputs, for the execution to follow that path Check path condition satisfiability (SAT-problem), explore only feasible paths STP STP is a constraint solver (also referred to as a decision procedure or automated prover) aimed at solving constraints generated by program analysis tools, theorem provers, automated bug finders, biology, cryptography, intelligent fuzzers and model checkers. STP has been used in many research projects at Stanford, Berkeley, MIT, CMU and other universities. When execution path diverges, fork, adding constraints on symbolic values When we terminate (or crash), use a constraint solver to generate concrete input Symbolic state Symbolic values/expressions for variables Path condition Program counter Background Introduction Valgrind and STP Implementation Conclusion State of the art Goal Concept Symbolic execution: example Introduc Valgrind and Implementa Conclu State of the art Goal Concept Symbolic execution: example input = "\x06\x00\x00\x00\x0f\x00\x00\x00" In courtesy of Gabriel Campana for this great example Fuzzgrind: an automatic fuzzing tool Introduction Valgrind and STP Implementation Conclusion 10/55 Symbolic execution: exam input = "\x06\x00\x00\x00\x0f\x00\x00\x00" 1 Background Fuzzgrind: an automatic fuzzing tool 11/55 input Fuzzgrind: an

7 Fuzzing Fuzzing Basic Idea Search for software implementation errors by injecting invalid data Test generation Random mutation Model-based How it works Make fuzzing be completely automatic. Give a target program and an input, New inputs generated automatically, Wait for crashes. Tools for fuzzing Open Source Sulley SPIKE Peach Fuzzing Academia [NDSS 2008] IntScope [NDSS 2009] SmartFuzz [USENIX Security 2009] BuzzFuzz [ICSE 2009] Checksum-aware Fuzz [Oakland 2010] Insight Use of algebraic expressions to represent the variable values throughout the execution of the program. Basic Idea Symbolically execute the target program on a given input, Analyze execution path and extract path conditions depending on the input Negate each path condition Solve constraints and generate new test inputs This algorithm is repeated until all executions path are (ideally) covered

8 A Complete Code Example with Fuzzgrind Internals of 1 #include <stdio.h> 2 #include <stdlib.h> 3 #include <fcntl.h> 4 #include <unistd.h> 5 #define ERROR(x) do { perror(x); \ exit(-1); } while (0); 6 int main(int argc, char *argv[]) { 7 char buffer[5] = { 0 }; 8 int fd; 9 if (argc!= 2) { 10 printf("usage: %s <file>\n", argv[0]); 11 exit(-1); 12 } 13 if ((fd = open(argv[1], O_RDONLY)) == -1) { 14 ERROR("open"); 15 } 16 if (read(fd, buffer, 4)!= 4) { 17 ERROR("read"); 18 } 19 if (*(int *)buffer == 0x ) { 20 printf("ok, vulnerability\n"); 21 } 22 return 0; 23 } Path Constraint 1 x0 : BITVECTOR(8); 2 x1 : BITVECTOR(8); 3 x2 : BITVECTOR(8); 4 x3 : BITVECTOR(8); 5 QUERY(NOT(NOT((~((IF (((x3@(x2@(x1@x0))) = 0h )) THEN (0b1) ELSE (0b0) ENDIF)) = 0b1))) ); Results Invalid. ASSERT( x3 = 0hex64 ); ASSERT( x0 = 0hex21 ); ASSERT( x2 = 0hex61 ); ASSERT( x1 = 0hex62 ); 1 Dynamic Binary Instrumentation At run-time disassemble instructions, and capture the semantics and constraints 2 Data Flow (Taint) Capturing and Analysis Associate constraint with input 3 Constraint Solving Query and solve the constraint to generate new input 4 System-events, control flow handler (Optional) Run the program with new state 1 Background 2 Advantages 1 Symboic execution is promissing in vulnerabiliity discovery 2 It can drive the program to run desired path 3 4 Research Problems 1 Symbolic execution cannot handle complicated constraint 2 It doesn t provide clues on how to fuzz and get the vulnerability 3 Vulnerable code identification is still needed

9 References James C. King,Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, DART: Directed Automated Random Testing, PLDI 2005 Automated Whitebox Fuzz Testing, with Levin and Molnar, NDSS 2008 Grammar-Based, PLDI

Fuzzgrind: an automatic fuzzing tool

Fuzzgrind: an automatic fuzzing tool 1/55 Fuzzgrind: an automatic fuzzing tool Gabriel Campana Sogeti / ESEC gabriel.campana(at)sogeti.com Fuzzgrind: an automatic fuzzing tool 2/55 Plan 1 2 3 4 Fuzzgrind: