Program Testing via Symbolic Execution Daniel Dunbar Program Testing via Symbolic Execution p. 1/26
Introduction Motivation Manual testing is difficult Program Testing via Symbolic Execution p. 2/26
Introduction Motivation Manual testing is difficult Requires knowledge of code Program Testing via Symbolic Execution p. 2/26
Introduction Motivation Manual testing is difficult Requires knowledge of code Requires constant maintenance Program Testing via Symbolic Execution p. 2/26
Introduction Motivation Manual testing is difficult Requires knowledge of code Requires constant maintenance Random testing is ineffective Hard to deeply probe programs Program Testing via Symbolic Execution p. 2/26
Introduction Motivation Manual testing is difficult Requires knowledge of code Requires constant maintenance Random testing is ineffective Hard to deeply probe programs Symbolic execution? Automatic Can check for error conditions Can reach deep code paths Program Testing via Symbolic Execution p. 2/26
Overview KLEE: A Symbolic Virtual Machine Basic Architecture Constraint Solver Optimizations Program Testing via Symbolic Execution p. 3/26
Overview KLEE: A Symbolic Virtual Machine Basic Architecture Constraint Solver Optimizations A Modeling Environment for UNIX Program Testing via Symbolic Execution p. 3/26
Overview KLEE: A Symbolic Virtual Machine Basic Architecture Constraint Solver Optimizations A Modeling Environment for UNIX Case Study: Coreutils Over 80% coverage on majority of tools Overall coverage better than developer tests Program Testing via Symbolic Execution p. 3/26
What is Symbolic Execution? Program Testing via Symbolic Execution p. 4/26
What is Symbolic Execution? Symbolic Replace concrete program values with symbolic variables Program Testing via Symbolic Execution p. 4/26
What is Symbolic Execution? Symbolic Replace concrete program values with symbolic variables Execution Program instructions become operations on expressions Program Testing via Symbolic Execution p. 4/26
What is Symbolic Execution? Symbolic Replace concrete program values with symbolic variables Execution Program instructions become operations on expressions Also: Record branch choices in a path condition Program Testing via Symbolic Execution p. 4/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: A 0 s: c:? Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: path:? Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: path:? Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: path:? s 0 0 Input: s 0 0 s: c: path:? s 0 = 0 Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: s 0 path: s 0 0 Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: s 0 path: s 0 = \ Input: s 0 0 s: c: s 0 path: s 0 0 s 0 \ Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: path: 0 s 0 = \ Program Testing via Symbolic Execution p. 5/26
What is Symbolic Execution? void escape(char *s, char *out) { while (*s!= 0) { char c = *s++; if (c == \\ ) c = *s++; *out++ = c; } } Input: s 0 0 s: c: path: 0 s 0 = \ Program Testing via Symbolic Execution p. 5/26
The KLEE Architecture Based on experience with EXE system Goals Simplicity Scalability Speed Program Testing via Symbolic Execution p. 6/26
The KLEE Architecture: Simplicity Interpreter for LLVM Assembly Language Precise semantics Multiple language support Mature binary tools Program Testing via Symbolic Execution p. 7/26
The KLEE Architecture: Simplicity Interpreter for LLVM Assembly Language Precise semantics Multiple language support Mature binary tools Easy to understand Simple interpreter loop Explicit process representation Modular search & constraint solving Program Testing via Symbolic Execution p. 7/26
The KLEE Architecture: Scalability Fine-grained heap memory Object level copy-on-write Program Testing via Symbolic Execution p. 8/26
The KLEE Architecture: Scalability Fine-grained heap memory Object level copy-on-write Careful data structure selection Persistent heap, expressions, path data Compact expression representation Program Testing via Symbolic Execution p. 8/26
The KLEE Architecture: Scalability Fine-grained heap memory Object level copy-on-write Careful data structure selection Persistent heap, expressions, path data Compact expression representation Handle over 100k processes for small programs Program Testing via Symbolic Execution p. 8/26
The KLEE Architecture: Speed Constraint solving dominates run-time Program Testing via Symbolic Execution p. 9/26
The KLEE Architecture: Speed Constraint solving dominates run-time Keep constraints simple Canonicalize simple forms Cache concrete values and indices Dynamically rewrite path constraints Program Testing via Symbolic Execution p. 9/26
The KLEE Architecture: Speed Constraint solving dominates run-time Keep constraints simple Canonicalize simple forms Cache concrete values and indices Dynamically rewrite path constraints Optimize constraint solving Program Testing via Symbolic Execution p. 9/26
Solver Optimization Take advantage of the nature of symbolic execution Program Testing via Symbolic Execution p. 10/26
Solver Optimization Take advantage of the nature of symbolic execution Program decomposition Queries deal with distinct input variables Program Testing via Symbolic Execution p. 10/26
Solver Optimization Take advantage of the nature of symbolic execution Program decomposition Queries deal with distinct input variables Considerable redundancy Processes accrue constraints Static set of branches Program Testing via Symbolic Execution p. 10/26
Independent Constraint Opt. Path constraints may refer to many variables Path: s 0 = A s 1 = 0 y 0 < 100 Program Testing via Symbolic Execution p. 11/26
Independent Constraint Opt. Path constraints may refer to many variables Path: s 0 = A s 1 = 0 y 0 < 100 Individual expressions only refer to a few Branch condition: y = 10 Program Testing via Symbolic Execution p. 11/26
Independent Constraint Opt. Path constraints may refer to many variables Path: s 0 = A s 1 = 0 y 0 < 100 Individual expressions only refer to a few Branch condition: y = 10 Independent constraint optimization: Group constraints into disjoint subsets based on referenced variables Only pass dependent constraints to solver Query: y < 100 y = 10 Program Testing via Symbolic Execution p. 11/26
Counterexample Cache Take advantage of free counterexamples Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Take advantage of free counterexamples Cache C {A, }, where C is a set of constraints and A a satisfying assignment Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Take advantage of free counterexamples Cache C {A, }, where C is a set of constraints and A a satisfying assignment When performing a query with constraints C: Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Take advantage of free counterexamples Cache C {A, }, where C is a set of constraints and A a satisfying assignment When performing a query with constraints C: 1. If (C, x) in cache, return x Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Take advantage of free counterexamples Cache C {A, }, where C is a set of constraints and A a satisfying assignment When performing a query with constraints C: 1. If (C, x) in cache, return x 2. If (C, ), C C in cache, return Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Take advantage of free counterexamples Cache C {A, }, where C is a set of constraints and A a satisfying assignment When performing a query with constraints C: 1. If (C, x) in cache, return x 2. If (C, ), C C in cache, return 3. If (C, A), C C in cache, return A Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Take advantage of free counterexamples Cache C {A, }, where C is a set of constraints and A a satisfying assignment When performing a query with constraints C: 1. If (C, x) in cache, return x 2. If (C, ), C C in cache, return 3. If (C, A), C C in cache, return A 4. Otherwise, for each (C, A), C C in cache, speculatively evaluate C using A Program Testing via Symbolic Execution p. 12/26
Counterexample Cache Example void foo(int x) { if (x < 100)... if (x!= 50)... } Program Testing via Symbolic Execution p. 13/26
Counterexample Cache Example void foo(int x) { if (x < 100)... if (x!= 50)... } Cache: {} Program Testing via Symbolic Execution p. 13/26
Counterexample Cache Example void foo(int x) { if (x < 100)... if (x!= 50)... } Cache: {} Query: x < 100, Cache: {(x < 100, {x : 0})} Program Testing via Symbolic Execution p. 13/26
Counterexample Cache Example void foo(int x) { if (x < 100)... if (x!= 50)... } Cache: {} Query: x < 100, Cache: {(x < 100, {x : 0})} Query: x < 100 x 50 Program Testing via Symbolic Execution p. 13/26
Counterexample Cache Example void foo(int x) { if (x < 100)... if (x!= 50)... } Cache: {} Query: x < 100, Cache: {(x < 100, {x : 0})} Query: x < 100 x 50 Evaluate with {x : 0} 0 < 100 0 50 Found assignment Program Testing via Symbolic Execution p. 13/26
Optimization Results Average Time (s) 300 250 200 150 100 Base Independence Cex Cache All 50 0 0 0.2 0.4 0.6 0.8 1 Normalized Num. Instructions Program Testing via Symbolic Execution p. 14/26
Environment Modeling Testing real applications requires providing a realistic environment system libraries operating system Program Testing via Symbolic Execution p. 15/26
Environment Modeling Testing real applications requires providing a realistic environment system libraries operating system Our approach: Compile and execute using µlibc Implement UNIX systems calls in separate modeling library Program Testing via Symbolic Execution p. 15/26
Environment Modeling (2) Idea: Overlay symbolic environment onto operating system Program sees virtual symbolic resources and actual OS resources Program Testing via Symbolic Execution p. 16/26
Environment Modeling (2) Idea: Overlay symbolic environment onto operating system Program sees virtual symbolic resources and actual OS resources Implemented by routing system calls through model routines Symbolic resources are handled in model Actual resources are forwarded to OS Program Testing via Symbolic Execution p. 16/26
Environment Modeling (3) Currently model supports symbolic files and program arguments Program Testing via Symbolic Execution p. 17/26
Environment Modeling (3) Currently model supports symbolic files and program arguments Supports open(), close(), read(), write(), lseek(), stat(), fstat() as well as simple stubs for several more Program Testing via Symbolic Execution p. 17/26
Environment Modeling (3) Currently model supports symbolic files and program arguments Supports open(), close(), read(), write(), lseek(), stat(), fstat() as well as simple stubs for several more Implementation is about 800 lines of C code Program Testing via Symbolic Execution p. 17/26
Testing Process Test generation: The application is linked with µclibc and the model and run with KLEE Program Testing via Symbolic Execution p. 18/26
Testing Process Test generation: The application is linked with µclibc and the model and run with KLEE Additional arguments specify what parts of environment to make symbolic Program Testing via Symbolic Execution p. 18/26
Testing Process Test generation: The application is linked with µclibc and the model and run with KLEE Additional arguments specify what parts of environment to make symbolic KLEE generates test cases for any errors and for coverage Program Testing via Symbolic Execution p. 18/26
Testing Process (2) Test verification: Programs are linked with gcc using µlibc and stripped down model Program Testing via Symbolic Execution p. 19/26
Testing Process (2) Test verification: Programs are linked with gcc using µlibc and stripped down model Tests are replayed natively Program Testing via Symbolic Execution p. 19/26
Testing Process (2) Test verification: Programs are linked with gcc using µlibc and stripped down model Tests are replayed natively Protects against modeling and system errors Program Testing via Symbolic Execution p. 19/26
Coreutils Background GNU implementation of standard UNIX utilities Program Testing via Symbolic Execution p. 20/26
Coreutils Background GNU implementation of standard UNIX utilities Core of most Linux installations Program Testing via Symbolic Execution p. 20/26
Coreutils Background GNU implementation of standard UNIX utilities Core of most Linux installations Encompasses 90 applications of varying size and focus File system: ls, dd, chmod Numeric: factor, seq System: printenv, hostname Text Processing: sort, head, od Program Testing via Symbolic Execution p. 20/26
Coreutils Case Study All 90 applications were tested on latest public release Program Testing via Symbolic Execution p. 21/26
Coreutils Case Study All 90 applications were tested on latest public release Focused on tool front-ends Program Testing via Symbolic Execution p. 21/26
Coreutils Case Study All 90 applications were tested on latest public release Focused on tool front-ends Covers 20k combined executable lines of code Program Testing via Symbolic Execution p. 21/26
Coreutils Case Study All 90 applications were tested on latest public release Focused on tool front-ends Covers 20k combined executable lines of code Mostly automatic One source code modification required Most programs tested using a generic configuration Program Testing via Symbolic Execution p. 21/26
Results: Bugs Six bugs found One fixed in developer repository Three resulted in segmentation fault Two allowed arbitrary heap smashing ptx -F\\ abcdefghijklmnopqrstuvwxyz paste -d\\ abcdefghijklmnopqrstuvwxyz seq -f %0 1 mkdir -Z a b mkfifo -Z a b mknod -Z a b p Program Testing via Symbolic Execution p. 22/26
Results: Coverage Overall coverage: 70.6% Over 80% on 61/90 30 29 100 25 24 80 87% (37) 80% (21) 80% (14) 71% (8) 77% (3) 82% (1) 20 60 57% (1) 56% (1) 15 47% (4) 40 10 8 9 9 8 5 20 0 1 10-20% 2 20-30% 50-60% 60-70% 70-80% 80-90% 90-100% 100% 0 0-100 100-200 200-300 300-400 500-600 600-700 800-900 1100-1200 1300-1400 Executable Lines of Code Program Testing via Symbolic Execution p. 23/26
Results: Vs. Manual Developers tests overall coverage: 67.4%. Generated tests outperform on majority of tools. 100 80 Generated Tests Developer Tests Shared 60 40 20 0 Program Testing via Symbolic Execution p. 24/26
Conclusion Symbolic execution can provide an effective testing tool Program Testing via Symbolic Execution p. 25/26
Conclusion Symbolic execution can provide an effective testing tool Can scale to real applications Careful attention to scalability Take advantage of specialized domain to optimize constraint solving Program Testing via Symbolic Execution p. 25/26
Conclusion Symbolic execution can provide an effective testing tool Can scale to real applications Careful attention to scalability Take advantage of specialized domain to optimize constraint solving Requires only modest amount of effort to construct an adequate model Program Testing via Symbolic Execution p. 25/26
Conclusion Symbolic execution can provide an effective testing tool Can scale to real applications Careful attention to scalability Take advantage of specialized domain to optimize constraint solving Requires only modest amount of effort to construct an adequate model Can beat manual testing out-of-the-box Program Testing via Symbolic Execution p. 25/26
Questions? Program Testing via Symbolic Execution p. 26/26