Testing & Symbolic Execution

Software Testing The most common way of measuring & ensuring correctness Input 2

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior 3

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior Oracle 4

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior Oracle Outcome 5

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior Oracle Outcome Test Suite Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 6

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior Oracle Outcome Key Issues: Are the tests adequate? Test Suite Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 7

Software Testing The most common way of measuring & ensuring correctness Input? Observed Behavior Oracle Outcome Key Issues: Are the tests adequate? Automated input generation Test Suite Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 8

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior Oracle Outcome Key Issues: Are the tests adequate? Automated input generation Automated oracles Test Suite Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 Test 7 9

Software Testing The most common way of measuring & ensuring correctness Input Observed Behavior Oracle Outcome Test 1 Are the tests adequate? Test 2 Test 3 Automated input generation Test 4 Automated oracles Test 5 Test 6 Robustness / flakiness / maintainability Test 7 Key Issues: Test Suite 10

Test Suite Adequacy Questions Is a test suite good enough? What parts of software need to be tested better? 12

Test Suite Adequacy Questions Is a test suite good enough? What parts of software need to be tested better? Metrics Statement Coverage def my_lovely_fun(a,b,c): if (a && b) c: Is each statement executed by at least one test in the test suite? else: print('awesome') score= # covered # statements 13

Test Suite Adequacy Questions Is a test suite good enough? What parts of software need to be tested better? Metrics Statement Coverage Branch Coverage score= # covered # branches def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') b T a c F 14

Test Suite Adequacy Questions Is a test suite good enough? What parts of software need to be tested better? Metrics Statement Coverage Branch Coverage MC/DC Coverage More common in safety critical systems where full coverage may be required. def my_lovely_fun(a,b,c): if (a & b) c: else: print('awesome') 15

Test Suite Adequacy Questions Is a test suite good enough? What parts of software need to be tested better? Metrics Statement Coverage Branch Coverage MC/DC Coverage Mutation Coverage def my_lovely_fun(a,b,c): if (a & b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') 16

Test Suite Adequacy Questions Is a test suite good enough? What parts of software need to be tested better? Metrics Statement Coverage Branch Coverage MC/DC Coverage Mutation Coverage # covered/killed score= # non-equivalent mutants def my_lovely_fun(a,b,c): if (a & b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') def my_lovely_fun(a,b,c): if (a && b) c: else: print('awesome') 17

Generating Inputs Sample all possible inputs for test in allpossibleinputs: check_test(test) 22

Generating Inputs Sample all possible inputs for test in allpossibleinputs: check_test(test) Target specific goals or other coverage criteria for s in statements: test = findtestthrough(s) check_test(test) 23

Generating Inputs Usually broken into black box & white box approaches 25

Generating Inputs Usually broken into black box & white box approaches Black Box Treat the program as opaque / unknown e.g. specification based, naive fuzzing, boundary value analysis,... 26

Generating Inputs Usually broken into black box & white box approaches Black Box Treat the program as opaque / unknown White Box Program structure & semantics can be used e.g. symbolic execution, call chain synthesis, white box fuzzing, boundary value analysis,... 27

Generating Oracles? 28

Generating Oracles Likely invariants? Careful variable selection & monitoring? Differential Testing Metamorphic Testing A very open (hard) problem. 29

Interesting Problems Random Testing Test Suite Adequacy Test Suite Minimization Test Generation Oracle Generation Test Maintenance Performance Testing Field Testing Power Testing Test Prioritization... 30

Symbolic Execution An approach for generating test inputs. Cadar & Sen, 2013 x input() y input() if x == 2*y if x > y+10 32

Symbolic Execution An approach for generating test inputs. Replace the concrete inputs of a program with symbolic values Cadar & Sen, 2013 x symbolic() y symbolic() if x == 2*y if x > y+10 33

Symbolic Execution An approach for generating test inputs. Replace the concrete inputs of a program with symbolic values Execute along a path using the symbolic values to build a formula over the input symbols. Cadar & Sen, 2013 x symbolic() y symbolic() if x == 2*y if x > y+10 34

Symbolic Execution An approach for generating test inputs. Replace the concrete inputs of a program with symbolic values Execute along a path using the symbolic values to build a formula over the input symbols. Solve for the symbolic symbols to find inputs that yield the path. Cadar & Sen, 2013 x symbolic() y symbolic() if x == 2*y if x > y+10 x=30 y=15 37

Symbolic Execution An approach for generating test inputs. Replace the concrete inputs of a program with symbolic values Execute along a path using the symbolic values to build a formula over the input symbols. Solve for the symbolic symbols to find inputs that yield the path. Cadar & Sen, 2013 x symbolic() y symbolic() if x == 2*y if x > y+10 x=30 y=15 x=2 y=1 38

Symbolic Execution An approach for generating test inputs. Replace the concrete inputs of a program with symbolic values Execute along a path using the symbolic values to build a formula over the input symbols. Solve for the symbolic symbols to find inputs that yield the path. Cadar & Sen, 2013 x symbolic() y symbolic() if x == 2*y if x > y+10 x=30 y=15 x=2 y=1 x=0 y=1 39

How Can We Solve Constraints? SMT Solvers Satisfiability Modulo Theories SAT with extra logic Standard interfaces through SMTLIB2 40

How Can We Solve Constraints? SMT Solvers Satisfiability Modulo Theories SAT with extra logic Standard interfaces through SMTLIB2 x = 2*y y > 10 (declare-const x Int) (declare-const y Int) (assert (= x (* 2 y))) (assert (> y 10)) (check-sat) (get-model) x=22 y=11 Z3 sat (model (define-fun y () Int 11) (define-fun x () Int 22) ) 43

Useful Questions If ɸ holds after a statement, what must have been true at the point before? weakest precondition 45

Useful Questions If ɸ holds after a statement, what must have been true at the point before? weakest precondition If ɸ holds before a statement, what can we guarantee to be true after? strongest poscondition e.g. Given two versions of a program v1,v2 and assertions on output ɸi in each from an input I What is wp(ɸ1) wp(ɸ2)? 47

Useful Questions If ɸ holds after a statement, what must have been true at the before? weakest precondition If ɸ holds before a statement, what can we guarantee to be true after? strongest poscondition e.g. Given two versions of a program v1,v2 and assertions on output ɸi in each from an input I What is wp(ɸ1) wp(ɸ2)? wp(ɸ1) wp(ɸ2) 48

Exploring the Execution Tree The possible paths of a program form an execution tree. x input() y input() Cadar & Sen, 2013 if x == 2*y if x > y+10 50

Exploring the Execution Tree The possible paths of a program form an execution tree. Traversing the tree will yield tests for all paths. x input() y input() if x == 2*y Cadar & Sen, 2013 if x > y+10 51

Exploring the Execution Tree The possible paths of a program form an execution tree. Traversing the tree will yield tests for all paths. Mechanizing the traversal yields two main approaches Concolic (dynamic symbolic) x input() y input() if x == 2*y if x > y+10 Cadar & Sen, 2013 53

Exploring the Execution Tree The possible paths of a program form an execution tree. Traversing the tree will yield tests for all paths. Mechanizing the traversal yields two main approaches Concolic (dynamic symbolic) Execution Generated Testing x input() y input() if x == 2*y if x > y+10 Cadar & Sen, 2013 58

Exploring the Execution Tree The possible paths of a program form an execution tree. Traversing the tree will yield tests for all paths. Mechanizing the traversal yields two main approaches Concolic (dynamic symbolic) Execution Generated Testing x input() y input() if x == 2*y if x > y+10 Cadar & Sen, 2013 X=? y=10 59

Exploring the Execution Tree The possible paths of a program form an execution tree. Traversing the tree will yield tests for all paths. Mechanizing the traversal yields two main approaches X=20 Concolic (dynamic symbolic) y=10 Execution Generated Testing x input() y input() if x == 2*y if x > y+10 Cadar & Sen, 2013 X=? 20 y=10 60

Exploring the Execution Tree The possible paths of a program form an execution tree. Traversing the tree will yield tests for all paths. Mechanizing the traversal yields two main approaches X=20 Concolic (dynamic symbolic) y=10 Execution Generated Testing Execution on this side is concrete from this point on. x input() y input() if x == 2*y if x > y+10 Cadar & Sen, 2013 X=? 20 y=10 61

Symbolic Execution Increasingly scalable every year 62

Symbolic Execution Increasingly scalable every year Can automatically generate test inputs from constraints 63

Symbolic Execution Increasingly scalable every year Can automatically generate test inputs from constraints The resulting symbolic formulae have many uses beyond just testing. 64

Symbolic Execution Increasingly scalable every year Can automatically generate test inputs from constraints The resulting symbolic formulae have many uses beyond just testing. Try it out: 1) https://github.com/klee/klee 2) Symbolic PathFinder 3) http://research.microsoft.com/pex/ 4) http://angr.io/ 65