Dynamic Symbolic Database Application Testing

Similar documents
Dynamic Symbolic Database Application Testing

Christoph Csallner, University of Texas at Arlington (UTA)

Symbolic Execution. Wei Le April

Symbolic and Concolic Execution of Programs

Symbolic Execution, Dynamic Analysis

CMSC 430 Introduction to Compilers. Fall Symbolic Execution

HAMPI A Solver for String Theories

From Symbolic Execution to Concolic Testing. Daniel Paqué

DART: Directed Automated Random Testing

Software Architecture and Engineering: Part II

CYSE 411/AIT681 Secure Software Engineering Topic #17: Symbolic Execution

Software has bugs. Static analysis 4/9/18. CYSE 411/AIT681 Secure Software Engineering. To find them, we use testing and code reviews

Automated Software Testing in the Absence of Specifications

References: Thomas A. Henzinger (1996): The theory of hybrid automata In: Annual IEEE Symposium on Logic in Computer Science

n HW7 due in about ten days n HW8 will be optional n No CLASS or office hours on Tuesday n I will catch up on grading next week!

A Backtracking Symbolic Execution Engine with Sound Path Merging

Symbolic Execution. Joe Hendrix Galois, Inc SMT Summer School galois

Fitness-Guided Path Exploration in Automated Test Generation

Testing & Symbolic Execution

Research on Fuzz Testing Framework based on Concolic Execution

CMSC 631 Program Analysis and Understanding. Spring Symbolic Execution

Symbolic Execu.on. Suman Jana

Automated Testing of Cloud Applications

CUTE: A Concolic Unit Testing Engine for C

KLEE: Effective Testing of Systems Programs Cristian Cadar

assertion-driven analyses from compile-time checking to runtime error recovery

A Generating Test Cases for Programs that Are Coded Against Interfaces and Annotations

Symbolic Execution for Bug Detection and Automated Exploit Generation

Dynamic Test Generation to Find Bugs in Web Application

Emblematic Execution for Software Testing based on DART AND CUTE

Environment Modeling for Automated Testing of Cloud Applications

Automated Test-Input Generation

Microsoft SAGE and LLVM KLEE. Julian Cohen Manual and Automatic Program Analysis

Targeted Test Input Generation Using Symbolic Concrete Backward Execution

Dynamic Symbolic Execution using Eclipse CDT

Kent Academic Repository

Program Testing via Symbolic Execution

An Empirical Comparison of Automated Generation and Classification Techniques for Object-Oriented Unit Testing

Testing, Fuzzing, & Symbolic Execution

IC-Cut: A Compositional Search Strategy for Dynamic Test Generation

Test Automation. 20 December 2017

Query Optimization Overview

A Gentle Introduction to Program Analysis

Automated Whitebox Fuzz Testing

Testing Error Handling Code in Device Drivers Using Characteristic Fault Injection

CS317 File and Database Systems

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Symbolic Execution with Interval Constraint Solving and Meta-Heuristic Search

JMLCUTE: Automated JML-Based Unit Test Case Generation

CS317 File and Database Systems

ADVANCED COMBINATORIAL TESTING ALGORITHMS AND APPLICATIONS LINBIN YU. Presented to the Faculty of the Graduate School of

Efficient Symbolic Execution for Software Testing

Query Optimization. Shuigeng Zhou. December 9, 2009 School of Computer Science Fudan University

Test Generation Using Symbolic Execution

IAENG International Journal of Computer Science, 41:3, IJCS_41_3_06. Automated Test Generation for Object-Oriented Programs with Multiple Targets

Mini Project Report One

Industrial Application of Concolic Testing Approach: A Case Study on libexif by Using CREST-BV and KLEE

JPF SE: A Symbolic Execution Extension to Java PathFinder

An Eclipse Plug-in for Model Checking

Test-Case Generation for Runtime Analysis and Vice-Versa: Verification of Aircraft Separation Assurance

Reading Assignment. Symbolic Evaluation/Execution. Move from Dynamic Analysis to Static Analysis. Move from Dynamic Analysis to Static Analysis

Symbolic Evaluation/Execution

Introduction Alternative ways of evaluating a given query using

Demand-Driven Compositional Symbolic Execution

Database Applications (15-415)

Evaluation Strategies for Functional Logic Programming

CHAPTER. Oracle Database 11g Architecture Options

Arnab Bhattacharya, Shrikant Awate. Dept. of Computer Science and Engineering, Indian Institute of Technology, Kanpur

Enabling Variability-Aware Software Tools. Paul Gazzillo New York University

Implementation Techniques

Advanced Programming Methods. Introduction in program analysis

Available online at ScienceDirect. Procedia Computer Science 62 (2015 )

Contents Contents Introduction Basic Steps in Query Processing Introduction Transformation of Relational Expressions...

Joint Entity Resolution

Query Evaluation Strategies

Software Testing CS 408. Lecture 6: Dynamic Symbolic Execution and Concolic Testing 1/30/18

Fitness-Guided Path Exploration in Dynamic Symbolic Execution

Block-wise abstract interpretation by combining abstract domains with SMT

Advanced Test Coverage Criteria: Specify and Measure, Cover and Unmask

Scalable Test Generation by Interleaving Concrete and Symbolic Execution

340 Review Fall Midterm 1 Review

µz An Efficient Engine for Fixed Points with Constraints

Parser: SQL parse tree

SDD-1 Algorithm Implementation

Symbolic Execution. Michael Hicks. for finding bugs. CMSC 631, Fall 2017

Xuandong Li. BACH: Path-oriented Reachability Checker of Linear Hybrid Automata

Improving Program Testing and Understanding via Symbolic Execution

Dynamic Software Model Checking

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University

HECTOR: Formal System-Level to RTL Equivalence Checking

VS 3 : SMT Solvers for Program Verification

Alleviating Patch Overfitting with Automatic Test Generation: A Study of Feasibility and Effectiveness for the Nopol Repair System

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

ABC basics (compilation from different articles)

Parallel SMT-Constrained Symbolic Execution for Eclipse CDT/Codan

LOWELL WEEKLY JOURNAL

CSE 190D Spring 2017 Final Exam Answers

Data Analytics and Boolean Algebras

Compositional Symbolic Execution through Program Specialization

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler

Transcription:

Dynamic Symbolic Database Application Testing Chengkai Li, Christoph Csallner University of Texas at Arlington June 7, 2010 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 1/30

Motivation Overview Maximizing code coverage is an important goal in testing. Database applications: input can be user-supplied queries. Query results will be used as program values in program logic. Different queries thus result in different execution paths. To maximize code coverage: we need to enumerate queries in an effective way. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 2/30

Our Method Overview Generate queries dynamically by inverting branching conditions in existing program execution paths. 1 Monitor the program s execution paths by dynamic symbolic execution (e.g., Dart, Pex). 2 Invert a branching condition on some covered path a new test query. 3 Execute the query, bring in new tuples. 4 The new tuples will cover new paths. 5 Do 1-4 iteratively. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 3/30

Illustration of the Idea After the initial query q 1 =c 1 c 2 Execution tree (maintained by dynamic symbolic engine): each path to a leaf node represents an execution path, encountered for tuples satisfying the branching conditions on the path. true c1 c2 c3 (q1) if (z > 0) { if (z < 100) //.. } // c1 // c2 c4!c4 c5!c5 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 4/30

Illustration of the Idea After the initial query, the candidate queries Each dashed edge represents an inversed branching condition, thus a candidate query. true c1!c1 c2 (q1)!c2 c3!c3 c4!c4 c5!c5 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 5/30

Illustration of the Idea The second test query q 2 =!c 1 c1 true!c1 (q2) c2 (q1)!c2 c3!c3 c4!c4 c5!c5 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 6/30

Illustration of the Idea After the second test query q 2 =!c 1 candidate queries are again dashed. c1 true!c1 (q2) c2 (q1)!c2 c6!c6 c3!c3 c7!c7 c4 c5!c5!c4 c11!c11 c12!c12 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 7/30

Illustration of the Idea The third test query q 3 =!c 1 c 6 c 7 c1 true!c1 (q2) c2 (q1)!c2 c6!c6 c3!c3 c7 (q3)!c7 c4 c5!c5!c4 c11!c11 c12!c12 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 8/30

Illustration of the Idea After the third test query q 3 =!c 1 c 6 c 7 c1 true!c1 (q2) c2 (q1)!c2 c6!c6 c3!c3 c7 (q3)!c7 c4!c4 c8!c8 c11!c11 c5!c5 c9!c9c10!c10 c12!c12 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 9/30

Illustration of the Idea The fourth test query q 4 =!c 1 c 6!c 7!c 11!c 12 c1 true!c1 (q2) c2 (q1)!c2 c6!c6 c3!c3 c7 (q3)!c7 c4!c4 c8!c8 c11!c11 c5!c5 c9!c9c10!c10 c12!c12 (q4) DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 10/30

Illustration of the Idea After the fourth test query q 4 =!c 1 c 6!c 7!c 11!c 12 c1 true!c1 (q2) c2 (q1)!c2 c6!c6 c3!c3 c7 (q3)!c7 c4!c4 c8!c8 c11!c11 c5!c5 c9!c9c10!c10 c12!c12 (q4) DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 11/30

Advantages of the Proposed Method Real data, no mock database (which can be hard to generate). No need to worry about if the mock database is representative. Given large space of possible program paths, we only test those that can be encountered for real data. This is especially useful for applications that only read existing data. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 12/30

Alternative Method 1: Brute force Test for every tuple in database. Too costly Limited resources in testing. Many tuples result in the same execution path. Thus efforts wasted. May not be possible to get all the tuples Security constraint. Query capability constraint. (e.g., deep-web databases) DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 13/30

Alternative Method 2: Sample the existing database Do sampling first, then test for every tuple in the sample. A presentative database sample may not trigger a set of program execution paths that is representative of the paths encountered in production use. E.g., a column with 1 million distinct values; several particular values will trigger some paths. Ours can be viewed as a sampling technique that is aware of the program structure. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 14/30

Alternative Method 3: Generate custom mock databases Generate a mock database such that its data will expose a bug in the program Will expose potential program bugs. But users may not care about them. Because many bugs will never occur in practice. Because the mock database generator typically cannot generate fully realistic databases. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 15/30

Alternative Method 4: Static Analysis Static program analysis is typically: (+) Fast (-) Imprecise: misses bugs and gives false alarms Our approach: Test = execute the program (dynamic analysis) (+) Fully precise: no false alarms (-) Resource-hungry, will still miss bugs Our (dynamic) analysis reasons about program + existing database contents. We are not aware of any static analysis that does that. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 16/30

Assumptions/Limitations Queries single-relation conjunctive selection query. Each conjunct is a v, where a is an attribute, v is a constant value, and can be <,, >,, =, or =. no grouping, aggregation, join, insertion, deletion, updates. Programs follow tuple-wise semantics. if a branching condition depends on a database tuple, the condition can be rewritten to the same form of the query conjuncts: a v. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 17/30

Iterative Testing Method 1: q define an initial test query; Q {q} 2: repeat 3: T run q and get the first n q result tuples 4: for each tuple t in T do 5: run the program over t and update the execution tree tree Q with encountered new execution paths 6: tree Q the complement tree of tree Q 7: Q c get the candidate queries based on tree Q 8: q select a query from Q c 9: Q Q {q} 10: until stopping criteria satisfied DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 18/30

Challenges Overview How to decide how many tuples to retrieve for a query? choose the next test query? design stopping condition for testing? DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 19/30

Optimization Goals Given program P and a set of test queries Q={q i } maximize coverage Path(P,R,Q) = {Path t t T i }, where T i is the first n i tuples for query q i. minimize cost cost(q) = i cost(q i) cost(q i ) = q_cost(q i )+t_cost(q i ) = w + c n i + t n i t_cost: t is test cost per tuple. q_cost: w is query cost to get first result tuple, c is query cost to get each additional tuple. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 20/30

Why only n i tuples for a query q i? Multiple tuples will result in the same program execution path. After a certain number of initial tuples, most or all distinct paths may have been encountered. Less retrieved/tested tuples means both less testing cost and less query execution cost. DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 21/30

How to choose next q and n Greedy Approach Given candidate query q, score(q) = cost (q) Path (P,R,M,Q {q}) Path(P,R,M,Q) Path (P,R,M,Q {q}) : estimate of Path(P,R,M,Q {q}) cost (q): estimate of cost(q) (both are functions of n) find q that minimizes score(q) DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 22/30

Estimating the Coverage and Cost Estimating the Coverage Estimate the query result size of leaf node (query). The result sizes for intermediate nodes are accumulated. c1 (100) c2!c2 (20) (80) c3!c3 c4 c5!c5!c4 (10) (10) (40) (40) Estimating the Cost both initial tuple cost and total cost. EXPLAIN (supported by major DBMSs) (30) (10) DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 23/30

Stopping Condition for Testing testing resource limit reached no more candidate queries no candidate query can return non-empty result total number of encountered tuples (associated with distinct paths) equals the table size DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 24/30

Overview Overview Fully automated tool Analyze Java bytecode programs (any Java program, no need for source code) Rewrite application bytecode at load-time: after each application bytecode instruction, insert a call to our dynamic symbolic engine Use inserted calls to maintain an accurate symbolic representation of program state Treat calls to database (e.g., Jdbc) differently: Represent returned values as symbolic variables and track how the program uses them, i.e., in path conditions DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 25/30

Overview Details Use Java 5 instrumentation facilities Use third-party open source bytecode instrumentation framework ASM Implement on top of new dynamic symbolic engine Dsc: Allows handling of regular (non-query) program inputs Solve constraints on regular program inputs with powerful third-party satisfiability modulo theories (SMT) constraint solver Z3 DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 26/30

Several directions Finish prototype implementation Evaluate on realistic applications Compare with mock-database generation techniques + compare with traditional database sampling techniques: Can we achieve higher coverage of the application code that is reachable with the existing database contents? How to deal with database insert, update, delete? DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 27/30

Thank you! Overview Contact cli@uta.edu, csallner@uta.edu DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 28/30

References Overview Dynamic Symbolic Execution Systems Dart: C programs, by Godefroid et al. [PLDI 05] jcute: Java programs, by Sen et al. [CAV 06] Klee: C programs, by Cadar et al. [OSDI 08] Pex:.Net programs (C#, etc.), by Tillmann et al. [TAP 08] Database application testing via mock database generation jcute extension: Java programs, by Emmi et al. [ISSTA 07] Qex (Pex extension):.net programs (C#, etc.), by Veanes et al. [ICFEM 09] DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 29/30

References Overview Main tools used by our prototype implementation ASM: http://asm.ow2.org/ Z3: http://research.microsoft.com/en-us/um/redmond/projects/z3/ DBTest 2010 Chengkai Li, Christoph Csallner Dynamic Symbolic Database Application Testing: 30/30