Feedback-directed Random Test Generation

Similar documents
Automatic Test Generation. Galeotti/Gorla/Rau Saarland University

Finding Errors in.net with Feedback-Directed Random Testing

An Empirical Comparison of Automated Generation and Classification Techniques for Object-Oriented Unit Testing

Automated Documentation Inference to Explain Failed Tests

Automatic Generation of Oracles for Exceptional Behaviors

Combined Static and Dynamic Automated Test Generation

Automated Software Testing in the Absence of Specifications

Shin Hwei Tan Darko Marinov Lin Tan Gary T. Leavens University of Illinois. University of Illinois

Automated Test-Input Generation

Directed Random Testing. Carlos Pacheco

Leveraging Test Generation and Specification Mining for Automated Bug Detection without False Positives

Mutually Enhancing Test Generation and Specification Inference

Automated Documentation Inference to Explain Failed Tests

Using a Genetic Algorithm to Control Randomized Unit Testing

Achievements and Challenges

Combined Static and Dynamic Mutability Analysis 1/31

THE ROAD NOT TAKEN. Estimating Path Execution Frequency Statically. ICSE 2009 Vancouver, BC. Ray Buse Wes Weimer

Software Design Models, Tools & Processes. Lecture 6: Transition Phase Cecilia Mascolo

Automated Test Generation for Java Generics

An Introduction To Software Agitation

Automated Whitebox Fuzz Testing. by - Patrice Godefroid, - Michael Y. Levin and - David Molnar

Symbolic Execution, Dynamic Analysis

TECHNICAL WHITEPAPER. Performance Evaluation Java Collections Framework. Performance Evaluation Java Collections. Technical Whitepaper.

Java-HCT: An approach to increase MC/DC using Hybrid Concolic Testing for Java programs

Automatic Identification of Common and Special Object-Oriented Unit Tests

Exercises: Disjoint Sets/ Union-find [updated Feb. 3]

Automatic Testing of Sequential and Concurrent Substitutability

State Extensions for Java PathFinder

Is Coincidental Correctness Less Prevalent in Unit Testing? Wes Masri American University of Beirut Electrical and Computer Engineering Department

Christoph Csallner, University of Texas at Arlington (UTA)

GRT: An Automated Test Generator using Orchestrated Program Analysis

Programming Languages and Techniques (CIS120)

Testing, Debugging, Program Verification

Improving Test Suites via Operational Abstraction

Regular Expressions ("reguläre Ausdrücke", "regexp")

CMSC132, Practice Questions

CS211 Computers and Programming Matthew Harris and Alexa Sharp July 9, Boggle

Automatic Generation of Program Specifications

Announcements. Testing. Announcements. Announcements

Heuristics-based Parameter Generation for Java Methods

USAL1J: Java Collections. S. Rosmorduc

Graphical Interface and Application (I3305) Semester: 1 Academic Year: 2017/2018 Dr Antoun Yaacoub

Learning from Executions

David Glasser Michael D. Ernst CSAIL, MIT

Comparing Non-adequate Test Suites using Coverage Criteria

MACHINE LEARNING BASED METHODOLOGY FOR TESTING OBJECT ORIENTED APPLICATIONS

MethodHandlesArrayElementGetterBench.testCreate Analysis. Copyright 2016, Oracle and/or its affiliates. All rights reserved.

Fitness-Guided Path Exploration in Dynamic Symbolic Execution

Impact of Length of Test Sequence on Coverage in Software Testing

ASSESSING INVARIANT MINING TECHNIQUES FOR CLOUD-BASED UTILITY COMPUTING SYSTEMS

CS159. Nathan Sprague. September 30, 2015

Making Program Refactoring Safer

Test Factoring: Focusing test suites on the task at hand

Software Engineering Testing and Debugging Testing

Java: advanced object-oriented features

Comparing Approaches to Analyze Refactoring Activity on Software Repositories

Principles of Software Construction: Objects, Design, and Concurrency (Part 2: Designing (Sub )Systems)

TEST FRAMEWORKS FOR ELUSIVE BUG TESTING

An Empirical Study of Structural Constraint Solving Techniques

Fully Automatic and Precise Detection of Thread Safety Violations

Test Advising Framework

Data abstractions: ADTs Invariants, Abstraction function. Lecture 4: OOP, autumn 2003

Which Configuration Option Should I Change?

[ANALYSIS ASSIGNMENT 10]

assertion-driven analyses from compile-time checking to runtime error recovery

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

An Introduction to Systematic Software Testing. Robert France CSU

Topic 22 Hash Tables

Principles of Software Construction: Testing: One, Two, Three

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell

The Object Class. java.lang.object. Important Methods In Object. Mark Allen Weiss Copyright 2000

Data Structures. COMS W1007 Introduction to Computer Science. Christopher Conway 1 July 2003

Testing. CMSC 433 Programming Language Technologies and Paradigms Spring A Real Testing Example. Example (Black Box)?

Top Down Design vs. Modularization

Controlling Randomized Unit Testing With Genetic Algorithms

MIT AITI Lecture 18 Collections - Part 1

Software Testing TEST CASE SELECTION AND ADEQUECY TEST EXECUTION

Types of Software Testing: Different Testing Types with Details

Exploring Code with Microsoft Pex

Jtest Tutorial. Tutorial

Bugs in software. Using Static Analysis to Find Bugs. David Hovemeyer

Automatic Generation of Unit Regression Tests

Invariants What is an invariant?

DyGen: Automatic Generation of High-Coverage Tests via Mining Gigabytes of Dynamic Traces

Homework 4: Hash Tables Due: 5:00 PM, Mar 9, 2018

Software Engineering

Efficient Solving of Structural Constraints

Test Automation. 20 December 2017

1600 Faults in 100 Projects: Automatically Finding Faults While Achieving High Coverage with EvoSuite

Identifying Patch Correctness in Test-Based Program Repair

Midterm Wednesday Oct. 27, 7pm, room 142

PVS. Empirical Study of Usage and Performance of Java Collections. Diego Costa, 1 Artur Andrzejak, 1 Janos Seboek, 2 David Lo

New Strategies for Automated Random Testing

Kent Academic Repository

CSE 331 Software Design & Implementation

Faults Found by Random Testing. WanChi Chio Justin Molinyawe

Automatic Extraction of Abstract-Object-State Machines Based on Branch Coverage

Introduction to Dynamic Analysis

Chapter 11, Testing. Using UML, Patterns, and Java. Object-Oriented Software Engineering

Black Box Testing. EEC 521: Software Engineering. Specification-Based Testing. No Source Code. Software Testing

Transcription:

Feedback-directed Random Test Generation (ICSE 07 with minor edits) Carlos Pacheco Shuvendu Lahiri Michael Ernst Thomas Ball MIT Microsoft Research Random testing Select inputs at random from a program s input space Check that program behaves correctly on each input An attractive error-detection technique Easy to implement and use Yields lots of test inputs Finds errors Miller et al. 1990: Unix utilities Kropp et al.1998: OS services Forrester et al. 2000: GUI applications Claessen et al. 2000: functional programs Csallner et al. 2005, Pacheco et al. 2005: object-oriented programs Groce et al. 2007: flash memory, file systems 1

Evaluations of random testing Theoretical work suggests that random testing is as effective as more systematic input generation techniques (Duran 1984, Hamlet 1990) Some empirical studies suggest systematic is more effective than random Ferguson et al. 1996: compare with chaining Marinov et al. 2003: compare with bounded exhaustive Visser et al. 2006: compare with model checking and symbolic execution Studies are performed on small benchmarks, they do not measure error revealing effectiveness, and they use completely undirected random test generation. Contributions We propose feedback-directed random test generation Randomized creation of new test inputs is guided by feedback about the execution of previous inputs Goal is to avoid redundant and illegal inputs Empirical evaluation Evaluate coverage and error-detection ability on a large number of widely-used, well-tested libraries (780KLOC) Compare against systematic input generation Compare against undirected random input generation 2

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.net Coverage Error detection Current and future directions for Randoop Random testing: pitfalls 1. Useful test Set t = new HashSet(); s.add( hi ); asserttrue(s.equals(s)); 2. Redundant test Set t = new HashSet(); s.add( hi ); s.isempty(); asserttrue(s.equals(s)); 3. Useful test Date d = new Date(2006, 2, 14); asserttrue(d.equals(d)); 4. Illegal test Date d = new Date(2006, 2, 14); d.setmonth(-1); // pre: argument >= 0 asserttrue(d.equals(d)); 5. Illegal test Date d = new Date(2006, 2, 14); d.setmonth(-1); d.setday(5); asserttrue(d.equals(d)); do not output do not even create 3

Feedback-directed random test generation Build test inputs incrementally New test inputs extend previous ones In our context, a test input is a method sequence As soon as a test input is created, execute it Use execution results to guide generation away from redundant or illegal method sequences towards sequences that create new object states Technique input/output Input: classes under test time limit set of contracts Method contracts (e.g. o.hashcode() throws no exception ) Object invariants (e.g. o.equals(o) == true ) Output: contract-violating test cases. Example: no contracts violated up to last method call HashMap h = new HashMap(); Collection c = h.values(); Object[] a = c.toarray(); LinkedList l = new LinkedList(); l.addfirst(a); TreeSet t = new TreeSet(l); Set u = Collections.unmodifiableSet(t); asserttrue(u.equals(u)); fails when executed 4

Technique 1. Seed components components = { int i = 0; boolean b = false;... } 2. Do until time limit expires: a. Create a new sequence i. Randomly pick a method call m(t 1...T k )/T ret ii. For each input parameter of type T i, randomly pick a sequence S i from the components that constructs an object v i of type Ti iii. Create new sequence S new = S 1 ;... ; S k ; T ret v new = m(v1...vk); iv. if S new was previously created (lexically), go to i b. Classify the new sequence S new a. May discard, output as test case, or add to components Classifying a sequence start execute and check contracts contract violated? yes minimize sequence no components no sequence redundant? yes contractviolating test case discard sequence 5

Redundant sequences During generation, maintain a set of all objects created. A sequence is redundant if all the objects created during its execution are members of the above set (using equals to compare) Could also use more sophisticated state equivalence methods E.g. heap canonicalization Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.net Coverage Error detection Current and future directions for Randoop 6

Randoop Implements feedback-directed random test generation Input: An assembly (for.net) or a list of classes (for Java) Generation time limit Optional: a set of contracts to augment default contracts Output: a test suite (Junit or Nunit) containing Contract-violating test cases Normal-behavior test cases Randoop outputs oracles Oracle for contract-violating test case: Object o = new Object(); LinkedList l = new LinkedList(); l.addfirst(o); TreeSet t = new TreeSet(l); Set u = Collections.unmodifiableSet(t); asserttrue(u.equals(u)); // expected to fail Oracle for normal-behavior test case: Object o = new Object(); LinkedList l = new LinkedList(); l.addfirst(o); l.add(o); assertequals(2, l.size()); // expected to pass assertequals(false, l.isempty()); // expected to pass Randoop uses observer methods to capture object state 7

Some Randoop options Avoid use of null statically Object o = new Object(); LinkedList l = new LinkedList(); l.add(null); and dynamically Object o = returnnull(); LinkedList l = new LinkedList(); l.add(o); Bias random selection Favor smaller sequences Favor methods that have been less covered Use constants mined from source code Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.net Coverage Error detection Current and future directions for Randoop 8

Coverage Seven data structures (stack, bounded stack, list, bst, heap, rbt, binomial heap) Used in previous research Bounded exhaustive testing [ Marinov 2003 ] Symbolic execution [ Xie 2005 ] Exhaustive method sequence generation [Xie 2004 ] All above techniques achieve high coverage in seconds Tools not publicly available Coverage achieved by Randoop Comparable with exhaustive/symbolic techniques data structure time (s) branch cov. Bounded stack (30 LOC) 1 100% Unbounded stack (59 LOC) 1 100% BS Tree (91 LOC) 1 96% Binomial heap (309 LOC) 1 84% Linked list (253 LOC) 1 100% Tree map (370 LOC) 1 81% Heap array (71 LOC) 1 100% 9

Visser containers Visser et al. (2006) compares several input generation techniques Model checking with state matching Model checking with abstract state matching Symbolic execution Symbolic execution with abstract state matching Undirected random testing Comparison in terms of branch and predicate coverage Four nontrivial container data structures Experimental framework and tool available Predicate coverage feedback-directed feedback-directed best systematic best systematic undirected random undirected random feedback-directed feedback-directed best systematic best systematic undirected random undirected random 10

Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.net Coverage Error detection Current and future directions for Randoop Subjects LOC Classes JDK (2 libraries) (java.util, javax.xml) 53K 272 Apache commons (5 libraries) (logging, primitives, chain jelly, math, collections) 114K 974.Net framework (5 libraries) 582K 3330 11

Methodology Ran Randoop on each library Used default time limit (2 minutes) Contracts: o.equals(o)==true o.equals(o) throws no exception o.hashcode() throws no exception o.tostring() throw no exception No null inputs and: Java: No NPEs.NET: No NPEs, out-of-bounds, of illegal state exceptions Results test cases output errorrevealing tests cases distinct errors JDK 32 29 8 Apache commons 187 29 6.Net framework 192 192 192 Total 411 250 206 12

Errors found: examples JDK Collections classes have 4 methods that create objects violating o.equals(o) contract Javax.xml creates objects that cause hashcode and tostring to crash, even though objects are well-formed XML constructs Apache libraries have constructors that leave fields unset, leading to NPE on calls of equals, hashcode and tostring (this only counts as one bug) Many Apache classes require a call of an init() method before object is legal led to many false positives.net framework has at least 175 methods that throw an exception forbidden by the library specification (NPE, out-ofbounds, or illegal state exception).net framework has 8 methods that violate o.equals(o).net framework loops forever on a legal but unexpected input JPF Used JPF to generate test inputs for the Java libraries (JDK and Apache) Breadth-first search (suggested strategy) max sequence length of 10 JPF ran out of memory without finding any errors Out of memory after 32 seconds on average Spent most of its time systematically exploring a very localized portion of the space For large libraries, random, sparse sampling seems to be more effective 13

Undirected random testing JCrasher implements undirected random test generation Creates random method call sequences Does not use feedback from execution Reports sequences that throw exceptions Found 1 error on Java libraries Reported 595 false positives Regression testing Object o = new Object(); LinkedList l = new LinkedList(); l.addfirst(o); l.add(o); assertequals(2, l.size()); // expected to pass assertequals(false, l.isempty()); // expected to pass Randoop can create regression oracles Generated test cases using JDK 1.5 Randoop generated 41K regression test cases Ran resulting test cases on JDK 1.6 Beta 25 test cases failed Sun s implementation of the JDK 73 test cases failed Failing test cases pointed to 12 distinct errors These errors were not found by the extensive compliance test suite that Sun provides to JDK developers 14

Evaluation: summary Feedback-directed random test generation: Is effective at finding errors Discovered several errors in real code (e.g. JDK,.NET framework core libraries) Can outperform systematic input generation On previous benchmarks and metrics (coverage), and On a new, larger corpus of subjects, measuring error detection Can outperform undirected random test generation Outline Feedback-directed random test generation Evaluation: Randoop: a tool for Java and.net Coverage Error detection Current and future directions for Randoop 15

Tech transfer Randoop is currently being maintained by a product group at Microsoft Spent an internship doing the tech transfer How would a test team actually use the tool? Push-button at first, desire more control later Would the tool be cost-effective? Yes Immediately found a few errors With more control, found more errors Pointed to blind spots in existing test suites Existing automated testing tools Which heuristics would be most useful? The simplest ones (e.g. uniform selection) More sophisticated guidance was best left to the users of the tool Future directions Combining random and systematic generation DART (Godefroid 2005) combines random and systematic generation of test data How to combine random and systematic generation of sequences? Using Randoop for reliability estimation Random sampling amenable to statistical analysis Are programs that Randoop finds more problems with more error-prone? Better oracles To date, we have used a very basic set of contracts Will better contracts lead to more errors? Incorporate techniques that create oracles automatically 16

Conclusion Feedback-directed random test generation Finds errors in widely-used, well-tested libraries Can outperform systematic test generation Can outperform undirected test generation Randoop: Easy to use just point at a set of classes Has real clients: used by product groups at Microsoft A mid-point in the systematic-random space of input generation techniques 17