IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs

Similar documents
BoxPlot++ Zeina Azmeh, Fady Hamoui, Marianne Huchard. To cite this version: HAL Id: lirmm

Linked data from your pocket: The Android RDFContentProvider

Fault-Tolerant Storage Servers for the Databases of Redundant Web Servers in a Computing Grid

Comparison of spatial indexes

Mokka, main guidelines and future

QuickRanking: Fast Algorithm For Sorting And Ranking Data

Blind Browsing on Hand-Held Devices: Touching the Web... to Understand it Better

Reverse-engineering of UML 2.0 Sequence Diagrams from Execution Traces

An FCA Framework for Knowledge Discovery in SPARQL Query Answers

Multimedia CTI Services for Telecommunication Systems

Teaching Encapsulation and Modularity in Object-Oriented Languages with Access Graphs

Relabeling nodes according to the structure of the graph

Service Reconfiguration in the DANAH Assistive System

Modularity for Java and How OSGi Can Help

Open Digital Forms. Hiep Le, Thomas Rebele, Fabian Suchanek. HAL Id: hal

Setup of epiphytic assistance systems with SEPIA

X-Kaapi C programming interface

B-Refactoring: Automatic Test Code Refactoring to Improve Dynamic Analysis

DynaMoth: Dynamic Code Synthesis for Automatic Program Repair

Linux: Understanding Process-Level Power Consumption

Study on Feebly Open Set with Respect to an Ideal Topological Spaces

Tacked Link List - An Improved Linked List for Advance Resource Reservation

A Voronoi-Based Hybrid Meshing Method

Natural Language Based User Interface for On-Demand Service Composition

Simulations of VANET Scenarios with OPNET and SUMO

How to simulate a volume-controlled flooding with mathematical morphology operators?

HySCaS: Hybrid Stereoscopic Calibration Software

Structuring the First Steps of Requirements Elicitation

Catalogue of architectural patterns characterized by constraint components, Version 1.0

Syrtis: New Perspectives for Semantic Web Adoption

Every 3-connected, essentially 11-connected line graph is hamiltonian

The Connectivity Order of Links

Comparator: A Tool for Quantifying Behavioural Compatibility

The optimal routing of augmented cubes.

KeyGlasses : Semi-transparent keys to optimize text input on virtual keyboard

Branch-and-price algorithms for the Bi-Objective Vehicle Routing Problem with Time Windows

Self-optimisation using runtime code generation for Wireless Sensor Networks Internet-of-Things

Fuzzy sensor for the perception of colour

FIT IoT-LAB: The Largest IoT Open Experimental Testbed

Multi-atlas labeling with population-specific template and non-local patch-based label fusion

BugMaps-Granger: A Tool for Causality Analysis between Source Code Metrics and Bugs

SIM-Mee - Mobilizing your social network

Very Tight Coupling between LTE and WiFi: a Practical Analysis

Moveability and Collision Analysis for Fully-Parallel Manipulators

DANCer: Dynamic Attributed Network with Community Structure Generator

Real-Time and Resilient Intrusion Detection: A Flow-Based Approach

NP versus PSPACE. Frank Vega. To cite this version: HAL Id: hal

The Athena data dictionary and description language

YAM++ : A multi-strategy based approach for Ontology matching task

Implementing an Automatic Functional Test Pattern Generation for Mixed-Signal Boards in a Maintenance Context

Robust IP and UDP-lite header recovery for packetized multimedia transmission

Deformetrica: a software for statistical analysis of anatomical shapes

Real-Time Collision Detection for Dynamic Virtual Environments

Scalewelis: a Scalable Query-based Faceted Search System on Top of SPARQL Endpoints

A Resource Discovery Algorithm in Mobile Grid Computing based on IP-paging Scheme

Emerging and scripted roles in computer-supported collaborative learning

SDLS: a Matlab package for solving conic least-squares problems

The SANTE Tool: Value Analysis, Program Slicing and Test Generation for C Program Debugging

Sliding HyperLogLog: Estimating cardinality in a data stream

YANG-Based Configuration Modeling - The SecSIP IPS Case Study

THE COVERING OF ANCHORED RECTANGLES UP TO FIVE POINTS

Taking Benefit from the User Density in Large Cities for Delivering SMS

Scan chain encryption in Test Standards

A 64-Kbytes ITTAGE indirect branch predictor

Regularization parameter estimation for non-negative hyperspectral image deconvolution:supplementary material

Generative Programming from a Domain-Specific Language Viewpoint

Representation of Finite Games as Network Congestion Games

Comparison of radiosity and ray-tracing methods for coupled rooms

Using a Medical Thesaurus to Predict Query Difficulty

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

Dictionary Building with the Jibiki Platform: the GDEF case

Quality of Service Enhancement by Using an Integer Bloom Filter Based Data Deduplication Mechanism in the Cloud Storage Environment

Mapping classifications and linking related classes through SciGator, a DDC-based browsing library interface

LaHC at CLEF 2015 SBS Lab

ASAP.V2 and ASAP.V3: Sequential optimization of an Algorithm Selector and a Scheduler

Application of RMAN Backup Technology in the Agricultural Products Wholesale Market System

Type Feedback for Bytecode Interpreters

Cloud My Task - A Peer-to-Peer Distributed Python Script Execution Service

Constraint Programming and Combinatorial Optimisation in Numberjack

A Short SPAN+AVISPA Tutorial

Change Detection System for the Maintenance of Automated Testing

The Proportional Colouring Problem: Optimizing Buffers in Radio Mesh Networks

Assisted Policy Management for SPARQL Endpoints Access Control

From medical imaging to numerical simulations

On Code Coverage of Extended FSM Based Test Suites: An Initial Assessment

Synthesis of fixed-point programs: the case of matrix multiplication

REPAIRING PROGRAMS WITH SEMANTIC CODE SEARCH. Yalin Ke Kathryn T. Stolee Claire Le Goues Yuriy Brun

The New Territory of Lightweight Security in a Cloud Computing Environment

MUTE: A Peer-to-Peer Web-based Real-time Collaborative Editor

DSM GENERATION FROM STEREOSCOPIC IMAGERY FOR DAMAGE MAPPING, APPLICATION ON THE TOHOKU TSUNAMI

Formal modelling of ontologies within Event-B

Malware models for network and service management

Prototype Selection Methods for On-line HWR

Querying Source Code with Natural Language

Modelling approaches for anion-exclusion in compacted Na-bentonite

Light field video dataset captured by a R8 Raytrix camera (with disparity maps)

Privacy-preserving carpooling

XBenchMatch: a Benchmark for XML Schema Matching Tools

XML Document Classification using SVM

UsiXML Extension for Awareness Support

Transcription:

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs Thomas Durieux, Martin Monperrus To cite this version: Thomas Durieux, Martin Monperrus. IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs. [Research Report] hal-01272126, Universite Lille 1. 2016. <hal-01272126> HAL Id: hal-01272126 https://hal.archives-ouvertes.fr/hal-01272126 Submitted on 11 Feb 2016 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

IntroClassJava: A Benchmark of 297 Small and Buggy Java Programs Thomas Durieux, Martin Monperrus February 11, 2016 To refer to this document: Thomas Durieux and Martin Monperrus. IntroClassJava: A Benchmark of 297 Small Java Programs for Automatic Repair, Technical Report #hal-01272126, University of Lille. 2016. Abstract Reproducible and comparative research requires well-designed and publicly available benchmarks. We present IntroClassJava, a benchmark of 297 small Java programs, specified by Junit test cases, and usable by any fault localization or repair system for Java. The dataset is based on the IntroClass benchmark and is publicly available on Github. 1 Introduction Context Reproducible and comparative research requires well-designed and publicly available benchmarks. In the context of research in fault localization and automatic repair, this means benchmarks of buggy programs [5], where each buggy program comes with a specification of the bug, usually as test cases. Le Goues and colleagues [3] have published IntroClass a benchmark of small C programs written by students at the University of California Davis. The programs are 5-20 lines, usually in a single method, and are specified by a set of input-output pairs. Objective We aim at providing a Java version of IntroClass, called Intro- ClassJava. IntroClassJava would allow to run and compare repair systems built for Java, such as Nopol [2]. Method We use source to source transformation for building IntroClass- Java. There are two main transformations: the first one transforms the program itself, and esp. takes care of correctly handling pointers and input parameters. The second one transforms the input-output specification as standard JUnit test cases. We discard the programs that cannot be transformed. The resulting programs are standard Maven projects. 1

Result We obtain a benchmark of 297 Java programs usable by any fault localization or repair systems for Java. The dataset is publicly available on Github [1]. 2 Methodology We present how we build the IntroClassJava benchmark. 2.1 Overview We take the IntroClass benchmark. We remove the duplicate programs, we transform the C programs into Java using an ad hoc transformation. We check that each resulting Java program has the same behavior as the original C program, by executing them using the inputs provided with IntroClass. Also, we transform the input output test cases of the C program into standard JUnit test cases. When the transformed program does not compile, or has an observable behavior that is different from the original one (according to the provided inputoutput specification), we discard it. That is the reason for which IntroClassJava contains less programs than IntroClass. 2.2 Translating Pointers in Java Many programs of IntroClass use pointer manipulation on primitive types, such as *i or &i if i is an integer. We use the following transformation to emulate this behavior in Java. The key idea of the transformation is to encapsulate every primitive value into a class. For instance, for integer this results in: class IntObj { int value ; IntObj IntObj ( int i ) { value = i ;... Then all variable declarations, assignments and expressions are transformed as follows. C code int i = 0 k = i; int* j = &i; Java code IntObj i = IntObj(0) k.value = i.value IntObj j = i We use this schema to handle C types int, float, long, double and char (using respectively IntObj, FloatObj, LongObj, DoublObj, and CharObj). 2

Listing 1: Handling of standard input and output c l a s s Median { Scanner scanner ; // f o r handling inputs both in t e s t and command l i n e S t r i n g output = ; // f o r handling t h e output s t a t i c void main ( S t r i n g [ ] a r g s ) throws Exception { Median mainclass = new Median ( ) ; S t r i n g output ; i f ( a r g s. l e n g t h > 0) { mainclass. s c a n n e r = new j a v a. u t i l. Scanner ( a r g s [ 0 ] ) ; e l s e { // standard i n put mainclass. s c a n n e r = new j a v a. u t i l. Scanner ( System. i n ) ; mainclass. exec ( ) ; System. out. p r i n t l n ( mainclass. output ) ; void exec ( ) throws Exception { // t h e program code 2.3 Testability In IntroClass, the program are tested using standard command line arguments. For instance, to test a program computing the median of three numbers, one calls $ median 4 0 17. However, we aim at achieving standard Java testability using JUnit test cases. Consequently, we transforms the programs and test cases as follows. 2.3.1 Backward-compatible Main Method We aim at supporting both command line arguments and JUnit invocations. This is achieved as follows. The main method of the Java programs takes either one argument or no argument. In the former case, it is meant to be used in a test case, in the latter case, it reads the arguments from the standard input as a Java program. We introduce an exec method for each program that is responsible of doing the actual computation. The exec method is used in test cases directly and in the main method when invoked in command line. Listing 2 shows an example main method. 2.3.2 Test Cases In IntroClass, the specification of the expected behavior is based on input-output pairs, where each input and each output is in a separate file. For instance, file 1.in contains the input of the first input/output pair and fail 1.out the expected output. We write a test case generator that reads those files and create Junit test classes accordingly. Consequently, there is one Junit test method per inputoutput pair. Each test case ends with the assertion corresponding to checking the actual output against the expected one. An example test case is shown in 3

Listing 2: Example of generated test case c l a s s WhiteBoxTest { @Test ( timeout = 1000) // corresponds to f i l e whitebox /1. in of IntroClass public void t e s t 1 ( ) throws Exception { Smallest mainclass = new Smallest ( ) ; S t r i n g expected = P l e a s e e n t e r 4 numbers s e p a r a t e d by s p a c e s > 0 i s the s m a l l e s t ; mainclass. s c a n n e r = new j a v a. u t i l. Scanner ( 0 0 0 0 ) ; mainclass. exec ( ) ; S t r i n g out = mainclass. output. r e p l a c e ( \n, ). trim ( ) ; a s s e r t E q u a l s ( expected. r e p l a c e (, ), out. r e p l a c e (, ) ) ; //.. o t h e r t e s t Listing 2. Note that the original output specification of IntroClass contains the command line invitations, they are kept as is in the generated test cases. IntroClass contains two kinds of tests: blackbox and whitebox. Blackbox are manually written and whitebox ones are generated ones with symbolic execution with the golden implementation as oracle. Consequently, IntroClassJava contains two test classes: WhiteBoxTest and BlackBoxTest. 2.4 Unique Naming In order to be able to analyze and repair all programs of the benchmark in the same virtual machine, we give each program and each test class an unique name based on the original identifier of IntroClass. It follows the convention subject, user, revision. For instance, class smallest 5a568359 004 is a program for problem smallest, written by user 5a568359, for which it was the fourth revision. All programs are in package introclassjava. 2.5 Implementation The transformations are implemented in Python. To analyze C code, we transform it into XML suing srcml [4]. srcml encodes the abstract syntax trees of programs as XML. We then load the resulting XML using a DOM library and generate the Java file using standard string manipulation. The printf formatting instructions are transformed to their Java counterpart which are mostly compatible (e.g. String.format ( %d is the smallest, a.value) ). All programs of IntroClassJava are regular Maven projects. 3 Results Final benchmark After applying the transformation described in Section 2, we obtain a dataset of 297 Java programs, all specified with JUnit test cases, with at least one failing test cases. Table 1 gives the main statistics of this dataset. 4

Group checksum digits grade median smallest syllables Total only black box failing tests only white box failing tests both white box and black box Total # of buggy programs 11 programs 75 programs 89 programs 57 programs 52 programs 13 programs 297 programs 33 programs 39 programs 225 programs 297 programs Table 1: Main descriptive statistics of the IntroClassJava benchmark Project # wb ok # wb ko # bb ok # bb ko # both ko checksum 0 11 4 7 7 digits 15 60 24 51 36 grade 1 88 0 89 88 median 9 48 6 51 42 smallest 7 45 5 47 40 syllables 1 12 0 13 12 Total 33 264 39 258 225 Table 2: Full breakdown by project It first indicates the breakdown according to the subject program. For instance, in checksum, there are 11 programs that are failing. It then indicates the breakdown according to the failure type. For instance, there are 33 programs that which only white tests are failing. Table 2 gives the full breakdown by project where wb means whitebox, bb means blackbox, ko means failing and ok means passing. Comparison with IntroClass In IntroClass, there are 450 failing programs, it means that the automatic transformation has failed for 450 297 = 153 programs. Either the resulting program was not syntactically correct Java and consequently does not compile, or the transformation has changed the execution behavior. Interestingly, for some programs, the transformation itself repaired the bug (the C program fails on some inputs, but the Java version is correct). This is mostly due to the different behavior of the non-initialized variable in C versus Java. In C, the non-initialized local variables have the value of the allocated memory space and in Java, all primitive types have a default value (e.g. 0 for numbers). 5

4 Conclusion We have presented how we have set up IntroClassJava, a benchmark of 297 buggy Java programs. It results from the automated transformation from C to Java of the IntroClass benchmark. This dataset can be used for future research on fault localization and automatic repair on Java programs. The benchmark is publicly available on Github [1]. References [1] The github repository of introclassjava. https://github.com/ Spirals-Team/IntroClassJava, 2015. [2] F. DeMarco, J. Xuan, D. L. Berre, and M. Monperrus. Automatic repair of buggy if conditions and missing preconditions with smt. In Proceedings of the 6th International Workshop on Constraints in Software Testing, Verification, and Analysis (CSTVA 2014), 2014. [3] C. Le Goues, N. Holtschulte, E. K. Smith, Y. Brun, P. Devanbu, S. Forrest, and W. Weimer. The manybugs and introclass benchmarks for automated repair of c programs. IEEE Transactions on Software Engineering (TSE), in press, 2015. [4] J. I. Maletic, M. L. Collard, and A. Marcus. Source code files as structured documents. In Proceedings of the 10th International Workshop on Program Comprehension (IWPC), 2002. [5] M. Monperrus. A critical review of automatic patch generation learned from human-written patches : Essay on the problem statement and the evaluation of automatic software repair. In Proceedings of the International Conference on Software Engineering, pages 234 242, 2014. 6