Jazz: A Tool for Demand-Driven Structural Testing

Jazz: A Tool for Demand-Driven Structural Testing J. Misurda, J. A. Clause, J. L. Reed, P. Gandra, B. R. Childers, and M. L. Soffa Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania 15260 USA {jmisurda,clausej,juliya,pgranda,childers}@cs.pitt.edu Department of Computer Science University of Virginia Charlottesville, Virginia 22904 USA soffa@cs.virginia.edu Abstract Software testing to produce reliable and robust software has become vitally important in recent years. Testing is a process by which software quality can be assured through the collection of information about software. While testing can improve software reliability, current tools typically are inflexible and have high overheads, making it a challenge to test large software projects. In this paper, we describe a new scalable and flexible tool, called Jazz, for testing Java programs. Jazz uses a novel demand-driven approach for node, branch, and def-use structural testing. It has a graphical user interface for specifying and running tests, a test planner to determine the most efficient way to test the program, and a dynamic instrumenter to carry out a test. Jazz has a low average overhead of 17.6% for branch testing, while in comparison, a traditional tool has an overhead of 89.9%. 1. Introduction In the last several years, the importance of producing high quality and robust software has become paramount [6]. Testing is an important process to support quality assurance by gathering information about the software being developed or modified. It is, in general, extremely labor and resource intensive, accounting for 50-60% of the total cost of software development [7]. The increased emphasis on software quality and robustness mandates improved testing methodologies. To test software, a number of techniques can be applied. One class of techniques that is widely used is structural testing, which checks that a given coverage criterion is satisfied. For example, branch testing checks that a certain percentage of branches are executed. Other structural tests include def-use testing in which pairs of variable definitions and uses are checked for coverage and node testing in which nodes in a program s control flow graph are checked. Unfortunately, structural testing approaches are often hindered by the lack of scalable and flexible tools. Current tools are not scalable in terms of both time and memory, limiting the number and scope of the tests that can be applied to large programs. These tools often modify the software binary to insert instrumentation for testing. In this case, the tested version of the application is not the same version that is shipped to customers and errors may remain. Testing tools are usually inflexible and only implement certain types of testing. For example, many tools implement branch testing, but do not implement node or def-use testing. In this paper, we describe a new tool for structural testing, called Jazz, that addresses these problems. This tool uses a novel demand-driven technique to apply different testing strategies in an efficient and automatic way. Our method relies on test plans that describe what test instrumentation should be inserted and removed ondemand in executing code to carry out testing strategies. A test plan is a recipe that describes how and where a test should be performed. The approach is path specific and uses the actual execution paths of an application to drive the instrumentation and testing. Once a test site is covered, the instrumentation is dynamically removed to avoid performance overhead, and the test plan continues. The granularity of the instrumentation is flexible and includes statement and structure level (e.g., loops, functions). Because this approach is dynamic and can insert and remove tests as a program executes, the same program that is tested can be shipped to a customer. Jazz uses a specification language to describe what to test. From the specification, a test plan can be automatically generated by a test planner. The test specification describes what tests to apply and under what conditions to apply them. The specification language has both a visual representation and textual form. The visual language is expressed through a graphical user interface (GUI). The GUI can be used to describe tests, automatically carry them out, and report results. Jazz is a complete and new implementation of our SoftTest framework [2] for structural testing. It implements a GUI, a test planner, and dynamic instrumenter for on-demand testing. Jazz is incorporated as a plug-in 1

Eclipse Plug-in Application Test Spec. JVM Test Planner Node Branch Def-Use Planner APIs Test Specification in the Eclipse integrated development environment (IDE) and the IBM Jikes Java Research Virtual Machine [1]. It supports branch, node and def-use testing over code regions in a program. Experiments show that Jazz s run-time overhead is very low in comparison to traditional testing tools that use static instrumentation. 2. Jazz: Testing Java Programs Figure 1 shows Jazz s components and how they interact. To carry out a test, a user constructs a test specification with the GUI. Next, the graphical representation of the test specification is converted into a textual form in a language called testspec. A testspec specification includes the relevant code segments to be tested and the actions needed in the testing process. If desired, testers can write specifications directly in testspec, rather than use the GUI. Once the user is ready to test the program, the specification is passed to a test planner. This step translates the specification into a test plan. In the next step, the test plan is used by the dynamic instrumenter to instrument the program and determine coverage. Finally, the test results are displayed by the GUI. 2.1. Test Specification Jikes RVM Test Plan Dynamic Instrumenter In testing a software application, a developer may wish to apply different tests to various code regions. The tests are also often applied with different coverage criteria. Jazz has a GUI for specifying the tests to apply, where to apply them, and under what conditions. A coverage criterion can also be specified for each region. As shown in Figure 2, the GUI lets an user visually create and apply a test specification. The GUI shows the complete Eclipse IDE and how Jazz is incorporated. The callouts in the figure show some of Jazz s GUI widgets for creating, running, and viewing test results. Test Results Figure 1: Jazz Tool Flow and Methodology To illustrate how the user interacts with the tool, the figure shows several steps. The figure shows that the user has selected several source lines in the Eclipse source editor (step 1). The selected lines are used to build a test specification. In this case, lines 343 356 in the file Compress.java have been selected as a test region for branch testing. When a region is selected, a test specification is created and displayed by the GUI. Test specifications are shown in a specification viewer window (step 2). A specification may be changed or deleted from this window. To run the current tests, the user clicks a button on the Eclipse toolbar (step 3). Jazz automatically invokes the test planner, Jikes and the dynamic instrumenter. When the program completes, the test results are displayed as a bar graph in the specification viewer (step 4). The GUI also highlights covered and uncovered source lines in the Eclipse editor window. 2.2. Test Planner Using the test specification, the test planner decides how to test Java methods. The test planner is invoked every time a method is loaded by Jikes Just-in-Time compiler. The planner checks whether there is a test specification for any portion of the method. If a specification exists, then the planner generates a test plan for the relevant code in the method. Thus, only methods that are actually loaded and executed are tested. The main function of the test planner is to produce a test plan that determines where and how to instrument a method to do the test actions. The test plan describes how best to dynamically instrument a method to determine coverage. To generate a test plan, the planner identifies the locations where to instrument a test region, when to insert and remove instrumentation at each loca- 2

3: Create & Run Test 1: Select Code Region 2: Test Specification 4: Test Results Figure 2: Branch Coverage GUI for Jazz tion, and what to do at each location. Typically, instrumentation locations correspond to basic blocks (in the control flow graph of a test region) where coverage information is collected. For example, in def-use testing, there are instrumentation locations for each variable definition and all uses reachable from a definition. Instrumentation is inserted and removed on-demand as the program executes. For example, in node testing, when a particular basic block is executed, instrumentation is inserted in successor blocks. Once a block is hit, its instrumentation can be removed because the block is covered. In branch and def-use testing, the planner ensures that instrumentation remains until all edges out of a block or all uses of a definition are covered. Finally, the planner determines what actions to perform at each location. The actions are encoded in a payload that is executed at each location. In node testing, the payload updates coverage information, inserts instrumentation at successor blocks, and removes the instrumentation in the current block. The payloads for branch and def-use testing are similar, except they check whether all edges or def-use pairs are covered. When multiple tests are applicable at a location, the test planner generates a customized payload for that location. For example, in def-use testing, there may be several definitions and uses within a single basic block. The custom payload for such a block inserts instrumentation at uses reachable from the definitions. It also updates the coverage for the uses in the current block. 2.3. Dynamic Instrumenter With the test plan from the planner, the dynamic instrumenter provides the functionality to insert and remove instrumentation at run-time. This interface is targeted by the test planner. The dynamic instrumenter operates on target machine code generated by Jikes. Dynamic instrumentation (that can be removed/inserted at runtime) is implemented with fast breakpoints [5]. A fast breakpoint replaces an instruction in the target machine code with a jump to a breakpoint handler that invokes the test instrumentation payload from the test planner. The dynamic instrumenter s API provides primitives, such as the placement of successor breakpoints, storing test-specific data, and removal of breakpoints, for con- 3

structing fast breakpoints with varying payloads. The instrumentation constructed with the API is highly scalable since only relevant portions of the program are instrumented for only as long as needed. 3. Experimental Results We used Jazz to measure the node, branch, and def-use coverage of the SPECjvm98 benchmarks. In this experiment, the test specification covers all loaded methods and the test inputs are the data sets from SPECjvm98. The benchmarks were run on a 2.4 GHz Pentium IV machine with 1 GB of RAM and RedHat Linux. Benchmark Branch Node Def-Use compress 58% 90.6% 89.8% jess 46.8% 80.3% 71.8% db 44.4% 76.9% 75% javac 38.9% 75% 66.9% mpegaudio 60.9% 88.7% 90.5% mtrt 50.6% 90.3% 87.3% jack 55.6% 82.2% 73.4% Table 1: Percentage coverage Table 1 shows the coverages for the three tests supported by Jazz. The coverages for branch testing are 38.9 58% and for node testing, 75 90.6%. The tool reported coverage of 66.9 90.5% for def-use testing. We also investigated Jazz s performance and compared it to a traditional approach based on static instrumentation. To ensure a fair comparison, we implemented a separate tool that uses static instrumentation in our framework. This tool instruments a method s binary code before run time and does not remove instrumentation. It is similar to Rational PurifyPlus [3] and JCover [4]. Jazz and the static tool differ only in ondemand versus static instrumentation. We measured run-time when the benchmarks were run directly in Jikes without testing, with Jazz and with the static tool. For brevity, we summarize the run-times only for branch testing. When run without testing, the benchmarks take 13.8 44.7 seconds. With the static branch testing tool, run-time is increased dramatically. It varies from 20.7 96.1 seconds and incurs an overhead of 11.7 241% (average 89.9%) over native execution. Jazz has much lower run-times than the static tool. Its run-time is 20.6 43.9 seconds and the performance overhead is only 0.3% to 7.8% (average 17.6%). Jazz has less overhead than the static tool because instrumentation is inserted and removed on-demand. Indeed, in tight loops, instrumentation is typically removed within the first few loop iterations, which keeps overhead low (e.g., compress has a tight loop). As these results show, Jazz is an effective tool, with minimal performance overhead. 4. Related Work There are a number of commercial tools that perform coverage testing on Java programs, including JCover and IBM s Rational PurifyPlus. These tools statically instrument the program to perform coverage testing. The work that is most closely related to ours is a tool developed with the ParaDyn parallel instrumentation framework [8]. This tool dynamically inserts and removes instrumentation on method invocations to do node coverage, where we take a similar approach for branch coverage. Unlike our approach, instrumentation is inserted in the whole method when it is invoked and a separate garbage collection pass removes the instrumentation. Our technique instruments only executed paths and removes instrumentation as soon as possible. 5. Summary This paper described a new tool, called Jazz, for software testing of Java programs that relies on a novel scheme for dynamically inserting and removing instrumentation on-demand. The performance results with Jazz are very encouraging: The average overhead for branch testing with Jazz was 17.6%, while a tool that used static instrumentation had an overhead of 89.9%. Jazz supports branch, node, and def-use testing. It is available for Eclipse 3.0 and the IBM Jikes Java RVM. References [1] IBM, Jikes RVM, www.ibm.com/developerworks/ oss/jikesrvm [2] B. Childers, M. L. Soffa, J. Beaver, et al. SoftTest: A framework for software testing of Java programs, Eclipse Technology Exchange, 2003. [3] IBM, Rational PurifyPlus, www.ibm.com/rational [4] JCover, www.codework.com/jcover/ [5] P. Kessler, Fast breakpoints: Design and implementation, ACM SIGPLAN Conf. on Programming Languages, Design and Implementation, 1990. [6] L. Osterweil et al., Strategic directions in software quality, ACM Computing Surveys, Vol. 4, 1996. [7] W. Perry, Effective Methods for Software Testing, John Wiley & Sons, Inc., New York, 1995. [8] M. Tikir and J. Hollingsworth, Efficient instrumentation for code coverage testing, Int l. Symp. on Software Testing and Analysis, 2002. 4

Appendix: Presentation Overview Depending on the amount of time allocated for demos and the venue (e.g., during a separate poster session or conference presentation), we will demonstrate the Jazz structural testing tool with a combination of a presentation and a tool walk-through. The presentation will describe the challenges to flexible and scalable testing and how demand-driven structural testing addresses these challenges. We will also present our framework for demand-driven structural testing and the techniques that are used in Jazz. After presenting the approach and basic techniques, we will demonstrate demand-driven branch testing with Jazz on a complete application, as described below. Finally, after the demonstration, we will present the performance of Jazz in comparison to traditional structural testing approaches. 1. Introduction This part of the presentation will motivate and introduce the need for structural software testing. We will discuss the challenges, as described in the paper, faced when testing large, complex software systems. Next, we will explain our approach: demand-driven structural testing and how this approach addresses the problems with faced when testing large software projects. Finally, we will describe Jazz and explain the various capabilities and components of the tool. 1.1 Challenges to Structural Testing Structural testing is the process of discovering run-time properties of a program specifically those pertaining to control flow. It uses control flow coverage information to help ensure software quality. Coverage information is collected by instrumenting a program. Traditionally, instrumentation is inserted statically prior to program execution. However, this approach is not scalable. Instrumentation code is left in place and continues to be executed even after coverage information is collected. 1.2 Goals and Approach Our research is developing new approaches to structural testing, and we have built a practical and useful tool that incorporates many of our approaches for structural testing of Java programs. Jazz has a number of goals: 1. The tool must be scalable and flexible. a. Scalable: Large programs can be tested as easily as small ones. b. Flexible: Easy to add new tests and modify existing tests. 2. It should be possible to apply multiple testing strategies to a program.

3. The tool should have a low overhead, which allows more tests to be run in a smaller amount of time. Importantly, with scalable and flexible testing approaches and tool, it becomes feasible to test large software projects. To achieve these goals, we use an approach that is specification-driven, path specific, and dynamic. 1.3 Tool Overview Our tool is composed of four main components which can be seen in the figure above. The first component is the test specification. The specification is generated using an Eclipse GUI. Using this GUI the tester is able to choose which regions to instrument and which tests to perform on those regions. The second component is the test planner. The planner uses the test specification to generate a test plan which specifies where, when, and how to instrument the program. The planner also creates a test table and test payloads. The test table encodes where and when to instrument and records the results of instrumentation. The test payload specifies what actions to perform at each instrumentation point. The third component is the dynamic instrumenter. It provides an API to manipulate the target machine code. The fourth component is the results viewer. The viewer presents the results of the each test to the use.

2. Demand Driven Structural Testing This section will explain how demand driven testing can reduce the amount and cost of instrumentation needed to collect coverage information. For the remainder of the presentation, we will focus our examples and analysis on Branch Coverage testing. Branch coverage provides a balance between the simplicity of Node coverage and the complexity of Def-Use coverage and allows the audience to appreciate the benefit of our tool with out becoming lost in implementation details. We will step through a branch testing example using our demand driven technique on a control flow graph, such as that shown above. We will show how demand-driven testing inserts and removes instrumentation dynamically along a path of execution and how coverage information is recorded. After demonstrating the demand driven technique for branch coverage, we will discuss how the technique is used for node and def use coverage. For node coverage a single instrumentation probe is placed at the first block in the control flow graph. As execution proceeds instrumentation is placed in successors of the current block unless a block has been covered. We don t need to worry about replacing instrumentation since we are only recording the coverage of individual basic blocks--not paths between them. For def-use testing, we need two types of instrumentation probes, definition probes and use probes. Definition probes record the start of a def-use pair and need to remain until all uses that

can be reached from the definition are also covered. Use probes record that a chain between a definition and a use has been covered. The use probes do not need to remain and can be removed as soon as they have been covered. 3. Demo Our tool demonstration will have two primary parts: an end-to-end walkthrough of the tool s operation, and an example of testing an application with static instrumentation (as used by traditional testing tools, such as IBM s Rational PurifyPlus) and dynamic instrumentation (as used by Jazz). 3.1 Walkthrough We will demonstrate the Eclipse GUI for branch coverage pictured below. Several tests will be constructed using different features of the interface to select and build code regions. The tree view on the right hand side of the interface can be used to select entire methods for testing. It is also possible to select specific line ranges by highlighting them in the central editing view. After specifying ranges to test the interface will generate a specification that is passed to the test planner. The dynamic instrumenter will then execute the plan created by the test planner and return the results to the GUI. Results for each test are displayed in the table at the bottom of the interface. 2. Click the add test button to add the selection to the test specification 3. Tests show up here 1. Select a method or a region of code

4. Click the run tests button to execute the program with the current test plan 5. Results show up here after the program runs 3.2 Application Demonstration To show the benefit of using our demand-driven testing with dynamic instrumentation over a traditional approach with static instrumentation, we will first run a program using static instrumentation and then with dynamic instrumentation. We plan on using an mp3 or video player. With static instrumentation, the playback will be choppy because the instrumentation will cause missed deadlines due to the high performance overhead of static instrumentation. When using dynamic instrumentation, playback is slightly choppy at first, but quickly smooths out as instrumentation is removed (and incurs no cost, once sufficient amount of the instrumentation is removed). 4. Results The first part of our results section will explain visually the difference in the static and demand-driven techniques demonstrated in the previous section.

Instrumentation Points Hit Over Time for mpegaudio 7000 6000 5000 Breakpoints Hit 4000 3000 2000 1000 0 1 88 175 262 349 436 523 610 697 784 871 958 1045 1132 1219 1306 1393 1480 1567 1654 1741 1828 1915 2002 2089 Sample Breakpoints/sample Static 10 per. Mov. Avg. (Breakpoints/sample) 10 per. Mov. Avg. (Static) Graph 1: Instrumentation Probes Hit Over Time (mpegaudio) Graph 1, Instrumentation Probes Hit Over Time, shows the number of instrumentation probes (called breakpoints in the graph) hit during a sample time interval when testing the program mpegaudio (with branch coverage). Static instrumentation is shown in red and our demand-driven approach with dynamic instrumentation in Jazz is shown in blue. The graph shows that the demand-driven approach nearly always executes less instrumentation. The graph also indirectly shows a comparison between the amount of overhead for each type of instrumentation. This will be explained further in the next graph. Graph 2, Jazz Branch Testing vs. Traditional Static Testing, shows a comparison between the Jazz framework and our own implementation of static instrumentation for branch coverage. The slowdown over uninstrumented code 0.98x to 1.56x (average 1.8x) which is 1.01x to 3.11x (1.63x average) times faster than the static implementation. Graph 3, Jazz Node, Branch, and Def-Use Performance, shows a comparison between the three types of structural testing the Jazz framework currently supports. The overhead associated with node testing is negligible, ranging from 0.99x to 1.04x (1.02x average). The slightly increased overhead for branch testing is due to increased complexity and the need to record more information. The slowdowns for def-use testing, ranging from 1.04x to 3.79x (2.27x average), are due to the extended lifetime of definition instrumentation. It should also be noted that these are overheads were recorded when every variable in the program was instrumented. More realistic testing, where a

smaller number of variables are instrumented would have a correspondingly lower overhead. Slowdown 4 3.5 3 2.5 2 1.5 1 0.5 0 Jazz Branch Testing vs. Traditional Static Testing compress jess db javac mpegaudio mtrt jack Jazz Static Graph 2: Jazz vs. a Traditional Testing Approach (for Branch Coverage) Jazz Branch, Node, & Def-Use Performance Node Branch Def-Use Slowdown 4 3.5 3 2.5 2 1.5 1 0.5 0 compress jess db javac mpegaudio mtrt jack Graph 3: A Comparison of Jazz s Performance on Node, Branch, & Def-Use Testing 5. Conclusion For the conclusion portion of our presentation, we will describe the contributions from implementing and using Jazz. The contributions and conclusions include:

1. We have created a structural testing tool, Jazz, that is both scalable and flexible. It was possible to add Node Coverage and Def-Use coverage in a short amount of time by reusing components written for Branch coverage. 2. Jazz has good performance and better enables the testing of large software projects and incorporating new test strategies. 3. Using the Eclipse GUI, an user is able to easily apply multiple testing strategies to programs. 4. Jazz has low overhead, in the best case causing no additional overhead and in the worst case having a slowdown of less than 4x. Jazz is a mature tool: The applications it can run are limited only by the Jikes RVM. 6. Availability Currently the testing GUI is implemented in Eclipse 3.0 and the test planner and dynamic instrumenter are implemented using the Jikes RVM version 2.1.1. We plan on releasing Jazz to the compiler and software engineering community during CC 05. For more information on Jazz and demand-driven structural testing, please see University of Pittsburgh Department of Computer Science Technical Report TR-04-118 or visit http:www.cs.pitt.edu/copa/eclipse.html.