Implementation of Customized FindBugs Detectors

Implementation of Customized FindBugs Detectors Jerry Zhang Department of Computer Science University of British Columbia jezhang@cs.ubc.ca ABSTRACT There are a lot of static code analysis tools to automatically find program errors. Traditional techniques usually involve formal methods and complicated computations, and thus suffer from poor extendibility and performance. FindBugs was developed to address these issues. The system is based on the concept of bug patterns, which are claimed to be easy to implement and effective to discover real bugs. In order to evaluate the system in terms of these two aspects, we experimented in creating and using a custom detector from resources provided by in the FindBugs package. 1. INTRODUCTIN As software products provide more functions their structures tend to be more complicated accordingly. Finding program errors in such systems thus becomes harder for this increased complexity. Although the traditional manpower code inspection still plays an important role in quality assurance, automated debugging tools have been desired as a necessary supplement to tremendously ease and enhance this process. Quite a few techniques are being widely adapted in practice and they can be categorized mainly into two kinds: dynamic or static approaches. The dynamic approaches validate if a particular module of source code is working properly by running user-defined test cases whose behaviours and results are expected from the testing module. Errors are discovered from exceptions thrown during runtime or outputs generated in the end. A set of good test cases critically affects the success of finding potential problems, but this put additional burdens on programmers to write extra testing code that hopefully covers all situations as completely as possible. For a large-scaled software product, this is not always feasible due to time and budget constraints. In fact, even the unit testing community, which represents the mostly adapted dynamic testing approach, doesn t suggest to totally rely on it because testing all possible input combinations for any non-trial software is unrealistic for programmers [1]. Static approaches, on the other hand, promise to find existing bugs code without requiring as much effort on manpower because most traditional static techniques are based on formal methods and sophisticated program analysis. This means the developer can just throw the code to the system, let it handle the dirty work,

and see what the result is. However, such systems could be difficult to apply in practice for its great complexity and high false positives. A new static analysis tool called FindBugs is aiming to address these issues. Unlike its predecessors, FindBugs employs a simpler yet powerful technique to conduct static analysis. The basis of the system is a set of bug patterns that are code idioms likely to be errors. Occurrences of bug patterns are places where code does not follow desired practice of a language feature. Therefore, a bug pattern can be used to form a detector to probe all bugs of the same type. FindBugs is essentially a tool providing a set of bug detectors for different types of bugs, and an interface to extend the rules for new patterns. The developers of FindBugs claim that writing custom bug detectors is reasonably easy and using them to find software errors results low false positives [2]. The purpose of the paper is therefore to experiment in creating an application-specific bug detector to evaluate its easiness, and to run it against the application source code to test its effectiveness in finding the occurrences of the expected bug. 2. IMPLEMENTATION This section discusses the steps involved to implement a customized applicationspecific bug detector and build it into FindBugs system. DebugObj dobj = EncapObject(); if (isdebugging) { DumpObjectToScreen(dobj); } Figure 1: Inefficient code if (isdebugging) { DebugObj dobj = EncapObject(); DumpObjectToScreen(dobj); } Figure 2: Efficient code 2.1 System Setup and Preparation Installing the Windows version of FindBugs is very straightforward. The Java Development Kit and the Byte Code Engineering Library (BCEL) are also required to run and extend FindBugs with new detectors. According to [2], we need BCEL because it is utilized by FindBugs to implement its detectors. 2.2 Problem and Goal Description One of my previous projects is modified and used as the target source code. The program is to visualize biomedical data. Two static methods are used for debugging purpose. One is EncapObject(), which is to gather and encapsulate required information of the visualized object into an object. Another method is DumpObjectToScreen(), which is to print the encapsulate information on the screen. Both of the methods are being called all over the code to monitor program states. There is also a static flag named isdebugging whose boolean value determines whether or not to print information onto the screen. The code snippet in Figure 1 is a modified version of handling the situation, and Figure 2 is the original code in my

project. In Figure 1, EncapObject()s runs regardless of whether it is in debug mode or not. Since EncapObject() is an expensive operation consuming cpu time for data computation and memory space for storing, we do not want it to be executed if the return object is not going to be used. Therefore, it should be placed inside the if clause to avoid poor performance. Figure 2 is what we would like to have after fixing the bug. Since I understand my own program, it did not take too long to search the whole project workspace and replace each occurrence of code in Figure 2 with the code in Figure 1. Now I want to create a custom bug pattern to let FingBugs identify the places in the code that make calls to the EncapObject() method without being put in an if clause. 2.3 Approach Since FindBugs currently has had over 200 detectors, I thought it might be possible to find an example from the existing ones that is similar to what I was going to create as a template. After some browsing, I figured that my focus should be on detectors of the Bytecode Scanning type because the other Bytecode Pattern type requires the bug to have an equivalent sequence of bytecode pattern expressions, which was not obvious to form for our bug (and thus would bring my project far from one of its motivations evaluating easiness). Moreover, of the four categories a detector s implementation strategy can choose, [3] also suggests that the Linear Code Scan would be the easiest and most suitable type for our case. Therefore, I read through the simplest FindRunInvocations detector example provided in [3] and decide to implement our customized detector in a similar way linearly scan through the bytecode for the methods in analyzed code based on the visitor pattern. 2.4 Development The FindRunInvocations detector overrides the visit(code) and sawopcode(int) methods provided by BCEL to walk though methods and analyze opcode within each method, respectively. Therefore, our custom detector will do the same and Figure 3 shows the relevant parts of these two methods in the code. 2.4.1 Scanning Method As mentioned in 2.3, visit(code code) scans the bytecode for the analyzed code method by method. Line 16 to 22 in Figure 3 corresponds to this method. It does nothing fancy but resetting three variables before calling suerp.visit(), the superclass implementation to actually visit the method that we want to analyze:

5 public class UnGuardedEncapObjectCall 6 { 7 private int isdebuggingat; 8 private int ifstartat; 9 private int ifendat; 16 public void visit(code code) 17 { 18 isdebuggingat = -1; 19 ifstartat = -1; 20 ifendat = -1; 21 super.visit(code); 22 } 24 public void sawopcode(int seen) 25 { 26 if (classconstant.equals( visualize/debug ) 27 && nameconstant.equals( isdebugging )) 28 { 29 isdebuggingat = PC; 30 } 31 else 32 { 33 if (seen == IFEQ && isdebuggingat > -1 34 && ( PC >= isdebuggingat + 1 && PC < isdebuggingat + 5)) 35 { 36 ifstartat = branchfallthrough; 37 ifendat = branchtarget; 38 } 39 if (classconstant.equals("visualize/debug") 40 && nameconstant.equals( EncapObject ) 41 && (PC < ifstartat PC >= ifendat)) 42 { 43 bugreporter.reportbug( 44 new BugInstance( UnGuardedEncapObjectCall, 45 HIGH_PRIORITY).addClassAndMethod(this). 46 addsourceline(this)); 47 } 48 } 49 } 50 } Figure 3: visit(code code) and sawopcode(int seen) methods in the custom detector class UnGuardedEncapObjectCall

isdebuggingat maintains the position of isdebugging discovered in the bytecode ifstartat stores the beginning index of the if clause whose condition is isdebugging in the bytecode ifendtat stores the index of the first line after the end of the if clause whose condition is isdeugging in the bytecode The reason to reset these variables is that they maintain accumulated states (bytecode indices) used by sawopcode(int) within the method currently being analyzed. Thus when visit(code code) starts to scan a new method, the states should be flushed as well. 2.4.2 Analyzing Method After a method is scanned by visit(code code), sawopcode(int) is called repeatedly to analyze each bytecode instruction contained in the method one at a time. There is a global program counter variable, PC, storing the index of the currently analyzed instruction. The analysis is based on the following reasoning: a) If the flag isdebugging is found in the method, store its position in isdebuggingat. b) If isdebugging is found to be used as an if clause condition, store the beginning and ending indices of the if clause in ifstartat and ifendat, respectively. c) If EncapObject() is found, determine if its located outside the if clause based on its position in PC, the value of ifstartat and ifendat. Figure 3 from line 26 to 30 is the implementation of part a). classconstant and nameconstant are protected variables the detector class inherits from its superclass. They contain the class namespace and the variable or method name of the current bytecode instruction. visualize/debug is the name space of the static variable isdebugging. This piece of code locates the isdebugging variable and assignment its position to isdebuggingat if there is such variable in the method. From line 33 to 38 in the same figure, the code implements part b). IFEQ is a BCEL constant representing an if-equal clause, so this section can is interpreted as if there is a isdebugging variable in the method and there is an if-equal clause, is this if-equal clause anywhere between 1 to 5 bytecodes away isdebugging. The 1 and 5 values were given in the sample detector. [3] says that these numbers are mainly based on bug-specific experiments and sometimes it could take a long time to find the right range. Therefore it was lucky that the sample detector was close enough to our need that we did not have to spend more time learning how to conduct the trials. The branchfallthrough and branchtarget variables are also from the superclass and they indicate the beginning and first line after the end of an if clause. Last but not least, part c) is implemented by the code from line 39 to 49 in Figure 3. Similar to the code of part a), it detects the EncapObject() method by its name space and

method name. If PC, the position of EncapObject(), is out of the if clause range bounded by ifstartat and ifendat, a bug is detected and thus should be reported with its name, priority and location (class, method and line). 2.5 Building and Installation After the code is ready, it needs to be packaged into a JAR file so FingBugs can recognize it. Although the building process is well documented in [3] and one can look at an existing detector s files as a template, it still involves quite a bit of editing work in several files: A build script is required to specify the source and destination, as well as the target JAR file name. FindBugs.xml is one file generated by the build script. It describes the class, speed, abbreviation, type, and category of the detector. For each new detector, one needs to copy all these properties into the file of the same name used by FindBug. Messages.xml is another file produced in the build. It contains details of the bug pattern used by the GUI. One needs to open the xml to add html descriptions and make sure the class and type information is align to that in FindBugs.xml. 3. PERFORMANCE EVALUATION With the custom detector handy, I applied it to test my project source code using the FingBug GUI. Figure 4 is the result when all other detectors were turned off. Files Analyzed Classes Analyzed Methods Analyzed Bugs Found Original Bugs False Positive 6 9 56 29 28 1 Figure 4: Evaluation of detected bugs and false positive 317 DebugObj dobj = EncapObject(); 318 if (isdebugging) 319 { 320 DumpObjectToScreen(dobj); 321 } 322 } 338 DebugObj specinfoobj = EncapObject(); Figure 5: False positive The original number of bugs was known because I searched each occurrence of EncapObject() and moved it outside the if clause. The new detector not only found all

the errors but also went belong it returned a false positive. The code where it failed is illustrated in Figure 5. The call to EncapObject() in line 338 is unguarded, but it does not need to because it is actually being used for a non-debugging purpose. Our simple bug pattern did not take this into account. However, I think with some improvement on the pattern, a more sophisticated version should be able to distinguish such difference. On the other hand, this unexpected finding is still valuable because it suggests an inappropriate practice the debugging method is being used for a purpose different from what it is written for. I recalled that line 338 was part of a patch I applied later on to fix some errors, and I really should have created a new method in a different class for that. The misuse of our detector reveals a phenomenon often occurs in software development cycle the structure of a program tends to degrade as more maintenance work is conducted. Fingbugs might be able to help delay this process if we can create a bug pattern capable of detecting misused method calls. 4. CONCLUSION AND FUTURE WORK This project experimented in writing an application-specific bug detector in FindBugs, and using it to discover occurrences of the expected bug. The purpose is to evaluate its easiness in extending bug patterns and its effectiveness in finding bugs of interest. By walking through all the steps to make our custom detector working, we found that although extending FingBugs with new rules is conceptually simple, the implementation is less straightforward in several aspects. We encountered the following issues during our experiment, and some of them might indicate future research directions: Analyzing bytecode instructions for source code with the BCEL library is effective but representing patterns in terms of bytecodes is not always easy for users. Sometimes one has to use a third-party tool to parse a piece of sample code to look at the disassembled bytecodes and learn how to structure a pattern. The quality of a bug pattern is somewhat uncertain. In part b) of 2.4.2, the boundaries to determine whether an if clause is followed by isdebugging are based on experiments and trials. In our case we proved it worked well because we created the bugs and therefore knew where they are. In reality, when we want to use this tool to actually find out where the bugs are, we do not know how much percent of total errors it reports. Building and adding a custom detector involves a little too effort. It would be a lot more convenient to have a GUI that takes a set of parameters, compiles the source code, and integrates the JAR into the system automatically. In spite of its shortcomings, FindBugs is still a very adaptable static bug analysis tool for its simplicity and extendibility. Furthermore, as we found in our evaluation, bug patterns might potentially be used to help maintain software structure by detecting method misuses.

REFERENCES [1] IEEE Standards Board, IEEE Standard for Software Unit Testing: An American National Standard, ANSI/IEEE Std 1008-1987, IEEE Standards: Software Engineering, Volume Two: Process Standards; 1999 Edition; published by The Institute of Electrical and Electronics Engineers, Inc., 1999 [2] David Hovemeyer & William Pugh, Finding Bugs is Easy Companion to the 19th annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications, 2004 [3] FindBugs Manual, http://findbugs.sourceforge.net/manual/index.html