Evolutionary Object-Oriented Testing

Size: px

Start display at page:

Download "Evolutionary Object-Oriented Testing"

Virgil Ray
5 years ago
Views:

A thesis submitted for the degree of Msc Artificial

1 Evolutionary Object-Oriented Testing Lucas Serpa Silva Artificial Intelligence University of Amsterdam A thesis submitted for the degree of Msc Artificial Intelligence Supervised by Dr. Maarten van Someren 2009, July

2 Abstract It is estimated that 80% of software development cost is spent on detecting and fixing defects. To tackle this issue, a number of tools and testing techniques have been developed to improve the testing framework. Although techniques such as static analysis, random testing and evolutionary testing have been used to automate the testing process, it is not clear what is the best approach. Previous research on evolutionary testing has mainly focused on procedural programming languages with simple test data inputs such as numbers. In this work, we present an evolutionary object-oriented testing approach that combines a genetic algorithm with static analysis to increase the number of faults found within a time frame. A total of 640 experiments were executed to evaluate the effectiveness of different genetic algorithms and parameters. The system results are compared to the results obtained by running a random test case generator for 15, 30 and 60 minutes. The results show that genetic algorithm combined with static analysis can considerably increse the number of faults found compared to random testing. In some cases, evolutionary testing found more faults in 15 minutes then a random testing strategy found in 60 minutes.

3 Acknowledgements I would like to thank my supervisor, Maarten van Someren for his support, guidance and constructive comments throughout this work. I would also like to thank Yi Wei for the various discussions regarding Autotest, code coverage and automated testing. A special thanks goes to Olga Nikolayeva for many invaluable suggestions and the time she spent proofreading and reviewing this thesis.

4 Contents List of Figures List of Tables vii ix 1 Introduction Motivation Past research Project goals Background Testing Black box White box Automated testing Eiffel & Design by Contract Autotest Faults Genetic Algorithm Chromosome Mutation Crossover Objective and fitness value Selecting individuals for reproduction GA Variations Evolutionary testing Implementation iv

5 CONTENTS Parameters Algorithm stages Allele value specification Initialization Evaluation Mutation and crossover Evolutionary Autotest Experiments Introduction Setting Experiments Group A Experiment A1: Autotest parameters Experiments Group B Experiment B1: Mutation probability Experiment B2: Mutation algorithm Experiment B3: Crossover probability Experiment B4: Crossover algorithm Experiment B5: Selection method Experiments Group C Experiment C1 :Original Autotest Experiment C2: Autotest with static analysis Experiment C3: Evolutionary testing Discussion Types of faults found Parameters Conclusion Considerations Further improvement A Primitive Values 50 B Chromosome specification 51 v

6 CONTENTS C Chromosome files 52 Bibliography 53 vi

7 List of Figures 2.1 Example of Design by Contract tm Autotest algorithm Autotest algorithm Genetic Algorithm flow diagram Examples of mutation algorithms One and two points crossover Order crossover examples Four basic components of the system Four stages of the genetic algorithm Parallel population evaluation Corrupted chromosome caused by crossover Valid chromosome crossover Evolutionary Autotest 1 - loading chromosome and evolve.conf Evolutionary Autotest 2 - method call Evolutionary Autotest 3 - object creation Number of faults found using random and static analysis technique to select the initial primitive values Number of faults for mutation algorithms for each class Effect of mutation and crossover probability on the number of faults Comparison of crossover algorithms Comparison of selection algorithms Variation on the total number of faults found Autotest progress Evolutionary approach on α set vii

8 LIST OF FIGURES 4.9 Evolutionary testing on β set Total number of faults found for all classes over time by the three approaches Evolutionary approach time library Distribution of the types of faults found in the metalex class Usage frequency of each parameter viii

9 List of Tables 1.1 Previous work Test classes α Test classes β Genetic algorithm setting Autotest parameters Mutation probability Mutation methos Crossover probability Crossover methods Population selection schema Original Autotest Original Autotest Original Autotest executions Autotest with static analysis Autotest with static analysis Time allocation Execution setting Evolutionary algorithm Evolutionary algorithm A.1 Autotest primitive values B.1 Chromosome specification C.1 Chromosome files ix

10 1 Introduction 1.1 Motivation In the past 50 years the growing influence of software in all areas of industry lead to an everincreasing demand for complex and reliable software. According to a study(3) conducted by the National Institute of Standard & Technology, approximately 80% of the development cost is spent on identifying and correcting defects. The same study found that software bugs cost the United States economy around $59.5 billion a year, with one third of this value being attributed to the poor software testing infrastructure. In the effort to improve the existing testing infrastructure, a number of tools have been developed to automate the test execution such as JUnit(1) and GoboTest(4). However, the automation of test data generation is still a topic under research. Recently, a number of methods such as metaheuristic search, random test generation and static analysis have been used to completely automate the testing process, but the application of these tools to real software is still limited. Random test case generation has been used by a number of tools (Jartege(34), Autotest(33), Dart(32)) that automate the generation of test cases, but a number of studies found a genetic algorithm (evolutionary testing) to be more efficient and to outperform random testing(9; 13; 16; 18; 26) for code coverage. 1.2 Past research The study of genetic algorithms as a technique for automating the process of test case generation is often referred to as evolutionary testing in the literature. Since the early 90s, there has been a number of studies on evolutionary testing. The complexity and applicability of 1

11 1.2 Past research these studies vary. In order to classify the relevance of past research to this project, a number of studies are classified according to the complexity of the test cases being generated and the optimization parameter used by the genetic algorithm. The complexity of the test cases being generated is important because to generate test cases for structured programs that only take simple input, such as numbers is simpler than generating test cases for object-oriented programs. Reference Year Language type Optimization parameter (5)Xanthakis, S Procedural (C ) Branch coverage (6)Shultz, A Procedural (Vehicle Simulator) Functional (7)Hunt, J Procedural (POP11[X] ) Functional (Seeded errors) (8)Roper, M Procedural (C) Branch coverage (9)Watkins, A Procedural (TRITYP simulator) Path Coverage (10)Alander, J Procedural (Strings) Time (18)Harman M Procedural (Integers) Branch coverage (14)Jones, B Procedural (Integers) Branch coverage (11)Tracey, N Complex (ADA) Functional (specification) (12)Borgelt, K Procedural (TRITYP simulator) Path Coverage (13)Pargas, R Procedural (TRITYP simulator) Branch coverage (15)Lin Procedural (TRITYP simulator) Path Coverage (16)Michael, C Procedural (GADGET) Branch coverage (17)Wegener, J Procedural Branch coverage (19)Daz E 2003 Procedural Branch coverage (20)Berndt, D Procedural (TRITYP simulator) Functional (9)A. Watkins 2004 Procedural Functional (Seeded error) (24)Tonella, P 2004 Object-oriented (Java) Branch coverage (21)D. J. Berndt 2005 Procedural (Robot simulator) Functional (Seeded error) (22)Alba.E 2005 Procedural (C) Condition coverage (23)McMinn P Procedural (C) Branch coverage (27)Wappler, S Object-oriented (Java) Branch, condition coverage (28)Wappler, S Object-oriented (Java) Exceptions / Branch coverage (26)Harman, M Procedural Branch coverage (25)Mairhofer, S Object-oriented (Ruby) Branch coverage Table 1.1: Previous work. As shown in Table 1.1, there have been only a few projects that generate test cases for object-oriented programs, and to the best of our knowledge there was only one project(11) that generates test cases for object-oriented programs and uses the number of faults found as 2

12 1.3 Project goals the optimization parameter for the genetic algorithm. In that study, test cases were generated for ADA programs, but a formal specification had to be manually specified in a SPARK-Ada proof context. Thus, the testing process was not completely automated. Table 1.1 also shows that branch coverage was the optimization parameter used to drive the evolution of test cases in most other studies. However, there is little evidence on the correlation between branch coverage and the number of uncovered faults. Although code coverage is a usefull test suit measurement, the number of faults a test suit unveils is more important. Past research has shown that evolutionary testing is a good approach to automate the generation of test cases for structured programs. To make this approach attractive to industry, however, the system must be able to generate test cases for object-oriented programs and to use the number of faults found as the main optimization parameter. To the best of our knowledge, there is currently no existing project that fulfils these two requirements. 1.3 Project goals This project has three goals: 1. to use genetic algorithms to automatically generate test cases for object-oriented programs written in Eiffel and to use the number of faults found as the optimization parameter for the genetic algorithm. 2. to investigate the effect of different genetic algorithms on the number of faults found when generating test cases for object-oriented software. 3. to combine evolutionary testing with static analysis and evaluate if this improves the results. The base hypothesis for this work is that evolutionary testing finds more faults and in less time than random testing. This project innovates by using the number of faults as the main optimization parameter for the genetic algorithm and combining static analysis to a genetic algorithm. It also extends the existing research in evolutionary testing by providing a study on the effect of different genetic algorithm techniques, such as mutation and crossover algorithms on the evolution of test cases for object-oriented software. This project is based on the Autotest(2) tool and the Design by Contract tm methodology implemented by the Eiffel programming language(29). 3

13 2 Background 2.1 Testing Testing is one of the most used software quality assessment methods. There are two important processes when testing object-oriented software. First, the software has to be initialized with a set of values. These values are used to set a number of variables that are relevant for the test case. The values of these variables define a single state from the possible set of states. These values can either be a primitive value such as an integer or complex values such as an object. With the software initialized, its methods can then be tested by calling them. If a method takes one or more objects as parameters, these objects also have to be initialized. To determined if the test case passed or fail, a software specification has to be used. The software specification defines what should be the output of the software and what is a valid input. Because of the number of possible states a software may have is exponential, it is impossible to test all of them. Interesting states are normally identified by the developers according to a software specification or program structure. There are many types of testing. However, they can all be classified as either black box or white box testing Black box The Black box testing, also called functional testing(30), will consider the unit under test as a black box where data is fed-in and the output is verified according to a software specification. Functional testing has the advantage that it is uncoupled from the source code, because given the software specification, test data can be generated even before the function has been implemented. Functional testing is also closely related to the user requirements since 4

14 2.2 Eiffel & Design by Contract it is testing a function of the program. Its main disadvantage is that it requires a software specification and it may not explore the unit under test well since it does not know the code structure White box The white box testing technique, also called structural testing, will take into account the internal structure of the code. By analyzing the structure of the code, different test data can be generated to explore those specific areas. Structural testing may also be used to measure how much of the code has been covered according to some structural criteria. By analyzing the program flow and the path an execution took, a code coverage can be computed given certain criteria such as statement coverage, which computes the number of unique statements executed Automated testing To automate the testing process, both the generation of test data and the execution of test cases have to be automated. There are already a number of tools such as JUnit(1) and GoboTest(4) that automate the test case execution but the main problem lies on the automation of the test data generation. Since the number of possible input date is huge, the problem can be viewed as an optimization problem, where the optimal solution is a set of test data that triggers all fault in the software. There are some tools that will randomly generate test data such as Autotest (33), DART (32) and Jartege(34), but there are many optimization algorithms that are considered better then random. 2.2 Eiffel & Design by Contract The lack of software specification is one of the main problems when automatically generating test cases. Without specification it is impossible to be sure that a feature 1 has failed. Even when the test case leads the program to crash or throw an exception, it is not clear if the software has a fault since the program could have not been defined for the given input. Normally, the developers will write a header as a comment for each method, describing its 1 Feature means either a procedure or a function. In this report feature and method are interchangeably used to refer to a procedure or a function. 5

15 2.2 Eiffel & Design by Contract behaviour. Although there are guidelines on how to write these headers, they are not formal enough to allow the derivation of the method s pre- and postcondition. This problem has been dealt by the Eiffel programming language(29), which, besides other methodologies, implements the Design by Contract tm (31) concept. The idea behind the Design by Contract tm is that each method call is a contract between the caller (client) and the method (supplier). This contract is specified in terms of what the client must provide and what the suppliers guarantees in return. This contract is normally written in the form of pre- and postcondition boolean expressions for each method. In the example illustrated in Figure 2.1, the precondition is composed by four boolean expressions and the postcondition by two boolean expressions. These expressions are evaluated sequentially upon method invocation and termination. The system will throw an exception as soon as one of the precondition or postcondition boolean expression is evaluated to false. Therefore, the method caller must ensure the precondition is true before calling the method call and the method must ensure that the postcondition is true before returning. For example, the borrow book method shown in Figure 2.1 takes the id of a borrower and the id of the book this borrower wants to borrow. The method caller must ensure that the book id is a valid id, it has at least one copy available, the borrower id is a valid id and the borrower can borrow books. If these conditions are fulfilled, the method guarantees that it will add the book to the borrower s list of borrowed book and decrease the number of copies available by one. Apart from the pre- and postcondition, every class has an invariant condition that has to remain true after the execution of the constructor and loops may have variants and invariants. With Design by Contract tm a method has a fault if it: Figure 2.1: Example of Design by Contract tm - 6

16 2.3 Autotest 1. violates another method s precondition. 2. does not fulfil its own postcondition. 3. violates the class invariant. 4. violates loop variant or invariant. For the automation of test case generation, Design by Contract tm can be used to determine if the generated test data is defined for a given method by checking it against the precondition. It can also be used to check if a method has failed or not by comparing the result against the postcondition. In the next section we discuss how this idea is implemented in the Autotest tool (2). 2.3 Autotest Autotest exploits the Design by Contract tm methodology implemented in Eiffel to automatically generate random test data for Eiffel classes. Autotest works with a given timeout and a set of classes to be tested. Figure 2.2: Autotest algorithm 1 - Method invocation Autotest starts by loading the classes to be tested and creating a table containing all (including the inherited) methods of those classes. As described in the algorithm 2.2, Autotest will randomly select methods to test while the timeout has not expired. Autotest chooses the method to be tested (line 4) and the creation method (line 23) randomly. Autotest 7

17 2.3 Autotest uses a probability to determine if a new object should be created or selected from the object pool (line 11). The object pool is a set of all objects created by Autotest. The idea behind the object pool is that reusing the objects that might have been modified during a previous method call will increase the chance of finding more faults. When creating an object, Autotest uses different algorithms for extended and non extended types. Extended types are the primitive types such as Integer, Boolean, Character and so on. For these types, Autotest must provide an initial value as shown in Figure 2.3. The initial values for the extended types are randomly selected from a set of fixed values chosen by the developers. These values are listed in appendix A.1. Figure 2.3: Autotest algorithm 2 - Object creation When instantiating objects that are not of the extended type, Autotest will randomly select one of its creation procedures and invoke it. After the timeout expires, Autotest will generate a report containing the number of test cases generated, the number of failures, the number of unique failures, the number of invalid test cases and will reproduce the code that triggers the faults it found Faults Eiffel will throw an exception whenever a contract is violated (precondidion, postcondition, class invariant, loop invariant, loop variant). Autotest will then examine the exception to find out if it was triggered by an invalid test case or by an actual fault in the code. Invalid test cases are the test cases that violate the precondition of the feature being tested. If it 8

18 2.4 Genetic Algorithm is a valid test case, Autotest will check if this fault is unique by looking at the line of code where the exception happened and compare to all unique faults it has already found. Beside the faults triggered by the Design by Contract tm conditions, other exceptions triggered by calling methods on void object, lack of memory are also considered as valid test cases. 2.4 Genetic Algorithm Genetic Algorithms (GA) are search algorithms based on the natural selection as described by Charles Darwin. They are used to find solutions to optimization and search problems. Genetic algorithms became popular when John Holland published the Adaptation in Natural and Artificial Systems (36) in 1975 and De Jong finished an analysis of the behaviour of a class of genetic adaptive systems(35) in the same year. The basic idea of a GA is to encode the values of the parameters of an optimization problem in a chromosome which is evaluated by an objective function. As shown in Figure 2.4, the algorithm starts by initializing or randomly generating a set of chromosomes (population). At the end of each generation, each chromosome is evaluated and modified according to a number of genetic operations in order to produce a new population. This process repeats until a predefined number of generations is computed or until the objective value of the population has converged Chromosome Each individual in the population is represented by a chromosome that stores the values of the optimization problem. The chromosome is normally encoded as a list of bits, but its encoding and structure can vary. Each gene of the chromosome can have a specific allele. An allele specifies the range or the possible values that the gene can have. To evaluate each chromosome, an objective function must be defined. The objective function uses the values encoded on the chromosome to check how well it performs in the optimization problem. At the end of each generation a number of genetic operations such as mutation and crossover are applied to each chromosome to produce the population for the next generation Mutation When a chromosome is passed on, it has a probability that some of its genes will not be copied correctly and undergo a small mutation. Mutation ensures that the solutions of the 9

19 2.4 Genetic Algorithm Figure 2.4: Genetic Algorithm flow diagram - new generation are not identical to those of the previous one. The mutation probability controls how much of the chromosome will mutate. A small probability leads to a slower convergence, while a large probability will lead to instability. The mutation operator can be defined in different ways. Three basic mutation operation are described below. 1. Flip mutator will change a single gene of the chromosome to a random value according to the range specified by the alleles. 2. Swap mutator will randomly swap a number of genes of the chromosome. 3. Gaussian mutator will pick a new value around the current value using a gaussian distribution. The mutation operation is defined according to the structure of the chromosome. When the chromosome is stored in a tree, one possible mutation is to swap subtrees as shown in Figure Crossover Crossover is the process where two or more chromosomes are combined to form one or more chromosomes. The idea behind crossover is that the offspring may be better than both 10

20 2.4 Genetic Algorithm Figure 2.5: Examples of mutation algorithms - parents. Crossover is normally done between two individuals, but more can be used. There are many crossover algorithms, some of them are described below: 1. Uniform crossover will randomly select the parent where each gene should come from. 2. Even odd crossover will select the genes with even index from parent A and the genes with odd index from parent B. 3. One point crossover will randomly select a position on the chromosome and all the genes to the left come from parent A and the genes to the right come from parent B. 4. Two points crossover will randomly select two positions and pick the genes from parent A which have a greater index than the smaller position and a smaller index than the biggest position. The remaining genes come from parent B. 5. Partial match crossover will produce two children C 1 and C 2. It initializes C 1 by copying the chromosome of the parents A and C 2 by copying the chromosome of parent B. It will then randomly select a number of positions and swap the genes between C 1 and C 2 at those positions. 6. Order crossover produces two children C 1 and C 2. It initializes by copying the genes of the parents to the children and deleting n genes randomly selected from each 11

2.4 Genetic Algorithm Figure 2.6: One and two points crossover - offspring. It then selects an interval with size n and slides the genes such that the interval is empty.

21 2.4 Genetic Algorithm Figure 2.6: One and two points crossover - offspring. It then selects an interval with size n and slides the genes such that the interval is empty. It then select the original genes in that interval from the opposite offspring. The algorithm is illustrated in Figure 2.7. Figure 2.7: Order crossover examples - 7. Cycle crossover produces two children C 1 and C 2. It initializes C 1 and C 2 by copying the chromosomes of the parents A and B respectively. Then it selects n random positions and replaces the genes from C 1 with genes from parent B in those positions. The process is repeated for C 2 with parent A Objective and fitness value The objective value is the performance measurement for each chromosome. This value is used to aid the selection of chromosomes for crossover. It can be used directly to select the good chromosomes for crossover, but it is normally scaled to produce a fitness value. The scaling function is one method that can be used to minimize the elitism problem described in 12

22 2.4 Genetic Algorithm section 2.4.5, where only a limited number of chromosomes is involved in producing the next generation. This fitness value is then used to compute compatibility of each chromosome for crossover. The compatibility is used to ensure that good individuals are not combined with bad ones. Many methods exist to compute the fitness value; the most common scaling methods are described below. 1. Linear scaling 2. Power law scaling fitness = α objectivev alue + β (2.1) fitness = objectivev alue α (2.2) 3. Sharing scaling computes the number of genes that the two chromosomes have in common. Two individual are considered unfit for mating when their difference is very low, meaning that they are too similar. The difference can be computed using bitwise operations (37) or other user-specified method if the chromosome is not encoded as bit strings Selecting individuals for reproduction Elitism and diversity are two important factors when selecting individuals for reproduction. With elitism, selection is biased towards the individuals with the best objective value. Elitism is important since it removes bad solutions from the population and reproduces the good ones. However, by continuously reproducing from a small set of individuals, the population becomes very similar which may lead to a sub-optimal solution. This effect The diversity of the population ought to be controlled to ensure that the search space is explored well. Many selection schemas have been developed to properly select the individuals for reproduction and to try to minimize the elitism problem. Some of the selection schemas include: 1. Rank schema selects the best individuals of the population every time. 2. Roulette Wheel selects individuals according to their fitness values as compared to the population. The probability of an individual being picked is:. p 1 = fitness len(population) i=0 fitness i (2.3) 13

23 2.4 Genetic Algorithm 3. Tournament sampling uses the roulette wheel method to select two individuals. Then it picks the one with the higher fitness value. 4. Uniform sampling selects an individual randomly from the population. 5. Stochastic remainder sampling first computes the probability of each individual being selected, p 1, and its expected representation, ε = p 1 len(population). The expected representation is used to create a new population of the same size. For example, if an individual has ε equal to 1.7, it will fill one position in the new population and it has a probability of 0.7 to fill another position. After the new population is created, the uniform method is used to select the individuals for mating. 6. Deterministic sampling computes ε of each individual as in the stochastic remainder sampling. A new population is created and filled with all individuals with ε 1 and the remaining positions are filled by sorting the original population s fractional parts of ε and selecting the highest individuals on the list GA Variations There are three common types of the Genetic Algorithm. They differ in how the new population is computed at the end of each generation. 1. Simple Genetic Algorithm uses a non-overlapping population between generations. At each generation the population is completely replaced. 2. Steady-state Genetic Algorithm uses an overlapping population where a percentage of the population is replaced by new individuals. 3. Incremental Genetic Algorithm has only one or two children replacing members of the current population at the end of each generation. Compared to other optimization algorithms, genetic algorithm is relative simple and robust(37). In the past, it has been successfully used to automatically generate test data to optimize the code coverage as described in section 1.2. In this work, genetic algorithms are used to automatically generate a set of test cases and optimize the number of faults found. One of the main reasons we believe genetic algorithm is a good approach for automatically generating test data is because it can adapt to the code being tested. It is plausible to assume 14

24 2.4 Genetic Algorithm that developers will acquire bad habits with time which leads to a patter of mistakes. One assumption is that genetic algorithms may be able detect some of these mistakes and tune the test data generation mechanism to exploit it. 15

3 Evolutionary testing 3.1 Implementation To link the genetic algorithm to Autotest an evolutionary testing strategy is implemented for Autotest.

25 3 Evolutionary testing 3.1 Implementation To link the genetic algorithm to Autotest an evolutionary testing strategy is implemented for Autotest. This strategy will generate and execute test cases according to parameters specified in a chromosome generated by a genetic algorithm. To find a good strategy (chromosome), a genetic algorithm is implemented in C++ using the GAlib(38) library. The communication between Autotest and the genetic algorithm is done through two files. The four basic components of the system are shown in Figure 3.1. Figure 3.1: Four basic components of the system - When Evolutionary Autotest is executed, it will load the chromosome from file containing parameter settings for the Autotest test generator and test the classes for a given amount of time. In the end, it produces a report containing the objective value (number of unique faults found) which is used by the genetic algorithm to evaluate how good that chromosome 16

26 3.1 Implementation is. The evolution of a testing strategy (chromosome) can be done for a single class or a set. On this work, however, the evolution of a testing strategy is performed for single classes Parameters Genetic algorithms work by optimizing parameters for a given problem. In order to optimize the generation of test cases, five different parameters have been used. These parameters influence how the test cases are generated and how Autotest is executed. 1. primitive values: these specify a set of values for each of the five primitive types (Integer, Real, Characters, Boolean and Natural). These values are used for creating objects that are used as input data. 2. method call: specifies which methods should be called and which parameters should be used for each method call. This parameter is used to set the software into different states while it is being tested. 3. creation probability: probability of creating a new object instead of reusing one from the object pool. 4. seed: value used to initialize the pseudorandom number generator. 5. sequential method call: calls the methods of the class under test sequentially and selects input parameters for each method at random. As described in section 2.3, Autotest has a fixed set of primitive values and it will call and create objects randomly. Although a study (2) has shown that the creation probability parameter can be optimized, it is not obvious which parameters are good for evolutionary testing. When there is not enough time to optimize the paremeter, it might be better to use a random strategy or predefied value. The goal is to select the best set of parameters for each class but because there are 2 5 possible sets of parameters, it is not feasible to test all of them for every class. The evolutionary algorithm will optimize all parameters and a file is used to specify which parameters Autotest should use while executing the evolutionary strategy. Because finding the best set of parameters for each class is an optimization problem, the genetic algorithm can also be used to optimize the set of parameters. The chromosome has been used to specify the values for these parameters, but the chromosome can also be 17

27 3.1 Implementation used to specify which parameters should be used. Thus, the genetic algorithm can be used to optimize both, the set of parameters used and their values Algorithm stages The chromosome is encoded as a list of floating numbers because all the parameters can be represented as a floating number without much conversion. The implementation of the genetic algorithm is divided into four stages. Figure 3.2: Four stages of the genetic algorithm - 1. Specification: Create the chromosome and specify the alleles. 2. Initialization: Create the initial population. 3. Evaluation: Evaluate the population. 4. Mutation and Crossover: Apply mutation and reproduction operations Allele value specification As described in section 2.4.1, the alleles can be used to specify the range or a list of possible values allowed for each gene. Specifying the allele for each gene simplifies the chromosome encoding and interpretation. For example, the range of valid values for the Character data type is between 0 and 600, but by randomly selecting a floating number, it is likely that a number outside this range will be selected since the set of floating numbers is much larger than the set of Character. This would force the number to be rounded down to 600 or up to 18

28 3.1 Implementation 0, and lead to a set of characters with similar values. By specifying the allele (0, 600), all the characters will have the same probability of being picked. The chromosome is created given the number η of values that is encoded for each primitive type. The starting, finishing index and the allele specification for each parameter is shown in Appendix B Initialization The seed, method call, creation probability are initialized with random values from the range of values specified by the alleles. The primitive values may be initialized in three different ways: 1. Randomly: select random values from the range of values specified by the alleles of each gene. 2. Hard coded: use the original values used by Autotest as specified in Table A.1 and complete the set of values with random values. 3. Static analysis: in this approach, a simple technique is used to extract primitive values from the classes under test. The system works by scanning the classes for natural, integer, real and character values and storing these values. Because the system does not consider the structure of the code, it will even use values found on comments. These values are then used to initialize the chromosome combined to random values. When initializing the chromosome, a probability (0.8) is used to specify whether each value should come from the set of value obtained using static analysis or from a random value generator (0.2). This probability is used to avoid initializing a population that is too similar, by introducing some random values, a level of diversification is guaranteed Evaluation When evaluating a chromosome, the genetic algorithm will generate a set of files (shown in Appendix C) that contain the values of the parameters encoded in that chromosome for a specific class. Autotest is then executed to test a class for a fixed amount of time and the number of unique faults found is used as the object value. Since each chromosome can be executed independently from the others, the evaluation of the population is executed in parallel. The parallel evaluation of the population works by creating 4 instances of the code under test and calling Autotest for each one of them. As 19

29 3.1 Implementation Figure 3.3: Parallel population evaluation - illustrated in Figure 3.3, four individuals are evaluated in parallel. This number was chosen because there were four processors in the computer used for experiments. For an optimal evaluation, the population size ought to be a multiple of four Mutation and crossover To test a piece of software thoroughly, it is important to test it in many different states. A state can be reached by a particular sequence of method calls. Autotest hopes to achieve different states by randomly invoking methods. To map this behaviour onto the chromosome, the possibility of adding and removing a method call has to be considered because some states can be reached in two, while others may require seven method calls. Another problem is that each method call has a certain number of parameters of a specific type. With these requirements, the crossover operation may produce a corrupted chromosome, since the number of method calls and parameters for each method call may differ for each chromosome. Figure 3.4 shows an example where the chromosome stores the method to be called with the parameters in the same section of the chromosome. Chromosome X will call method a, method b and method a again. The problem is that method a takes two String parameters. The combined chromosome, however, will produce a call to method a that takes one String and one Integer. One possible solution to this problem was described by Tonella(24). Tonella used grammar to specify syntax. This grammar was then used to drive the mutation and 20

30 3.1 Implementation Figure 3.4: Corrupted chromosome caused by crossover - crossover operations. In this project, a simpler approach was used to solve the same problem. First, the section on the chromosome that specifies which methods should be invoked is separated from the section that specifies which parameters should be used. When a method needs three parameters, it reads three slots from the parameter section of the chromosome. If the next method requires two parameters it will read the next two slots. To ensure that the parameters are of the right type, the chromosome does not specify the object to be used but instead specifies an index of the object as shown in Figure 3.5. Since Autotest knows which types are needed to execute each method, the chromosome just needs to specify which object from the list of possible objects has to be used. Because the number of methods and the number of available types is not know in advance, the chromosome assumes a maximum number and the real index is computed with real index = chromosome index MOD list size. Where the list size is the list of methods to call or a list of available object of a given type. With this approach, adding or removing a method call is very simple. Whenever a mutation makes the real index = 0, the method call is removed and when the real index is modified from 0 to a different number, a method call is added. With this approach different 21

31 3.2 Evolutionary Autotest Figure 3.5: Valid chromosome crossover - mutations and crossover methods can be used without having to worry about the chromosome getting corrupted. 3.2 Evolutionary Autotest The evolutionary strategy is executed by specifying the E option when running Autotest. It starts by loading a file that specifies which parameters it should use and the chromosome files which store the values for these parameters. Then it checks if the creation probability parameter is being used. If so, it sets the probability value. With the Evolutionary Autotest, there are two new ways to select the methods to be tested with the evolutionary strategy. If the method call parameter is true, it will compute the real index and select a method from the method table with that index. If the method call is false and the sequential method is true, it will select the next method from the table of methods in a sequential manner. When both the method call and sequential method are true, the method call is used. A random method is selected if both parameters are false. After Autotest invokes a method, it checks if the seed parameter is being used, if so, it selects a new seed from the list of seeds as shown in Figure

32 3.2 Evolutionary Autotest Figure 3.6: Evolutionary Autotest 1 - loading chromosome and evolve.conf - The invoke method will use the creation probability to decide if new objects should be created instead of reusing the objects from the object pool. The pseudocode of this method is shown in Figure 3.7 Figure 3.7: Evolutionary Autotest 2 - method call - When creating an object, Autotest will check if the method call parameter is being used. If so, the constructor method is selected according to the real index computed using a value from the list of method calls encoded in the chromosome. If the method call is not being used, Autotest will select a random constructor to instantiate the object. If it is creating a primitive type, it will check whether the primitive value parameter is true. If so, it will get 23

33 3.2 Evolutionary Autotest a value from the list of primitive values loaded for each primitive type. Figure 3.8 shows the pseudocode for the create new input object method. Figure 3.8: Evolutionary Autotest 3 - object creation - When all the parameters are false, the evolutionary strategy becomes a random strategy. 24

34 4 Experiments 4.1 Introduction The experiments were divided into three groups, each group is concerned with a specific optimization of the system. The experiments from Group A were executed to find what is the best way to encode the chromosome by examining the effect of each parameter on the number of faults found. The experiments from Group B were executed to optimize the genetic algorithm by evaluating different genetic operators and probabilities. The last set of experiments from Group C were executed to evaluate the effectiveness of evolutionary testing compared to the random testing. 4.2 Setting Twenty two classes were randomly selected from two well-used libraries, time and lex, provided with EiffelStudio 6.3. The lex library provides a mechanism for building lexical analyzers from regular expressions and the time library provides an abstraction for data and time computation. The selected classes were divided into two sets α and β. The α set listed in Table 4.1 was used for optimizing and validating the system and the β set listed in Table 4.2 was only used for validating the system. The tables 4.1 and 4.2 list the number of lines of code (LOC), the number of local features, the number of feature including the inherited ones for each class. All experiments were executed on a single machine with Intel Core tm 2 Quad Q6600 with 2Gb of RAM running Linux. 25

35 4.2 Setting Library Class Name LOC Local features Total features Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute time interval duration date time parser date time validity checker Sum Total Table 4.1: Properties of the classes from the α set In each experiment, a number of parameters used by the genetic algorithm has to be specified. These parameters were set according to each experiment with the goal of emphasizing the part of the algorithm being tested. For example, when evaluating the parameters for Autotest, it is important to have a bigger population size to increase the number of executions of Autotest. On the other hand, when evaluating crossover or mutation algorithms, it is important to increase the number of generations, since these operations are only performed at the end of each generation. In total six different settings were used. These settings are specified in Table 4.3. Library Class Name LOC Local features Total features Lex text filler lex builder lexical linked automaton metalex ndfa pdfa scanning state of dfa Sum Table 4.2: Properties of the classes from the β set 26

36 4.3 Experiments Group A 4.3 Experiments Group A In section 3.1.1, a number of parameters that specify how test cases should be generated were identified, but not all parameters might have a positive effect on the evolutionary algorithm. Some of these parameters might be very sensitive to modification or take very long to be optimized and this may lead to an overall poorer solution. The goal of the experiments in group A is to find the best set of parameters that should be encoded in the chromosome. Configuration Setting 1 Setting 2 Setting 3 Setting 4 Setting 5 population size number of generations mutation probability 0.6 α crossover probability replacement percentage Table 4.3: Experiment settings Experiment A1: Autotest parameters As described in section 3.1.1, a total of 5 parameters that specify how test cases should be generated were identified. A total of 12 experiments were executed to find out how these parameters contribute to the number of unique faults found. These experiments were executed with genetic algorithm setting 1 specified in Table 4.3. The number of faults found for each class for different set of parameters are shown in Table 4.4. The results show that the performance of the parameters is dependent on the class being tested. According to the results, there is no dominating parameter, as each parameter performed best for at least one class. For instance, the creation probability parameter which had the worst performance overall, performed the best for the error list class. The method call parameter performed the best for both df a and data time validity checker classes. Since there are 32 possible combinations of parameters, it is unfeasible to test all of them everytime a class is tested. Thus the technique that optimizes the set of parameters used described in section was developed. This experiment was therefore the only experiment that did not use this technique. This experiment also compared two methods for initializing the primitive values. The Primitives column of Table 4.4 shows the number of faults found when randomly initializing the primitive values and the column Static analysis shows the number 27

37 4.4 Experiments Group B Library Class name Primitives Primitives, Creation probability Primitives, Sequential method call Primitives, Seed Primitives, Method call Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute time interval duration date time parser date time validity checker Sum Total Static analysis Table 4.4: The effect of Autotest parameters on the number of unique faults found of faults found when initializing the primitive values by combining values extracted from the classes being tested to random values as described in section The number of faults found using the static analysis technique considerably increased compared to random. Figure 4.1 shows a comparison of the two approaches. Another interesting result was the poor performance of the creation probability parameter. The optimization of the primitive and the creation probability parameter decreased the number of faults found compared to the optimization of the primitive parameter alone. This negative effect was due to the range of values (0 to 1) used for this probability. According to (39), Autotest performs bad when the creation probability is far from the value of To improve the performance of the creation probability parameter, the range of possible values was be decreased to (0.2 to 0.3). 4.4 Experiments Group B As described in section 2.4, there are different mutation, crossover and population selecting algorithms. executed. In order to evaluate these genetic operators a total of 65 experiments were 28

4.4 Experiments Group B Figure 4.1: Number of faults found using random and static analysis technique to select the initial primitive values - 4.4.1 Experiment B1: Mutation probability Mutation probability controls how often the mutation operator is applied to each gene.

38 4.4 Experiments Group B Figure 4.1: Number of faults found using random and static analysis technique to select the initial primitive values Experiment B1: Mutation probability Mutation probability controls how often the mutation operator is applied to each gene. When the probability is too low, the genetic algorithm takes longer to converge and when the probability is too high, the algorithm becomes unstable. To find the best value, 10 experiments were executed with five different mutation probabilities. The flip mutation algorithm and the genetic algorithm setting 2 was used in these experiments. As shown in Table 4.5, the mutation probability does not seems to have a big impact on the overall performance as long as the probability is not too low. In this case the mutation probability of 0.4 was just slightly better than 0.8 by finding two more unique faults Experiment B2: Mutation algorithm The mutation algorithm has a direct impact on how the search space is explored. The three mutation algorithms described in section were evaluated. To find which mutation algorithm performed best, a total of 6 experiments were executed with setting 5. The number of unique faults found for each class is shown in Table 4.6. The results show that the flip mutation algorithm outperformed the swap mutation algorithm by 36% and the gaussian by 32%. The flip mutator performed the best for all classes except error list as illustrated in Figure 4.2. One possible reason for the poor performance 29

39 4.4 Experiments Group B Library Class Name Lex automaton dfa error list fixed automaton fixed dfa fixed integer set Sum Time high builder absolute T ime interval duration date time parser date time validity checker Sum Total Table 4.5: The effect of the mutation probability on the number of unique faults found Library Class Name Flip Swap Gaussian Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute T ime interval duration date time parser date time validity checker Sum Total Table 4.6: The effect of mutation algorithms on the number of unique faults found 30

4.4 Experiments Group B of the swap mutator is that it will never introduce new values in the chromosome, thus limiting the search space to the current values.

40 4.4 Experiments Group B of the swap mutator is that it will never introduce new values in the chromosome, thus limiting the search space to the current values. One possible reason for the poor performance of the gaussian mutator compared to the flip mutator, is that the gaussian mutation algorithm replaces the value of a gene by a close-by value, which leads to the exploration of states that are close to the current state. Figure 4.2: Number of faults for mutation algorithms for each class Experiment B3: Crossover probability The crossover probability controls how much of the population will crossover. A low crossover probability may lead to a very slow convergence whereas a high value may lead to a high number of unfit individuals. To find a good crossover probability a total of 14 experiments were executed using the uniform crossover algorithm. The genetic algorithm setting 5 was used. The table 4.7 shows the number of unique faults found for each class. 31

41 4.4 Experiments Group B Library Class Name Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute time interval duration date time parser date time validity checker Sum Total Table 4.7: The effect of crossover probability on the number of unique faults found The results show that the best crossover probability is around 0.4. Compared to the mutation probability, the crossover probability had a greater influence on the result as shown in Illustration 4.3, the crossover probability forms a curve with peak on 0.4 whereas the mutation probability looks like a straight line. This indicates that the crossover algorithm may have a greater influence on the number of faults found compared to the mutation algorithm. Figure 4.3: Effect of mutation and crossover probability on the number of faults - 32

42 4.4 Experiments Group B Experiment B4: Crossover algorithm Combined with the mutation algorithm, the crossover algorithm specifies how the search space is explored. The crossover algorithm must be able to combine chromosomes in a way that affects all the values encoded in the chromosome. Since the chromosome is encoded by sections, where each section represents the values of a single parameter, the crossover algorithm must be able to mix well all sections of the chromosome. To find a good crossover algorithm, all algorithms described in section were evaluated. A total of fourteen experiments were executed using setting 5. The results are shown in Table 4.8. Library Class name Uniform Even Odd One Point Two Points Partial Match Order Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute time interval duration date time parser date time validity checker Sum Total Cycle Table 4.8: The effect of crossover algorithms on the unique number of faults found The results indicate that the crossover algorithms that modify more sections of the chromosome had a better performance compared to the ones that only modify a few. By modifying different parts of the chromosome, the algorithm has a higher chance of modifying the values of all parameters encoded instead of just one. As shown in Figure 4.4, the uniform and partial match performed much better then the order and cycle crossover. The discrepancy between the results of the one and two point crossover algorithm is strange. The difference on the number of unique faults found might be too high to be attributed to variation. More experiments would be required to investigate why one point crossover performed much better than two points. 33

4.4 Experiments Group B Figure 4.4: Comparison of crossover algorithms - 4.4.5 Experiment B5: Selection method The selection algorithm is very important to control the diversity of the population.

5, if the same individuals are constantly selected for reproduction, the population may become too similar and converge to a sub-optimal solution.

43 4.4 Experiments Group B Figure 4.4: Comparison of crossover algorithms Experiment B5: Selection method The selection algorithm is very important to control the diversity of the population. As described in section 2.4.5, if the same individuals are constantly selected for reproduction, the population may become too similar and converge to a sub-optimal solution. To find a good selection mechanism, all algorithms described in section were evaluated in a total of 12 experiments with setting 4. The results are shown in Table 4.9. Figure 4.5: Comparison of selection algorithms - 34

44 4.5 Experiments Group C Library Class Name Rank Roulette Wheel Tournament Uniform Stochastic Reminder Deterministic Sampling Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute T ime interval duration date time parser date time validity checker Sum Total Table 4.9: The effect of different population selection schema on the number of unique faults found Both the stochastic reminder and the rank method found a total of 110 unique faults. The rank algorithm does not perform any speciation technique, it only selects the best individuals, whereas the stochastic reminder will normalize the population as described in Among the six selection schemas evaluated, three had a similar result, as shown in Figure 4.5. These results indicate that elitism has a good effect on the number of unique faults found by the genetic algorithm, since the three best performing algorithms had a greater probability to select the individual with the best score. In particular, the Rank algorithm that will constantly select the individuals with the best score for reproduction. The crowding issue does seems to for this type of genetic algorithm, one possible reason for that is the small number of generations. Because the computation of the objective function takes a long time, the number of generation has to be limited, not giving enough time for crowding to happen. 4.5 Experiments Group C With the optimization of the chromosome and the genetic algorithm completed, the final system is compared to an automated random testing system represented by Autotest. The main 35

45 4.5 Experiments Group C criterion for evaluation is the number of unique faults found within a given amount of time. To make the comparison fair, Autotest is executed for the same amount of time taken by the evolutionary algorithm to evolve a testing strategy and test the system using this strategy. Three executions of Autotest with different seeds were recorded for 15, 30 and 60 minutes and the average results are used for comparison. In addition to the set of classes used in the previous experiments, a new set of classes as described in 4.2 were used for validation the results Experiment C1 :Original Autotest To determine the maximum number of unique faults Autotest can find within a certain amount of time, Autotest was executed three times for 15, 30 and 60 minutes. The average number of faults found for each class from the α set are shown in Table 4.10 and for the β set in Table Library Class Name 15 min 30 min 60 min Lex automaton df a error list f ixed automaton fixed dfa f ixed integer set high builder Sum Time absolute time interval duration date time parser date time validity checker Sum Total Table 4.10: Number of unique faults found in the α set by the original Autotest The variation from the three execution was not so large. Classes with a higher number of faults had a greater variation compared to the ones with less faults. The variation on the 36

46 4.5 Experiments Group C Library Class Name 15 min 30 min 60 min Lex text f iller lex builder lexical linked automaton metalex ndf a pdf a scanning state of df a Sum Table 4.11: Number of unique faults found in the β set by the original Autotest time library was very small, the total number of faults found in the classes from the time library had a variation of only one fault from the average result. The Figure 4.6 shows the variation on the total number of faults found in all classes for the three executions of Autotest. Figure 4.6: Variation on the total number of faults found - The total number of faults found in each execution and the average for 15, 30 and 60 minutes is shown in Table

47 4.5 Experiments Group C Execution 15 min 30 min 60 min Average Execution Execution Execution Table 4.12: Total number of faults found in three executions of Autotest It can can also be observed that most of the faults were found in the first few minutes of executions and the number of new faults found rapidly decreased as shown by Figure 4.7. This result goes in accordance with the results from the random testing predictability study (39). Figure 4.7: Autotest progress - Progress of number of faults found Experiment C2: Autotest with static analysis Using static analysis to select an initial set of primitive values is a technique that is independent from the the evolutionary algorithm. Because this technique was used in the evolutionary testing system, Autotest was extended to use the same static analysis technique for selecting primitive values. As described in section , this approach extracts the primitive values from the classes it is testing to use as input for the data generation. This enhanced Autotest was then executed for 15, 30 and 60 minutes and the number of faults found in the α set is shown in Table 4.13 and the number of faults found in the β set in Table

48 4.5 Experiments Group C Library Class Name 15 min 30 min 60 min Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute T ime interval duration date time parser date time validity checker Sum Total Table 4.13: Faults found in the α set by Autotest with static analysis Library Class Name 15 min 30 min 60 min Lex text filler lex builder lexical linked automaton metalex ndfa pdfa scanning state of dfa Sum Table 4.14: Faults found in the β set by Autotest with static analysis Autotest with static analysis found more faults than the original Autotest for the α set as shown in Figure 4.8 and the β set as shown in Figure 4.9. This was expected since using primitive values extracted from the source code increases the probability that test cases generated using primitive values are relevant. For example, if the source code has a boolean expression that evaluates x == 782, the probability of randomly returning the value of 782 is very low. But using static-analysis this values is immediately added to a list of relevant values. 39

49 4.5 Experiments Group C Experiment C3: Evolutionary testing With the crossover, mutation, selection algorithm and probability selected, the final genetic algorithm can be composed. The final genetic algorithm uses the partial matching crossover algorithm with a crossover probability of 0.4, the flip mutator with a mutation probability of 0.4 and the stochastic remainder algorithm to select the population for crossover. With the parameters chosen, the evolutionary algorithm is executed for 15, 30 and 60 minutes for both the α and the β set. This time includes both the evolution and the execution of a strategy. The time allocation used for each execution is shown in Table These values were chosen based on a preliminary optimization (results not shown). Total time Evolution Execution 15 min min min Table 4.15: Time allocation for each execution of the final system Between the three executions, the population size, number of generation and the size of the chromosome varied. For short executions it is better to have a smaller chromosome. A smaller chromosome will converge faster, but it may converge to a sub-optimal solution, so for longer executions it is better to have a larger chromosome. This effect is mainly due to the number of primitive values that are encoded in the chromosome. As described in section , this number defines the size of the chromosome. The settings used for the three executions are shown in Table Setting 15 min 30 min 60 min primitive set size population size number of generations mutation probability crossover probability Table 4.16: Settings for the final execution The evolutionary algorithm was first executed for the 13 classes from the α set. number of unique faults found for each class is reported in Table The 40

4.5 Experiments Group C Library Class Name 15 min 30 min 60 min Lex automaton 9 9 9 dfa 16 17 17 error list 13 13 15 fixed automaton 17 23 31 fixed dfa 32 39 41 fixed integer set 2 2 2 high builder

50 4.5 Experiments Group C Library Class Name 15 min 30 min 60 min Lex automaton dfa error list fixed automaton fixed dfa fixed integer set high builder Sum Time absolute T ime interval duration date time parser date time validity checker Sum Total Table 4.17: The number of unique faults found in the α set by the GA Figure 4.8: Evolutionary approach on α set - Number of faults found in the α set 41

4.5 Experiments Group C A comparison between the original Autotest, Autotest with static analysis and the evolutionary algorithm illustrated in Figure 4.

51 4.5 Experiments Group C A comparison between the original Autotest, Autotest with static analysis and the evolutionary algorithm illustrated in Figure 4.8, shows that the evolutionary algorithm outperformed both the original Autotest and Autotest with static analysis. Because the classes from the α set were used for optimization, the system was tested against a new set of classes, the β set. The number of faults found in the β set using the evolutionary algorithm is shown in Table Library Class Name 15 min 30 min 60 min Lex text filler lex builder lexical linked automaton metalex ndfa pdfa scanning state of dfa Sum Table 4.18: The number of unique faults found in the β set by The GA Figure 4.9: Evolutionary testing on β set - Comparison of the three approaches using classes from the β set 42

52 4.5 Experiments Group C The results for the β set also shows the evolutionary algorithm as the best performing approach. However, the difference between the Autotest with static analysis and the evolutionary algorithm is very small. One possible reason is that the evolutionary algorithm has a big penalty when it is executed for short times because it won t have enough time to evolve a good strategy. This penalty is even higher when a class has a high number of faults, since most of these faults are found within the first minutes. With longer executions, the time used to evolve a good strategy starts to pay-off. While Autotest is slowing down the rate it finds new faults, the evolutionary approach continues to find new faults in a faster pace. This pattern is shown in Figure Figure 4.10: approaches - Total number of faults found for all classes over time by the three The evolutionary algorithm performed considerably better when the classes being tested had fewer bugs. This can be seen in Figure 4.11 which shows a comparison of the three systems using only the classes from the time library. 43

53 4.5 Experiments Group C Figure 4.11: Evolutionary approach time library - Comparison of the three approaches using classes from the time library 44

54 5 Discussion 5.1 Types of faults found In order to find out the types of faults that are being discovered by the evolutionary algorithm, the faults found by running the evolutionary algorithm for 60 minutes with the metalex class were analyzed. A total of 43 faults were found in 37 features. A total of 8% of the metalex features contained at least one fault. There were basically four types of faults: 1. Class invariant violation: One of the class invariant condition is violated. 2. Precondition violation: The method violated another method precondition. 3. Call on a void object: The method tried to invoke a feature on a void object. 4. Memory / OS Related: Not enough memory or file not found types of faults. As Figure 5.1 shows, the most predominant type of fault found was the precondition violation followed by the call on void object. The call on void type of faults are less relevant compared to the other types because Eiffel is becoming void-safe. That is, Eiffel will assure at compilation time that there won t be any call on void objects. 5.2 Parameters The genetic algorithm was used to both find the parameter values and to determine which parameters should be used. During the initialization of the chromosome, the genes that specify if a parameter should be used are randomly initialized to a value between -1 and 45

5.2 Parameters Figure 5.1: Distribution of the types of faults found in the metalex class - 1. When this value is negative, the parameter it represents is not used.

55 5.2 Parameters Figure 5.1: Distribution of the types of faults found in the metalex class - 1. When this value is negative, the parameter it represents is not used. This initialization assumes that all parameters have the same importance to the system, but this is not true. By analyzing the chromosomes evolved for the 60 minutes execution of the evolutionary algorithm from experiment C3, we can find out the importance of each parameter to the system. Higher usage frequency means greater importance. Figure 5.2 shows how often each parameter was used. One notices that the primitive values parameter was used in all successful strategies, which highlights the importance of this parameter. The seed parameter, on the other hand, was only used 7.69% of the time. It is important to consider these usage frequency values, because they directly affect chromosome initialization and thus could considerably decrease the time needed to evolve good strategies. For example, the evolution of the primitive values should always be used, whereas the seed parameter can most likely be left out. Although these values can be used to initialize the chromosome, experiment A1 showed that each class will benefit differently from the optimization of each parameter. From this we can conclude that the evolutionary testing is specific to a single class and it might not generalize well for a set of classes. 46

5.3 Conclusion Figure 5.2: Usage frequency of each parameter - 5.3 Conclusion Based on the results presented in this thesis, we draw a number of conclusions.

56 5.3 Conclusion Figure 5.2: Usage frequency of each parameter Conclusion Based on the results presented in this thesis, we draw a number of conclusions. conclusions are grouped in three sections below. These 1. Genetic operators In this work, 3 mutation, 7 crossover and 6 population selection algorithms were evaluated. The results of experiment B2 showed that it is important to introduce new random values in the chromosome and increase the diversity of the population. Between the three mutation algorithm evaluated, the flip mutation algorithm was the best. It outperformed the other two mutation algorithm 92% of the time. The performance of the crossover algorithm seems to be depended on how the chromosome is defined. Because the chromosome, in this case, contained many parameters, the crossover algorithms that affected many sections of the chromosome had a better performance. One of the main purpose of the population selection algorithm is to control the crowding problem. However, the system did not seems to have a crowding problem. One possible reason for that is the low number of generations. Because the evaluation of the population takes a some time, the number of generations has to be low, which may not be enough time for crowding. 47

Automated Object-Oriented Software Testing using Genetic Algorithms and Static-Analysis

Automated Object-Oriented Software Testing using Genetic Algorithms and Static-Analysis Lucas Serpa Silva Software Engineering Swiss Federal Institute of Technology A thesis submitted for the degree of