Towards Byte Code Genetic Programming 1

CATEGORY: Genetic Programming Towards Byte Code Genetic Programming 1 Brad Harvey James A. Foster Deborah Frincke Compaq/Tandem Division Computer Science Dept. Computer Science Dept. 14231 Tandem Bl. University of Idaho University of Idaho Austin, Texas 78728 Moscow, Idaho 83844-1010 Moscow, Idaho 83844-1010 brad.harvey@compaq.com foster@cs.uidaho.edu frincke@cs.uidaho.edu (512)432-7142 (208)885-7062 (208)885-9052 ABSTRACT This paper uses the GP paradigm to evolve linear genotypes (individuals) that consist of Java byte code. Our prototype GP system is implemented in Java using a standard Java development kit (JDK). The evolutionary process is done completely in memory and the fitness of individuals is determined by directly executing them in the Java Virtual Machine (JVM). We validate our approach by solving a functional regression problem with a fourth degree polynomial, and a classification problem diagnosing thyroid disease. Our implementation provides a fast, effective means for evolving native machine code for the JVM. 1. Introduction An increasing number of software vendors are including Java, or Java-based components, as strategic parts of their systems. For example, the leading browser vendors (Microsoft, Netscape, and Sun) support Java applets. The leading database vendors (Oracle, IBM, Tandem, Informix, etc.) are collaborating on standards for embedding SQL in Java and Java within the database engine. A leading smart card vendor (Schlumberger) offers a subset of the Java operating environment on their smart card so that smart card application developers can develop applications using standard Java development tools. Numerous other vendors in more non-traditional computing environments (e.g., embedded devices or real-time operating systems) are in the process of supporting the Java environment. One drawback of using Java for time critical applications is performance. Java programs are slower than conventional compiled programs because a software virtual processor executes them. However, with just-in-time (JIT) compilers, adaptive compilers, and with the possible future option of having at least part of the JVM execution engine implemented in hardware [McGhan and O'Connor, 1998], this performance gap will only shrink. Genetic Programming (GP) has proven to be a very powerful paradigm for solving diverse problems from a variety of different domains, including regression problems, robot control, hardware design, and protein segment classification. Therefore, because of Java's rapid success as operating environment and because of the power of GP, we explore using GP to directly evolve Java byte code in the context of a standard Java environment. Our work was initially inspired by Nordin's [1994] use of GP to evolve RISC machine code. We, along other researchers, are exploring byte code GP [Harvey, et al., 1998][Klanhold, et al., 1998][Lukschandl, et al., 1998]. The remainder of this paper is structured as follows. Section 2 provides background on the JVM. Section 3 discusses related work. Section 4 discusses our work. Section 5 provides experimental results. And Section 6 concludes the paper. 1 The basis for this paper appeared in the late breaking papers of GP-98. 1

2. Background The JVM [Lindholm and Yellin, 1996] executes a program that is contained in a set of class files. A class file is a stream of bytes, conceptually similar to an executable a.out file in UNIX. For example, it contains a magic number, versioning information, a constant pool which is similar to a symbol table, and for each method in the class the JVM instructions which constitute the method. There is a 1-1 relationship between a Java class in a Java source file and a binary class file. Multiple Java classes can reside in a single source file, but each Java class has its own binary class file after its been compiled. Normally class files are created using a Java compiler (e.g., javac). However, the class file is not dependent on the Java language. In other words, other tools or language compilers not related to Java could generate class files that are executed by the JVM runtime system, assuming they are valid with respect to the JVM specification. The JVM execution engine is a stack based virtual processor containing an instruction set of about 150+ instructions, most of which are low-level instructions similar to those found in an actual hardware processor. Most instructions are a single byte, so it does not a have a fixed instruction format like a RISC processor. The JVM does not contain a set of explicit general-purpose registers like most conventional processors. The JVM supports the notion of separate threads of execution. Each thread has its own program counter and stack for state related purposes. Each call in the context of a thread generates a new stack frame on the thread's stack. Each stack frame contains local variables, an operand stack, and other state related information. In many ways, the local variables serve the same purpose as registers (e.g., they contain the method's parameter values). The operand stack is used by the JVM instructions as a stack based scratch pad. For example, the floating point instruction (fadd) pops two values from the operand stack, adds their values, and pushes the result back on the operand stack. Each method's maximum operand stack size is determined at compile time. The JVM loads and links classes using symbolic references dynamically at runtime. The JVM contains a system class loader that knows the details of loading classes from the local file system. However, applications that load classes from other sources provide their own version of the class loader. For example, a Java enabled browser contains a class loader that knows how to load applets over the network from a URL. 3. Related Work 3.1 Automatic Induction of Machine Code with GP (AIMGP) The main motivation for AIMGP [Nordin, 1994] (initially referred to as CGPS for Compiling GP System ) is performance and a compact representation scheme. AIMGP departs from using tree-structured individuals [Koza, 1992], and instead represents them as linear bit strings that represent actual machine code. It takes advantage of the common von Neumann architecture in which a program and its data reside in the same memory space, enabling a program to manipulate it self as it would data. While the individuals of the evolutionary process are represented as machine code, the GP kernel including the genetic operators is written in C. The system has been implemented on Sun SPARC workstations, which are register-based machines with a 32-bit instruction size format. Nordin and his colleagues have recently released a commercial implementation of AIMGP technology for Pentium machine code, known as Discipulus. The benefits of the AIMGP approach compared to the conventional GP approach are: Execution speed - no interpretation. For example, an individual's fitness is computed by directly executing machine code. Indications are that this speed-up is between 1500-2000 times compared to the tree-structured Lisp approach. Compact GP kernel (30KB) - A Lisp interpreter is not required. Compact representation - A single 32-bit instruction of the SPARC is able to represent four nodes in a conventional approach (operator, two operands, and result). Simple memory management - Due to the linear structure of the binary machine code. 3.2 Java Bytecode GP (JBGP) The initial objectives for JBGP [Lukschandl, et al., 1998] are to use a standard Java environment for the evolution of byte code and to master the difficulties encountered interacting with the Java verifier and class loader. In JBGP s representation scheme the genotype and phenotype are not the same structure. They are both linear in nature and contain byte code, but the genotype contains additional information to aid the genetic operators in their production of valid byte code with respect to stack depth and branch addresses. The genotype structure contains an individual s maximum stack depth and an array of Instruction structures. An Instruction structure contains additional information per byte code instruction such as number of pushes and pops, local maximum stack depth, branch offset, and an array of bytes that represents the actual byte code for a JVM instruction. The phenotype structure consists purely of Java byte code. Initially a population consists of an array of individuals in their genotype format. In order to evaluate the fitness of a population, it is transformed into a Java class file. A class file is created with each individual in the population being represented by its phenotype structure as a Java 2

method. This process involves copying and concatenating the byte code that resides in each Instruction data structure into the class area. Once the class has been constructed it is loaded into the JVM using a class loader such that each individual s fitness can be computed by calling it as a Java method. JBGP supports crossover and mutation as well as fitness proportional and ranking selection. JBGP is implemented using the Java development environment from Symantec called Visual Café. 3.3 Java Bytecode Program optimization with Evolutionary Techniques (JAPHET) The motivations for JAPHET [Klanhold, et al., 1998] are: the platform independence of the JVM with respect to the evolution of JVM machine code, the object oriented aspect of the byte code which can be used to evolve complex structures, and the fact that other high-level languages can be compiled to byte code (e.g., Ada). In JAPHET the representation schema is a linear genotype in which an individual is represented as a set of Java classes. In this scheme, each class file has a dynamic part and a static part. The dynamic part of the class file is affected by the genetic operators while the static part is not. The dynamic part of the class file contains the methods of the class file. The static part contains class file fields such as version information, the constant pool, etc. The initial class file used to build-up individuals during generation zero is supplied by the user. The system extends this class file as it adds byte code to its methods. JAPHET supports several crossover operators and the mutation operator along with fitness proportional, rank and tournament selection. JAPHET is written in Java. 4. bcgp Like other researchers, our initial work with byte code GP has been to understand the issues of evolving Java byte code in the context of a standard Java environment. This is in contrast to other possible approaches such as building a custom JVM, modifying a JVM, or using native methods to facilitate the GP process. We call our system bcgp for byte code GP, and we have experimented with it on problems of symbolic regression and classification. The initial goals for our work are: Use of a standard Java development environment (Sun's JDK 1.1). Create a prototype GP system written in Java that evolves individuals that consist of Java byte code. Support direct execution of these individuals in the context of the standard JVM. Enable the evolutionary process to occur completely in memory. 4.1 Representation The representation scheme used by bcgp is as follows. Each generation of a run is represented by its own in memory, dynamically created Java class file with a method in it for each individual in the population. An individual under this scheme is a linear genotype consisting of Java byte code. There is no special phenotype representation in this scheme. The representation of an individual is similar to the one used in AIMPG [Nordin, 1994]. Specifically, each individual contains a header, body, footer, and buffer. The header, body, and footer consists of byte code operations which support the terminal and function sets along with additional byte code required to support these sets. Generation classes and methods are created using a naming scheme that logically identifies them. The naming scheme for a class representing a generation is g<run #><gen #>.class where <run #> is a number between 0 and 9 indicating which run this, and where <gen #> is a number between 00 and 99 indicating the generation s number. For example, the generation class g000.class represents generation 0 of run 0. The naming scheme for an individual is f<individual #> where <individual #> is a number between 00000 and 65535 (2^16-1). For example, if the population size is 100, then each generation class contains a set of functions: f00000 through f00099. While each generation is represented by its own in memory Java class file, only two generations are ever present in memory at a time: the current generation and the next generation. The current generation is the one whose members are undergoing fitness evaluation, selection and reproduction. For a given invocation of the system, each generation class is of the same uniform shape and size. For example, each generation has the same name length, the same number of methods, each method has the same name length, and each method has the same size when its buffer area is taken into account. This results in the following efficiency from a storage and manipulation perspective: two fixed size byte buffers are used to store the current and next generations. The two buffers are represented as a 2 x n array where n is the maximum size of an individual. Since the Java class file represents its methods as a table, the byte offset within a buffer for a given individual can be computed based upon the fixed sizes of the various aspects of the class file. We exploit this ability to index into the buffers representing the current and next generations, since the genetic operators are able to directly manipulate the byte code contained in the methods of the class files. Another aspect of our representation scheme is that the effective size of each individual is stored externally to the genotype (Java method). The effective size of an individual ignores the buffer area at the end of the individual. The sizes are contained in two arrays, which represent the current and next generations. Each array has m elements where m is the population size. Storing such meta-data in auxiliary structures avoids a representation that requires both a genotype and phenotype structure. The underlying Java facilities used to support this representation scheme are: 3

The dynamic creation of in memory generation classes with methods initialized to all no-ops uses the JAS class builder package [Meyer and Dowing, 1997]. These classes are sound with respect to the JVM specification. bcgp provides its own class loader that knows how to load these in memory class files. During fitness evaluation, the Java class and reflection facilities (java.lang.class and java.lang.reflect, respectively) are used to find the generation s individuals and then to invoke each individual (Java method). At runtime, these in-memory class files can be written to disk for later incorporation into other applications, or they can be inspected with other Java tools (e.g., javap, which is a class file dissembler). 4.2 Genetic Operators bcgp supports mutation, crossover, and reproduction. In all cases only the byte code representing an individual (excluding its header and footer) is modified. The mutation operator works by randomly changing a byte code operation s opcode and/or its operand values. The mutated op-code is taken from the function set. We used single-point crossover between two individuals of different lengths in order to produce a new individual in the next generation. We also used restrictions (or repair) on crossover to ensure valid byte code individuals, depending upon the JVM byte code. For example, the conditional byte codes such as those used in the classification experiments require that the branch addresses be repaired after crossover. The reproduction operator simply copies an individual from the current generation to the next generation. 4.3 Fitness Evaluation An individual s raw fitness is based on its error rate with respect to the training data. Error rate is defined by the ratio: hits / test cases, which results in a real number between 0 and 1. A hit occurs when an individual provides the correct answer to a specific training case. The more fit an individual, the closer its raw score is to zero. bcgp also supports (via a configurable parameter) augmenting the raw fitness score with the adaptive parsimony pressure method [Zhang and Muhlenbein, 1996][Banzhaf, et al. 1998]. 4.4 Selection Method We have been using k-tournament selection with k being a configurable parameter. 5. Experiments In the following section, we discuss experimental results using bcgp on the problems of symbolic regression and binary classification. 5.1 Symbolic Regression In this experiment bcgp solves the symbolic regression problem for the function: f(x) =x 4 +x 3 +x 2 +x, with a training data set consisting of twenty points in the interval [-1, 1). Table 1 summarizes the GP parameters for this experiment. Parameter Value Function set fadd, fsub, fmul, fdiv Terminal set x (independent variable) Error tolerance 0.01 Runs per experiment 10 Number generations per run 25 Population size 64 Max individual size (bytes) 64 Selection method Tournament (4) Probability of crossover 0.8 Probability of mutation 0.0 Probability of reproduction 0.2 Table 1 - Symbolic Regression Parameters During this experiment the target function is actually found rather than merely being approximated. In all but one run, a successful solution is found. The best-of-all solution from the standpoint of both accuracy and efficiency is found during run 6 generation 10 (individual 54). The byte code for the best-of-all individual (54 of run 6 generation 10) is: fload_0; fload_0; fmul; fload_0; fadd; fload_0; fmul; fload_0; fadd; fload_0; fmul; fload_0; fadd; freturn. 4

For comparison, the byte code of the longest and still successful individual (33 of run 8 generation 8) is: fload_0; fload_0; fmul; fload_0; fmul; fload_0; fdiv; fload_0; fmul; fload_0; fdiv; fload_0; fmul; fload_0; fdiv; fload_0; fadd; fload_0; fmul; fload_0; fdiv; fload_0; fmul; fload_0; fadd; fload_0; fmul; fload_0; fdiv; fload_0; fmul; fload_0; fdiv; fload_0; fmul; fload_0; fadd; freturn. While not as obvious, this is again an exact solution and upon factorization individual 33 is identical to 54. Of course individual 54 is more efficient than 33 and it is optimal with respect to its computations (i.e., 3 multiplications and 3 additions). In this experiment, the fload_0 operation (retrieve float from local variable zero) is the byte code used to support referencing the independent variable (x), and the freturn operation (return from method with a float result) is the footer. The best-of-all individual (54) can be interpreted as the following Java method: public static float f00054(float x) { return ((x * x + x) * x + x) * x + x); } 5.2 Classification In this problem bcgp predicts the class, sick (positive class) or not sick (negative class), of a thyroid patient based upon various patient features (e.g., age and thyroid levels). This experiment uses a public domain thyroid disease database [Blake, et al., 1998]. For comparison purposes, we also used an artificial neural network (NN) for the problem. Table 2 summarizes the training and test data sets used in the experiment. Cases with missing values are not used. We encoded feature values as integers. For example, we represent the gender feature with values of 'M' and 'F' as 1 and 0, respectively. The various measurement-related features, which have floating point values in the database, are linearly scaled into integers. We trained and tested two neural networks for this problem one without over fit protection while the other with it (NN+). Over fit protection is a technique that prevents a model from memorizing the training set so that it can generalize to unseen data. Table 3 summarizes the GP parameters for this problem. We summarize two experiments with bcgp. Table 4 presents the accuracy and error rate (percentages) for the first experiment. In the first experiment, the results reported for bcgp are for the best-of-all individual (22), which is found during run 0, generation 14, and has a size 36. The byte code for this individual (including instruction offsets) is: 0 iload 17 2 sipush 3493 5 if_icmpge 34 8 iload 17 10 sipush 10265 13 if_icmpeq 34 16 iload 17 18 sipush 3860 21 if_icmpgt 34 24 iload 5 26 sipush 1 29 if_icmpgt 34 32 iconst_1 33 ireturn 34 iconst_0 35 ireturn In the second experiment, bcgp includes adaptive parsimony pressure (APP). In the second experiment, the best individual (53) has the same predictive accuracy as in the first experiment, but it only has a size of 11. In the first experiment (without APP), the mean size for the best individual of each run is 26.4 while with APP the mean size of the best individual is 15.2. The byte code for individual 53 is: 0 iload 17 1 sipush 3331 4 if_icmpge 9 7 iconst_1 5

8 ireturn 9 iconst_0 10 ireturn Set Size Negative (%) Positive (%) Train 1946 1788 (91.9) 158 (8.1) Test 645 592 (91.8) 53 (8.2) Parameter Function set Value If_icmpeq, if_icmpne, if_icmplt, if_icmpgt, if_icmple, if_icmpge 21 features describing a case Terminal set Runs per experiment 10 Number generations per run 25 Population size 64 Max individual size (bytes) 64 Selection method Tournament (4) Probability of crossover 0.8 Probability of mutation 0.5 Probability of reproduction 0.2 Table 3 Classification Parameters Method Train Test bcgp 96.0/4.0 94.7/5.3 MLP 91.9/8.1 91.8/8.2 MLP+ 97.3/2.7 95.8/4.2 Table 4 Accuracy/Error % A classifier that learns must do better than naive prediction, which is just a matter of always picking the higher prevalence class. For this problem this is the negative class with the naive prediction being 91.9 and 91.8 training and test, respectively (Table 2). Both bcgp and NN+ do better than this. The neural network with over fit protection enabled does slightly better than bcgp. However, since bcgp is not trained with over-fit protection this is not a completely fair comparison. In this experiment, the iload operation (retrieve integer from local variable) is the byte code used to support referencing the independent variables (case features), sipush is used to support the creation of partially evolved feature values, and the last four byte codes (iconst_1, ireturn, iconst_0, ireturn) are the footer used to return a boolean as to the predicted classification of a case by the method. Individual 53 after un-scaling can be represented as the rule: IF T3 < 1.12248 THEN Class = Sick ELSE Class = Negative. T3 is a patient s triiodothyronine level with a normal value being in the range 1.2 to 2.8. 6. Conclusion In this paper we evolve Java byte code in a standard Java environment. Our technique generates in-memory class files to represent generations, with each class file containing methods that represent the individuals in the population for the given generation. The fitness of an individual is computed by directly executing the byte code contained in a method using the JVM in the bcgp host application. Other researchers are also independently exploring this idea. While the AIMGP system's performance evolving real machine code are encouraging, the disadvantage is that the GP system and its evolved solutions are platform dependent. The performance gain of low-level GP and the platform independence of JVM are, of course, motivations for Java related 6

research. From the Java GP perspective, bcgp, JBGP, and JAPHET all implement Java GP. However, they are all fairly different in their approaches and possibly in the problems they can solve. JBGP appears to be a reasonable approach to evolving byte code in general, when considering the difficulties of handling stack depths and branch addresses related to the stack based architecture of the JVM. The idea and motivation for this scheme are a key contribution to the concept of Java GP. However, nothing is without a cost. This scheme requires both a genotype and phenotype representation of an individual with data copying to transform one into the other. JAPHET takes a very different approach to the representation scheme than either bcgp or JBGP. While the generation per class file approach used by both bcgp and JBGP is limited to 2^16 individuals, JAPHET appears to have no such limits. Nevertheless, 2^16 individuals is more than enough capacity to handle most (if not all) the problems discussed in the literature. bcgp keeps the entire GP process in memory, including the representation of a generation as a Java class file. The references for JBGP and JAPHET never discussed the system s usage of memory versus the file system. Finally, one of the motivations for byte code GP is performance of low-level code as compared to a higher-level language approach. However, none of these systems (bcgp, JBGP, and JAPHET) have under gone this performance analysis yet. We have tested bcgp on two problems, which are representative of broad classes of practical applications: functional regression, and classification. Our system performed very efficiently and accurately on these problems. 7. Bibliography [Banzhaf, et al., 1998] Banzhaf, W., P. Nordin, R. Keller, and F. Francone (1998). Genetic Programming, An Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, San Francisco, CA. [Blake, et al., 1998] Blake, C., E. Keogh, and C. J. Merz (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/mlrepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. [Harvey, et al., 1998] Harvey, B. J. A. Foster, and D. Frincke (1998). Byte Code Genetic Programming. In: Late Breaking Papers at the Genetic Programming 1998 Conference, University of Wisconsin Madison. [Klanhold, et al., 1998] Klanhlold, S., S. Frank, R. Keller and W. Banzhaf (1998). Exploring the Possibilities and Restrictions of Genetic Programming in Java Bytecode. In: Late Breaking Papers at the Genetic Programming 1998 Conference, University of Wisconsin Madison. [Koza, 1992] Koza, J. R. (1992). Genetic Programming: On the Programming of Computers by Natural Selection. MIT Press, Cambridge, MA. [Lindholm and Yellin, 1996] Lindolm, T. and F. Yellin (1996). The Java Virtual Machine Specification. Addison- Wesley. [Lukschandl, et al., 1998] Lukschandl, E., M. Holmlund, E. Moden, M. Nordahl, and P. Nordin (1998). Induction of Java Bytecode with Genetic Programming. In: Late Breaking Papers at the Genetic Programming 1998 Conference, University of Wisconsin Madison. [McGhan and O'Connor, 1998] McGhan, H. and M. O'Connor. (1998). PicoJava: A Direct Execution Engine for Java ByteCode. IEEE Computer 31, 3 (October 1998). [Meyer and Dowing, 1997] Meyer, J. and T. Downing (1997). JAVA Virtual Machine. O Reilly & Associates, Sebastopol, CA. [Nordin, 1994] Nordin, P. (1994). A compiling genetic programming system that directly manipulates the machine code. In Kinnear, Jr., K. E., editor, Advances in Genetic Programming. MIT Press, Cambridge, MA. [Nordin and Banzhaf, 1995] Nordin, P. and W. Banzhaf (1995). Evolving Turing-complete programs for a register machine with self-modifying code. In Eshelman, L., editor, Genetic Algorithms: Proceedings of the Sixth International Conference, Pittsburgh, PA. Morgan Kaufmann, San Francisco, CA. 7

[Zhang and Muehlenbein, 1996] Zhang, B. and H. Muehlenbein (1996). Adaptive fitness functions for dynamic growing/pruning of program trees. In Angeline, P. J. and Kinnear, Jr., K. E., editors, Advances in Genetic Programming 2. MIT Press, Cambridge, MA. 8