Interaction of JVM with x86, Sparc and MIPS

Similar documents
CSc 453 Interpreters & Interpretation

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

VM instruction formats. Bytecode translator

Introduction to Java Programming

CHAPTER 5 A Closer Look at Instruction Set Architectures

Untyped Memory in the Java Virtual Machine

Introduction to Java. Lecture 1 COP 3252 Summer May 16, 2017

Java On Steroids: Sun s High-Performance Java Implementation. History

Compiling Techniques

Advanced Computer Architecture

Exploiting Hardware Resources: Register Assignment across Method Boundaries

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Hierarchical PLABs, CLABs, TLABs in Hotspot

Chapter 13 Reduced Instruction Set Computers

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

JOP: A Java Optimized Processor for Embedded Real-Time Systems. Martin Schoeberl

CS Computer Architecture

EITF20: Computer Architecture Part2.1.1: Instruction Set Architecture

Lecture 4: Instruction Set Architecture

For our next chapter, we will discuss the emulation process which is an integral part of virtual machines.

Real instruction set architectures. Part 2: a representative sample

Run-time Program Management. Hwansoo Han

CS252 Spring 2017 Graduate Computer Architecture. Lecture 18: Virtual Machines

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Practical Malware Analysis

Garbage Collected. Methods Area. Execution Engine. Program Counter. Heap. Operand. Optop. Execution Environment. Frame. Local Variables.

Lecture 1: Overview of Java

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Notes of the course - Advanced Programming. Barbara Russo

What do Compilers Produce?

Chapter 14 Performance and Processor Design

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Optimization Techniques

High-Level Language VMs

Executing Legacy Applications on a Java Operating System

1.3 Data processing; data storage; data movement; and control.

Chapter 5. A Closer Look at Instruction Set Architectures

Designing for Performance. Patrick Happ Raul Feitosa

A Study of Cache Performance in Java Virtual Machines

Java Performance Analysis for Scientific Computing

Lecture 4: RISC Computers

17. Instruction Sets: Characteristics and Functions

Assembly Language. Lecture 2 x86 Processor Architecture

Jupiter: A Modular and Extensible JVM

55:132/22C:160, HPCA Spring 2011

Approaches to Capturing Java Threads State

CHAPTER 5 A Closer Look at Instruction Set Architectures

The SURE Architecture

JamaicaVM Java for Embedded Realtime Systems

Sista: Improving Cog s JIT performance. Clément Béra

Jazelle ARM. By: Adrian Cretzu & Sabine Loebner

Chapter 3: Operating-System Structures

On the Design of the Local Variable Cache in a Hardware Translation-Based Java Virtual Machine

CS 252 Graduate Computer Architecture. Lecture 15: Virtual Machines

Introduction. CS 2210 Compiler Design Wonsun Ahn

Parallelism of Java Bytecode Programs and a Java ILP Processor Architecture

Java Internals. Frank Yellin Tim Lindholm JavaSoft

OPERATING SYSTEM. Chapter 4: Threads

COMPUTER ORGANIZATION & ARCHITECTURE

Computer Systems A Programmer s Perspective 1 (Beta Draft)

point in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd

REDUCED INSTRUCTION SET COMPUTERS (RISC)

Instruction-set Design Issues: what is the ML instruction format(s) ML instruction Opcode Dest. Operand Source Operand 1...

Topics Power tends to corrupt; absolute power corrupts absolutely. Computer Organization CS Data Representation

History of Compilers The term

Performance measurement. SMD149 - Operating Systems - Performance and processor design. Introduction. Important trends affecting performance issues

Java Jitters - The Effects of Java on Jitter in a Continuous Media Server

Last class: OS and Architecture. OS and Computer Architecture

Last class: OS and Architecture. Chapter 3: Operating-System Structures. OS and Computer Architecture. Common System Components

Chapter 4: Threads. Chapter 4: Threads. Overview Multicore Programming Multithreading Models Thread Libraries Implicit Threading Threading Issues

picojava I Java Processor Core DATA SHEET DESCRIPTION

Run time environment of a MIPS program

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

Lecture 4: MIPS Instruction Set

The von Neumann Architecture. IT 3123 Hardware and Software Concepts. The Instruction Cycle. Registers. LMC Executes a Store.

CS 152 Computer Architecture and Engineering. Lecture 22: Virtual Machines

An Overview of the BLITZ System

More on Conjunctive Selection Condition and Branch Prediction

OS and Computer Architecture. Chapter 3: Operating-System Structures. Common System Components. Process Management

Global Scheduler. Global Issue. Global Retire

Performance Profiling. Curtin University of Technology Department of Computing

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

CHAPTER 1 Introduction to Computers and Java

Computers in Engineering COMP 208. Computer Structure. Computer Architecture. Computer Structure Michael A. Hawker

History Introduction to Java Characteristics of Java Data types

Operating System: Chap2 OS Structure. National Tsing-Hua University 2016, Fall Semester

CHAPTER 5 A Closer Look at Instruction Set Architectures

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell

Chapter 3: Operating-System Structures

Hardware-Supported Pointer Detection for common Garbage Collections

Typical Processor Execution Cycle

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

COS 140: Foundations of Computer Science

Understand the factors involved in instruction set

Evolution of Virtual Machine Technologies for Portability and Application Capture. Bob Vandette Java Hotspot VM Engineering Sept 2004

Intermediate Representations

CPE300: Digital System Architecture and Design

Advanced Object-Oriented Programming Introduction to OOP and Java

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Transcription:

Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical Engineering University of Maryland Baltimore County 1 Hilltop Circle, Baltimore, MD 2125 ABSTRACT A Java class is a perfect example of architecture independent code. A Java program can be compiled on a MIPS R44 based Indy workstation running Irix 6.5 and the generated class executed on an Intel Pentium III based Windows 95 system with no problems. This independence is possible, because any system claiming to support Java implements a Java Virtual Machine (JVM). This paper presents a detailed analysis of the interaction of a JVM with three popular processor architectures the Intel x86, the Sun UltraSparc and the SGI MIPS. The analysis shows that each architecture performs better than the other two in some aspects, but no one architecture is the best for Java programs. 1 Introduction The Java Virtual Machine (JVM) is a powerful concept that allows platform and architecture independent program development. This independence is achieved chiefly by compiling a Java program into JVM instructions, called bytecodes, rather than native (of the underlying architecture) instruction code. The JVM executes these bytecodes by first translating them into native code and then executing the native code. It is possible to immediately ask certain questions regarding the native code generated and executed by the JVM. In this paper, we have asked the following questions and attempted to answer them through analysis of the generated data. Q1. What is the instruction mix for different native instruction classes (ALU, data transfers and control) on different architectures? Q2. What is the average native instruction length? Q3. What is the bytecode complexity (i.e., the number of native instructions generated per bytecode) on different architectures? Q4. Which architecture causes the overall executable native executable code size (in bytes) to be the largest? Q5. On a particular architecture, in which JVM mode (JIT or interpreter) do Java programs execute faster? The rest of this paper is organized as follows. Section 2 provides an overview of the JVM architecture. The components of the heart of the JVM the JIT and the Interpreter are described in section 3. In section 4, we discuss our performance metrics in detail. Section 5 contains details of the data generation process. Section 6 describes the analysis of the data generated. We discuss the results of our analysis in section 7. Conclusions of this work are drawn in section 8. The 1

graphs plotted for various data generated with respect to the five performance metrics are shown in section 1. 2 JVM Structure Class files, containing bytecodes and linkage information are assembled from a variety of sources and are executed on a host machine by the implementation of the JVM. The execution speed is increased by using a verifier that performs a static examination and does not consume many timeconsuming run time checks. Features of the Architecture The JVM is a stack-based machine that manipulates data represented by words. A JVM stack comprises a collection of frames, each associated with the execution of a single method. A frame consists of two components: a collection of local variables and an operand stack. Local variables are accessed directly by index. An operand stack contains a number of words that are accessed on a LIFO basis by the JVM bytecodes. In addition, the VM incorporates a heap that contains objects. The Java Virtual Machine (JVM) consists of two environments: Compiler Environment: A java source is compiled using the javac compiler in the optimizing mode to produce bytecodes in Java classes. Bytecodes are platform independent and a java program can be ported to different platforms by simply moving the classes. Figure 1: Structure of JVM 2

Run-time Environment: The run-time environment consists of the Class Loaders, Java Interpreter/JIT, Runtime System and interaction with the Operating System and the Hardware. Class Loaders: They enable the JVM to load classes without apriori knowledge about the underlying file system semantics, and they allow applications to dynamically load Java classes as extension modules. Java Interpreter/JIT: Java uses the interpreter or Just in Time Compiler. Runtime System: The runtime system communicates with the Operating System, which in turn interacts with the underlying hardware. 3 JIT and Interpreter 3. 1 JIT A Just-In-Time (JIT) Java compiler produces native code from Java byte code instructions during program execution. Compilation speed is more important in a Java JIT compiler requiring optimization algorithms to be lightweight and effective. The JIT consists of five major phases. The pre-pass phase performs a linear-time traversal of byte codes to collect information needed for global register allocation and for implementing garbage collection. The global register allocation phase assigns physical registers to local variables. The code generation phase generates instructions and performs optimizations like common sub-expression elimination, array bounds check elimination, frame pointer elimination etc. The code emission phase copies the generated code and data sections to their final locations in memory. The patching phase fixes up relocations in the code and data sections i.e. offsets of forward branches, addresses of code labels in switch table entries etc. With the exception of global register allocation phase, all phases are linear in time and space. 3.2 Interpreter Interpreters play a crucial role as binary emulators, enabling code to port directly from one architecture to another. The execution time of an interpreted program depends upon the number of commands interpreted and the time to decode and execute each command. The number of commands and execution time directly depends on the complexity of the virtual machine implemented [Romer]. The virtual machine defines a set of virtual commands, which provide a portable interface between the program and the processor. The implementer of the virtual machine executes one virtual command on each trip through the main interpreter loop. The interpreter hence incurs an overhead for fetching and decoding each virtual command before performing the work specified by the command. Hence, the execution time of an interpreted command depends on number of commands interpreted, fetching and decoding of each command and the actual time spent of execution of the operation specified by the command. 4 Metrics In order to study the interaction of the JVM with the underlying architectures, we selected the following metrics: 3

Instruction Count: This is the number of instructions of a particular category generated on a specific architecture. Instructions were categorized into different instruction classes such as ALU, Data transfer, Control and miscellaneous. Average Instruction Length: The instruction sets for the platforms were analyzed to obtain the size in bytes of the individual instruction. RISC platforms (Sun Sparc and MIPS) have a constant size of instruction length for all instructions; x86 has a variable instruction length. Bytecode Complexity: The bytecodes are translated into native instructions on each architecture. A particular bytecode may translate to different number and type of native instructions on different architectures. Hence, the complexity of translation of a particular bytecode to native instructions indicates, to a certain degree, the complexity of the JVM generating the native code on the particular platform. Native executable code size: Depending on the instructions generated for a particular architecture, for a particular program, the native executable code size was generated as the product of the number of instructions of a particular type and the average instruction length of that type of instruction. The native executable code size denotes the size of code generated in bytes on the three different architectures. This allows us to analyze possible effects of program size on the execution time of the JIT or Interpreter on different architecture. Execution time: In order to compare the JIT and interpreter, the execution time of programs in our test suite was determined. The source code of the programs was analyzed to understand the conditions under which JIT/Interpreter out performed each other. 5 Data Generation Kaffe Source Code Analysis: As a first step in obtaining the required data for our analysis, the current version of the Kaffe Virtual Machine [Wilkinson] was obtained. The source code of the Kaffe JIT engine and the Kaffe interpreter engine was analyzed to understand their functional aspects. This code analysis explained, to a certain extent, how the JIT and interpreter might perform bytecode optimizations. Test Programs: The next step was to obtain a set of real test programs that evaluated the JVM. The test programs in the regression test suite designed to test the Kaffe JVM were examined and a set of 25 programs was chosen. These programs test all the capabilities of the JVM and are not intended to be a set of benchmark programs. Important features of the JVM such as class loading, thread handling, garbage collection, exception handling, integer computations, floating point computations, loops, etc., are tested by the programs. These programs were used to perform experiments and compare the performance of the JVM on the three architectures, based on the results of the metrics. Software Tools for Data Generation: In order to evaluate JVM performance, the analysis of the native code generated on each architecture was necessary. A software tool called Toba [Toba], developed in the CS department at the University of Arizona was used for this purpose. Toba converts a java program or a java class file into C source file(s) or native code source file(s), as required. For the analysis, both C and native code source files were obtained. In order to install Toba, the JDK version is required to be 1.1.6 or greater. The Linux Redhat 6., SunOS 5.6 and Irix 6.5 versions of the JDK were installed on three systems based on the x86, Sparc and MIPS architectures, respectively. Toba was then built, installed and configured on each system. 4

6 Data Analysis Instruction Count (Frequency of instructions): The possible native instructions were determined and instruction counts were generated for each architecture by examining their instruction sets. Perl scripts were written to analyze and generate the instruction counts for individual native source files generated by Toba. Average Instruction Length:RISC machines consist of constant length instructions. The Sparc and MIPS, which represent the RISC machines, have a constant length of 4 bytes per instruction. CISC machines (e.g. x86) generate instructions with length varying from 1 to 6 bytes. The occurrence of instructions with 1 byte length is quite rare, in our test environment (32-bit mode on the Linux OS). Instructions with lengths 2, 4 and 6 were observed quite frequently. Bytecode Complexity: Optimized class files (generated by javac with the optimization flag) were used to generate bytecodes using javap. A java program consisting of a single occurrence of each bytecode in the program was used for analysis. The bytecodes generated were mapped manually to the native instructions generated on each architecture. This enabled us to calculate the number of native instructions generated for a particular bytecode on different architectures. Native executable code size: The total contribution of all native instructions to the overall executable code is defined as the native executable code size of the program. Execution time: The optimized class files were executed on each platform, in JIT mode. The execution time was computed as an average of ten runs of each program. The programs were then executed using the interpreter. The interpreter execution times were also computed as an average of ten runs of each program. Similar tests were carried out on all the platforms. We used the java tool from the JDK version 1.2 on each platform using our test programs as inputs. We used the UNIX time command to generate the overall execution time data per program in JIT and interpreter modes. 7 Results Frequency of instructions: Instructions were grouped into Data transfer, ALU, Control or miscellaneous categories. Frequency of instructions was generated on calculating the fraction of the class of the instruction over the total instruction count of the program. Graphs 1,2 and 3 show the following information. x86 MIPS Sparc Data Transfers 6% 75% 2% ALU 2% 1% 45% Control 2% 15% 3% Sparc also contains 5% of miscellaneous instructions, which do not fall in either category. Average Instruction Length: As noted above average instruction for Sparc and MIPS architectures is 4 bytes. The average instruction length for x86 was calculated as follows: the frequency of instructions for each type of instruction and the average instruction length for that type was used. The contribution of each particular type of instruction to the overall native executable code size was determined. Various contributions were then added and average of the sum over the total 5

number of instructions in the program was calculated. The average instruction length for x86 was thus determined to be 4.14 bytes/instruction. Bytecode Complexity: Graphs 7 and 8 represent the distribution of native instructions for a Java program, which consists of one instance of every bytecode of the particular category. We observe that MIPS has the highest number of native instructions generated for data transfers as well as Control instructions. Also, MIPS generates the least number of instructions for ALU bytecodes whereas x86 and Sparc generate nearly the same number of native instructions. Sparc generates the maximum number of instructions for miscellaneous bytecodes. As can be observed from the table, x86 (CISC) generates the least number of instructions while the MIPS generates the most number of native instructions for a particular program. x86 MIPS Sparc Data Transfers 119 186 166 ALU 1 86 1 Control 54 71 54 Miscellaneous 76 87 93 Total 349 43 413 Native executable code size: From Figure 2, it is quite evident that the MIPS generates the maximum native executable code size for a particular program when compared to the same program on x86 and Sparc. Between x86 and Sparc, x86 (CISC) has a greater native executable code size than the Sparc. 12 1 Bytes 8 6 4 2 x86 MIPS Sun Figure 2: Native executable code size Execution time: From Graph 4, it is quite evident that the Interpreter on x86 outperforms the JIT on majority of the occasions. Graphs 5 and 6 show that the JIT outperforms the Interpreter on more occasions on the MIPS and Sparc than on the x86. JIT is slower in programs, which include creation of object/array (the use of new operator), synchronous calls to classes like I/O and also include more inline code. JIT is faster in programs, which include loop overhead and increment and assignment operations. Similar observations were made when the source code of the programs was investigated to obtain the reason as to why the interpreter and the JIT outperformed each other. 8 Conclusions 6

We analyzed the interaction of JVM with underlying architectures and arrived at the following conclusions. Data transfer performance is critical to MIPS & x86. Hence, it is very important to improve the performance of these units to improve execution time on these architectures. Branch performance affects execution time on Sparc the most, since a major percentage of translated instructions fall into the category of control instructions. ALU intensive programs execute faster on MIPS, since the bytecode translation for ALU instructions generates the least number of native instructions. Hence, ALU intensive java applications would run best on the MIPS, though the overall bytecode complexity for the MIPS is the highest. Hence, programs having a proportionate mix of instructions would be slower on MIPS than on the other platforms. Interpreter is faster than JIT in many cases. The x86 JIT behaves quite differently from the other JITs and hence indicates that it does not perform equivalent optimizations on the x86 than the other platforms. 9 References [Krall] Krall, A., et al., CACAO - A 64 bit JavaVM Just-In-Time Compiler. In Proc. ACM PPoPP'97 Workshop on Java for Science and Engineering Computation. http://www.complang.tuwien.ac.at/andi/javaws.ps [Lindholm] Lindholm, T. and Yellin, F., The JavaTM Virtual Machine Specification, Second Edition, http://java.sun.com/docs/books/vmspec [Romer] Romer, T., et al. The Structure and Performance of Interpreters. In Proc. Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (Cambridge, Massachusetts), ACM, 1996. [Tabatabai] Adi-Tabatabai, A., et al. Fast, Effective Code Generation in a Just-In-Time Compiler. In Proceedings SIGPLAN 98 Montreal, Canada. [Transvirtual] http://www.transvirtual.com/products [Toba] http://www.cs.arizona.edu/sumatra/toba/ [Yelland] Yelland, P. A Compositional Account of the Java Virtual Machine. In Proc. 26th Annual Symposium on Principles of Programming Languages (San Antonio, Texas), ACM, 1999. 7

1 Appendix %Frequency 1% 8% 6% 4% 2% % Control% ALU % Transfer % 1% 9% 8% 7% 6% 5% 4% 3% 2% 1% %Control %ALU %Transfers Programs % Programs Figure 1. Instruction counts on x86 Figure 2. Instruction counts on MIPS 1.6 1% 1.4 8% 6% 4% 2% % %Frequency Misc% ALU% Control% Data Transfers 1.2 1.8.6.4.2 Interpreter JIT Figure 3. Instruction counts on Sparc Figure 4. Execution time on x86 8 3.5 7 3 6 2.5 5 4 3 Interpreter JIT 2 1.5 Interpreter JIT 2 1 1.5 Figure 5. Execution time on MIPS Figure 6. Execution time on Sparc 8

ALU Transfers 1 2 95 15 9 85 8 1 1 86 1 5 119 166 186 75 Control Misc 8 1 6 8 4 54 54 71 6 4 76 93 87 2 2 Figure 7.Bytecode Distribution by class on x86,sparc and MIPS 5 4 3 2 1 349 413 43 Figure 8. Total Bytecode Distribution on x86,sparc and MIPS 9