INVESTIGATING ANDROID BYTECODE EXECUTION ON JAVA VIRTUAL MACHINES

Size: px
Start display at page:

Download "INVESTIGATING ANDROID BYTECODE EXECUTION ON JAVA VIRTUAL MACHINES"

Transcription

1 INVESTIGATING ANDROID BYTECODE EXECUTION ON JAVA VIRTUAL MACHINES A DISSERTATION SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF MASTER OF SCIENCE IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES 2016 By Salim Shaaban Salim School of Computer Science

2 Table of Contents List of Figures... 5 List of Tables... 6 Abstract... 7 Declaration... 8 Intellectual Property Statement... 9 Acknowledgements Chapter 1. Introduction Project Context Project Motivation Aims and Objectives Project Deliverables Thesis Structure Chapter 2. Background Introduction The Android Runtime and Dalvik VM Android Bytecode Dalvik Executable (DEX) File Format Dalvik Bytecode Instruction Set Android Bytecode vs Java Bytecode Stack-Based Virtual Machines Register-Based Virtual Machines The Truffle Framework Self-Optimising Interpreter Partial Evaluation Type System and Domain-Specific Language Languages Interoperability (Polyglot) The Graal Compiler Speculative Optimisation Graal API and Dynamic Class Loader

3 Graph-Based Intermediate Representation Scanning, Parsing and AST Interpretation with the Truffle Framework Related Work Redex Bytecode Optimiser LLVM IR Interpreter (Sulong) Summary Chapter 3. Methodology and System Design Introduction Methodology Research and Familiarisation with the Truffle Framework Understanding the Dalvik Executable Format (DEX) Building DEX AST Interpreter with the Truffle Framework Benchmarking, Optimisation and Evaluation Development Environment System Design System Architecture Program Execution Design Design Assumptions Summary Chapter 4. Project Implementation Introduction Bytecode Reader DEX Parser AST Interpreter Nodes Implementation Data Types Implementation Statements and Expressions Implementation Variables and Data Access Implementation Control Flow Implementation Arrays and Object Structures Implementation Methods and Classes Implementation Java API Calls Unimplemented Features of the Android Platform

4 4.6.1 Incomplete Instruction Set Implementation Android Native Code Applications with User Interface Hardware Interaction Summary Chapter 5. Testing and Evaluation Introduction Achievements Existing Android Java Benchmarks Test Suites and Benchmarks Experimental Setup Java SciMark Embedded CaffeineMark Test Suite Summary Chapter 6. Conclusion and Future Work Summary Project Findings Limitations Future Work References Appendix A. Android Instructions (Opcodes) Implemented Appendix B. TruffleDEX Test Cases Appendix C. Generating DEX Files from Java Classes Word Count:

5 List of Figures Figure 1-1: Work flow of the steps involved in this project Figure 2-1: Dalvik and ART architecture (Image Source: [14]) Figure 2-2: Bytecode representation of a simple example in hexadecimal form Figure 2-3: A simple addition example in Java Figure 2-4: Simple addition representation for stack-based machine Figure 2-5: Simple addition representation for register-based machine Figure 2-6: Implementation of addition operator for different data types Figure 2-7: Node rewriting and partial evaluation to produce machine code (Image Source: [4]) Figure 2-8: The type system for the AST interpreter Figure 2-9: Detailed system structure of the Truffle implementation (Image source: [4] ) Figure 2-10: LLVM IR representation of simple addition Figure 3-1: The architecture of the project Figure 4-1: Execution context of the program Figure 4-2: Part of the root node class Figure 4-3: Example of inner class implementation for related opcodes Figure 4-4: Lexical scope implementation Figure 4-5: Parsing and setting initial call target Figure 4-6: Node classes definitions for top classes of node hierarchy Figure 4-7: Implementation of integer literal node class Figure 4-8: Multiplication operation implementation Figure 4-9: Implementation of byte literal local write Figure 4-10: Implementation of byte local variable read Figure 4-11: Implementation of argument values read Figure 4-12: Implementation of cached property read Figure 4-13: A loop with jumps that goes outside its block (in mnemonic code) Figure 4-14: Implementation of the jump exception Figure 4-15: Implementation of the if node which throws a control flow exception Figure 4-16: Implementation of the handling of the control flow exception for jumps Figure 4-17: Implementation of the method Figure 4-18: Method dispatcher implementation with cache Figure 4-19: Java API call implementation using reflection Figure 5-1: SciMark 2.0 Benchmark results Figure 5-2: Embedded CaffeineMark 3.0 Benchmark results Figure 5-3: TruffleDEX suite benchmark results

6 List of Tables Table 2-1: Some android bytecode opcodes and their description Table 4-1: Unimplemented opcodes Table 5-1: List of custom test suites and their exercising features Table 5-2: SciMark 2.0 benchmarks description Table 5-3: SciMark 2.0 statistical summary of results Table 5-4: Embedded CaffeineMark 3.0 benchmarks description Table 5-5: Embedded CaffeineMark 3.0 statistical summary of results Table 5-6: TruffleDEX suite test cases description Table 5-7: TruffleDEX suite statistical summary of results

7 Abstract Handheld devices such as smartphones and tablets have emerged to be common among users interacting with different applications. Android is one of the operating systems used on those devices. One of the key component for running Android applications is the Android runtime, which hosts applications and executes bytecodes. The speed of execution provided by the runtime influences the experiences users get when using these applications. This thesis presents the TruffleDEX project which aims to explore how Android bytecodes can be parsed, interpreted and executed on Java Virtual Machines (JVMs). The TruffleDEX project is implemented using the Truffle framework which provides novel methods and techniques for implementing languages targeting JVMs. The thesis describes the design and implementation of TruffleDEX. It explains how the TruffleDEX implements data types, data structures and other instructions of the Android bytecode instruction set. The thesis also elaborates on the techniques developed to support the unstructured flow of execution and jumping behaviour of bytecodes. Additionally, the thesis presents an evaluation of the implementation and performance benchmarks. The evaluation concludes on the possibility of hosting Android applications on JVMs. Furthermore, the results of the evaluation and benchmarks presented in the thesis provide insights on the possibility of providing a better performing Android execution environment. The thesis also identifies challenges facing Truffle implementations of bytecodes and unstructured execution. 7

8 Declaration No portion of the work referred to in the dissertation has been submitted in support of an application for another degree or qualification of this or any other university or other institute of learning. 8

9 Intellectual Property Statement i. The author of this dissertation (including any appendices and/or schedules to this dissertation) owns certain copyright or related rights in it (the Copyright ) and s/he has given The University of Manchester certain rights to use such Copyright, including for administrative purposes. ii. iii. iv. Copies of this dissertation, either in full or in extracts and whether in hard or electronic copy, may be made only in accordance with the Copyright, Designs and Patents Act 1988 (as amended) and regulations issued under it or, where appropriate, in accordance with licensing agreements which the University has entered into. This page must form part of any such copies made. The ownership of certain Copyright, patents, designs, trademarks and other intellectual property (the Intellectual Property ) and any reproductions of copyright works in the dissertation, for example graphs and tables ( Reproductions ), which may be described in this dissertation, may not be owned by the author and may be owned by third parties. Such Intellectual Property and Reproductions cannot and must not be made available for use without the prior written permission of the owner(s) of the relevant Intellectual Property and/or Reproductions. Further information on the conditions under which disclosure, publication and commercialisation of this dissertation, the Copyright and any Intellectual Property and/or Reproductions described in it may take place is available in the University IP Policy (see in any relevant Dissertation restriction declarations deposited in the University Library, and The University Library s regulations (see 9

10 Acknowledgements I would like to express my sincere gratitude to my supervisor Dr. Mikel Luján for his invaluable guidance, support, inspiration, understanding and most importantly, his engagement throughout the learning process of this master s project. His advices helped me gain an in-depth understanding of the subject and be able to produce the project discussed in this thesis. I would also like to express my gratitude to the Commonwealth Scholarship Commission in the UK and the Association of Commonwealth Universities for their financial support which provided me with this opportunity to pursue master studies in the UK. Furthermore, I would like to thank Dr. Bijan Parsia for his advice and guidance during the project selection process and generally throughout the year. Many thanks also to the Advance Processor Group members who have supported me throughout the project time. Last but not least, I would like to thank my family and friends for their love and support. Special thanks to my parents, sister and brothers for keeping me harmonious. Without their love this work would not have been possible. 10

11 Chapter 1. Introduction 1.1 Project Context Android is one of the most popular platforms for handheld devices such as smartphones and tablets [1]. Most developers use the Java programming language to write Android applications. In the Android Standard Development Kit (SDK), libraries and frameworks provide Application Programming Interfaces (APIs) that can be accessed by Java applications when writing source code. Although Android applications are developed in Java, they are not run on standard Java Virtual Machines (JVMs) during execution. Rather, the Android operating system contains the Android Runtime (ART) that executes these applications [2]. Prior to ART and Android 4.4 (KitKat), the Dalvik Virtual Machine was the process execution virtual machine for Android applications [2]. The increased usage of Android applications in wearable devices and embedded systems have motivated the need for high performing applications under limited resources available on such environments [3]. This has led to some companies, such as Google and Facebook, to develop tools that produce optimized and better performing applications for mobile platforms. Although, apart from generating optimized code, the performance of an application also depends on the runtime environment or Virtual Machine (VM) that host that application. Developing high performing runtime environments and VMs is a complex process [4]. Many languages developed to run on these managed runtime environments, such as Python and Ruby has shown poor performance compared to unmanaged languages [5]. A JVM, among others, is one of the best performing virtual machine and the most popular target for new languages implementation [6]. Due to this reason and the availability of the Open Java Development Kit (OpenJDK) project, which provides a free and open source implementation of the Java Platform, JVMs have been growing to support many languages other than Java [7]. As a result, language developers and designers can (re)implement their languages to specifically benefit from the performance, runtime environment and other features available in the Java platform. To benefit from these improvements in the Java platform, this MSc thesis has investigated methods to execute and host Android applications on JVMs. The project has also investigated the performance benefits provided by JVMs compared to the default environments available in the Android system. 11

12 1.2 Project Motivation To support and simplify the implementation of languages that are targeting JVMs, Oracle Inc. has developed a framework, called Truffle, to help developers design and implement languages using the Java programming language [8]. The company also provides the Graal VM, which is a special version of JVMs that contains a dynamic optimizing compiler, called Graal compiler, and provides APIs for interacting with this compiler. Together, the Truffle and Graal frameworks are used by languages developers that are targeting the JVM and would prefer to benefit from a high performing compiler provided by the Graal VM [9]. Many languages that have been re-implemented using the Truffle and Graal frameworks have shown improved performances compared to their original implementation. The evidence of this improvement in performance can be clearly seen in the case of the JavaScript implementation on Truffle. The results obtained from different benchmarks [10] have shown the Truffle implementation of JavaScript performing better or closely to the high performing JavaScript engines such as the V8 [11], which is used in Google Chrome, and the SpiderMonkey [12] which is used in Mozilla Firefox. Despite the current good performance provided by the Android system, there are still room for improvements. The performance benefit provided by the Truffle framework and the Graal VM motivated this investigation of benefits in performance for Android applications when run on the JVM. 1.3 Aims and Objectives The main aim of this thesis is to investigate the execution of Android applications on the JVM by parsing and running Android bytecodes in the Dalivik Executable (DEX) and Executable and Linkable Format (ELF) files using the Truffle and Graal frameworks. Several specific objectives are associated with this project to produce a fully Abstract Syntax Tree (AST) interpreter for the DEX format. The AST interpreter developed, called TruffleDEX [13], is then run on the Graal VM (or any other JVM) to produce execution environment for Android applications. These objectives are as follows: Studying and understanding the Truffle framework for programming languages implementation. This helped to familiarize with the framework specific implementation features and constructs that are used when developing an AST interpreter. 12

13 Studying and understanding the DEX format instructions and their implementation details. The DEX format contains bytecode instructions that need to be parsed in order to produce an AST. Understanding these instructions enabled the development of the parser and the AST interpreter. Using the Truffle framework to develop and link together parser and AST interpreter for the android bytecode. As a result, Android DEX files can be hosted by the JVM. Generating test suite for sample Android applications. The test suite will be used to produce benchmark results. This stage involved the conversion of Java-based benchmark test suites into Android DEX files and also organising other Android specific test suites. Analysing and evaluating benchmark results by identifying ways to improve android code execution on the JVM. The Benchmark evaluation helped identifying possible improvements that can be applied to the AST interpreter in order to produce optimized code for scenarios where the JVM hosted code run slower that the Android runtime hosted one. 13

14 To accomplish these objectives, this project was carried out by following steps as shown in a work flow diagram in Figure 1-1. Figure 1-1: Work flow of the steps involved in this project. 1.4 Project Deliverables The deliverables of this thesis are the following: A DEX AST interpreter, called TruffleDEX, developed with the Truffle framework for the Dalvik Executable Format (DEX) of the Android 6 API level 23. A test suite with sample Android applications that can be executed with the DEX AST interpreter. 14

15 A report document on design and implementation describing suggested methods for improving execution time as identified by the project to produce better AST interpreters for Android bytecode. 1.5 Thesis Structure In addition to the introduction chapter, this thesis includes five more chapters. Chapter 2 provides background information on bytecodes and how Android bytecode instructions are organised and executed by their environment. Furthermore, a discussion on the different kinds of VMs and the difference between Android and Java bytecode is provided. At the end of Chapter 2, a literature review on the Graal and Truffle frameworks is discussed in detail and closely related work is presented. Chapter 3 discusses the methodology used for development and the environment used during the process. The chapter also outlines the design and architecture of the system developed while focusing on elaborating the design choices for parser and interpreter. In Chapter 4, an implementation of the TruffleDEX is discussed. The chapter elaborates development choices made when working on different parts of the system. The chapter also discusses implementation constraints that were solved and how they were solved as well as those that remained unsolved. For unsolved issues, this chapter discusses possible ways that could be used but were never achieved in this project. Testing and evaluation are discussed in Chapter 5 of the thesis. The chapter discusses benchmarks that were used to test and evaluate the project and their results are presented. At the end, benchmark results and analysed are also discussed. Finally, the Chapter 6 concluded the thesis by discussing conclusion remarks and limitations of the project. Furthermore, suggested future work is also presented. 15

16 Chapter 2. Background 2.1 Introduction This chapter provides a background research about the Android Runtime, Dalvik VM, bytecode and how Android bytecode instructions are organized and executed by the runtime system. Section 2.3 to 2.5 discuss bytecodes and Android DEX format in details while Section 2.6 compares the differences between Android and Java bytecode formats. Furthermore, Section 2.7 and 2.8 discuss different kinds of VMs and how bytecode is presented in such VMs. From Section 2.9 to 2.11, a literature review on the Graal and Truffle frameworks are discussed in detail and a discussion on how a Truffle AST interpreter is developed is presented. Finally, Section 2.12 discusses closely related work. 2.2 The Android Runtime and Dalvik VM The Android Runtime (ART) is a managed application runtime environment used by the android operating system [2]. Both ART and its predecessor, Dalvik VM, execute the Dalvik Executable (DEX) files. In ART, DEX files are compiled into native machine code during installation using Ahead-Of-Time (AOT) compilation [2]. The result of the AOT compilation is the Executable and Linkable Files (ELF) which contain both DEX and native code. The Dalvik virtual machine, which is the predecessor of ART, optimizes DEX file during installation to produce optimized DEX files (ODEX), but no AOT compilation is done at this stage. Latter during execution, ODEX files are compiled using Just-In-Time (JIT) compilation [2]. Figure 2-1 illustrates the ART and Dalvik architecture as part of the Android system. 16

17 Figure 2-1: Dalvik and ART architecture (Image Source: [14]). 2.3 Android Bytecode Bytecodes are intermediate representations that act as machine language for a specific VM. Each instruction in bytecode is formed by an opcode, which indicates the action of the instruction, followed by zero or more operands. Operands normally hold information about registers or literal values to compute. Bytecodes are represented in hexadecimal format but each opcode normally has a mnemonic code in the form of assembly for easy understanding. Table 2-1 lists some of the most common opcodes of Android and their description [15]. Opcode Mnemonic Description (Hexadecimal ) Code 0E return-void Return the control from a method without a return value. 0F return vx Return with a return value in register vx. 17

18 10 return-wide vx Return with a value that requires more than one register and store the result at registers starting at vx. This opcode is used to return long and double values. 11 return-object vx Return an object reference at vx. 13 const/16 vx, lit16 Store a 16-bit constant literal (lit16) in a register vx. 90 add-int vx, vy, vz Compute vy + vz register values and store the result in vx. 32 if-eq vx, vy, If value at vx is equals to the vy, then move to a target. A target target is an instruction for the execution to move to. Table 2-1: Some android bytecode opcodes and their description. 2.4 Dalvik Executable (DEX) File Format The Android DEX file uses bytecode to store its instructions, which is different from bytecode executed by the Java VMs. Each DEX file contains a header which, among other things, contains a special list of bytes that must appear at the beginning of each DEX file [16]. This list of bytes (also known as "file magic") is used to identify the Android version used by the file and detect corrupted files. Figure 2-2 shows a simple "Hello World" example and its respective Android bytecode in hexadecimal format as display by the Bytecode Viewer (Version 2.9.8) program. The right hand side panel displays bytecode instructions from the DEX file while the left hand side panel shows the respective Java source. Figure 2-2: Bytecode representation of a simple example in hexadecimal form. The header is then followed by a list of string identifiers containing all strings referred by the code. Then a list of type identifiers such as classes, arrays and primitive types is listed followed by a list of method prototypes used in the source code. The file also contains a list of fields, method identifiers and class definitions referred or defined by the file source code. 18

19 The bytecode in DEX file also contains all supporting data associated with classes, methods and other identifiers specified in other lists mentioned in the file. When a method is called, the bytecode instructions associated with it are loaded for execution. 2.5 Dalvik Bytecode Instruction Set The instruction set of the Android bytecode consists of 218 instructions that are encountered in the normal flow of execution [15]. These instructions build the execution body of methods and classes in the DEX file. When executing, most of the instructions read and/or write data from and to registers which are identified using numbers. Each method contains a frame with a specific predefined number of registers which it requires to run. These registers are 32-bits in size, and a pair of adjacent registers is used if an instruction requires to store a 64-bits value. Such instructions that operates on 64-bits values are identified by a -wide suffix. Some other instructions operate on specific types; such instructions are also identified with a suffix telling which type they operate on. For example, an addition operator has different instructions identified by the type they operate on such as add-int, add-double, add-float and addlong. The use of type specific operators reduces the need to include type checking and type conversion instructions each time a value is read. This means fewer instructions are needed for such operations. Some other instructions operate on the same types and register size but provide different options when executing. Such instructions are differentiated using a forward slash ( / ) followed by suffix indicating the difference. For instance, method calling is done using invoke operator. The operator specifies the kind of invocation such as invoke-virtual, which is used to invoke a normal, non-static method. The invocation also lists registers that their values should be used for arguments to a called method. But some methods can be called with flexible number of registers that can only be known at runtime (array arguments for example). These methods are then called using invoke-virtual/range instruction, which instead of listing which exact registers to be used as parameters, tells a starting register and how many registers to pass as parameters. Here the operation performed by invoke-virtual and invokevirtual/range is the same, the only difference is the options for which registers to use as parameters [15]. Most of these regular instructions are independent of each other. This means they can appear in any order depending on how the program is developed. Nevertheless, some opcodes are 19

20 strictly required to appear immediately after other specific opcodes and their appearance anywhere else is invalid [15]. Example of such instructions are move-result opcodes, which can only appear after method invocation opcodes or a filled-new-array opcode. This is because the move-result opcodes main duty is to move the result returned by the previous instruction to a specific register. Other than the regular execution instructions, there are other pseudo-instructions that are part of the Dalvik instruction set. These instructions are not part of the method execution block but are referred by the executing instructions. The pseudo-instructions are used as data payload for arrays and lookup tables for switches. A switch instruction in the method body will contain a reference (offset) to the switch-payload operator that contains a lookup table with information about where to go for different values. The instruction and pseudo-instructions build the Dalvik instruction set that each Android bytecode file contains. Some of these instructions are listed in Table 2-1, while a full list of instructions that were implemented in this project is provided in Appendix A. 2.6 Android Bytecode vs Java Bytecode Bytecode used in the DEX files is set to be executed by register-based machines [15]. The instructions refer to registers when dealing with variables and objects reading and writing. This is different from Java bytecode where variables are listed in a variable list and pushed to the variable space (in the form of stack) when needed to be used. Since JVMs are stack-based machines [7], Java bytecode normally requires extra opcodes to transfer data to and from the stack. These opcode instructions are not available in the Android bytecode and thus, among other reasons, resulted into Android bytecode not being able to be hosted directly by standard JVMs. Other than the difference in bytecode instructions, many improvements have been done to improve performance and efficiency of DEX for the Android system. Firstly, all classes in the same application are compiled into the same DEX file, which is different from individual.class files in Java. Secondly, in DEX files, 32-bit signed and unsigned quantities are encoded using LEB128 (Little Endian Base 128) to reduce byte consumption [16]. Furthermore, DEX files benefit from fewer instructions and as a result a smaller number of bytes is required to store a similar piece of code compared to Java. These improvements reduce the size of an actual program and in return less memory is used by an application. 20

21 2.7 Stack-Based Virtual Machines A stack-based virtual machine implements (emulates) a real machine by storing computational results and other data in a form of stack (Last-In First-Out data structure) instead of individual registers available in a real machine. Stack-based machines contain a stack where frames are pushed into each time a method is called. When a method finishes executing, and it returns the control to its caller method, the frame is popped from the stack [17]. Each frame contains information about its method, local variables space and instructions stack (operand stack). Computation is done by pushing and popping instructions to and from the instruction stack and the result of those computations is stored in the variable space as needed. int num1 = 15; int num2 = 30; int sum = num1 + num2; Figure 2-3: A simple addition example in Java. When targeting a stack-based machine, a simple addition example in Java as shown in Figure 2-3 can be represented in JVM bytecode (using mnemonic) as shown in Figure 2-4. As elaborated by the comments provided, to store data into a variable, it has to be pushed and then popped from the stack. Most of the opcodes are also prefixed by a data type indicator like i for int and l for long. This reduces the need for extra type checking and conversion instructions bipush 15 // Push int constant into stack. istore_1 // pop int from stack to local variable position 1 bipush 30 // Push int constant into stack. istore_2 //pop int from stack to local variable position 2 iload_1 //push int from local variable at position 1 into the stack iload_2 //push int from local variable at position 2 into the stack iadd //add two integers istore_3 //pop the result iadd into a local variable Figure 2-4: Simple addition representation for stack-based machine. in each load or store operation as each opcode expects its own data type [7]. For this reason, JVM has different opcodes of the same operation only differ in data types they are operating on. As an example, popping data from a stack to a local variable can be performed by istore, lstore, dstore or others depending on the type [7]. 2.8 Register-Based Virtual Machines Register-based virtual machines emulates a real machine using registers to store data. In register-based machines, method frames contain registers as needed by the method. These 21

22 registers include special purposes registers such as Program Counter (PC) and general purposes registers which are used for storing method data and local variables [17]. The example from Figure 2-3 could be represented in Android bytecode as shown in Figure 2-5 if the target machine is register-based machine. const/16 v0, 0x0f const/16 v1, 0x1e add-int v3, v0, v1 Figure 2-5: Simple addition representation for register-based machine. As shown in Figure 2-5, all instructions related to pushing to and popping from the stack are not available in register-based instructions. Instead these instructions use register numbers (v0, v1 and so on) to tell where the result of the instruction should be stored. 2.9 The Truffle Framework The Truffle framework was developed by Oracle to support and simplify the development of languages that target JVMs [9]. The Truffle framework provides an API for implementing AST interpreters for guest languages. This project uses Truffle to implement AST interpreter for the Android bytecode. The Truffle framework provides many features to a language developer, some of them are described in the following sub-sections Self-Optimising Interpreter Together with other features such as frames and variable handling, the Truffle framework provides optimization process where nodes have the ability to replace themselves [4]. Some guest languages are dynamic typed, which means the type of the variable is only known at runtime after being assigned a value. The Truffle framework allows the syntax tree to be modified and change node types to a more specialized type during interpretation. This allows a node to modify itself to a more specialized execution implementation related to its state at runtime. To understand type specialization, take an example the addition operator (+ sign). Addition can be applied to numeric types such as integer, byte and double as well as string concatenation. If the left and right operands of an addition expression are simple integers, a simple addition can be carried out and return back an integer. On the other hand, if this value happens to cause an overflow and does not fit any more on an integer, the addition node can be rewritten to be done by long or BigInteger addition. The node rewrite would also occur if a long variable carries 22

23 a very small value that can be done by integer or byte and save memory needed to store long variable. The implementation of such addition node in Truffle will look as shown in Figure 2-6. The annotations inform the Truffle framework whether the implementation is specialized to a specific type or is a generic and should be used only for types that did not match other specialization provided in this = "+") public abstract class DexAddNode extends DexBinaryNode = ArithmeticException.class) protected long add(long left, long right) { long result = ExactMath.addExact(left, right); return = ArithmeticException.class) protected int add(int left, int right) { int result = ExactMath.addExact(left, right); return result; = "isstring(left, protected String add(object left, Object right) { return left.tostring() + right.tostring(); Figure 2-6: Implementation of addition operator for different data types. Figure 2-7 shows how node rewriting during interpretation can rewrite a node to a specialized type (Integer in this case). Node rewriting uses frequency information and profiling feedback to decide for the best specialization of the node. The rewriting is very helpful when implementing dynamic languages where variables are allowed to hold values of different types and will change their types dynamically depending on which value they are currently holding. Figure 2-7: Node rewriting and partial evaluation to produce machine code (Image Source: [4]). 23

24 2.9.2 Partial Evaluation Partial evaluation is used to specialize programs based on the current know input for the purpose of producing compiled code for only stable and hot code [4]. During this evaluation, generated code is attached with specialization information and information on the state of the input. Input can be known during evaluation time based on profiling information or other means, on which case the state of the input is marked as static. On the other hand, if the input is not yet known, it will be decided dynamically at runtime [4]. With partial evaluation, branches and blocks of code which are unlikely to execute are eliminated in the compiled code. As a result, unnecessary memory access operations and condition checking are eliminated and thus produces optimized code [10]. In a case where the omitted code is needed, the execution is transferred back to the interpreter and continue running the part of the code that was not compiled earlier Type System and Domain-Specific Language To eliminate the need for developers to manually write type checking and verification code for each type available in the guest language, Truffle provides a Domain-Specific Language (DSL) that can be used to generated specified code for each guest type [18]. Developers are responsible to write their type system and identify all types by mapping them to Java primitive types or the Object class. Then developers implement an expression node for each type so Truffle will understand how to execute each short.class, long.class, int.class, long.class, BigInteger.class, boolean.class, String.class, DLFunction.class, DLClass.class, SLNull.class) public abstract class DexTypes { DSL type system uses annotation processor to generate source code needed for type checking and conversion. Figure 2-8 illustrates how type mapping is done using Figure 2-8: The type system for the AST interpreter Languages Interoperability (Polyglot) The truffle framework provides a Polyglot engine that allows languages developers to register their languages so they can be accessed by other Truffle guest languages [19]. This allows users of such languages to be able to combine different languages when writing source code. As a result, developers (end users of the language) can choose a suitable language for a given problem without the need to change the programming language for the whole project [19]. 24

25 On the other hand, languages developers can also benefit from polyglot services of the framework. Languages can share implementations of features and break the boundary of limitations those languages have [4] The Graal Compiler The implementation of the AST interpreter on Truffle can be hosted on any JVM since it is in pure Java. Nevertheless, such JVMs do not provide the API to their dynamic compilers to support features such as custom compilation, static analysis or partial evaluation. To support these features, Truffle projects are normally hosted on a Graal VM. This is a modified version of standard JVM that uses the Graal compiler as its dynamic compiler and provides API to access such compiler [20]. The Graal compiler is a high performing Just-in-Time (JIT) optimizing compiler developed in Java for the Java hotspot VM. The following sub-sections provides an overview of some of the features available when working with the Graal compiler Speculative Optimisation Speculative optimisation is done by aggressively optimise a piece of code depending only on assumptions and profiling information collected by the VM [20]. One such assumption is if a class is loaded and there is a call to a method from another class. At that time the compiler assumes that the method is only implemented in there and so it generates compiled code and inline into the caller method s code. If later the method is overridden by another inheriting class, then the compiled code is discarded and the methods (both original and overriding) are recompiled without in-lining [20]. Another technique supported by the Graal compiler is partial evaluation of ASTs to produce efficient machine code [21]. During interpretation, the AST interpreter uses partial evaluation to produce interpreted code of the AST branch it is interpreting. The evaluation starts by compiling the execute method of the root node, and inline all the children execute methods. When the Graal compiler compiles a branch, it performed automatic partial evaluation and, if the rewriting is not required, produces optimised machine code. To produce optimised code, the compiler modifies the code to perform method in-lining, escape analysis, loops optimisation and also removing nodes information related to specialisation, which are not needed anymore at this stage [4]. For branches that require rewriting, the Graal compiler decompile them back to the AST for further rewriting. The node information related to profiling data, frequency details and the current state of the node are used for rewriting the node into a new more specialized node. 25

26 Graal API and Dynamic Class Loader The Graal API provides interface for accessing classes, methods and their signatures, instance variables, constant pools and all other necessary data available in a class file. The API also provides interface to access compiled code information such as machine code bytes, cached data, garbage collection pointers and de-optimisation details [20]. The availability of the API allows Truffle applications to do custom compilation and support features such as adding specific optimisation phase, partial evaluation and custom in-lining [20]. As in any other JVM, classes are loaded and initialised as late as possible [7]. This allows the VM to only load classes that are needed in the current execution and avoid loading unused classes. When Graal is used as a JIT, class loading and linking is done by an interpreter. This allows possibilities for de-optimisation and returning execution to the interpreter [20]. On the other hand, when Graal is used for static analysis, class loading can be done by the compiler Graph-Based Intermediate Representation The Graal compiler uses directed graph to represent its Intermediate Representation (IR) [22]. The graph includes nodes representing opcodes, Java fields, start and end of blocks, guards and de-optimisation safe points. Edges of the graph represents data and control flow between nodes. Languages developers can use graph visualizer to visualise optimisation phases taken by the Graal compiler. Other features provided by the Graal VM includes memory optimisation, register allocation and code generation [10] Scanning, Parsing and AST Interpretation with the Truffle Framework To host a guest language on the JVM using the Truffle framework, a developer has to provide a scanner for reading the source file. The main duty of the scanner is to perform lexical analysis part of the compilation process. This is the process of reading sequence of characters from a source file and organize them into meaningful tokens [23]. These tokens are then feed to the parser for further processing. Then a parser has to be developed that checks the syntax of the parsed tokens and if the syntax is correct, generates an AST for the interpreter [23]. An AST provides the structure of the source code in a tree format. The Truffle framework provides a freedom for developers to 26

27 provide any implementation of their choice for the parser and scanner. Although, the parser has to produce an AST that is understood by the Truffle implementation of the AST interpreter. Furthermore, the developer provides the AST interpreter, which is a way to describe the semantic of the guest language [9]. For the interpreter, developer implements semantic features such as types definition, type checking and node implementation for each operation available in a guest language. Each node implementation provides a mechanism to execute its operation, and for block related nodes, a frame to store and access local variables of that block. Figure 2-9 shows a detailed system structure of the project implemented with the Truffle framework. The DEX files This project implementation Figure 2-9: Detailed system structure of the Truffle implementation (Image source: [4] ) Related Work There have been many projects focusing on optimising Android applications. Most of those projects focus on optimising bytecodes to reduce unnecessary instructions. On the other hand, Truffle is a research framework. Although there have been many languages already implemented on top such as Ruby, Python, R and JavaScript, there is currently only one implementation of static typed language or bytecode level parsing. This section discusses two projects that closely related to the focus of this thesis Redex Bytecode Optimiser Facebook Inc. developed in 2015 a software framework, Redex, for reading, writing, analysing and optimising Android bytecodes [24]. The framework is written in C++ and was released as open source in April 2016 [25]. The framework helps developers to transform Android DEX 27

28 files and produce better performing Android applications. The following are some of the transformation techniques applied by Redex to reduce the size of DEX files [26]. Minification and Compression This technique involves reducing and minimizing strings size used by identifiers (for method and field names), class paths and file paths without changing the overall functionality. Since long string takes up many bytes in DEX bytecode, this technique saves up bytes that were dedicated for string in the original DEX file. Code Inlining Another technique used in Redex to optimise bytecode is moving functionality of a called method to the body of its caller. This reduces overhead introduced by method calls and possibly could reduce bytecode size. Dead Code Elimination Another technique applied by the Redex framework is elimination of all unreachable code. Although this is a common technique in optimisation process, it is still possible for original DEX files to contain dead code. Optimised DEX files generated by the Redex framework are aimed to be packaged in Android Packages (APK) and run by the Dalvik VM or ART. In contrast, this project aims to execute DEX files on JVMs. To accomplish this, this project focuses on the development of an AST interpreter built on top of the Truffle framework and the Graal VM. The project transforms Android applications to produce applications that can be hosted by JVMs LLVM IR Interpreter (Sulong) Among the projects that were developed with the Truffle and Graal frameworks, Sulong [27] is the one that is closely related to the implementation details of this project. Sulong is a Low Level Virtual Machine Intermediate Representation (LLVM IR) interpreter developed with the Truffle and Graal frameworks. Languages that can be compiled with LLVM compiler infrastructures [28] such as C, C++ and Fortran [29] can use Sulong as their dynamic runtime for their IR when running on the JVM. 28

29 LLVM-based languages could be compiled to LLVM IR bitcode, using an LLVM front-end such as Clang [30], and then be hosted on the JVM using Sulong. The bitcode instructions in LLVM IR are register-based [31] as the one in Android bytecode. The example code from Figure 2-3 can be represented as LLVM IR as shown in Figure %1 = alloca i32, align 4 %num1 = alloca i32, align 4 %num2 = alloca i32, align 4 %sum = alloca i32, align 4 store i32 0, i32* %1 store i32 15, i32* %num1, align 4 store i32 30, i32* %num2, align 4 %2 = load i32* %num1, align 4 %3 = load i32* %num2, align 4 %4 = add nsw i32 %2, %3 store i32 %4, i32* %sum, align 4 Figure 2-10: LLVM IR representation of simple addition. Although LLVM IR is more low level and closer to the machine code, the technique to parse it is closely related to the Android bytecode. Nevertheless, while Sulong focuses on the generalized LLVM instructions, this project focuses on specialized set of instructions used by the Android platform Summary This chapter presented the literature information needed for undertaking this project in the wide context of Android system and virtual machines in general. The chapter discussed bytecodes and how they are put together in an application. The chapter also discussed different types of virtual machines and how register-based machines bytecodes are organised. At the end of the chapter closely related projects were discussed and relation their relations to this project were presented. 29

30 Chapter 3. Methodology and System Design 3.1 Introduction This chapter presents a detailed explanation of the design of the proposed system and the methodology used during the development process. In details, the chapter discusses design decisions taken while putting together different components of the system. Section 3.2 discusses the methodology that was used during different phases of the project. Furthermore, Section 3.3 discusses the development environment tools and libraries necessary for the development and evaluation phases of the project. Section 3.4 presents the detailed architecture of the system and the design of different components is discussed. A discussion about design assumptions is also presented in Section Methodology To accomplish the aims and objectives presented in Section 1.3, an incremental approach was used in which phases of the project were executed iteratively. The following sub-sections discuss the methodology during different phases involved in this project Research and Familiarisation with the Truffle Framework The first iterations focused on researching and understanding tools and different technologies involved in the development of this project. Together with the literature survey, this phase was used to study and understand simple language construction with the Truffle framework. The objective was to understand how a simple language can be developed on Truffle and identify issues and difficulties that could rise when working with the Truffle framework Understanding the Dalvik Executable Format (DEX) Next phase of the project involved a study to understand DEX opcode and their usage. This includes identifying different opcodes, their hexadecimal presentation, mnemonic representation and their relationship to the original Java constructs. The objective of this phase was to understand the DEX hexadecimal format and be able to build a simple scanner and parser for the format. 30

31 3.2.3 Building DEX AST Interpreter with the Truffle Framework This phase involved design and implementation of the project. A general design of the project was developed followed by building a simple interpreter for simple DEX files. Iteratively, more DEX functionality and features were developed and added. Together with the AST interpreter, test cases were built to complement the development process of the interpreter. Section 3.4 discusses the design decision taken while Chapter 4 presents the details of the implementation of the project Benchmarking, Optimisation and Evaluation The aim of this phase is to validate the correctness of the implementation and compare benchmark results with other Android runtimes. To evaluate the correctness of the implementation, the project uses standard benchmark test suites for validation and benchmark. The detail on benchmarks used together with the results is discussed in Chapter Development Environment This section discusses tools and resources that made the development environment of this project. Operating System: GNU/Linux is the chosen platform for this project, as it provides most of the necessary tools and also support most of the custom tools developed by the Truffle team. Ubuntu is used as the development operating system. Programming Language: Java is the main programming language of the project. Version 7 and 8 of the JDK are used as required by the Truffle and Graal frameworks. Framework: The Truffle framework is used to implement the AST interpreter [8]. Execution VM: The Graal VM is used to run the implemented project, as the Truffle project performs better when hosted on the Graal VM instead of other JVMs [8]. IDE: Eclipse IDE is used as an Integrated Development Environment to simplify writing, debugging and testing of the source code. Android Applications: Android SDK is used to generate input files as it contains tools necessary to produce DEX files from Java source code. All sample DEX files, together with the benchmark test suites used in this project were constructed using API level 23 of the Android 6.0 (Marshmallow) [32]. 31

Java: framework overview and in-the-small features

Java: framework overview and in-the-small features Chair of Software Engineering Carlo A. Furia, Marco Piccioni, Bertrand Meyer Java: framework overview and in-the-small features Chair of Software Engineering Carlo A. Furia, Marco Piccioni, Bertrand Meyer

More information

Compiling Techniques

Compiling Techniques Lecture 10: Introduction to 10 November 2015 Coursework: Block and Procedure Table of contents Introduction 1 Introduction Overview Java Virtual Machine Frames and Function Call 2 JVM Types and Mnemonics

More information

Truffle A language implementation framework

Truffle A language implementation framework Truffle A language implementation framework Boris Spasojević Senior Researcher VM Research Group, Oracle Labs Slides based on previous talks given by Christian Wimmer, Christian Humer and Matthias Grimmer.

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

High-Level Language VMs

High-Level Language VMs High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs

More information

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher COP4020 ming Languages Compilers and Interpreters Robert van Engelen & Chris Lacher Overview Common compiler and interpreter configurations Virtual machines Integrated development environments Compiler

More information

CSc 453 Interpreters & Interpretation

CSc 453 Interpreters & Interpretation CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson Interpreters An interpreter is a program that executes another program. An interpreter implements a virtual machine,

More information

The role of semantic analysis in a compiler

The role of semantic analysis in a compiler Semantic Analysis Outline The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon School of Electrical Engineering and Computer Science Seoul National University, Korea Android apps are programmed using Java Android uses DVM instead of JVM

More information

Trace Compilation. Christian Wimmer September 2009

Trace Compilation. Christian Wimmer  September 2009 Trace Compilation Christian Wimmer cwimmer@uci.edu www.christianwimmer.at September 2009 Department of Computer Science University of California, Irvine Background Institute for System Software Johannes

More information

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM. IR Generation Announcements My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM. This is a hard deadline and no late submissions will be

More information

Accelerating Ruby with LLVM

Accelerating Ruby with LLVM Accelerating Ruby with LLVM Evan Phoenix Oct 2, 2009 RUBY RUBY Strongly, dynamically typed RUBY Unified Model RUBY Everything is an object RUBY 3.class # => Fixnum RUBY Every code context is equal RUBY

More information

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM

Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM 1 Tizen/Artik IoT Lecture Chapter 3. JerryScript Parser & VM Sungkyunkwan University Contents JerryScript Execution Flow JerryScript Parser Execution Flow Lexing Parsing Compact Bytecode (CBC) JerryScript

More information

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation Language Implementation Methods The Design and Implementation of Programming Languages Compilation Interpretation Hybrid In Text: Chapter 1 2 Compilation Interpretation Translate high-level programs to

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Run-time Program Management. Hwansoo Han

Run-time Program Management. Hwansoo Han Run-time Program Management Hwansoo Han Run-time System Run-time system refers to Set of libraries needed for correct operation of language implementation Some parts obtain all the information from subroutine

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Truffle: A Self-Optimizing Runtime System Christian Wimmer, Thomas Würthinger Oracle Labs Write Your Own Language Current situation How it should be Prototype a new language Parser and language work

More information

Sista: Improving Cog s JIT performance. Clément Béra

Sista: Improving Cog s JIT performance. Clément Béra Sista: Improving Cog s JIT performance Clément Béra Main people involved in Sista Eliot Miranda Over 30 years experience in Smalltalk VM Clément Béra 2 years engineer in the Pharo team Phd student starting

More information

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization. Where We Are Source Code Lexical Analysis Syntax Analysis Semantic Analysis IR Generation IR Optimization Code Generation Optimization Machine Code Where We Are Source Code Lexical Analysis Syntax Analysis

More information

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace Project there are a couple of 3 person teams regroup or see me or forever hold your peace a new drop with new type checking is coming using it is optional 1 Compiler Architecture source code Now we jump

More information

JAM 16: The Instruction Set & Sample Programs

JAM 16: The Instruction Set & Sample Programs JAM 16: The Instruction Set & Sample Programs Copyright Peter M. Kogge CSE Dept. Univ. of Notre Dame Jan. 8, 1999, modified 4/4/01 Revised to 16 bits: Dec. 5, 2007 JAM 16: 1 Java Terms Java: A simple,

More information

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1 SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine David Bélanger dbelan2@cs.mcgill.ca Sable Research Group McGill University Montreal, QC January 28, 2004 SABLEJIT: A Retargetable

More information

Introduction to Programming Using Java (98-388)

Introduction to Programming Using Java (98-388) Introduction to Programming Using Java (98-388) Understand Java fundamentals Describe the use of main in a Java application Signature of main, why it is static; how to consume an instance of your own class;

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Just In Time Compilation

Just In Time Compilation Just In Time Compilation JIT Compilation: What is it? Compilation done during execution of a program (at run time) rather than prior to execution Seen in today s JVMs and elsewhere Outline Traditional

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Statically vs. Dynamically typed languages

More information

Intermediate Representations

Intermediate Representations COMP 506 Rice University Spring 2018 Intermediate Representations source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students

More information

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13 Run-time Environments Lecture 13 by Prof. Vijay Ganesh) Lecture 13 1 What have we covered so far? We have covered the front-end phases Lexical analysis (Lexer, regular expressions,...) Parsing (CFG, Top-down,

More information

Combining Analyses, Combining Optimizations - Summary

Combining Analyses, Combining Optimizations - Summary Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate

More information

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar BEAMJIT: An LLVM based just-in-time compiler for Erlang Frej Drejhammar 140407 Who am I? Senior researcher at the Swedish Institute of Computer Science (SICS) working on programming languages,

More information

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc. Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management

More information

LECTURE 3. Compiler Phases

LECTURE 3. Compiler Phases LECTURE 3 Compiler Phases COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent

More information

Just-In-Time Compilation

Just-In-Time Compilation Just-In-Time Compilation Thiemo Bucciarelli Institute for Software Engineering and Programming Languages 18. Januar 2016 T. Bucciarelli 18. Januar 2016 1/25 Agenda Definitions Just-In-Time Compilation

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE 1 SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE Tatsuya Hagino hagino@sfc.keio.ac.jp slides URL https://vu5.sfc.keio.ac.jp/sa/ Java Programming Language Java Introduced in 1995 Object-oriented programming

More information

When do We Run a Compiler?

When do We Run a Compiler? When do We Run a Compiler? Prior to execution This is standard. We compile a program once, then use it repeatedly. At the start of each execution We can incorporate values known at the start of the run

More information

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Compiling and Interpreting Programming. Overview of Compilers and Interpreters Copyright R.A. van Engelen, FSU Department of Computer Science, 2000 Overview of Compilers and Interpreters Common compiler and interpreter configurations Virtual machines Integrated programming environments

More information

Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Copyright 2014 Oracle and/or its affiliates. All rights reserved. Copyright 2014 Oracle and/or its affiliates. All rights reserved. On the Quest Towards Fastest (Java) Virtual Machine on the Planet! @JaroslavTulach Oracle Labs Copyright 2015 Oracle and/or its affiliates.

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information

Introduction. CS 2210 Compiler Design Wonsun Ahn

Introduction. CS 2210 Compiler Design Wonsun Ahn Introduction CS 2210 Compiler Design Wonsun Ahn What is a Compiler? Compiler: A program that translates source code written in one language to a target code written in another language Source code: Input

More information

Compiler construction 2009

Compiler construction 2009 Compiler construction 2009 Lecture 3 JVM and optimization. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int

More information

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573 What is a compiler? Xiaokang Qiu Purdue University ECE 573 August 21, 2017 What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g.,

More information

CMSC 430 Introduction to Compilers. Fall Language Virtual Machines

CMSC 430 Introduction to Compilers. Fall Language Virtual Machines CMSC 430 Introduction to Compilers Fall 2018 Language Virtual Machines Introduction So far, we ve focused on the compiler front end Syntax (lexing/parsing) High-level language semantics Ultimately, we

More information

BEAMJIT, a Maze of Twisty Little Traces

BEAMJIT, a Maze of Twisty Little Traces BEAMJIT, a Maze of Twisty Little Traces A walk-through of the prototype just-in-time (JIT) compiler for Erlang. Frej Drejhammar 130613 Who am I? Senior researcher at the Swedish Institute

More information

B.V. Patel Institute of BMC & IT, UTU 2014

B.V. Patel Institute of BMC & IT, UTU 2014 BCA 3 rd Semester 030010301 - Java Programming Unit-1(Java Platform and Programming Elements) Q-1 Answer the following question in short. [1 Mark each] 1. Who is known as creator of JAVA? 2. Why do we

More information

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis? Anatomy of a Compiler Program (character stream) Lexical Analyzer (Scanner) Syntax Analyzer (Parser) Semantic Analysis Parse Tree Intermediate Code Generator Intermediate Code Optimizer Code Generator

More information

CMSC 430 Introduction to Compilers. Spring Intermediate Representations and Bytecode Formats

CMSC 430 Introduction to Compilers. Spring Intermediate Representations and Bytecode Formats CMSC 430 Introduction to Compilers Spring 2016 Intermediate Representations and Bytecode Formats Introduction Front end Source code Lexer Parser Types AST/IR IR 2 IR n IR n.s Middle end Back end Front

More information

Contents. Figures. Tables. Examples. Foreword. Preface. 1 Basics of Java Programming 1. xix. xxi. xxiii. xxvii. xxix

Contents. Figures. Tables. Examples. Foreword. Preface. 1 Basics of Java Programming 1. xix. xxi. xxiii. xxvii. xxix PGJC4_JSE8_OCA.book Page ix Monday, June 20, 2016 2:31 PM Contents Figures Tables Examples Foreword Preface xix xxi xxiii xxvii xxix 1 Basics of Java Programming 1 1.1 Introduction 2 1.2 Classes 2 Declaring

More information

A Tour of Language Implementation

A Tour of Language Implementation 1 CSCE 314: Programming Languages Dr. Flemming Andersen A Tour of Language Implementation Programming is no minor feat. Prometheus Brings Fire by Heinrich Friedrich Füger. Image source: https://en.wikipedia.org/wiki/prometheus

More information

02 B The Java Virtual Machine

02 B The Java Virtual Machine 02 B The Java Virtual Machine CS1102S: Data Structures and Algorithms Martin Henz January 22, 2010 Generated on Friday 22 nd January, 2010, 09:46 CS1102S: Data Structures and Algorithms 02 B The Java Virtual

More information

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points Name: CS520 Final Exam 12 December 2018, 120 minutes, 26 questions, 100 points The exam is closed book and notes. Please keep all electronic devices turned off and out of reach. Note that a question may

More information

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done What is a compiler? What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g., C++) to low-level assembly language that can be executed by hardware int a,

More information

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100 GATE- 2016-17 Postal Correspondence 1 Compiler Design Computer Science & Information Technology (CS) 20 Rank under AIR 100 Postal Correspondence Examination Oriented Theory, Practice Set Key concepts,

More information

Playing with bird guts. Jonathan Worthington YAPC::EU::2007

Playing with bird guts. Jonathan Worthington YAPC::EU::2007 Jonathan Worthington YAPC::EU::2007 I'm not going to do a dissection. The Plan For Today Take a simple Parrot program, written in PIR Look, from start to finish, at what happens when we feed it to the

More information

What are the characteristics of Object Oriented programming language?

What are the characteristics of Object Oriented programming language? What are the various elements of OOP? Following are the various elements of OOP:- Class:- A class is a collection of data and the various operations that can be performed on that data. Object- This is

More information

Assumptions. History

Assumptions. History Assumptions A Brief Introduction to Java for C++ Programmers: Part 1 ENGI 5895: Software Design Faculty of Engineering & Applied Science Memorial University of Newfoundland You already know C++ You understand

More information

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Lecture 9. February 7, 2018 Semantic Analysis Lecture 9 February 7, 2018 Midterm 1 Compiler Stages 12 / 14 COOL Programming 10 / 12 Regular Languages 26 / 30 Context-free Languages 17 / 21 Parsing 20 / 23 Extra Credit 4 / 6 Average

More information

<Insert Picture Here> Maxine: A JVM Written in Java

<Insert Picture Here> Maxine: A JVM Written in Java Maxine: A JVM Written in Java Michael Haupt Oracle Labs Potsdam, Germany The following is intended to outline our general product direction. It is intended for information purposes

More information

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

SEMANTIC ANALYSIS TYPES AND DECLARATIONS SEMANTIC ANALYSIS CS 403: Type Checking Stefan D. Bruda Winter 2015 Parsing only verifies that the program consists of tokens arranged in a syntactically valid combination now we move to check whether

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview Introduction to Visual Basic and Visual C++ Introduction to Java Lesson 13 Overview I154-1-A A @ Peter Lo 2010 1 I154-1-A A @ Peter Lo 2010 2 Overview JDK Editions Before you can write and run the simple

More information

Self-Optimizing AST Interpreters

Self-Optimizing AST Interpreters Self-Optimizing AST Interpreters Thomas Würthinger Andreas Wöß Lukas Stadler Gilles Duboscq Doug Simon Christian Wimmer Oracle Labs Institute for System Software, Johannes Kepler University Linz, Austria

More information

ART JIT in Android N. Xueliang ZHONG Linaro ART Team

ART JIT in Android N. Xueliang ZHONG Linaro ART Team ART JIT in Android N Xueliang ZHONG Linaro ART Team linaro-art@linaro.org 1 Outline Android Runtime (ART) and the new challenges ART Implementation in Android N Tooling Performance Data & Findings Q &

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA The course covers Compiler architecture Pre-requisite Front-end Strong programming background in C, C++ Back-end LLVM Code optimization A case study: nu+

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University Code Generation Frédéric Haziza Department of Computer Systems Uppsala University Spring 2008 Operating Systems Process Management Memory Management Storage Management Compilers Compiling

More information

2 rd class Department of Programming. OOP with Java Programming

2 rd class Department of Programming. OOP with Java Programming 1. Structured Programming and Object-Oriented Programming During the 1970s and into the 80s, the primary software engineering methodology was structured programming. The structured programming approach

More information

Chapter 11 Introduction to Programming in C

Chapter 11 Introduction to Programming in C Chapter 11 Introduction to Programming in C C: A High-Level Language Gives symbolic names for containers of values don t need to know which register or memory location Provides abstraction of underlying

More information

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Objective PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS Explain what is meant by compiler. Explain how the compiler works. Describe various analysis of the source program. Describe the

More information

WACC Report. Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow

WACC Report. Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow WACC Report Zeshan Amjad, Rohan Padmanabhan, Rohan Pritchard, & Edward Stow 1 The Product Our compiler passes all of the supplied test cases, and over 60 additional test cases we wrote to cover areas (mostly

More information

Computer Components. Software{ User Programs. Operating System. Hardware

Computer Components. Software{ User Programs. Operating System. Hardware Computer Components Software{ User Programs Operating System Hardware What are Programs? Programs provide instructions for computers Similar to giving directions to a person who is trying to get from point

More information

A Method-Based Ahead-of-Time Compiler For Android Applications

A Method-Based Ahead-of-Time Compiler For Android Applications A Method-Based Ahead-of-Time Compiler For Android Applications Fatma Deli Computer Science & Software Engineering University of Washington Bothell November, 2012 2 Introduction This paper proposes a method-based

More information

Language Reference Manual simplicity

Language Reference Manual simplicity Language Reference Manual simplicity Course: COMS S4115 Professor: Dr. Stephen Edwards TA: Graham Gobieski Date: July 20, 2016 Group members Rui Gu rg2970 Adam Hadar anh2130 Zachary Moffitt znm2104 Suzanna

More information

Compiler Construction

Compiler Construction Compiler Construction Thomas Noll Software Modeling and Verification Group RWTH Aachen University https://moves.rwth-aachen.de/teaching/ws-1819/cc/ Generation of Intermediate Code Outline of Lecture 15

More information

Computer Components. Software{ User Programs. Operating System. Hardware

Computer Components. Software{ User Programs. Operating System. Hardware Computer Components Software{ User Programs Operating System Hardware What are Programs? Programs provide instructions for computers Similar to giving directions to a person who is trying to get from point

More information

Lecture 1 Introduction to Android. App Development for Mobile Devices. App Development for Mobile Devices. Announcement.

Lecture 1 Introduction to Android. App Development for Mobile Devices. App Development for Mobile Devices. Announcement. CSCE 315: Android Lectures (1/2) Dr. Jaerock Kwon App Development for Mobile Devices Jaerock Kwon, Ph.D. Assistant Professor in Computer Engineering App Development for Mobile Devices Jaerock Kwon, Ph.D.

More information

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2 Building a Compiler with JoeQ AKA, how to solve Homework 2 Outline of this lecture Building a compiler: what pieces we need? An effective IR for Java joeq Homework hints How to Build a Compiler 1. Choose

More information

Compiler construction 2009

Compiler construction 2009 Compiler construction 2009 Lecture 2 Code generation 1: Generating Jasmin code JVM and Java bytecode Jasmin Naive code generation The Java Virtual Machine Data types Primitive types, including integer

More information

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University Soot A Java Bytecode Optimization Framework Sable Research Group School of Computer Science McGill University Goal Provide a Java framework for optimizing and annotating bytecode provide a set of API s

More information

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

Python Implementation Strategies. Jeremy Hylton Python / Google

Python Implementation Strategies. Jeremy Hylton Python / Google Python Implementation Strategies Jeremy Hylton Python / Google Python language basics High-level language Untyped but safe First-class functions, classes, objects, &c. Garbage collected Simple module system

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

EDAN65: Compilers, Lecture 13 Run;me systems for object- oriented languages. Görel Hedin Revised:

EDAN65: Compilers, Lecture 13 Run;me systems for object- oriented languages. Görel Hedin Revised: EDAN65: Compilers, Lecture 13 Run;me systems for object- oriented languages Görel Hedin Revised: 2014-10- 13 This lecture Regular expressions Context- free grammar ATribute grammar Lexical analyzer (scanner)

More information

Intermediate Code, Object Representation, Type-Based Optimization

Intermediate Code, Object Representation, Type-Based Optimization CS 301 Spring 2016 Meetings March 14 Intermediate Code, Object Representation, Type-Based Optimization Plan Source Program Lexical Syntax Semantic Intermediate Code Generation Machine- Independent Optimization

More information

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine

Machine Language Instructions Introduction. Instructions Words of a language understood by machine. Instruction set Vocabulary of the machine Machine Language Instructions Introduction Instructions Words of a language understood by machine Instruction set Vocabulary of the machine Current goal: to relate a high level language to instruction

More information

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming.

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming. Compilers Compilers are fundamental to modern computing. They act as translators, transforming human-oriented programming languages into computer-oriented machine languages. To most users, a compiler can

More information

Compiling Techniques

Compiling Techniques Lecture 2: The view from 35000 feet 19 September 2017 Table of contents 1 2 Passes Representations 3 Instruction Selection Register Allocation Instruction Scheduling 4 of a compiler Source Compiler Machine

More information

CSCE 314 Programming Languages

CSCE 314 Programming Languages CSCE 314 Programming Languages! JVM Dr. Hyunyoung Lee 1 Java Virtual Machine and Java The Java Virtual Machine (JVM) is a stack-based abstract computing machine. JVM was designed to support Java -- Some

More information

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

Java and C II. CSE 351 Spring Instructor: Ruth Anderson Java and C II CSE 351 Spring 2017 Instructor: Ruth Anderson Teaching Assistants: Dylan Johnson Kevin Bi Linxing Preston Jiang Cody Ohlsen Yufang Sun Joshua Curtis Administrivia Lab 5 Due TONIGHT! Fri 6/2

More information

Pace University. Fundamental Concepts of CS121 1

Pace University. Fundamental Concepts of CS121 1 Pace University Fundamental Concepts of CS121 1 Dr. Lixin Tao http://csis.pace.edu/~lixin Computer Science Department Pace University October 12, 2005 This document complements my tutorial Introduction

More information

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites:

C Programming. Course Outline. C Programming. Code: MBD101. Duration: 10 Hours. Prerequisites: C Programming Code: MBD101 Duration: 10 Hours Prerequisites: You are a computer science Professional/ graduate student You can execute Linux/UNIX commands You know how to use a text-editing tool You should

More information

Notes of the course - Advanced Programming. Barbara Russo

Notes of the course - Advanced Programming. Barbara Russo Notes of the course - Advanced Programming Barbara Russo a.y. 2014-2015 Contents 1 Lecture 2 Lecture 2 - Compilation, Interpreting, and debugging........ 2 1.1 Compiling and interpreting...................

More information

COPYRIGHTED MATERIAL. What Is Assembly Language? Processor Instructions

COPYRIGHTED MATERIAL. What Is Assembly Language? Processor Instructions What Is Assembly Language? One of the first hurdles to learning assembly language programming is understanding just what assembly language is. Unlike other programming languages, there is no one standard

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

The Structure of a Compiler

The Structure of a Compiler The Structure of a Compiler A compiler performs two major tasks: Analysis of the source program being compiled Synthesis of a target program Almost all modern compilers are syntax-directed: The compilation

More information

Procedure and Object- Oriented Abstraction

Procedure and Object- Oriented Abstraction Procedure and Object- Oriented Abstraction Scope and storage management cs5363 1 Procedure abstractions Procedures are fundamental programming abstractions They are used to support dynamically nested blocks

More information

Stating the obvious, people and computers do not speak the same language.

Stating the obvious, people and computers do not speak the same language. 3.4 SYSTEM SOFTWARE 3.4.3 TRANSLATION SOFTWARE INTRODUCTION Stating the obvious, people and computers do not speak the same language. People have to write programs in order to instruct a computer what

More information

The Structure of a Syntax-Directed Compiler

The Structure of a Syntax-Directed Compiler Source Program (Character Stream) Scanner Tokens Parser Abstract Syntax Tree Type Checker (AST) Decorated AST Translator Intermediate Representation Symbol Tables Optimizer (IR) IR Code Generator Target

More information