Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

Similar documents
JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Compiling Techniques

Programming Language Systems

The Java Virtual Machine. CSc 553. Principles of Compilation. 3 : The Java VM. Department of Computer Science University of Arizona

301AA - Advanced Programming [AP-2017]

Let s make some Marc R. Hoffmann Eclipse Summit Europe

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

CSC 4181 Handout : JVM

Compiler construction 2009

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008

CSCE 314 Programming Languages

Under the Hood: The Java Virtual Machine. Problem: Too Many Platforms! Compiling for Different Platforms. Compiling for Different Platforms

How do you create a programming language for the JVM?

02 B The Java Virtual Machine

Program Dynamic Analysis. Overview

3/15/18. Overview. Program Dynamic Analysis. What is dynamic analysis? [3] Why dynamic analysis? Why dynamic analysis? [3]

Java byte code verification

Run-time Program Management. Hwansoo Han

COMP3131/9102: Programming Languages and Compilers

JAM 16: The Instruction Set & Sample Programs

The Java Virtual Machine

Static Program Analysis

Java Class Loading and Bytecode Verification

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

Course Overview. Levels of Programming Languages. Compilers and other translators. Tombstone Diagrams. Syntax Specification

Chapter 5. A Closer Look at Instruction Set Architectures

COMP3131/9102: Programming Languages and Compilers

CMPSC 497: Java Security

javac 29: pop 30: iconst_0 31: istore_3 32: jsr [label_51]

Problem: Too Many Platforms!

Compiler construction 2009

Michael Rasmussen ZeroTurnaround

CSc 453 Interpreters & Interpretation

CS2110 Fall 2011 Lecture 25. Under the Hood: The Java Virtual Machine, Part II

Over-view. CSc Java programs. Java programs. Logging on, and logging o. Slides by Michael Weeks Copyright Unix basics. javac.

Improving Java Performance

Tutorial 3: Code Generation

Compiling for Different Platforms. Problem: Too Many Platforms! Dream: Platform Independence. Java Platform 5/3/2011

Compilation 2012 Code Generation

Topics. Structured Computer Organization. Assembly language. IJVM instruction set. Mic-1 simulator programming

The Java Language Implementation

Taming the Java Virtual Machine. Li Haoyi, Chicago Scala Meetup, 19 Apr 2017

An Introduction to Multicodes. Ben Stephenson Department of Computer Science University of Western Ontario

COMP 520 Fall 2009 Virtual machines (1) Virtual machines

Delft-Java Dynamic Translation

A Quantitative Analysis of Java Bytecode Sequences

CSE 431S Final Review. Washington University Spring 2013

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

CS263: Runtime Systems Lecture: High-level language virtual machines. Part 1 of 2. Chandra Krintz UCSB Computer Science Department

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Improving Java Code Performance. Make your Java/Dalvik VM happier

Java: framework overview and in-the-small features

Mnemonics Type-Safe Bytecode Generation in Scala

Exercise 7 Bytecode Verification self-study exercise sheet

The Java Virtual Machine

Why GC is eating all my CPU? Aprof - Java Memory Allocation Profiler Roman Elizarov, Devexperts Joker Conference, St.

Translating JVM Code to MIPS Code 1 / 43

CSc 620 Debugging, Profiling, Tracing, and Visualizing Programs. Compiling Java. The Java Class File Format 1 : JVM

Code Profiling. CSE260, Computer Science B: Honors Stony Brook University

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

Plan for Today. Safe Programming Languages. What is a secure programming language?

Recap: Printing Trees into Bytecodes

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 16

Jaos - Java on Aos. Oberon Event 03 Patrik Reali

Chapter 5. A Closer Look at Instruction Set Architectures

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Part VII : Code Generation

Java Security. Compiler. Compiler. Hardware. Interpreter. The virtual machine principle: Abstract Machine Code. Source Code

301AA - Advanced Programming [AP-2017]

EXAMINATIONS 2014 TRIMESTER 1 SWEN 430. Compiler Engineering. This examination will be marked out of 180 marks.

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

High-Level Language VMs

Living in the Matrix with Bytecode Manipulation

Introduction Basic elements of Java

Running class Timing on Java HotSpot VM, 1

CMSC 430 Introduction to Compilers. Spring Intermediate Representations and Bytecode Formats

invokedynamic IN 45 MINUTES!!! Wednesday, February 6, 13

Java Overview An introduction to the Java Programming Language

Android Internals and the Dalvik VM!

Static Analysis of Dynamic Languages. Jennifer Strater

Short Notes of CS201

COMP 520 Fall 2013 Code generation (1) Code generation

CS201 - Introduction to Programming Glossary By

CS 231 Data Structures and Algorithms, Fall 2016

Shared Mutable State SWEN-220

Computer Components. Software{ User Programs. Operating System. Hardware

Oak Intermediate Bytecodes

JSR 292 backport. (Rémi Forax) University Paris East

In Vogue Dynamic. Alexander Shopov

Today. Instance Method Dispatch. Instance Method Dispatch. Instance Method Dispatch 11/29/11. today. last time

A Type System for Object Initialization In the Java TM Bytecode Language

This project is supported by DARPA under contract ARPA F C-0057 through a subcontract from Syracuse University. 2

Code Analysis and Optimization. Pat Morin COMP 3002

LaboratoriodiProgrammazione III

P. O. Box 1565 Cupertino, CA USA

Transcription:

Course Overview PART I: overview material 1 Introduction (today) 2 Language Processors (basic terminology, tombstone diagrams, bootstrapping) 3 The architecture of a Compiler PART II: inside a compiler 4 Syntax Analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion 9 Conclusion Added material: Java runtime organization the JVM. 1

What This Lecture is About In this lecture we look at the JVM as an example of a real-world runtime system for a modern object-oriented programming language. The material in this lecture is interesting because: 1) it will help understand some things about the JVM for the final stage of the course project. 2) JVM is probably the most common and widely used VM in the world. 3) You ll get a better idea what a real VM looks like. 2

Lecture overview JVM is an abstract machine. Recap: what is an AM? The JVM architecture Class File Structure JVM instruction execution: the role of the constant pool in dynamic linking. Reference: This material was compiled from the book: Specification 2nd Edition. Tim Lindholm and Frank Yellin. Online at: http://java.sun.com/docs/books/vmspec/ => you will probably need to consult this reference when doing the final stage of the project. These slides only provide a broad overview. 3

RECAP: Interpretive Compilers Why? A tradeoff between fast(er) compilation and a reasonable runtime performance. How? Use an intermediate language more high-level than machine code => easier to compile to more low-level than source language => easy to implement as an interpreter Example: A Java Development Kit for machine M Java->JVM M JVM M 4

Abstract Machines Abstract machine implements an intermediate language in between the high-level language (e.g. Java) and the low-level hardware (e.g. Pentium) High level Java Java JVM (.class files) Implemented in Java: Machine independent Java compiler Java JVM interpreter or JVM JIT compiler Low level Pentium Pentium 5

Abstract Machines An abstract machine is intended specifically as a runtime system for a particular (kind of) programming language. JVM is a virtual machine for Java programs: It directly supports object oriented concepts such as classes, objects, methods, method invocation etc. easy to compile Java to JVM => 1) easy to implement compiler 2) fast compilation another advantage: portability 6

Class Files and Class File Format External representation platform independent.class files load JVM internal representation implementation dependent objects classes methods primitive types integers arrays The JVM is an abstract machine in the true sense of the word. The JVM spec. does not specify implementation details (can be dependent on target OS/platform, performance requirements etc.) The JVM spec defines a machine independent class file format that all JVM implementations must support. 7

Data Types JVM (and Java) distinguishes between two kinds of types: Primitive types: boolean: boolean numeric integral: byte, short, int, long, char numeric floating point: float, double internal, for exception handling: returnaddress Reference types: class types array types interface types Note: Primitive types are represented directly, reference types are represented indirectly (as pointers to array or class instances) 8

JVM: Runtime Data Areas Besides OO concepts, JVM also supports multi-threading. Threads are directly supported by the JVM. => Two kinds of runtime data areas: 1) shared between all threads 2) private to a single thread Shared Thread 1 Thread 2 Garbage Collected Heap pc pc Method area Java Stack Native Method Stack Java Stack Native Method Stack 9

Java Stacks JVM is a stack based machine, much like TAM. JVM instructions implicitly take arguments from the stack top put their result on the top of the stack The stack is used to pass arguments to methods return result from a method store intermediate results in evaluating expressions store local variables This works similarly to (but not exactly the same as) the way we discussed in the lectures on stack-based allocation and routines. 10

Stack Frames The Java stack consists of frames. The JVM specification does not say exactly how the stack and frames should be implemented. The JVM specification specifies that a stack frame has areas for args + local vars operand stack to runtime constant pool A new call frame is created by executing some JVM instruction for invoking a method (e.g. invokevirtual, invokeinterface, invokestatic,...) The operand stack is initially empty. But grows and shrinks during execution. 11

pointer to constant pool args + local vars operand stack Stack Frames The role/purpose of each of the areas in a stack frame: Needed for resolution: more about this later. This is used implicitly when executing JVM instructions that contain entries into the CPool space where the arguments and local variables of a method are stored. This includes a space for the receiver (this) at position 0. Stack for storing intermediate results during the execution of the method. Initially it is empty. The maximum depth is known at compile time. 12

Stack Frames An implementation using registers such as SB, ST and LB and a dynamic link is one possible implementation. to previous frame on the stack SB LB dynamic link args + local vars operand stack to runtime constant pool JVM instructions store and load for accessing args and locals use addresses which are numbers from 0 to #args + #locals - 1 ST 13

JVM Interpreter The core of a JVM interpreter is basically this: do { byte opcode = fetch an opcode; switch (opcode) { case opcode1 : fetch operands for opcode1; execute action for opcode1; break; case opcode2 : fetch operands for opcode2; execute action for opcode2; break; case... } while (more to do) 14

Instruction-set: typed instructions! JVM instructions are explicitly typed: different opcodes for instructions for integers, floats, arrays and reference types. This is reflected by a naming convention in the first letter of the opcode mnemonics: Example: different types of load instructions iload lload fload dload aload integer load long load float load double load reference-type load 15

Instruction set: kinds of operands JVM instructions have three kinds of operands: - from the top of the operand stack - from the bytes following the opcode - part of the opcode One instructions may have different forms supporting different kinds of operands. Example: different forms of iload. Assembly code Binary instruction code layout iload_0 26 iload_1 27 iload_2 28 iload_3 29 iload n 21 n wide iload n 196 21 n 16

Instruction-set: accessing arguments and locals arguments and locals area inside a stack frame 0: 1: 2: 3: args: indexes 0.. #args-1 locals: indexes #args.. #args+#locals-1 Instruction examples: iload_1 iload_3 aload 5 aload_0 istore_1 astore_1 fstore_3 A load instruction: loads something from the args/locals area to the top of the operand stack. A store instruction takes something from the top of the operand stack and stores it in the argument/local area 17

Instruction-set: non-local memory access In the JVM, the contents of different kinds of memory can be accessed by different kinds of instructions. accessing locals and arguments: load and store instructions accessing fields in objects: getfield, putfield accessing static fields: getstatic, putstatic Note: static fields are a lot like global variables. They are allocated in the method area where also code for methods and representations for classes are stored. Q: what memory area are getfield and putfield accessing? Note: we don t have something similar to L1, L2, etc. addresses in JVM 18

Arithmethic Instruction-set: operations on numbers add: iadd, ladd, fadd, dadd subtract: isub, lsub, fsub, dsub multiply: imul, lmul, fmul, dmul etc. Conversion i2l, i2f, i2d l2f, l2d, f2s f2i, d2i, 19

Instruction-set Operand stack manipulation pop, pop2, dup, dup2, dup_x1, swap, Control transfer Unconditional : goto, goto_w, jsr, ret, Conditional: ifeq, iflt, ifgt, 20

Instruction-set Method invocation: invokevirtual: usual instruction for calling a method on an object. invokeinterface: same as invokevirtual, but used when the called method is declared in an interface. (requires different kind of method lookup) invokespecial: for calling things such as constructors. These are not dynamically dispatched but do have a this argument (old name: invokenonvirtual) invokestatic: for calling methods that have the static modifier (these methods belong to a class, rather an object) Returning from methods: return, ireturn, lreturn, areturn, freturn, 21

Instruction-set: Heap Memory Allocation Create new class instance (object): new Create new array: newarray: for creating arrays of primitive types. anewarray, multianewarray: for arrays of reference types 22

Instructions and the Constant Pool Many JVM instructions have operands which are indexes pointing to an entry in the so called constant pool. The constant pool contains all kinds of entries representing symbolic references for linking. This is the way that instructions refer to things such as classes, interfaces, methods, fields and constants such as string literals and numbers. These are the kinds of constant pool entries that exist: Class_info Fieldref_info Methodref_info InterfaceMethodref_info String Integer Float Long Double NameAndType_info Utf8 23

Instructions and the Constant Pool Example: We examine the getfield instruction in detail. Format: 180 indexbyte1 indexbyte2 CONSTANT_Fieldref_info { u1 tag; u2 class_index; u2 name_and_type index; } CONSTANT_Name_andType_info { u1 tag; u2 name_index; u2 descriptor_index; } Class_info { u1 tag; u2 name_index; } Utf8Info name of field Utf8Info field descriptor Utf8Info fully qualified class name 24

Instructions and the Constant Pool That previous picture is rather complicated, let s simplify it a little: Format: 180 indexbyte1 indexbyte2 Fieldref Class Name_and_Type Utf8Info fully qualified class name Utf8Info name of field Utf8Info field descriptor 25

Instructions and the Constant Pool The constant entries format is part of the Java class file format. Luckily, the tool we will use for code generation, ASM, provides an API that abstracts away the details of constant pool management. ASM takes care of creating the constant pool entries for us. When an instruction operand expects a constant pool entry the assembler allows you to enter the entry in place in an easy syntax. 26

ASM Example The following example shows how to use ASM to generate the contents of a.class file equivalent to the following.java program. public class Main { private int x; } public int getx() { return x; } 27

ASM Example public static byte[] dump () throws Exception { ClassWriter cw = new ClassWriter(0); FieldVisitor fv; MethodVisitor mv; cw.visit(v1_6, ACC_PUBLIC, "Main", null, "java/lang/object", null); cw.visitsource("main.java", null); //Generate field fv = cw.visitfield(acc_private, "x", "I", null, null); fv.visitend(); //Generate default constructor mv = cw.visitmethod(acc_public, "<init>", "()V", null, null);... //Generate getx method mv = cw.visitmethod(acc_public, "getx", "()I", null, null); mv.visitcode(); mv.visitvarinsn(aload, 0); mv.visitfieldinsn(getfield, "Main", "x", "I"); mv.visitinsn(ireturn); mv.visitmaxs(1, 1); mv.visitend(); } cw.visitend(); return cw.tobytearray(); 28

Instructions and the Constant Pool Fully qualified class names and descriptors in constant pool UTF8 entries. 1) Fully qualified class names: a package + class name string. Note this uses / instead of. 2) Descriptors: are strings that define a type for a method or field. Java descriptor boolean Z integer I Object Ljava/lang/Object; String[] [Ljava/lang/String; int foo(int,object) (ILjava/lang/Object;)I 29

Linking In general: linking is the process of resolving symbolic references in binary files. Most programming language implementations have what we call separate compilation. Modules or files can be compiled separately and transformed into some binary format. But since these separately compiled files may have connections to other files, they have to be linked. => The binary file is not yet executable, it has some kind of symbolic links in it that point to things (methods, classes, procedures, variables, etc.) in other files/modules. Linking is the process of resolving these symbolic links and replacing them by real addresses so that the code can be executed. 30

Loading and Linking in JVM In JVM, loading and linking of class files happens at runtime, while the program is running! Classes are loaded as needed. The constant pool contains symbolic references that need to be resolved before a JVM instruction that uses them can be executed (this is the equivalent of linking). In JVM a constant pool entry is resolved the first time it is used by a JVM instruction. Example: When a getfield is executed for the first time, the constant pool entry index in the instruction may be replaced by the offset of the field. 31

Closing Example As a closing example on the JVM, we will take a look at the compiled code of the following simple Java class declaration. class Factorial { } int fac(int n) { int result = 1; for (int i=2; i<n; i++) { result = result * i; } return result; } 32

Factorial.class Disassembled This disasmbled code produce by the ASM bytecode outline Eclipse plugin. See ASM website http://asm.objectweb.org/eclipse/index.html // class version 50.0 (50) // access flags 32 class junk/factorial { // compiled from: Factorial.java // access flags 0 <init>()v L0 (0) ALOAD 0 INVOKESPECIAL java/lang/object.<init>()v RETURN L1 (4)... 33

Factorial.class Disassembled public fac(i)i L0 (0) ICONST_1 ISTORE 2 L1 (3) ICONST_2 ISTORE 3 L2 (6) GOTO L3 L4 (8) ILOAD 2 ILOAD 3 IMUL ISTORE 2... }... L5 (14) IINC 3 1 L3 (16) ILOAD 3 ILOAD 1 IF_ICMPLT L4 L6 (21) ILOAD 2 IRETURN L7 (24) 34