JoeQ Framework CS243, Winter 20156

Similar documents
Joeq Analysis Framework. CS 243, Winter

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

CS243 Homework 2. Winter Due: January 31, 2018

Compiling Techniques

Translating JVM Code to MIPS Code 1 / 43

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Chapter 5. A Closer Look at Instruction Set Architectures

Compiler construction 2009

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

02 B The Java Virtual Machine

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Chapter 5. A Closer Look at Instruction Set Architectures

Compiler construction 2009

Code Generation Introduction

Recap: Printing Trees into Bytecodes

Tutorial 3: Code Generation

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008

COMP3131/9102: Programming Languages and Compilers

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

An Introduction to Multicodes. Ben Stephenson Department of Computer Science University of Western Ontario

Shared Mutable State SWEN-220

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

CSC 4181 Handout : JVM

CMSC430 Spring 2014 Midterm 2 Solutions

Java byte code verification

Over-view. CSc Java programs. Java programs. Logging on, and logging o. Slides by Michael Weeks Copyright Unix basics. javac.

Topics. Structured Computer Organization. Assembly language. IJVM instruction set. Mic-1 simulator programming

Java: framework overview and in-the-small features

The Java Virtual Machine

Run-time Program Management. Hwansoo Han

Static Program Analysis

JAM 16: The Instruction Set & Sample Programs

Exercise 7 Bytecode Verification self-study exercise sheet

Java Class Loading and Bytecode Verification

Course Overview. Levels of Programming Languages. Compilers and other translators. Tombstone Diagrams. Syntax Specification

CSc 453 Interpreters & Interpretation

Java Security. Compiler. Compiler. Hardware. Interpreter. The virtual machine principle: Abstract Machine Code. Source Code

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

Plan for Today. Safe Programming Languages. What is a secure programming language?

CSE 431S Final Review. Washington University Spring 2013

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

The Java Virtual Machine. CSc 553. Principles of Compilation. 3 : The Java VM. Department of Computer Science University of Arizona

Under the Hood: The Java Virtual Machine. Problem: Too Many Platforms! Compiling for Different Platforms. Compiling for Different Platforms

Java Instrumentation for Dynamic Analysis

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points

Improving Java Code Performance. Make your Java/Dalvik VM happier

Problem: Too Many Platforms!

INDEX. A SIMPLE JAVA PROGRAM Class Declaration The Main Line. The Line Contains Three Keywords The Output Line

Parsing Scheme (+ (* 2 3) 1) * 1

javac 29: pop 30: iconst_0 31: istore_3 32: jsr [label_51]

CSc 372 Comparative Programming Languages. Getting started... Getting started. B: Java Bytecode BCEL

Project Compiler. CS031 TA Help Session November 28, 2011

Running class Timing on Java HotSpot VM, 1

Intermediate Representations

Improving Java Performance

CS2110 Fall 2011 Lecture 25. Under the Hood: The Java Virtual Machine, Part II

Java Primer 1: Types, Classes and Operators

Introduction to Programming Using Java (98-388)

CSc 620 Debugging, Profiling, Tracing, and Visualizing Programs. Getting started... Getting started. 2 : Java Bytecode BCEL

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 16

Compiling for Different Platforms. Problem: Too Many Platforms! Dream: Platform Independence. Java Platform 5/3/2011

CSc 620 Debugging, Profiling, Tracing, and Visualizing Programs

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Genetic Programming in the Wild:

Compilation 2012 Code Generation

Introduction Basic elements of Java

Implementing on Object-Oriented Language 2.14

Oak Intermediate Bytecodes

Intermediate Representations

INTERMEDIATE REPRESENTATIONS RTL EXAMPLE

Lecture 2. COMP1406/1006 (the Java course) Fall M. Jason Hinek Carleton University

Part VII : Code Generation

Intermediate representation

Soot: a framework for analysis and optimization of Java

Principles of Computer Architecture. Chapter 5: Languages and the Machine

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

CMPSC 497: Java Security

The Java Virtual Machine

Informatik II (D-ITET)

Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

Goals. Java - An Introduction. Java is Compiled and Interpreted. Architecture Neutral & Portable. Compiled Languages. Introduction to Java

Intermediate Representations Part II

A Quantitative Analysis of Java Bytecode Sequences

CSCE 314 Programming Languages

CS 11 java track: lecture 1

Informatik II (D-ITET)

Course Administration

The Hardware/Software Interface CSE351 Spring 2015

Intermediate Code Generation

Intermediate Representation (IR)

Problem with Scanning an Infix Expression

CS 536 Spring Programming Assignment 5 CSX Code Generator. Due: Friday, May 6, 2016 Not accepted after Midnight, Tuesday, May 10, 2016

Let s make some Marc R. Hoffmann Eclipse Summit Europe

Mnemonics Type-Safe Bytecode Generation in Scala

Transcription:

Overview Java Intermediate representation JoeQ Framework CS243, Winter 20156 Bytecode JoeQ Framework Quads: Instruction set used in JoeQ JoeQ constructs Writing analysis in JoeQ HW 2 Typical Compiler Infrastructure Java Front-end Middle-end Back-end Parsing Machine- Independent optimizations Machine- Dependent Optimizations Front-end Parsing Middle-end Machine- Independent optimizations Back-end Machine- Dependent Optimizations Java compiler (javac) JVM (java)

Java Source Code Input to the Java Front-end A very rich representation Good for reading and writing (by human) Hard to analyze (by computer) Many high-level concepts with no hardware counterparts classes, generics, virtual function calls, exceptions, structured control flow, locks, etc. Java Bytecode Machine-independent intermediate representation (.class files) Coarse program structure is still maintained One file per class A section per method or field Each method has a bytecode sequence for its implementation Still high level Virtual methods, locks, etc. Bytecode representation What is a stack machine model? Each bytecode instruction uses one byte Instructions may have additional operands, stored immediately after the instruction Variables stored in abstract registers r0 is this r1,... are parameters followed by locals Stack machine model All intermediate values stored on a stack Each instruction pushes or pops values onto a stack. Eg : x = y + 10 push y push 10 add pop x

What is a stack machine model? What is a stack machine model? Each instruction pushes or pops values onto a stack. Eg : Each instruction pushes or pops values onto a stack. Eg : x = y + 10 x = y + 10 push y push 10 add y push y push 10 add 10 y pop x pop x What is a stack machine model? What is a stack machine model? Each instruction pushes or pops values onto a stack. Eg : x = y + 10 push y push 10 add pop x y+10 *add - Implicitly pops the top 2 operands and pushes result Each instruction pushes or pops values onto a stack. Eg : x = y + 10 push y push 10 add pop x Top of stack is popped and the value is assigned to x.

Bytecode Instructions An example Each instruction is prefixed by the types of operands. iload_1 class ExprTest { int test(int a) { > javac ExprTest.java > javap c ExprTest Load the first parameter or local variable and push it on the stack bipush <n> Push byte constant "n" onto the stack as an integer value iadd Add the top two values on the stack, and push the result back onto the stack istore_2 Pop the stack and store its value into the second param/local int b, c, d, e, f; c = a + 10; f = a + c; if (f > 2) { f = f - c; return f; class ExprTest extends java.lang.object { ExprTest(): Code:... int test(int): Code:... ExprTest.ExprTest() ExprTest.test(int a) ExprTest(): Code: 0: aload_0 // load address 'this' and push it onto the stack 1: invokespecial #1 // invokes base class methods. #1 is constructor 2: return class ExprTest { int test(int a) { int b, c, d, e, f; c = a + 10; f = a + c; if (f > 2) { f = f - c; return f; int test(int) Code: 0: iload_1 1: bipush 10 3: iadd 4: istore_3 5: iload_1 6: iload_3 7: iadd 8: istore 6 10: iload 6 12: iconst_2 13: if_icmple 22 16: iload 6 18: iload_3 19: isub 20: istore 6 22: iload 6 24: ireturn

JoeQ Compiler framework for analyzing and optimizing Java bytecode Developed by John Whaley and others Implemented in Java Research project infrastructure: 10+ papers rely on Joeq Also see: http://joeq.sourceforge.net JoeQ Intermediate Representation High-level representation: Joeq has classes for each component of the Java.class file jq_type, jq_class, jq_field, jq_method, We use a lower-level representation called Quad. Joeq translates bytecodes into quads. Etymology jyo-kyu- like the name Joe and the letter Q Quads Stanford version of the TAC (Three address code) One operator and up to four operands: four-address instructions joeq.compiler.quad.quad Register machine model, not stack machine model All temporary data stored in registers Closer to (RISC) machine instructions than a stack model: Joeq is lower-level IR than Java Bytecode Register machine model This mode assumes an unbounded number of pseudo registers. Pseudo registers hold local variables of a method, as well as temporary variables All data must first be loaded into pseudo registers before they can be operated on. More conducive to program optimization than the stack architecture.

JoeQ Operands (joeq.compiler.quad.operands) Register Operand Abstract registers representing parameters, local variables, and temporal variables Constant Operand int/float/string/etc. Constants Target Operand Basic block target of a branch instruction Method Operand, ParamListOperand Target and arguments to a method call Field Operand, TypeOperand,... JoeQ Operators (joeq.compiler.quad.operator) Operator.Move Operator.Unary, Operator.Binary Operator.Invoke Operator.Branch, Operator.GetField, Operator.PutField, Operator.New, Have suffixes indicating return type ADD_I adds two integers L, F, D, A, and V refer to long, float, double, reference, and void JoeQ Runtime Checking Operators Runtime checks are explicit quads Not implicit as in bytecodes: Joeq is lower-level IR than Java Bytecode Operator.NullCheck JoeQ CFGs (joeq.compiler.quad.controlflowgraph) Graphs of basic blocks with entry and exit Entry and exit basic blocks always exist. They are empty. Operator.BoundsCheck Operator.CheckCast, Operator.StoreCheck,...

JoeQ Basic Blocks (joeq.compiler.quad.basicblock) Lists of quads Provide access to successors and predecessors Exception control flow is not explicit in Joeq basic blocks An exception can jump out of the middle of a basic block You do not need to consider exceptions in this class class ExprTest { int test(int a) { ExprTest int b, c, d, e, f; c = a + 10; f = a + c; if (f > 2) { f = f - c; return f; > javac ExprTest.java > java PrintQuads ExprTest Class: ExprTest Control flow graph for ExprTest.<init> ()V:... Control flow graph for... ExprTest.test (I)I: ExprTest.ExprTest() ExprTest.test(int a) Code: 0: aload_0 // load address 'this' and push it onto the stack 1: invokespecial #1 // invokes base class methods. #1 is constructor 2: return BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB1 (EXIT)) 1 NULL_CHECK T-1 <g>, R0 ExprTest 2 INVOKESPECIAL_V% java.lang.object.<init>()v, (R0 ExprTest) 3 RETURN_V BB1 (EXIT) (in: BB2, out: <none>) Bytecode Quads class ExprTest { int test(int a) { int b, c, d, e, f; c = a + 10; f = a + c; if (f > 2) { f = f - c; return f; BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB3, BB4) 1 ADD_I T2 int, R1 int, IConst: 10 2 MOVE_I R3 int, T2 int 3 ADD_I T2 int, R1 int, R3 int 4 MOVE_I R4 int, T2 int 5 IFCMP_I R4 int, IConst: 2, LE, BB4 BB3 (in: BB2, out: BB4) 6 SUB_I T2 int, R4 int, R3 int 7 MOVE_I R4 int, T2 int BB4 (in: BB2, BB3, out: BB1 (EXIT)) 8 RETURN_I R4 int BB1 (EXIT) (in: BB4, out: <none>)

ExprTest.test(int a) Writing analysis with JoeQ 0: iload_1 1: bipush 10 3: iadd 4: istore_3 5: iload_1 6: iload_3 7: iadd 8: istore 6 10: iload 6 12: iconst_2 13: if_icmple 22 16: iload 6 18: iload_3 19: isub 20: istore 6 22: iload 6 24: ireturn BB0 (ENTRY) (in: <none>, out: BB2) BB2 (in: BB0 (ENTRY), out: BB3, BB4) 1 ADD_I T2 int, R1 int, IConst: 10 2 MOVE_I R3 int, T2 int 3 ADD_I T2 int, R1 int, R3 int 4 MOVE_I R4 int, T2 int 5 IFCMP_I R4 int, IConst: 2, LE, BB4 BB3 (in: BB2, out: BB4) 6 SUB_I T2 int, R4 int, R3 int 7 MOVE_I R4 int, T2 int BB4 (in: BB2, BB3, out: BB1 (EXIT)) 8 RETURN_I R4 int BB1 (EXIT) (in: BB4, out: <none>) Often can be written with visitors Traverse all the loaded CFGs, or all the quads in those CFGs ControlFlowGraphVisitor: interface for an analysis which makes a pass over the CFGs QuadVisitor: interface for an analysis which makes a single pass over the quads in a CFG QuadCounter and LoadStoreCounter Visitor Design Pattern public class QuadCounter extends QuadVisitor.EmptyVisitor { public int count = 0; public void visitquad(quad q) { count++; public class LoadStoreCounter extends QuadVisitor.EmptyVisitor { public int loadcount = 0, storecount = 0; public void visitload(quad q) { loadcount++; public void visitstore(quad q) { storecount++; Add an operation to existing objects without modifying the structure of objects Operations: variable, objects: fixed Control flow graph analysis 1, 2, Add methods analysis1, analysis2,... to ControlFlowGraph class vs. Analysis1CfgVisitor, Analysis2CfgVisitor, In compiler, representation is (almost) fixed. Adding analysis/transformation should be flexible.

Joeq.Main.Helper class A clean interface to the complexities of Joeq (Façade design pattern) runpass(cfg or quad, visitor) runs a ControlFlowGraphVisitor/QuadVisitor over ControlFlowGraphs/Quads. Helper Class Usage Example: QuadCounter class CountQuads { public static void main(string[] args) { jq_class[] classes = new jq_class[args.length]; for (String classname : args) { jq_class c = (jq_class)helper.load(classname); System.out.println( Class: + classname); QuadCounter qc = new QuadCounter(); Helper.runPass(c, qc); System.out.println(className + has + qc.count + quads ); QuadIterator Java Beginners An alternative to visitors Simple interface to iterate through all the quads in a reverse post-order Extends java.util.iterator<quad> Java collections library List, Set, Map, Java Generics ControlFlowGraph cfg =... QuadIterator iter = new QuadIterator(cfg); while (iter.hasnext()) { Quad quad = iter.next(); if (quad.getoperator() instanceof Operator.Invoke) { dosomething(cfg.getmethod(), quad); Inner Class Make yourself familiar with these concepts

HW2 : Dataflow Framework More Details We provide Solver interface: Flow.Solver Analysis interface: Flow.Analysis Two analysis that extend Flow.Analysis: ConstantProp and Liveness Solver MySolver Due: Next Friday (1/30), 5PM, PST joeq.jar is provided Unjar this if you want to look at Joeq source code. Using Eclipse may make your life easier. Working with groups of two is encouraged. Output must match ours on Stanford Linux clusters such as myth. Goal is to complete Skeleton MySolver that extends Flow.Solver Should work with ConstantProp and Liveness Skeleton ReachingDefs that extends Flow.Analysis Analysis Get started early. It make take a long time to understand the Joeq framework, although you need to know only small part of it. Post questions on Piazzza, so that we can answer them the session on Friday. Similarly Faintness analysis Const. Prop. Liveness Reaching Defs