Lecture 9 Dynamic Compilation

Similar documents
Azul Systems, Inc.

Trace-based JIT Compilation

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Just-In-Time Compilation

Performance Profiling

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Just-In-Time Compilers & Runtime Optimizers

Virtual Machines. 2 Disco: Running Commodity Operating Systems on Scalable Multiprocessors([1])

Dynamic Dispatch and Duck Typing. L25: Modern Compiler Design

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Profiling & Optimization

A program execution is memory safe so long as memory access errors never occur:

Office Hours: Mon/Wed 3:30-4:30 GDC Office Hours: Tue 3:30-4:30 Thu 3:30-4:30 GDC 5.

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

Profiling & Optimization

Method-Level Phase Behavior in Java Workloads

Announcements. Reading. Project #1 due in 1 week at 5:00 pm Scheduling Chapter 6 (6 th ed) or Chapter 5 (8 th ed) CMSC 412 S14 (lect 5)

Compilers. Lecture 2 Overview. (original slides by Sam


Hierarchical PLABs, CLABs, TLABs in Hotspot

Kampala August, Agner Fog

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Topics on Compilers Spring Semester Christine Wagner 2011/04/13

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Running class Timing on Java HotSpot VM, 1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Virtual Memory COMPSCI 386

CSc 453 Interpreters & Interpretation

GPU Fundamentals Jeff Larkin November 14, 2016

Remote Procedure Call (RPC) and Transparency

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Inside Out: A Modern Virtual Machine Revealed

CSE 504: Compiler Design. Runtime Environments

Habanero Extreme Scale Software Research Project

2 TEST: A Tracer for Extracting Speculative Threads

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective. Part I: Operating system overview: Processes and threads

Processes, Threads and Processors

Lecture 4: MIPS Instruction Set

Compiler construction 2009

SafeDispatch Securing C++ Virtual Calls from Memory Corruption Attacks by Jang, Dongseok and Tatlock, Zachary and Lerner, Sorin

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Compositional Pointer and Escape Analysis for Java Programs

Overview of the ECE Computer Software Curriculum. David O Hallaron Associate Professor of ECE and CS Carnegie Mellon University

CS553 Lecture Dynamic Optimizations 2

Memory Management. Goals of Memory Management. Mechanism. Policies

Memory Management. Dr. Yingwu Zhu

C++ for System Developers with Design Pattern

Virtual Machine Design

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Processes. CS 416: Operating Systems Design, Spring 2011 Department of Computer Science Rutgers University

Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02)

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Shenandoah: Theory and Practice. Christine Flood Roman Kennke Principal Software Engineers Red Hat

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)

THE TROUBLE WITH MEMORY

Research on the Novel and Efficient Mechanism of Exception Handling Techniques for Java. Xiaoqing Lv 1 1 huihua College Of Hebei Normal University,

OOPLs - call graph construction. Example executed calls

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Kasper Lund, Software engineer at Google. Crankshaft. Turbocharging the next generation of web applications

Program Optimization

Chapter 8: Virtual Memory. Operating System Concepts

Last class: OS and Architecture. OS and Computer Architecture

Last class: OS and Architecture. Chapter 3: Operating-System Structures. OS and Computer Architecture. Common System Components

Pipelining, Branch Prediction, Trends

CPSC 341 OS & Networks. Introduction. Dr. Yingwu Zhu

CS Programming In C

Heuristics for Profile-driven Method- level Speculative Parallelization

Name, Scope, and Binding. Outline [1]

Chapter 9: Virtual Memory. Operating System Concepts 9 th Edition

Learning from Executions

Java Performance: The Definitive Guide

Operating Systems CMPSCI 377, Lec 2 Intro to C/C++ Prashant Shenoy University of Massachusetts Amherst

Portland State University ECE 587/687. Virtual Memory and Virtualization

Systems I: Programming Abstractions

Interprocedural Analysis with Data-Dependent Calls. Circularity dilemma. A solution: optimistic iterative analysis. Example

Building a Multi-Language Interpreter Engine

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Separate compilation. Topic 6: Runtime Environments p.1/21. CS 526 Topic 6: Runtime Environments The linkage convention

JOVE. An Optimizing Compiler for Java. Allen Wirfs-Brock Instantiations Inc.

Advanced Computer Architecture

How to Sandbox IIS Automatically without 0 False Positive and Negative

Windows Java address space

CS370 Operating Systems

EEC 170 Computer Architecture Fall Cache Introduction Review. Review: The Memory Hierarchy. The Memory Hierarchy: Why Does it Work?

Dynamic Memory Scheduling

Towards automated detection of buffer overrun vulnerabilities: a first step. NDSS 2000 Feb 3, 2000

CS533 Concepts of Operating Systems. Jonathan Walpole

THE ROAD NOT TAKEN. Estimating Path Execution Frequency Statically. ICSE 2009 Vancouver, BC. Ray Buse Wes Weimer

CSE 374 Programming Concepts & Tools. Brandon Myers Winter 2015 Lecture 11 gdb and Debugging (Thanks to Hal Perkins)

Lecture 1 Introduction (Chapter 1 of Textbook)

CS370 Operating Systems

Algorithm Performance Factors. Memory Performance of Algorithms. Processor-Memory Performance Gap. Moore s Law. Program Model of Memory II

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

New Java performance developments: compilation and garbage collection

SRAMs to Memory. Memory Hierarchy. Locality. Low Power VLSI System Design Lecture 10: Low Power Memory Design

Transcription:

Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method Compilation Using Dynamic Profile Information, John Whaley, OOPSLA 01 M. Lam & J. Whaley

I. Goals of This Lecture Beyond static compilation Example of a complete system Use of data flow techniques in a new context Experimental approach

Static/Dynamic Compiler: high-level binary, static Interpreter: high-level, emulate, dynamic Dynamic compilation: high-level binary, dynamic machine-independent, dynamic loading cross-module optimization Specialize program using runtime information (without profiling)

High-Level/Binary Binary translator: Binary-binary; mostly dynamic Run as-is Software migration (x86 alpha, sun, transmeta; 68000 powerpc x86) Virtualization (make hardware virtualizable) Dynamic optimization (Dynamo Rio) Security (execute out of code in a cache that is protected )

Closed-world vs. Open-world Closed-world assumption (most static compilers) All code is available a priori for analysis and compilation. Open-world assumption (most dynamic compilers) Code is not available; arbitrary code can be loaded at run time. Open-world assumption precludes many optimization opportunities. Solution: Optimistically assume the best case, but provide a way out if necessary.

II. Overview in Dynamic Compilation Interpretation/Compilation policy decisions Choosing what and how to compile Collecting runtime information Instrumentation Sampling Exploiting runtime information frequently-executed code paths

Speculative Inlining Virtual call sites are deadly. Kill optimization opportunities Virtual dispatch is expensive on modern CPUs Very common in object-oriented code Speculatively inline the most likely call target based on class hierarchy or profile information. Many virtual call sites have only one target, so this technique is very effective in practice.

III. Compilation Policy T total = T compile (n executions * T improvement ) If T total is negative, our compilation policy decision was effective. We can try to: Reduce T compile (faster compile times) Increase T improvement (generate better code) Focus on large n executions (compile hot spots) 80/20 rule: Pareto Principle 20% of the work for 80% of the advantage

Latency vs. Throughput Tradeoff: startup speed vs. execution performance Startup speed Interpreter Best Poor Quick compiler Fair Fair Optimizing compiler Poor Best Execution performance

Multi-Stage Dynamic Compilation System Stage 1: interpreted code when execution count = t1 (e.g. 2000) Stage 2: compiled code Execution count is the sum of method invocations & back edges executed. when execution count = t2 (e.g. 25000) Stage 3: fully optimized code

Granularity of Compilation Compilation takes time proportional to the amount of code being compiled. Many optimizations are not linear. Methods can be large, especially after inlining. Cutting inlining too much hurts performance considerably. Even hot methods typically contain some code that is rarely or never executed.

Example: SpecJVM db Hot loop void read_db(string fn) { int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act!= n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } }

Example: SpecJVM db void read_db(string fn) { int n = 0, act = 0; byte buffer[] = null; try { FileInputStream sif = new FileInputStream(fn); buffer = new byte[n]; while ((b = sif.read(buffer, act, n-act))>0) { act = act + b; } sif.close(); if (act!= n) { /* lots of error handling code, rare */ } } catch (IOException ioe) { /* lots of error handling code, rare */ } } Lots of rare code!

Optimize hot regions, not methods Optimize only the most frequently executed segments within a method. Simple technique: any basic block executed during Stage 2 is said to be hot. Beneficial secondary effect of improving optimization opportunities on the common paths.

Method-at-a-time Strategy of basic blocks compiled execution threshold

Actual Basic Blocks Executed of basic blocks execution threshold

Dynamic Code Transformations Compiling partial methods Partial dead code elimination Escape analysis

IV. Partial Method Compilation 1. Based on profile data, determine the set of rare blocks. Use code coverage information from the first compiled version

Partial Method Compilation 2. Perform live variable analysis. Determine the set of live variables at rare block entry points. live: x,y,z

Partial Method Compilation 3. Redirect the control flow edges that targeted rare blocks, and remove the rare blocks. to interpreter

Partial Method Compilation 4. Perform compilation normally. Analyses treat the interpreter transfer point as an unanalyzable method call.

Partial Method Compilation 5. Record a map for each interpreter transfer point. In code generation, generate a map that specifies the location, in registers or memory, of each of the live variables. Maps are typically < 100 bytes live: x,y,z x: sp - 4 y: R1 z: sp - 8

V. Partial Dead Code Elimination Move computation that is only live on a rare path into the rare block, saving computation in the common case.

Partial Dead Code Example x = 0; if (rare branch 1){... } z = x + y;... if (rare branch 2){... } a = x + z;... if (rare branch 1) { } x = 0;... z = x + y;... if (rare branch 2) { x = 0; }... a = x + z;...

VI. Escape Analysis Escape analysis finds objects that do not escape a method or a thread. Captured by method: can be allocated on the stack or in registers. Captured by thread: can avoid synchronization operations. All Java objects are normally heap allocated, so this is a big win.

Escape Analysis Stack allocate objects that don t escape in the common blocks. Eliminate synchronization on objects that don t escape the common blocks. If a branch to a rare block is taken: Copy stack-allocated objects to the heap and update pointers. Reapply eliminated synchronizations.

VII. Run Time Improvement First bar: original (Whole method opt) Second bar: Partial Method Comp (PMC) Third bar: PMC + opts Bottom bar: Execution time if code was compiled/opt. from the beginning