Running class Timing on Java HotSpot VM, 1

Similar documents
Compiler construction 2009

COMS W4115 Programming Languages and Translators Lecture 21: Code Optimization April 15, 2013

Just-In-Time Compilation

Compiling Techniques

Java Performance Tuning

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Compiler construction 2009

Just In Time Compilation

Translating JVM Code to MIPS Code 1 / 43

Comp 204: Computer Systems and Their Implementation. Lecture 22: Code Generation and Optimisation

CSE443 Compilers. Dr. Carl Alphonce 343 Davis Hall

High Performance Managed Languages. Martin Thompson

Code Generation. Frédéric Haziza Spring Department of Computer Systems Uppsala University

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Taming the Java Virtual Machine. Li Haoyi, Chicago Scala Meetup, 19 Apr 2017

7. Optimization! Prof. O. Nierstrasz! Lecture notes by Marcus Denker!

CSc 453 Interpreters & Interpretation

A JVM Does What? Eva Andreasson Product Manager, Azul Systems

Garbage Collection. Akim D le, Etienne Renault, Roland Levillain. May 15, CCMP2 Garbage Collection May 15, / 35

JVM Memory Model and GC

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Lecture 15 Garbage Collection

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

CMSC 330: Organization of Programming Languages

An Introduction to Multicodes. Ben Stephenson Department of Computer Science University of Western Ontario

Compiler construction Compiler Construction Course aims. Why learn to write a compiler? Lecture 1

Just-In-Time Compilers & Runtime Optimizers

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

Garbage Collection. Hwansoo Han

CS153: Compilers Lecture 15: Local Optimization

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Exploiting the Behavior of Generational Garbage Collector

Course introduction. Advanced Compiler Construction Michel Schinz

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Use of profilers for studying Java dynamic optimizations

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Runtime. The optimized program is ready to run What sorts of facilities are available at runtime

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Managed runtimes & garbage collection

Context Threading: A flexible and efficient dispatch technique for virtual machine interpreters

CMSC 330: Organization of Programming Languages

Java: framework overview and in-the-small features

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Intermediate Code & Local Optimizations. Lecture 20

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

CA341 - Comparative Programming Languages

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

High Performance Managed Languages. Martin Thompson

2011 Oracle Corporation and Affiliates. Do not re-distribute!

USC 227 Office hours: 3-4 Monday and Wednesday CS553 Lecture 1 Introduction 4

Usually, target code is semantically equivalent to source code, but not always!

Compact and Efficient Strings for Java

Java Performance: The Definitive Guide

Lecture Outline. Intermediate code Intermediate Code & Local Optimizations. Local optimizations. Lecture 14. Next time: global optimizations

CMSC 330: Organization of Programming Languages. Memory Management and Garbage Collection

Improving Java Code Performance. Make your Java/Dalvik VM happier

Debugging Your Production JVM TM Machine

Improving Java Performance

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Group B Assignment 8. Title of Assignment: Problem Definition: Code optimization using DAG Perquisite: Lex, Yacc, Compiler Construction

Tour of common optimizations

Intermediate Code & Local Optimizations

Design Issues. Subroutines and Control Abstraction. Subroutines and Control Abstraction. CSC 4101: Programming Languages 1. Textbook, Chapter 8

Towards High Performance Processing in Modern Java-based Control Systems. Marek Misiowiec Wojciech Buczak, Mark Buttner CERN ICalepcs 2011

Azul Systems, Inc.

Intermediate representation

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 16

CS577 Modern Language Processors. Spring 2018 Lecture Optimization

JVM Troubleshooting MOOC: Troubleshooting Memory Issues in Java Applications

Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

High-Level Language VMs

Truffle A language implementation framework

The Fundamentals of JVM Tuning

New Java performance developments: compilation and garbage collection

Lecture 13: Garbage Collection

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

02 B The Java Virtual Machine

THE TROUBLE WITH MEMORY

Exercise 7 Bytecode Verification self-study exercise sheet

<Insert Picture Here> Symmetric multilanguage VM architecture: Running Java and JavaScript in Shared Environment on a Mobile Phone

Code optimization. Have we achieved optimal code? Impossible to answer! We make improvements to the code. Aim: faster code and/or less space

MODEL ANSWERS COMP36512, May 2016

HydraVM: Mohamed M. Saad Mohamed Mohamedin, and Binoy Ravindran. Hot Topics in Parallelism (HotPar '12), Berkeley, CA

Automatic Memory Management

Contents. 8-1 Copyright (c) N. Afshartous

Register Allocation. CS 502 Lecture 14 11/25/08

Operational Semantics of Cool

Habanero Extreme Scale Software Research Project

Java On Steroids: Sun s High-Performance Java Implementation. History

Compiler Optimization Techniques

One-Slide Summary. Lecture Outine. Automatic Memory Management #1. Why Automatic Memory Management? Garbage Collection.

Run-time Environments

Introduction to Code Optimization. Lecture 36: Local Optimization. Basic Blocks. Basic-Block Example

Coping with Immutable Data in a JVM for Embedded Real-Time Systems. Christoph Erhardt, Simon Kuhnle, Isabella Stilkerich, Wolfgang Schröder-Preikschat

Run-time Environments

Optimization Prof. James L. Frankel Harvard University

Transcription:

Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return s * x; Questions Why doesn t javac produce better code? How would you do to generate good code? Code generated by javac.method public static f(i)i.limit locals 3.limit stack 2 iconst_3 iconst_5 Measuring Java execution time Running class Timing on Java HotSpot VM, 1 public class Timing { public static void main (String [] args) { Server VM time time=6548 for (int n = 0; n < 100; n++) { long start = System.nanoTime(); sum(300); long stop = System.nanoTime(); System.out.println (n+": "+(stop-start)/1000); 40 n public static int sum (int n) { if (n <= 1) return 1; else return n + sum (n-1); Comment After 10000 method calls to sum using the interpreter, the VM decides to invest in optimising compilation of method sum. This reduces execution time of future calls of sum(300) by 90 %.

Running class Timing on Java HotSpot VM, 2 Java HotSpot VM Client VM 40 time time=315 Two versions Java HotSpot VM uses JIT compilation and comes in two versions. Server VM. Focuses on overall performance. Default on server-class machines. Client VM. Focuses on short startup time and small footprint. Default on smaller machines. Core VM Same; only compilers (from bytecode to machine code) different. Comment After 1500 method calls to sum using the interpreter, the VM decides to invest in (a less optimising) compilation. This gives same reduction of execution time. n Recently, major progress in making locking more efficient. Garbage collection strategies, heap sizes, etc can be tuned. A surprising (?) fact Java HotSpot VM is written in C++. A Jasmin example: fact The control flow graph.method public static fact(i)i.limit locals 3.limit stack 3 Label0: if_icmplt Label2 goto Label3 Label2: Label3: ifeq Label1 iinc 1 1 goto Label0 Label1: iinc 1 1 if_icmplt ifeq Comments Code split into basic blocks. Control flow visualised by edges. Optimization simpler within a basic block.

Java HotSpot client compiler JIT compilation General structure Front end. From bytecode to HIR (High level Intermediate). HIR is SSA-based, control-flow graph representation of bytecode. Some code optimizations: copy propagation, common subexpression elimination, constant folding, inlining. (We will discuss these algorithms after Easter). Method call optimization (static calls instead of dynamic). Back end. From HIR to LIR, then to machine code. Simple but good register allocation (after Easter). Peephole optimization (in a few slides). The 80/20 rule 80 % of the time is spent running 20 % of the code. (Some say that the correct version is the 90/10 rule.) Conclusion: Spend the cost of compilation on the hot code only. java javac "Compile time" bytecode Front end HotSpot VM Client compiler HIR "Runtime" Back end LIR machine code Challenges for Java JIT compilers Java HotSpot server compiler Long-running loops: Need to change from interpreted to compiled code during execution. These have different stack layout, so must change stack frame when changing to compiled code. Deoptimization: Class loading may invalidate compiling assumptions; e.g. some method call cannot be determined statically. Need to go back to interpretation. Back to old stack representation; e.g. must add stack frames for inlined methods. Basic features Adds many more optimizations (discussed after Easter). Another, SSA-based intermediate representation. Phases: parsing, machine-independent optimization, instruction selection, code motion, register allocation, peephole optimization, code emission. Further improvements Start with interpretation. When code deemed hot, perform client compilation. When red hot, perform server compilation for cruising speed.

Garbage collection for Java, 1 Garbage collection for Java, 2 HotSpot collection Heap divided into three generations: Young generation. Newly created objects. Old generation. Objects that have survived a number of collections are promoted here. Permanent generation. Internal, heap-allocated data strucures (not collected). Allocation uses separate allocation buffer per thread. Fast/slow paths. Young generation. Three areas: Eden, where objects are created, and two alternating survivor spaces. Whenever Eden is filled, stop-and-copy collection from Eden and active survivor space to other survivor space. Old generation. Default is mark-and-sweep, stop-the-world collector. Current trends Continued rapid progress. Parallel collectors, for shorter pause times and more efficient use of multiple processors, are becoming available. Major conclusion Garbage collectors in modern JVM:s manage memory more efficiently than you can do it explicitly. Some consequences for the Java programmer Preview of code optimization: A.f Don t use public instance variables; add set/get methods. There is no runtime overhead; these calls are inlined. Don t make classes final for performance. Client compiler will do this analysis for you. Don t try memory (de-)allocation yourself. Allocation is inlined and fast; your deallocation will not be better than GC. Don t use exception handling for control flow. Exception objects expensive; but no cost of exception handling when not used. Source code public static int f (int x) { int r = 3; int s = r + 5; return s * x; Resulting code.method public static f(i)i.limit locals 1.limit stack 2 iconst_3 ishl Observations r is initialized to 3 and never assigned to. Hence we can replace all uses by 3 and remove r. The expression 3 + 5 is computed by the compiler. s is also constant and can be replaced by 8. Multiplication by 8 is more efficiently done as left shift 3 positions. We need algorithms to do this, not hand-waving!

Peephole optimization A simple idea Look at small sequences of instructions to find possibilities for improvement. Can be iterated (fixpoint iteration), since one optimization may open new possibilities. Use a suitable list type for your code (i.e. one that allows for fast deletion, insertion and reordering). Easiest with pattern matching in Haskell. More peephole optimization examples Strength reduction Replace an expensive operation (left) by a cheaper one (right) bipush 16 iconst_4 ishl ldc2 w 2.0 dup2 dmul dadd Example (Constant folding) bipush 7 bipush 5 can be replaced by just bipush 12. Algebraic simplification iconst 0 iconst 0 pop Unreachable code Next week Java disallows unreachable code Unreachable (or dead) code, i.e. code that can never be executed, is disallowed in Java. However, to find all instances of dead code is an undecidable problem. The languange specification defines a conservative approximation. Peephole optimization can find some instances: Code after a goto and before next label, Code in a branch of an if or while statement with constant condition. Also other jump-related optimizations, like jumping to the next instruction. Lectures discuss LLVM and code generation. Problem session on Monday on JVM/Jasmin. Optimization techniques deferred to after Easter. Deadline for submission A on Friday 17.00.