Taming the Java Virtual Machine. Li Haoyi, Chicago Scala Meetup, 19 Apr 2017

Similar documents
Compiler construction 2009

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

Running class Timing on Java HotSpot VM, 1

Improving Java Performance

JVML Instruction Set. How to get more than 256 local variables! Method Calls. Example. Method Calls

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

Improving Java Code Performance. Make your Java/Dalvik VM happier

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Run-time Program Management. Hwansoo Han

Compiling Techniques

Compiler construction 2009

Code Profiling. CSE260, Computer Science B: Honors Stony Brook University

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Under the Hood: The Java Virtual Machine. Problem: Too Many Platforms! Compiling for Different Platforms. Compiling for Different Platforms

Under the Hood: The Java Virtual Machine. Lecture 23 CS2110 Fall 2008

SOFTWARE ARCHITECTURE 7. JAVA VIRTUAL MACHINE

A JVM Does What? Eva Andreasson Product Manager, Azul Systems

JVM Memory Model and GC

Introduction to Java

Java Performance Tuning

The Java Virtual Machine. CSc 553. Principles of Compilation. 3 : The Java VM. Department of Computer Science University of Arizona

Garbage Collection. Hwansoo Han

CSC 4181 Handout : JVM

Space Exploration EECS /25

Mnemonics Type-Safe Bytecode Generation in Scala

Let s make some Marc R. Hoffmann Eclipse Summit Europe

Static Analysis of Dynamic Languages. Jennifer Strater

Code Generation Introduction

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

CS2110 Fall 2011 Lecture 25. Under the Hood: The Java Virtual Machine, Part II

Fundamentals of GC Tuning. Charlie Hunt JVM & Performance Junkie

Introduction Basic elements of Java

An Introduction to Multicodes. Ben Stephenson Department of Computer Science University of Western Ontario

Part VII : Code Generation

NG2C: Pretenuring Garbage Collection with Dynamic Generations for HotSpot Big Data Applications

Exploiting the Behavior of Generational Garbage Collector

Dalvik VM Internals. Dan Bornstein Google

CS 231 Data Structures and Algorithms, Fall 2016

Plan for Today. Safe Programming Languages. What is a secure programming language?

Over-view. CSc Java programs. Java programs. Logging on, and logging o. Slides by Michael Weeks Copyright Unix basics. javac.

The Java Virtual Machine

Hardware-Supported Pointer Detection for common Garbage Collections

Compiling Faster, Compiling Better with Falcon. Iván

Programming Language Systems

The G1 GC in JDK 9. Erik Duveblad Senior Member of Technical Staf Oracle JVM GC Team October, 2017

Attila Szegedi, Software

Roadmap. Java: Assembly language: OS: Machine code: Computer system:

Crash Course in Java. Why Java? Java notes for C++ programmers. Network Programming in Java is very different than in C/C++

Just In Time Compilation

Program Dynamic Analysis. Overview

3/15/18. Overview. Program Dynamic Analysis. What is dynamic analysis? [3] Why dynamic analysis? Why dynamic analysis? [3]

COMP3131/9102: Programming Languages and Compilers

CS263: Runtime Systems Lecture: High-level language virtual machines. Part 1 of 2. Chandra Krintz UCSB Computer Science Department

invokedynamic IN 45 MINUTES!!! Wednesday, February 6, 13

<Insert Picture Here> Maxine: A JVM Written in Java

Java Performance Tuning and Optimization Student Guide

Recap: Printing Trees into Bytecodes

invokedynamic under the hood

High-Level Language VMs

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

Exercise 7 Bytecode Verification self-study exercise sheet

Android Internals and the Dalvik VM!

Living in the Matrix with Bytecode Manipulation

Sustainable Memory Use Allocation & (Implicit) Deallocation (mostly in Java)

Delft-Java Link Translation Buffer

The Java Virtual Machine

Compilation 2012 Code Generation

Compiling for Different Platforms. Problem: Too Many Platforms! Dream: Platform Independence. Java Platform 5/3/2011

H.-S. Oh, B.-J. Kim, H.-K. Choi, S.-M. Moon. School of Electrical Engineering and Computer Science Seoul National University, Korea

JAM 16: The Instruction Set & Sample Programs

Java Performance Tuning From A Garbage Collection Perspective. Nagendra Nagarajayya MDE

Delft-Java Dynamic Translation

The Fundamentals of JVM Tuning

Problem: Too Many Platforms!

Advanced Programming Introduction

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Java: framework overview and in-the-small features

Chapter 5. A Closer Look at Instruction Set Architectures. Chapter 5 Objectives. 5.1 Introduction. 5.2 Instruction Formats

Java Without the Jitter

Chapter 5. A Closer Look at Instruction Set Architectures

Copyright 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 16

CSE 431S Final Review. Washington University Spring 2013

BASICS.

: Primitive data types Variables Operators if, if-else do-while, while, for. // // First Java Program. public class Hello {

Scala: Byte-code Fancypants. David Pollak JVM Language Summit 2009

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Habanero Extreme Scale Software Research Project

Certified Core Java Developer VS-1036

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)

CS251L REVIEW Derek Trumbo UNM

Java TM. Multi-Dispatch in the. Virtual Machine: Design and Implementation. Computing Science University of Saskatchewan

Translating JVM Code to MIPS Code 1 / 43

Java byte code verification

Reference Counting. Reference counting: a way to know whether a record has other users

Java & Coherence Simon Cook - Sales Consultant, FMW for Financial Services

Configuring the Heap and Garbage Collector for Real- Time Programming.

02 B The Java Virtual Machine

Transcription:

Taming the Java Virtual Machine Li Haoyi, Chicago Scala Meetup, 19 Apr 2017

Who Am I? Previously: Dropbox Engineering Currently: Bright Technology Services - Data Science, Scala consultancy Fluent Code Explorer, www.fluentcode.com Early contributor to Scala.js, author of Ammonite, FastParse, Scalatags,... Lots of work in Java, Scala, Python, JS,...

The Fluent Code Explorer High-performance code-explorer - Fast, as-you-type search Scalable to repos of tens millions of lines of code Single Process - No distributed system Runs on a single VM

Taming the Java Virtual Machine class Simple{ public static void main(string args[]){ String s = "Hello Java"; int i = 123; System.out.println(s + 123); } }

Taming the Java Virtual Machine class Simple{ public static void main(string args[]){ // How much memory // does this take? // How much memory int i = 123; // does this take? // What happens if I System.out.println(s + 123); // run out of memory? String s = "Hello Java"; } } // What is this JIT Compiler // I keep hearing about? // What s really // happening here?

Implementation Defined?

Taming the Java Virtual Machine Memory Layouts Garbage Collection Compilation

Taming the Java Virtual Machine Memory Layouts - OutOfMemoryError Garbage Collection - Long pauses Compilation - Mysterious performance issues

Taming the Java Virtual Machine Memory Layouts Garbage Collection Compilation

Memory Layouts

Memory Layouts Everything is great if you have enough Everything is terrible if you don t have enough Technically implementation-defined - in practice most people are using OpenJDK/OracleJDK

Memory Demo

Memory Layouts Data type Bytes boolean 1 byte 1 short 2 int 4 long 8 float 4 double 8

Memory Layouts Data type Bytes Data type Bytes boolean 1 Boolean 4 byte 1 Byte 4 short 2 Short 4 + 16 int 4 Int 4 + 16 long 8 Long 4 + 24 float 4 Float 4 + 16 double 8 Double 4 + 24

Memory Layouts Data type Bytes Data type Bytes Data type boolean 1 Boolean 4 byte 1 Byte 4 short 2 Short 4 + 16 int 4 Int 4 + 16 long 8 Long 4 + 24 float 4 Float 4 + 16 double 8 Double 4 + 24 Bytes Array 4 + 16, rounded to next 8 Object 4 + 12 rounded to next 8

Tips for reducing memory usage Use Arrays when dealing with large lists of primitives, instead of java.util.* Use BitSets instead of large Arrays of booleans Use a library to provide unboxed collections (sets, maps, etc.) of primitives: - FastUtil: http://fastutil.di.unimi.it/ Koloboke Collections: https://github.com/leventov/koloboke Eclipse Collections: https://www.eclipse.org/collections/

Koloboke Collections Map<Integer, Integer> map = new HashMap<>(); - 1,000,000 items, 72.3mb Map<Integer, Integer> map = HashIntIntMaps.newMutableMap(); - 1,000,000 items, 16.7mb

Build your own Specialized Collections class Aggregator[@specialized(Int, Long) T: ClassTag](initialSize: Int = 1) { // Can't be `private` because it makes `@specialized` behave badly protected[this] var data = new Array[T](initialSize) protected[this] var length0 = 0 def length = length0 def apply(i: Int) = data(i) def append(i: T) = { if (length >= data.length) { // Grow by 3/2 + 1 each time, same as java.util.arraylist. Saves a bit // of memory over doubling each time, at the cost of more frequent // doublings, val newdata = new Array[T](data.length * 3 / 2 + 1) System.arraycopy(data, 0, newdata, 0, length) data = newdata } data(length) = i length0 += 1 }

Taming the Java Virtual Machine Memory Layouts Garbage Collection Compilation

Garbage Collection Easy to take for granted In theory invisible to the logic of your application In practice can have huge impact on its runtime characteristics

Garbage Collection Demo workingset A B C D E F G H I J K L M N O P Q R S T

Garbage Collection Demo workingset U V W X Y Z A B C J K L M N O P Q R F garbagerate A C B E D H G I S T

Garbage Collection Demo workingset U V W X Y Z A B C D E F G H I N J L garbagerate K O M P Q R J K L S T

Parallel GC (Default)

Parallel GC: garbage doesn t affect pause times Live Set\Garbage Rate 1,600 6,400 25,600 100,000 17ms 17ms 20ms 200,000 30ms 31ms 30ms 400,000 362ms 355ms 356ms 800,000 757ms 677ms 663ms 1,600,000 1651ms 1879ms 1627ms

Parallel GC - Pause times (mostly) proportional to working set - Garbage load doesn t matter! ~1 millisecond per megabyte of working set - Frequency of pauses proportional to garbage load, inversely proportional to total memory - Will use as much heap as it can! - Even if it doesn t need all of it

Creating less garbage doesn t reduce pause times! Working Set Garbage Rate Maximum Pause 400,000 200 345ms 400,000 800 351ms 400,000 3,200 389ms 400,000 12,800 388ms

Concurrent Mark & Sweep

Concurrent Mark & Sweep Live Set\Garbage Rate 1,600 6,400 25,600 100,000 26ms 31ms 34ms 200,000 33ms 37ms 43ms 400,000 43ms 61ms 91ms 800,000 44ms *281ms 720ms 1,600,000 1311ms 1405ms 1403ms

Concurrent Mark & Sweep - Lower throughput than the Parallel GC - Pause times around 30-50ms - Doesn t grow the heap unnecessarily - % of time spent collecting not dependent on heap size

CMS: less garbage *does* reduce pause times Working Set Garbage Rate Maximum Pause 400,000 200 51ms 400,000 800 41ms 400,000 3,200 44ms 400,000 12,800 105ms

G1 Garbage First

G1 Garbage First Live Set\Garbage Rate 1,600 6,400 25,600 100,000 21ms 15ms 18ms 200,000 29ms 30ms 32ms 400,000 43ms 45ms 48ms 800,000 29ms 842ms 757ms 1,600,000 1564ms 1324ms 1374ms

G1 Garbage First - Basically a better version of CMS GC - Better support for larger heaps, more throughput - Might become the default in Java 9

GC Comparisons CMS 1,600 6,400 25,600 100,000 26ms 31ms 34ms 200,000 33ms 37ms 43ms 400,000 43ms 61ms 91ms 800,000 44ms *281ms 720ms 1,600,000 1311ms 1405ms 1403ms Parallel 1,600 6,400 25,600 G1 1,600 6,400 25,600 100,000 17ms 17ms 20ms 100,000 21ms 15ms 18ms 200,000 30ms 31ms 30ms 200,000 29ms 30ms 32ms 400,000 362ms 355ms 356ms 400,000 43ms 45ms 48ms 800,000 757ms 677ms 663ms 800,000 29ms 842ms 757ms 1,600,000 1651ms 1879ms 1627ms 1,600,000 1564ms 1324ms 1374ms

The Generational Hypothesis - Most objects are either very-short-lived or very-long-lived - Most GCs optimize for these cases - If your code matches this profile, the GC is a lot happier

The Generational Hypothesis For the HotSpot Java VM, the memory pools for serial garbage collection are the following. - Eden Space: The pool from which memory is initially allocated for most objects. Survivor Space: The pool containing objects that have survived the garbage collection of the Eden space. Tenured Generation: The pool containing objects that have existed for some time in the survivor space. http://stackoverflow.com/questions/2129044/java-heap-terminology-young-old-and -permanent-generations

Generation Garbage Collection workingset A B C D E F G H I J K L M N O P Q R S T

Generation Garbage Collection workingset U V W X Y Z A B C J K L M N O P Q R F garbagerate A C B E D H G I S T

Generation Garbage Collection workingset D E F G H I J K L J K L M N O P Q R A garbagerate U X W Z Y C B D S T

GC Demo

Parallel GC, Generational

Parallel GC, Generational - Non-generational workload: 500ms pauses - Generational workload: 2ms pauses

Garbage Collection Takeaways The default GC will use up all your memory and will result in pauses. Creating less garbage reduces frequency of GC pauses, but not their length To reduce the length of GC pauses - Reduce the size of the working set Try to ensure garbage is short-lived Worth trying out different garbage collectors if you are seeing unwanted pauses

Taming the Java Virtual Machine Memory Layouts Garbage Collection Compilation

Compilation Javac compiles Java to Byte Code, at compile time JVM JIT compiles Byte Code to Assembly, at runtime

Bytecode

Bytecode public static void init(){ previous = Runtime.getRuntime().totalMemory() Runtime.getRuntime().freeMemory(); }

Bytecode public static void init(); Code: 0: invokestatic #2 // Method java/lang/runtime.getruntime:()ljava/lang/runtime; 3: invokevirtual #3 // Method java/lang/runtime.totalmemory:()j 6: invokestatic #2 // Method java/lang/runtime.getruntime:()ljava/lang/runtime; 9: invokevirtual #4 // Method java/lang/runtime.freememory:()j 12: lsub 13: putstatic 16: return #5 // Field previous:j

String Construction String s1 = "" + input String s2 = String.valueOf(input);

Stringify Demo

String Construction String s1 = "" + input String s2 = String.valueOf(input); 32: invokestatic 11: new #7 // class StringBuilder 14: dup 15: invokespecial #8 // StringBuilder."<init>":()V 18: ldc // String #9 20: invokevirtual #10 // StringBuilder.append:(LString;)LStringBuilder; 23: iload_1 24: invokevirtual #11 // StringBuilder.append:(I)LStringBuilder; 27: invokevirtual #12 // StringBuilder.toString:()LString; #13 // String.valueOf:(I)String;

Switch Demo

Integer Switch switch((int)i){ 13: lookupswitch case 0: 0: 40 println("hello"); 1: 48 break; case 1: println("world"); } { // 2 default: 53 } 40: ldc #7 // String Hello 42: invokestatic #8 // println:(ljava/lang/string;)v 45: goto 53 48: ldc #9 // String World 50: invokestatic #8 // println:(ljava/lang/string;)v

String Switch switch((string)s){ case "0": 74: invokevirtual #11 77: lookupswitch 49: 119 default: 131 break; } { // 2 48: 104 println("hello S"); case "1": // Method java/lang/string.hashcode:()i } 104: aload_3 println("world S"); 105: ldc break; 107: invokevirtual #13 #12 // String 0 // Method java/lang/string.equals:(ljava/lang/object;)z 110: ifeq 131 113: iconst_0 114: istore 4 116: goto 131 119: aload_3 120: ldc #14 122: invokevirtual #13 // String 1 // Method java/lang/string.equals:(ljava/lang/object;)z 125: ifeq 131

String Switch 74: invokevirtual #11 77: lookupswitch // String.hashCode:()I { // 2 122: invokevirtual #13 125: ifeq 128: iconst_1 49: 119 129: istore 4 131: iload 4 133: lookupswitch { // 2 } 104: aload_3 0: 160 #12 107: invokevirtual #13 110: ifeq 131 48: 104 default: 131 105: ldc // String.equals:(Object;)Z // String 0 1: 168 // String.equals:(Object;)Z 131 default: 173 } 113: iconst_0 160: ldc #15 // String Hello S // Method println:(string;)v 114: istore 4 162: invokestatic #8 116: goto 131 165: goto 173 168: ldc #16 // String World S 170: invokestatic #8 // Method println:(string;)v 119: aload_3 120: ldc #14 // String 1

Why Read Bytecode? Understand what your code compiles to - Understanding performance characteristics Debugging frameworks that muck with bytecode - AspectJ Javassist Working with non-java languages (Scala, Clojure, Groovy, ) - These all speak Bytecode

Assembly The JIT compiler is not a black box You can see the actual assembly that gets run https://www.ashishpaliwal.com/blog/2013/05/jvm-how-to-see-assembly-code-for-y our-java-program/

Assembly for(int i = 0; i < count; i += 1){ items[i] = new int[2]; } 0x00000001121d2c52: mov %rbx,0x8(%rax,%rsi,8) 0x00000001121d2c57: dec %rsi 0x00000001121d2c5a: jne 0x00000001121d2c52 ;*newarray ; - Memory::main@22 (line 22)

JIT Demo

Why Read Assembly? Next level of Truth underneath the bytecode What is actually getting run on my processor? java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly

Polymorphism interface Hello{ for(int j = 0; j < 100; j++){ int get(); for(int i = 0; i < count; i++){ } eventotal += input[i].get(); class HelloOne implements Hello{ public int get(){ return 1; } } class HelloTwo implements Hello{ public int get(){ return 2; } } } }

Polymorphism Demo

Polymorphism interface Hello{ for(int j = 0; j < 100; j++){ int get(); for(int i = 0; i < count; i++){ } eventotal += input[i].get(); class HelloOne implements Hello{ public int get(){ } } return 1; # of subclasses Time Taken 1 1595ms public int get(){ 2 2234ms return 2; 3 4533ms 4 4460ms } } class HelloTwo implements Hello{ } }

Polymorphism for(int j = 0; j < 100; j++){ for(int i = 0; i < count; i++){ eventotal += input[i].get(); } }

Polymorphism: Bytecode for(int j = 0; j < 100; j++){ for(int i = 0; i < count; i++){ eventotal += input[i].get(); } } 166: iload 13 168: iload 4 170: if_icmpge 195 173: lload 10 175: aload 5 177: iload 13 179: aaload 180: invokeinterface #24, 185: i2l 186: ladd 187: lstore 10 189: iinc 13, 1 192: goto 166 195: iinc 12, 1 198: goto 156 1 // Hello.get:()I

Polymorphism: 2 subclasses 0x000000010b2859a0: mov 0x8(%r12,%r9,8),%r11d ;*invokeinterface get ; -Polymorphism::main@157 (line 49) ; implicit exception: dispatches to 0x000000010b285b2f 0x000000010b2859a5: movslq %r10d,%r10 0x000000010b2859a8: add %r14,%r10 ;*ladd ; -Polymorphism::main@163 (line 49) 0x000000010b2859ab: cmp $0xf800c0bc,%r11d 0x000000010b2859b2: je 0x000000010b285960 0x000000010b2859b4: cmp $0xf800c105,%r11d 0x000000010b2859bb: jne 0x000000010b285a32 ; {metadata('helloone')} ; {metadata('hellotwo')}

Polymorphism: 3 subclasses 0x00000001095841eb: nop 0x00000001095841ec: nop 0x00000001095841ed: movabs $0xffffffffffffffff,%rax 0x00000001095841f7: callq 0x00000001094b0220 ; OopMap{[248]=Oop off=4732} ;*invokeinterface get ; - Polymorphism::main@180 (line 49) ; 0x00000001095841fc: movslq %eax,%rax 0x00000001095841ff: mov 0xe8(%rsp),%rdx 0x0000000109584207: add %rdx,%rax 0x000000010958420a: mov 0xf0(%rsp),%ecx {virtual_call}

Compilation Takeaways - Dumb code runs faster - Even if it compiles to the exact same bytecode!

Taming the Java Virtual Machine

Taming the Java Virtual Machine Memory Layouts Garbage Collection Compilation

Taming the Java Virtual Machine Memory Layouts - OutOfMemoryError Garbage Collection - Long pauses Compilation - Mysterious performance issues

Taming the Java Virtual Machine class Simple{ public static void main(string args[]){ String s = "Hello Java"; int i = 123; System.out.println(s + 123); } }

Taming the Java Virtual Machine Li Haoyi, Chicago Scala Meetup, 19 Apr 2017