Performance Profiling. Curtin University of Technology Department of Computing

Similar documents
PARAMETERS Options may be in any order. For a discussion of parameters which apply to a specific option, see OPTIONS below.

SDK/RTE for Debian Linux on Intel Itanium Processors Release Notes

Designing experiments Performing experiments in Java Intel s Manycore Testing Lab

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Programming. Syntax and Semantics

Program Fundamentals

Introduction to Java

Interaction of JVM with x86, Sparc and MIPS

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

Introduction to Java Programming

Computer Components. Software{ User Programs. Operating System. Hardware

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

CS 231 Data Structures and Algorithms, Fall 2016

INDEX. A SIMPLE JAVA PROGRAM Class Declaration The Main Line. The Line Contains Three Keywords The Output Line

CS 11 java track: lecture 1

Introduction to Java

13 th Windsor Regional Secondary School Computer Programming Competition

Compiling Techniques

Run-time Program Management. Hwansoo Han

IBM Cognos ReportNet and the Java Heap

Locate a Hotspot and Optimize It

Virtual Machine Design

PROGRAMMING FUNDAMENTALS

C02: Overview of Software Development and Java

High-Level Language VMs

New Java performance developments: compilation and garbage collection

Introduction to Programming (Java) 2/12

Workload Characterization and Optimization of TPC-H Queries on Apache Spark

JamaicaVM Java for Embedded Realtime Systems

Don t Get Caught In the Cold, Warm-up Your JVM Understand and Eliminate JVM Warm-up Overhead in Data-parallel Systems

Programming Language Concepts: Lecture 1

Zing Vision. Answering your toughest production Java performance questions

Java: framework overview and in-the-small features

CS11 Java. Fall Lecture 1

MODULE 1 JAVA PLATFORMS. Identifying Java Technology Product Groups

Lecture 2. COMP1406/1006 (the Java course) Fall M. Jason Hinek Carleton University

The Z Garbage Collector Scalable Low-Latency GC in JDK 11

COE318 Lecture Notes Week 3 (Week of Sept 17, 2012)

Selected Questions from by Nageshwara Rao

CompSci 125 Lecture 02

Real Time: Understanding the Trade-offs Between Determinism and Throughput

Operating- System Structures

Java performance - not so scary after all

Outline. Parts 1 to 3 introduce and sketch out the ideas of OOP. Part 5 deals with these ideas in closer detail.

Lecture 1 - Introduction (Class Notes)

assembler Machine Code Object Files linker Executable File

Certified Core Java Developer VS-1036

Assembly Language. Lecture 2 - x86 Processor Architecture. Ahmed Sallam

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

CSC 1214: Object-Oriented Programming

Algorithms and Programming I. Lecture#12 Spring 2015

A Closer Look at Fedora s Ingest Performance

Computer Science 1 Ah

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

Untyped Memory in the Java Virtual Machine

Computational Optimization ISE 407. Lecture1. Dr. Ted Ralphs

Operating System: Chap2 OS Structure. National Tsing-Hua University 2016, Fall Semester

File System Interface. ICS332 Operating Systems

CHAPTER 1 Introduction to Computers and Java

Notes of the course - Advanced Programming. Barbara Russo

IT151: Introduction to Programming (java)

COMP163. Introduction to Computer Programming. Introduction and Overview of the Hardware

G52PGP. Lecture oo3 Java (A real object oriented language)

Assembly Language. Lecture 2 x86 Processor Architecture

JDK 9/10/11 and Garbage Collection

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Introduction to Java

Index. Course Outline. Grading Policy. Lab Time Distribution. Important Instructions

Running class Timing on Java HotSpot VM, 1

Hierarchical PLABs, CLABs, TLABs in Hotspot

CIS 110: Introduction to Computer Programming

The Z Garbage Collector Low Latency GC for OpenJDK

The Java programming environment. The Java programming environment. Java: A tiny intro. Java features

Java: Comment Text. Introduction. Concepts

Learning objectives. The Java Environment. Java timeline (cont d) Java timeline. Understand the basic features of Java

THREADS & CONCURRENCY

Chapter 2: Operating-System Structures

Expressions and Data Types CSC 121 Fall 2015 Howard Rosenthal

CHAPTER 1. Introduction to JAVA Programming

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Processes, Threads and Processors

CS/B.TECH/CSE(New)/SEM-5/CS-504D/ OBJECT ORIENTED PROGRAMMING. Time Allotted : 3 Hours Full Marks : 70 GROUP A. (Multiple Choice Type Question)

Introduction Basic elements of Java

Computer Components. Software{ User Programs. Operating System. Hardware

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

Lab #1: A Quick Introduction to the Eclipse IDE

Variables and Operators 2/20/01 Lecture #

Agenda CS121/IS223. Reminder. Object Declaration, Creation, Assignment. What is Going On? Variables in Java

Bugs in software. Using Static Analysis to Find Bugs. David Hovemeyer

Apple. Massive Scale Deployment / Connectivity. This is not a contribution

Mechanized Operational Semantics

A Quantitative Evaluation of the Contribution of Native Code to Java Workloads

CSc 453 Interpreters & Interpretation

Java Without the Jitter

Goals. Java - An Introduction. Java is Compiled and Interpreted. Architecture Neutral & Portable. Compiled Languages. Introduction to Java

Agenda. CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language 8/29/17. Recap: Binary Number Conversion

CS 61C: Great Ideas in Computer Architecture. Lecture 2: Numbers & C Language. Krste Asanović & Randy Katz

Course information. Petr Hnětynka 2/2 Zk/Z

Transcription:

Performance Profiling Curtin University of Technology Department of Computing

Objectives To develop a strategy to characterise the performance of Java applications benchmark to compare algorithm choices profile to identify bottlenecks tune problem areas Introduce JVM internals JVM profiling hooks Snapshots of the heap and Java Stack.

Under the hood in the JVM... Byte code stream Dynamic Class Loader and verifier Class and method area Garbage collected heap Support Code Exceptions Threads security etc. Native methods Native Method Linker Native Method Area Execution Engine Operating System

Why byte code? Vendor neutral and device independent. Portable to most computing environments. Does not assume an underlying architecture, register file, or computing environment. One binary file runs on every computer regardless of the CPU family. Author once for many platforms!

Factors effecting performance Java is an interpreted language Memory Management and garbage collection Input and Output Algorithm choice

Big O() Curves and Performance laugh.learn.love.live

StopWatch Class Constrcutors public StopWatch() Default constructor used to create a StopWatch instance. Methods public void start() Start the StopWatch by recording the current system time at the beginning of the timed period. public void stop() Stop the StopWatch by recording the current system time at the end of the timed period. Invalidate the time if the stop watch was not previously started using the start() method.

More StopWatch methods... public long elapsedmillis()calculate the elapsed time by subtracting the stop time from the start time. Return 0 if the StopWatch has not previously been started. public void reset()reset start and stop time to the initial state.

Sort Experiment code fragment // Time how long it takes to do the BubbleSort StopWatch stopwatch = new StopWatch(); stopwatch.start(); bubblesort.bubblesort(); stopwatch.stop(); long bubbletime = stopwatch.elapsedmillis(); // Time how ling it takes to do the MergeSort stopwatch.reset(); stopwatch.start(); mergesort.mergesort(); stopwatch.stop(); long mergetime = stopwatch.elapsedmillis();

CSVWriter Constrcutors public CSVWrite() Default constructor. Uses a FileDialog panel to allow the user to specify the filename and location of the CSV file. public CSVWrite( String directory, String filename ) Constructor that allows the programmer to specify the file name and location of the CSV file. public CSVWrite( String filename ) Constructor that allows the programmer to specify the CSV file name. The default directory location is assumed.

CSVWriter methods Methods public void header( String header ) Write a header to the CSV file. void nextfield( String field ) public void nextfield(long field) public void nextfield(int field) public void nextfield(float field)... Write the next field to the CSV file. Append a comma if this is not the first field on a given line. public void eol() End the current line. A comma will not be appended when the next field is written to the file.

Reporting Sort results... // report results if (csv!= null) { csv.nextfield(size); csv.nextfield(bubbletime); csv.nextfield(mergetime); csv.eol(); }...

Launching SortTest Disable asynchronous garbage collection if you force System.gc() outside of timing loops. Disable JIT. See your JVM help string for information on how to do this.

Sort Experiment Results laugh.learn.love.live

Experimental Results: BubbleSort laugh.learn.love.live

Experiment Results: MergeSort laugh.learn.love.live

StopWatch Considerations... Collect the following data: Best case Worst case Average case Consider using synchronous garbage collection outside of the timing loop. Consider turning off JIT or having an untimed initial run.

StopWatch pros and cons StopWatch pros: Simple to implement benchmark. Measures total elapsed time. Good for comparing two versions of the same algorithm. StopWatch cons: Doesn t identify code bottlenecks Requires timing code to be embedded in the code under test. Captures run time or JVM issues like JIT in addition to code issues.

Profiling Profiling your code lists the following: The methods called the most often. Percentage of CPU time used by each method OR relative method frequency count. Methods calling the most time consuming methods. Profiling Goals: Make frequently used methods faster. Call slow methods less often.

More profiling thoughts... Hooks built into the JVM. No need to embed timing code. Shows where time was spent while executing code. Reports results on method by method basis. JVM on some operating systems work better than others. Caution: The act of taking measurements impacts overall time.

HelloWorld Source Code // Java version of the Hello World program public class HelloWorld { public static void main(string args[]) { System.out.println( "Hello World!!!" ); } } Compile it: DAM@hammer.cs.curtin.edu.au> javac HelloWorld.java Profile it: DAM@hammer.cs.curtin.edu.au>java -prof:profilingfile.txt HelloWorld Hello World!!!

Non-standard features DAM@hammer.cs.curtin.edu.au> java -X -Xmixed mixed mode execution (default) -Xint interpreted mode execution only -Xbootclasspath:<directories and zip/jar files separated by :> set search path for bootstrap classes and resources -Xbootclasspath/a:<directories and zip/jar files separated by :> append to end of bootstrap class path -Xbootclasspath/p:<directories and zip/jar files separated by :> prepend in front of bootstrap class path -Xnoclassgc disable class garbage collection -Xloggc:<file> log GC status to a file with time stamps -Xbatch disable background compilation -Xms<size> set initial Java heap size -Xmx<size> set maximum Java heap size -Xss<size> set java thread stack size -Xprof output cpu profiling data -Xfuture enable strictest checks, anticipating future default -Xrs reduce use of OS signals by Java/VM (see documentation) -Xdock:name=<application name> override default application name displayed in dock -Xdock:icon=<path to icon file> override default icon displayed in dock -Xcheck:jni perform additional checks for JNI functions -Xshare:off do not attempt to use shared class data -Xshare:auto use shared class data if possible (default) -Xshare:on require using shared class data, otherwise fail. laugh.learn.love.live

Java: -Xrunhprof:help laugh.learn.love.live

JAVA PROFILE 1.0.1 DAM@hammer.cs.curtin.edu.au> java -Xrunhprof:cpu=samples,thread=y HelloWorld

School of Computing Resources Location OS CPU CLOCK RAM Lab 217 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 218 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 219 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 220 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 221 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Specifications are subject to change without notice (these may be outdated). laugh.learn.love.live

CoinToss: An example Large code bases are difficult to inspect line-by-line when trying to identify problems. Instead, run a profile tool to determine where the code is spending the bulk of its time. Then go to the code to see if you can work out why it s spending so much time in that location. We will look a simple example...

CoinToss Description Time how long it takes to toss a coin 10, 100, 1000, 10000, and 100000 times. Take the median of 100 iterations. Report results to a CSV file. Assume there s a lot of code (there s not). Review relevant documentation. Be aware that it may only contain public methods.

CoinToss Constrcutors public Coin() Default constructor that creates a new a new coin that can be tossed. It uses the current date as a seed to the random number generator used in tossing the coin. public Coin( long seed ) Creates a new coin. The programmer specifies the seed that will be used to create the random number generator used to toss the coin.

CoinToss Methods public void toss() Simulates a coin toss using a random number generator. public boolean istails( String field ) Determines the state of the coin. This method returns true if "Tails" is showing. False is returned if "Heads" is showing. public boolean isheads() Determines the state of the coin. This method returns true if "Heads" is showing. False is returned if "Tails" is showing.

Java 1.x profile analysis in a spreadsheet laugh.learn.love.live

Now we know where to look... Version 1 Code: Inspect the code and develop a hypothesis explaining why so much time is spent in isheads() and isodd(). Coin is heads when random number generator returns an odd number. isodd() determines if a number is even or odd by taking - 1 and testing the sign. Uses Math.pow( double, double ) which uses floating point arithmetic.

Now we know where to look... Version 2 Code: Mask off all but the low order bit and checks value. Only uses a simple bitwise AND function Does not involve floating point arithmetic or expensive type casts. Easy to make the change by overriding the offending function. Measurement and simple detective work can yield impressive results!

A real-world example? In a trivial example like this, you d probably come to the same conclusion without the profiler. The code is short, so doing a code walk through isn t a big deal. However, if this code contained millions of lines of code, a profiler would allow you to locate the problem areas quickly and methodically!

Mean Toss Time Version 1 and 2 laugh.learn.love.live

How does a profiler work? Runs in a separate thread. Wakes up at periodic intervals and inspects the run-time stack. Counts the number of times the stack frame for a given method is on the top of the stack Frame on TOS is for the method that s currently running. Frame immediately underneath any given Frame corresponds to its caller.

Stack implements method calls A stack frame is pushed on the stack for each copy of a called method. The stack frame contains local variables, parameters and other data including the PC. The top stack frame corresponds to the method currently executing. When that method returns, its stack frame is popped off the stack.

Pushing and popping stack frames Frame 3 operand stack Frame 2 local variables Frame 1 parameters Stack SP frame LV

Why local variables on the stack? Methods with local variables have their own copy of those variables for each running copy of the method. Maintained in that copy s stack frame. Changing a variable in one stack frame has no effect on the same variable in other stack frames. An example is on the web site, please study it.

Java Virtual Machine Tool Interface (JVMTI) JVMTI: Replaces the JVMPI is a new native programming interface for use by tools It provides both a way to inspect the state and to control the execution of applications running in the Java virtual machine (JVM). http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/index.html

Summary Benchmark throughput to compare algorithms or various approaches Profile to identify bottlenecks Knowledge of JVM and its memory management policies is crucial to be able to tune application code.

Appendix - Code from this lecture https://www.computing.edu.au/units/setm351/examplecode/setm/timer/stopwatch.javahttps://www.computing.edu.au/ units/setm351/examplecode/setm/util/csvwriter.javahttps://www.computing.edu.au/units/setm351/examplecode/app lications/cointoss/coin.javahttps://www.computing.edu.au/units/setm351/examplecode/applications/cointoss/coinver sion2.javahttps://www.computing.edu.au/units/setm351/examplecode/applications/cointoss/twoup.javahttps://www.c omputing.edu.au/units/setm351/examplecode/applications/cointoss/towuptimer.javahttps://www.computing.edu.au/ units/setm351/examplecode/applications/cointoss/twouptimerversion2.javahttps://www.computing.edu.au/units/set m351/examplecode/applications/cointoss/twoupversion2.java