Performance Profiling Curtin University of Technology Department of Computing
Objectives To develop a strategy to characterise the performance of Java applications benchmark to compare algorithm choices profile to identify bottlenecks tune problem areas Introduce JVM internals JVM profiling hooks Snapshots of the heap and Java Stack.
Under the hood in the JVM... Byte code stream Dynamic Class Loader and verifier Class and method area Garbage collected heap Support Code Exceptions Threads security etc. Native methods Native Method Linker Native Method Area Execution Engine Operating System
Why byte code? Vendor neutral and device independent. Portable to most computing environments. Does not assume an underlying architecture, register file, or computing environment. One binary file runs on every computer regardless of the CPU family. Author once for many platforms!
Factors effecting performance Java is an interpreted language Memory Management and garbage collection Input and Output Algorithm choice
Big O() Curves and Performance laugh.learn.love.live
StopWatch Class Constrcutors public StopWatch() Default constructor used to create a StopWatch instance. Methods public void start() Start the StopWatch by recording the current system time at the beginning of the timed period. public void stop() Stop the StopWatch by recording the current system time at the end of the timed period. Invalidate the time if the stop watch was not previously started using the start() method.
More StopWatch methods... public long elapsedmillis()calculate the elapsed time by subtracting the stop time from the start time. Return 0 if the StopWatch has not previously been started. public void reset()reset start and stop time to the initial state.
Sort Experiment code fragment // Time how long it takes to do the BubbleSort StopWatch stopwatch = new StopWatch(); stopwatch.start(); bubblesort.bubblesort(); stopwatch.stop(); long bubbletime = stopwatch.elapsedmillis(); // Time how ling it takes to do the MergeSort stopwatch.reset(); stopwatch.start(); mergesort.mergesort(); stopwatch.stop(); long mergetime = stopwatch.elapsedmillis();
CSVWriter Constrcutors public CSVWrite() Default constructor. Uses a FileDialog panel to allow the user to specify the filename and location of the CSV file. public CSVWrite( String directory, String filename ) Constructor that allows the programmer to specify the file name and location of the CSV file. public CSVWrite( String filename ) Constructor that allows the programmer to specify the CSV file name. The default directory location is assumed.
CSVWriter methods Methods public void header( String header ) Write a header to the CSV file. void nextfield( String field ) public void nextfield(long field) public void nextfield(int field) public void nextfield(float field)... Write the next field to the CSV file. Append a comma if this is not the first field on a given line. public void eol() End the current line. A comma will not be appended when the next field is written to the file.
Reporting Sort results... // report results if (csv!= null) { csv.nextfield(size); csv.nextfield(bubbletime); csv.nextfield(mergetime); csv.eol(); }...
Launching SortTest Disable asynchronous garbage collection if you force System.gc() outside of timing loops. Disable JIT. See your JVM help string for information on how to do this.
Sort Experiment Results laugh.learn.love.live
Experimental Results: BubbleSort laugh.learn.love.live
Experiment Results: MergeSort laugh.learn.love.live
StopWatch Considerations... Collect the following data: Best case Worst case Average case Consider using synchronous garbage collection outside of the timing loop. Consider turning off JIT or having an untimed initial run.
StopWatch pros and cons StopWatch pros: Simple to implement benchmark. Measures total elapsed time. Good for comparing two versions of the same algorithm. StopWatch cons: Doesn t identify code bottlenecks Requires timing code to be embedded in the code under test. Captures run time or JVM issues like JIT in addition to code issues.
Profiling Profiling your code lists the following: The methods called the most often. Percentage of CPU time used by each method OR relative method frequency count. Methods calling the most time consuming methods. Profiling Goals: Make frequently used methods faster. Call slow methods less often.
More profiling thoughts... Hooks built into the JVM. No need to embed timing code. Shows where time was spent while executing code. Reports results on method by method basis. JVM on some operating systems work better than others. Caution: The act of taking measurements impacts overall time.
HelloWorld Source Code // Java version of the Hello World program public class HelloWorld { public static void main(string args[]) { System.out.println( "Hello World!!!" ); } } Compile it: DAM@hammer.cs.curtin.edu.au> javac HelloWorld.java Profile it: DAM@hammer.cs.curtin.edu.au>java -prof:profilingfile.txt HelloWorld Hello World!!!
Non-standard features DAM@hammer.cs.curtin.edu.au> java -X -Xmixed mixed mode execution (default) -Xint interpreted mode execution only -Xbootclasspath:<directories and zip/jar files separated by :> set search path for bootstrap classes and resources -Xbootclasspath/a:<directories and zip/jar files separated by :> append to end of bootstrap class path -Xbootclasspath/p:<directories and zip/jar files separated by :> prepend in front of bootstrap class path -Xnoclassgc disable class garbage collection -Xloggc:<file> log GC status to a file with time stamps -Xbatch disable background compilation -Xms<size> set initial Java heap size -Xmx<size> set maximum Java heap size -Xss<size> set java thread stack size -Xprof output cpu profiling data -Xfuture enable strictest checks, anticipating future default -Xrs reduce use of OS signals by Java/VM (see documentation) -Xdock:name=<application name> override default application name displayed in dock -Xdock:icon=<path to icon file> override default icon displayed in dock -Xcheck:jni perform additional checks for JNI functions -Xshare:off do not attempt to use shared class data -Xshare:auto use shared class data if possible (default) -Xshare:on require using shared class data, otherwise fail. laugh.learn.love.live
Java: -Xrunhprof:help laugh.learn.love.live
JAVA PROFILE 1.0.1 DAM@hammer.cs.curtin.edu.au> java -Xrunhprof:cpu=samples,thread=y HelloWorld
School of Computing Resources Location OS CPU CLOCK RAM Lab 217 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 218 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 219 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 220 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Lab 221 Fedora Linux Intel Core 2 Duo 2.6 or 2.8 Ghz 500MB - 1GB Specifications are subject to change without notice (these may be outdated). laugh.learn.love.live
CoinToss: An example Large code bases are difficult to inspect line-by-line when trying to identify problems. Instead, run a profile tool to determine where the code is spending the bulk of its time. Then go to the code to see if you can work out why it s spending so much time in that location. We will look a simple example...
CoinToss Description Time how long it takes to toss a coin 10, 100, 1000, 10000, and 100000 times. Take the median of 100 iterations. Report results to a CSV file. Assume there s a lot of code (there s not). Review relevant documentation. Be aware that it may only contain public methods.
CoinToss Constrcutors public Coin() Default constructor that creates a new a new coin that can be tossed. It uses the current date as a seed to the random number generator used in tossing the coin. public Coin( long seed ) Creates a new coin. The programmer specifies the seed that will be used to create the random number generator used to toss the coin.
CoinToss Methods public void toss() Simulates a coin toss using a random number generator. public boolean istails( String field ) Determines the state of the coin. This method returns true if "Tails" is showing. False is returned if "Heads" is showing. public boolean isheads() Determines the state of the coin. This method returns true if "Heads" is showing. False is returned if "Tails" is showing.
Java 1.x profile analysis in a spreadsheet laugh.learn.love.live
Now we know where to look... Version 1 Code: Inspect the code and develop a hypothesis explaining why so much time is spent in isheads() and isodd(). Coin is heads when random number generator returns an odd number. isodd() determines if a number is even or odd by taking - 1 and testing the sign. Uses Math.pow( double, double ) which uses floating point arithmetic.
Now we know where to look... Version 2 Code: Mask off all but the low order bit and checks value. Only uses a simple bitwise AND function Does not involve floating point arithmetic or expensive type casts. Easy to make the change by overriding the offending function. Measurement and simple detective work can yield impressive results!
A real-world example? In a trivial example like this, you d probably come to the same conclusion without the profiler. The code is short, so doing a code walk through isn t a big deal. However, if this code contained millions of lines of code, a profiler would allow you to locate the problem areas quickly and methodically!
Mean Toss Time Version 1 and 2 laugh.learn.love.live
How does a profiler work? Runs in a separate thread. Wakes up at periodic intervals and inspects the run-time stack. Counts the number of times the stack frame for a given method is on the top of the stack Frame on TOS is for the method that s currently running. Frame immediately underneath any given Frame corresponds to its caller.
Stack implements method calls A stack frame is pushed on the stack for each copy of a called method. The stack frame contains local variables, parameters and other data including the PC. The top stack frame corresponds to the method currently executing. When that method returns, its stack frame is popped off the stack.
Pushing and popping stack frames Frame 3 operand stack Frame 2 local variables Frame 1 parameters Stack SP frame LV
Why local variables on the stack? Methods with local variables have their own copy of those variables for each running copy of the method. Maintained in that copy s stack frame. Changing a variable in one stack frame has no effect on the same variable in other stack frames. An example is on the web site, please study it.
Java Virtual Machine Tool Interface (JVMTI) JVMTI: Replaces the JVMPI is a new native programming interface for use by tools It provides both a way to inspect the state and to control the execution of applications running in the Java virtual machine (JVM). http://java.sun.com/j2se/1.5.0/docs/guide/jvmti/index.html
Summary Benchmark throughput to compare algorithms or various approaches Profile to identify bottlenecks Knowledge of JVM and its memory management policies is crucial to be able to tune application code.
Appendix - Code from this lecture https://www.computing.edu.au/units/setm351/examplecode/setm/timer/stopwatch.javahttps://www.computing.edu.au/ units/setm351/examplecode/setm/util/csvwriter.javahttps://www.computing.edu.au/units/setm351/examplecode/app lications/cointoss/coin.javahttps://www.computing.edu.au/units/setm351/examplecode/applications/cointoss/coinver sion2.javahttps://www.computing.edu.au/units/setm351/examplecode/applications/cointoss/twoup.javahttps://www.c omputing.edu.au/units/setm351/examplecode/applications/cointoss/towuptimer.javahttps://www.computing.edu.au/ units/setm351/examplecode/applications/cointoss/twouptimerversion2.javahttps://www.computing.edu.au/units/set m351/examplecode/applications/cointoss/twoupversion2.java