The Use of Traces in Optimization

Size: px
Start display at page:

Download "The Use of Traces in Optimization"

Transcription

1 The Use of Traces in Optimization by Borys Jan Bradel A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto 2004 c Copyright by Borys Jan Bradel 2004

2 ABSTRACT The Use of Traces in Optimization by Borys Jan Bradel Master of Applied Science Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto 2004 We build trace collection systems for Jupiter and the Jikes Research Virtual Machine. We use the systems to create traces based on the execution of the SPECjvm98 and Java Grande benchmarks. We characterize these traces and show that they contain the most frequently executed instructions, that traces are a compact representation of a program, and that traces are a good means of predicting the control flow of a program. Furthermore, we evaluate the use of traces for inlining. We execute the benchmarks using Jikes while providing information based on previously collected traces. We find that the use of traces leads to lower execution time, by 10%, compared to providing similar information from Jikes s adaptive system from a previous execution. This increase in performance, however, has an associated code expansion of 47%. Our work indicates that traces are beneficial for a single optimization and may also be beneficial for general optimizations. ii

3 ACKNOWLEDGEMENTS I am grateful to everyone who has made this thesis possible. First, I would like to thank my supervisor, Professor Tarek S. Abdelrahman. He has provided guidance and help throughout this research. I would like thank Henry Jo for proof reading this thesis. His insights and persistence have improved this thesis greatly. I would like to thank Patrick Doyle and the creators of the Jikes Research Virtual Machine for creating the Java Virtual Machines that I used. I would like to thank Carlos Cavanna and Patrick Doyle for helping me set up Jupiter and answering all of the questions that I had. I would like to thank Tomasz Czajkowski for his suggestions regarding my presentation in defence of this thesis. I would also like to thank all of my family and friends who have provided support and encouragement while I was working on this thesis. Lastly, I would like to acknowledge the financial support provided by the National Sciences and Engineering Research Council of Canada. iii

4 TABLE OF CONTENTS ACKNOWLEDGEMENTS iii LIST OF TABLES vii LIST OF FIGURES ix CHAPTERS 1. Introduction Thesis Overview Thesis Organization Background Program Model Control Flow Graphs Traces Trace Collection Trace Collection Example Mispredicting Returns Optimization using Traces Jupiter Jikes The Adaptive System The Optimization Test Harness Inlining in Jikes Inlining Oracles Related Work Static Trace Scheduling Path Profiling Hardware Trace Systems Software Trace Systems Feedback Directed Systems in Java Java Trace Systems iv

5 3. Trace Collection Trace Collection within Jupiter Basic Block Identification Profile Information Trace Collection within Jikes Control Flow Information Basic Block Identification Trace Formation Trace Characterization Number of Traces Static Trace Length Dynamic Trace Length Static Program Coverage Dynamic Program Coverage Method Coverage Exit Behaviour Exit Predictability Trace Cache Execution Trace Execution Patterns Inlining Benefits of Inlining Approaches to Inlining Traces and Inlining Candidate Method Selection Results Methodology Experimental Platform Jupiter Jikes Benchmarks Comparison of the Adaptive System and the Optimization Test Harness Trace Characterization Results Number of Traces Static Trace Length Dynamic Trace Length Static Program Coverage Dynamic Program Coverage v

6 6.2.6 Method Coverage Exit Behaviour Exit Predictability Trace Cache Execution Trace Execution Patterns Alternate Trace Collection Parameters Runtime Performance Results Inlining with Traces from Jupiter Inlining with Traces from Jikes Effects of Inline Sequences Effects of Compilation Queue Filling Ahead of Time Compilation Details of the Provided Inline Information Trace Collection Overhead Conclusion APPENDICES 7.1 Future Work A. Trace Characterization for Alternate Parameters B. Inlining Performance Data BIBLIOGRAPHY vi

7 LIST OF TABLES Table 6.1 Number of traces Average static lengths in instructions Dynamic trace lengths Static program coverage Dynamic program coverage Method coverage Trace exits Biases Bias differentiation Improved biases Execution lives Several characterizations for alternate traces Total number of methods to optimize Number of methods that are part of the benchmarks Number of methods that are part of the Java class library Number of inlining requests Time spent in the main and organizer threads vii

8 A.1 Number of traces A.2 Average static lengths in instructions A.3 Dynamic trace lengths A.4 Static program coverage A.5 Dynamic program coverage A.6 Method coverage A.7 Trace exits A.8 Biases A.9 Bias differentiation A.10 Improved biases A.11 Execution lives B.1 Main thread s execution time in the adaptive system B.2 Compile time in the adaptive system B.3 Machine code in kilobytes in the adaptive system B.4 Benchmark execution time in the optimization test harness B.5 User time in the optimization test harness B.6 LIR instructions generated in the optimization test harness viii

9 LIST OF FIGURES Figure 1.1 Two types of feedback directed systems Source code of a simple loop Control flow graph for the simple loop Instruction sequence of a simple program Traces mapped onto the control flow graph Example of interaction between a JVM and a trace collection system Source code for a return misprediction example Traces for a return misprediction example Source code for an optimization example Control flow graph for the optimization example Control flow graph with a trace for the optimization example An example of inlining using inline sequences Execution mapped to target sequences Potential invocation sequence identification. Invocation of method a() is common in the three cases Example frequency graphs Example of inline sequences Frequency graphs for SPECjvm98 benchmarks ix

10 6.2 Frequency graphs for Java Grande benchmarks Inlining with traces from Jupiter Inlining with traces from Jikes Inlining with inline sequences Inlining with filling of compilation queue Inlining with ahead-of-time compilation Overhead of collecting traces in Jikes A.1 Frequency graphs for SPECjvm98 benchmarks A.2 Frequency graphs for Java Grande benchmarks x

11 CHAPTER 1 Introduction Traditional static compilation has shortcomings in that it cannot take advantage of information available only at runtime to produce high-performance executables. Such runtime information includes processor architecture, specific input data characteristics, and control flow patterns within the program. One way to incorporate this runtime information is to use a feedback directed system that monitors the execution of a program and uses the information it collects to optimize the program. Feedback-directed systems can be divided into two categories: offline and online. The overall structure of each category is depicted in Figure 1.1. Offline feedback-directed systems monitor the execution of a program and use the collected data to optimize the program after it has completed executing so that its next execution will complete more quickly. Online feedback directed systems on the other hand collect information and optimize the program while it is executing. One advantage of online systems is that the feedback loop is shorter, because the information collected is based on the executing program and is immediately applied to it. An offline system uses information gathered during previous executions of the program, which may have different run-time behaviour from the execution of the optimized program due to user interaction and/or different program inputs. The information collected in an online system may therefore be more relevant and beneficial. The main disadvantage of an online system is that the system must be executing simultaneously with the program. Therefore, whatever amount of time and resources that the system uses will increase the execution time of the program or use up resources that could be otherwise utilized by the program. This dictates 1

12 Chapter 1. Introduction 2 that the information cannot be analyzed as extensively with an online system and this may lead to less effective optimizations. This thesis focuses on offline feedback-directed systems. Run Time Run Time Compiler Program Compiler Program Feedback Feedback Offline Feedback Directed System Online Feedback Directed System Figure 1.1: Two types of feedback directed systems. There have been many systems that employ both online and offline feedback. Examples include feedback directed systems created by Arnold et al [Mat00], Whaley [Wha01], and Suganuma, Yasue, and Nakatani [SYN02]. One common aspect to these feedback directed systems is that they employ counters to collect information regarding which instructions and methods are frequently executed while a program is executed and to direct optimization. This work focusses on using traces, instead of counters, to direct program optimization. 1.1 Thesis Overview A trace is a sequence of unique basic blocks 1 that is executed by a program [BDB99]. In this thesis we explore the effects of collecting the runtime information in a different manner, by employing traces. We characterize traces for a collection of different programs and apply the traces to one type of optimization, inlining. Our hypothesis is that traces are a better method of representing runtime information than counters and that they can be utilized more effectively when a feedback directed system performs 1 A basic block is a sequence of consecutive instructions that are executed together.

13 Chapter 1. Introduction 3 optimization. We support our hypothesis by performing two studies. The first is a characterization of traces to show the potential of their benefit to optimization. Our characterization shows that compilation and optimization based on traces indeed does have the potential to improve program performance. The second study employs traces to perform a specific optimizing, inlining, and the results show that traces are useful in this regard. Our work is based on two Java Virtual Machines (JVMs): Jupiter and Jikes. We have added a trace collection system to the Jupiter JVM. Our system collects traces and detailed statistics of their execution. We have also added a trace collection system to the Jikes Research Virtual Machine, referred to simply as Jikes. Furthermore we have modified and used Jikes to analyze the effects of using traces within an offline feedback-directed Java system. 1.2 Thesis Organization In Chapter 2 we present background information on traces, Jupiter, Jikes, and related research. In Chapter 3 we describe our trace collection architectures in Jupiter and Jikes. Chapter 4 contains a description of the trace characterizations that we employ. This is followed by Chapter 5 in which we describe several approaches for inlining. We present our results in Chapter 6. The final chapter, Chapter 7, gives concluding remarks and directions for future work.

14 CHAPTER 2 Background The first part of this chapter contains a brief description of how traces can be used in compilers. We first give a simple definitions of programs, control flow graphs, and traces. We then focus on how traces can be collected. We also show how traces can be beneficial to optimization. The second part of this chapter contains a description of related work. We first describe the Jupiter and Jikes virtual machines, which we use as underlying frameworks for our work. Then, in the last section of this chapter, we describe related research in the areas of feedback directed optimization and trace-based optimization. 2.1 Program Model A program contains a sequence of instructions to be executed by a computer. These instructions are generated from the source code of the program and are grouped into methods. Each method consists of a sequence of consecutive instructions. In this thesis, programs are written in Java and their instructions are therefore Java bytecodes, executed by a Java Virtual Machine (JVM). Each Java method consists of a sequence of bytecodes. Each bytecode has an associated index that indicates its location within the method. The computer, which is equivalent to a JVM, keeps track of which instruction it is executing by using a pointer to this instruction. This pointer is referred to as the instruction pointer. When the computer finishes executing an instruction it executes the next instruction in the instruction sequence and updates the instruction pointer to point to this new instruction. This is repeated until the computer encounters an instruction 4

15 Chapter 2. Background 5 that tells it to stop. Some instructions can change the instruction pointer to point to an instruction other than the next instruction in the sequence. These instructions are called control flow instructions. The instructions that can be executed after a control flow instruction are the control flow instruction s targets. Instructions can be grouped into basic blocks. A basic block is a maximal sequence of consecutive instructions such that execution must begin at the first instruction and the remaining instructions must be executed [Muc97]. Usually, basic blocks end in control flow instructions. The different types of control flow instructions are: branches, jumps, switch statements, invokes, returns, and exceptions. Branches and jumps are generated from if statements and loops in the original source code. A branch or jump is backwards if its target appears earlier in the sequence of instructions of the method that these instructions are in 1. These backward control flow instructions are often loop headers in the source code and can be used to identify corresponding loops. 2.2 Control Flow Graphs Control flow graphs (CFGs) are a commonly used representation of programs. A control flow graph is a directed graph in which each node represents a basic block [ASU86]. An edge from a basic block A to a basic block B exists if it is possible for the first instruction in basic block B to be executed immediately after the last instruction in basic block A is executed. For simplicity of presentation, and without loss of generality, we ignore certain programming constructs, such as exceptions, that complicate control flow analyses and graphs. Figure 2.2 contains the control flow graph for the method shown in Figure 2.1. Each node in the CFG is a basic block from the example method. We say that execution flows through a basic block or that the basic block is executed if the instructions in that basic block are executed. We use the same terminology when describing the execution of instructions that are on a trace as well. 1 This definition differs from the traditional definition. The traditional definition states that a branch or jump is backwards if its target is visited before the branch or jump during a depth first traversal of the instructions [Muc97].

16 Chapter 2. Background 6 public static int foo() { int a=0; for (int i=0;i<5;i++) a+=i; return a; } Figure 2.1: Source code of a simple loop B0 a=0 i=0 goto B2 B1 a+=i i++ B2 if (i<5) goto B1 B3 return a Figure 2.2: Control flow graph for the simple loop

17 Chapter 2. Background Traces A trace is a sequence of n unique basic blocks (b 1, b 2,..., b n ) such that basic blocks b 1, b 2,..., b n are executed in sequential order during the execution of a program [BDB99]. Block b 1 is called the start of the trace and b n is the end of the trace. The trace may contain any basic blocks of the program as long as the sequence corresponds to a path on the control flow graph. This is just one of several different definitions of traces [Fis81]. We use this definition because it expresses the traces that we collect more precisely. The sequence of basic blocks that corresponds to our execution of the program in Figure 2.1 is shown in Figure 2.3. Three potential traces are (B0,B2,B1), (B1,B2,B3), and (B1,B2). Note that the sequence (B1,B2,B1) is not a valid trace because B1 appears twice in the sequence, and the sequence therefore does not consist of unique basic blocks. The three traces mapped onto the control flow graph are shown in Figure 2.4. A trace is executed when each of its basic blocks is executed in sequence. The execution of the trace stops when the executed sequence diverges from the sequence of basic blocks on a trace. We refer to this point of divergence as a trace exit. There are two reasons for the two sequences to not correspond to each other. The first is that all the basic blocks in the trace have been executed. We refer to an exit that occurs in this case as a normal trace exit. The second reason is that the actual execution may differ from the trace. This occurs when several consecutive basic blocks are executed that are on a trace, but the next basic block that is executed does not correspond to the next basic block that is on the trace. We refer to such a trace exit as an early trace exit. Normal trace exits can also be divided into regular and self-loop exits [Ber03]. Self-loop exits of a trace are exits such that the instruction executed after the trace is the first instruction of the same trace. Regular exits are normal trace exits that are not self-loop exits. Categorization of trace exits in such a manner is useful when reasoning about traces and their use in optimization. In particular, optimizations that create a large amount of cleanup code to execute on early trace exits can result in poor program performance. A large number of self-loop exits indicates that loops repeatedly executed the same path while a low number indicates that either the loops did not take the same

18 Chapter 2. Background 8 path repeatedly or the path that was taken does not correspond exactly to the initially recorded trace for that loop. a=0 i=0 B0 goto B2 if (i<5) goto B1 B2 a+=i i++ } B1 B2 if (i<5) goto B1 a+=i B1 i++ if (i<5) goto B1 a+=i i++ if (i<5) goto B1 a+=i i++ if (i<5) goto B1 a+=i i++ if (i<5) goto B1 return a } } B2 B1 B2 B1 B2 B1 B2 } B3 Figure 2.3: Instruction sequence of a simple program. 2.4 Trace Collection Each trace is created by starting at some basic block in the control flow graph and creating a path by adding basic blocks until the desired trace is generated. This is done using a trace collection system (TCS). In this section we describe such a generic system, which operates by monitoring a program s execution, collecting information, and then creating traces based on the information that it collects. There are two types of information that the system collects: profile information regarding how often certain events occur, and the traces themselves 2. The profile information includes the number of times certain basic blocks are executed, how often methods are invoked, and how often branches are taken. By default the TCS only collects profile information. However, 2 JVMs can also interact with native code. In this thesis we do not keep any information regarding the details of this native code s execution.

19 Chapter 2. Background 9 B0 Trace 1 a=0 i=0 goto B2 B0 a=0 i=0 goto B2 B0 a=0 i=0 goto B2 Trace 2 Trace 3 B1 a+=i i++ B1 a+=i i++ B1 a+=i i++ B2 if (i<5) goto B1 B2 if (i<5) goto B1 B2 if (i<5) goto B1 B3 return a B3 return a B3 return a Figure 2.4: Traces mapped onto the control flow graph. when certain events occur the TCS records a sequence of basic blocks (i.e. a trace) being executed and continuously checks whether recording should stop or not. When recording stops the trace is stored in a buffer referred to as the trace cache. The TCS goes back to its default behaviour of collecting profile information. This process is repeated until the execution of the program ends. At this point the traces in the trace cache may be saved for further analysis. The system keeps track of when basic blocks and control flow instructions are executed. Furthermore it keeps track of when traces are executed. When the TCS detects that the first block of a trace is executed it notes this event as the start of a specific trace, and keeps track of which basic blocks are executed. A trace exit occurs when all the basic blocks of a trace are executed or when the block that is executed is not the next block in the sequence of basic blocks on the trace. When a trace exit occurs the TCS records this event, and resumes operating as before. The recorded information represents what would happen if the TCS would be able to execute traces. There are certain key events that trigger the recording of basic blocks as well as events that stop this recording. The events that start recording occur when some counter exceeds a certain threshold. In this section we will consider two events that

20 Chapter 2. Background 10 start recording of a trace and several different events that stop the recording. The first event that starts trace recording is a basic block being executed immediately after a backward taken branch or jump a specific number of times (i.e. the specific counter reaches some threshold). Traces that start at the target of backward branches and jumps will usually be a frequently executed path from the top of a loop until the end of a loop. The second event is a certain trace exit occurring a specific number of times. Traces that start at trace exits represent other frequently executed paths. The recording of a trace stops when: A backward branch or jump is taken. This corresponds to the end of a loop. This ensures that traces start at most a single loop and ensures that separate loops are not represented by a single trace. The block that is about to be recorded is the start of a different trace. This stops a trace because it is assumed that the instructions that are about to execute are already on a trace and have already been optimized. This prevents code explosion and duplicate work. The block that is about to be recorded is already in the trace that is being recorded. This ensures that the instructions on a trace are unique. The recorded trace is too long and recording should therefore stop. The length is arbitrary and may be selected based on hardware limitations, such as instruction cache size Trace Collection Example We will now give an example demonstrating the operation of a trace collection system. Figure 2.5 shows a JVM and a TCS. The TCS is linked to a JVM that executes the program in Figure 2.1. The JVM contains the program to execute as well as storage for the program s variables. These variables values are modified as shown when the program is executed. The lower part of the figure shows the sequence of steps that the JVM performs when executing the program. Each line represents the step taken for a

21 Chapter 2. Background 11 single instruction 3. When the JVM executes each control flow instruction it calls the TCS with information regarding which instruction 4 is executed and what it does. Here we represent this behaviour by showing the JVM calling a function in the TCS called notify 5. The TCS keeps track of what is being executed and records profile information when this notify function is executed. The TCS contains three components: a set of event counters that are used to determine when recording should start, a recording buffer that is used to hold basic blocks as they are recorded, and a trace cache. The tasks that are performed when the control flow instructions of the example program are executed are shown in the figure on the left side of the TCS. The solid arrows show the effects of certain commands executed by the JVM and the TCS while the dotted arrows indicate the recording of a trace. In the figure the JVM executes basic blocks B1 and B2 repeatedly and the TCS keeps track of how often the backward branch between B2 and B1 is taken. When the backward branch is executed often enough 6, the TCS starts recording a sequence of basic blocks (i.e. a trace). The system then records the execution of the program until it detects that the next basic block is already in the recorded sequence. After recording is stopped the sequence of basic blocks, (B1,B2), is stored in the trace cache. After the trace (B1,B2) is saved the instruction i3 is executed and the system detects that i3 is in B1, which is the head of a trace. The recorded trace is then being executed and the TCS keeps track of this trace s execution. This is shown by incrementing the number of times that the trace starts. As the loop is repeatedly executed, which is not shown, this trace is executed to completion several times. After the loop exits, a return is executed. Assuming that the method was called by the first instruction, i0, in main(), then the target of the return is the second instruction, i1, in main() and the appropriate counter is incremented. If this return were executed often enough a trace would be recorded 3 The instruction t=(c)?a:b is equivalent to if (c) t=a; else t=b;. 4 We assume that the JVM executes bytecode instructions as they appear in the program and does not perform any optimizations when the trace collection system is being called. 5 This, however, is not the only approach. One approach could be to call the TCS at every instruction. Another approach would be to save the list of executed control flow instructions and process this list later. 6 We have set the threshold for the counter to 2 in this example for demonstration purposes.

22 Chapter 2. Background 12 starting at the second instruction in main(). Data a = i = Java Virtual Machine Code i0 a=0 i1 i=0 i2 goto i5 i3 a+=i i4 i++ i5 if (i<5) goto i3 i6 return a a=0; next=i1 i=0; next=i2 notify(i2,i5); next=i5 t=(i<5)?i3:i6; notify(i5,t); next=t a+=i; next=i4 i++; next=i5 t=(i<5)?i3:i6; notify(i5,t); next=t a+=i; next=i4 i++; next=i5 t=(i<5)?i3:i6; notify(i5,t); next=t... t=(i<5)?i3:i6; notify(i5,t); next=t t=main:i1; notify(i5,t); next=t Trace Collection System notify(i5,i3) increment counter notify(i5,i3) increment counter start recording notify(i5,i3) stop recording add trace to cache trace 1 starts... notify(i6,main:i1) increment counter Program Flow Data backward taken branch at i5: counter: return to main() i1: counter: 0 1 Recording Buffer i3: a+=i i4: i++ i5: if (i<5) goto B1 i3 Trace Cache Trace 1: Blocks: B1, B2 Starts: B2 Figure 2.5: Example of interaction between a JVM and a trace collection system Mispredicting Returns One observation that we make is that returns on traces can be frequently mispredicted. The problem arises because traces can start in methods that are invoked in multiple places and can go outside of these methods. If several of these call sites are frequently executed then it is likely that the trace expects to return to one of these call sites when the actual execution returns to another call site. In this case a trace exit occurs and the return is mispredicted. This behaviour is illustrated using the example program in Figure 2.6. Figure 2.7 contains three traces of this program. The traces are generated when method a() is called and then method b() is called. Trace 1 starts in method a(), goes through one path in method c(), and returns to method a(). Trace 2 starts at the early trace exit of Trace 1 in method c() and returns to method a(). Trace 3 starts in method

23 Chapter 2. Background 13 b(), goes through one path in method c(), and returns to method a(). Half the time that Trace 3 is started, an early trace exit to Trace 2 occurs. This will always result in the misprediction of the return in Trace 2. In Trace 2 the return is to method a(), but execution must return to method b(). It is therefore possible for returns on traces to be frequently mispredicted. void a() { for (int i=0;i<10000;i++) c(i); } void b() { for (int i=0;i<10000;i++) c(i); } void c(int i) { if ((i%2)==0)... work 1... else... work 2... return; } Figure 2.6: Source code for a return misprediction example. 2.5 Optimization using Traces The quality of program optimization is dependent on the scope and exactness of the analysis that is performed on the program s control flow graph. The use of traces may improve the opportunities for optimization in three ways. First, traces can span multiple methods, thus facilitating inter-procedural analysis and extending the scope of analyses. Second, traces contain only the most frequently executed portions of a program and therefore can be used to only optimize frequently executed instructions

24 Chapter 2. Background 14 Trace 1 Trace 2 Trace 3 A1: invoke c(i) C2: work 2 B1: invoke c(i) C0: if ((i%2)==0) C3: return C0: if ((i%2)==0) C1: work 1 A2: i++ C1: work 1 C3: return A3: if (i<10000) goto A1 C3: return A2: i++ B2: i++ A3: if (i<10000) goto A1 B3: if (i<10000) goto A1 Figure 2.7: Traces for a return misprediction example. saving compilation and optimization times. Finally, traces can be used to eliminate infrequently executed instructions from the control flow graph. The resulting control flow graph is simpler and therefore more amenable to optimization. In this case, however, because execution may go off trace, fix-up code must be added to ensure that when this occurs the program s execution is still correct. We illustrate the impact that traces have on optimization through an example. We describe the execution of a program and a trace that is recorded. We then show how the trace affects the control flow analysis and how this can be advantageous for optimization. The program in Figure 2.8 contains an array variable, zarray, and two methods, f() and foo(). Assuming that the condition in the if statement in method f() is true one hundred times and false five times, the trace that will be collected when foo() is executed is (B1,BB0,BB1,BB3,B2,B3,B5,B6). This is shown on the program s control flow graph in Figure 2.9. This trace has the three beneficial qualities that we have described earlier. First, the trace spans two methods. Second, the basic blocks on the trace are frequently executed. Finally, the instructions that are not frequently executed are not on the trace. The program can be optimized based on the extra information provided by this trace. Figure 2.10 contains the control flow graph with the optimized traced linked to the remainder of the program. The trace is optimized by adding, removing, and reordering instructions. The goto statements at the end of basic blocks BB1 and B3 are removed since they are not necessary on the recorded trace. The second if statement that was recorded is also removed because it is redundant on the trace. The statement

25 Chapter 2. Background 15 is redundant because on the trace the method f() returns the value 3, which is less than 50. There are two cases that can cause the trace to exit early. The first case occurs when the method that the program invokes is not the one on the trace. This is possible because Java s methods are virtual. That is, the method that an invoke actually calls may be different than the one specified by the actual instruction. An extra check must therefore be inserted to ensure that the appropriate method is executed. The second case occurs when the value of zarray[i] is greater than or equal to 5. In general it may be necessary to execute extra clean-up instructions after an early trace exit because the trace is optimized in such a way that the execution may only be valid if the trace executes to completion. Furthermore, the fact that the trace has a potential self-loop exit can be used to optimize the trace further. This optimization involves the movement of the computation of a+b to the beginning of the trace and the linking of the end of the trace with its second instruction instead of the first. This in effect shortens the trace when it is executed repeatedly with no early exits. This code motion is possible because the optimization considers only the straight line execution of the trace instead of all possible paths of execution in the program. The amount of time that it takes to optimize the trace should be less than the amount of time that it takes to optimize both functions. The main reason for this is that the trace s control flow is only a sequence of instructions while the functions have more complicated control flow. 2.6 Jupiter Jupiter is an interpreter-based JVM developed by Patrick Doyle at the University of Toronto [Doy02]. It is written in C in an object oriented style. Jupiter is composed of many separate modules that all interact with each other via simple interfaces. The modules can be divided into three parts: data structures used when executing Java such as classes, methods, threads, locks, stacks, and stack frames modules that manage these structures such as a class source or memory source

26 Chapter 2. Background 16 int f(int i) { int result; if (zarray[i]<5) result=3; else result=(i*zarray[i]*zarray[i-1]+zarray[i-2]*zarray[i-3]); return result; } int foo() { int res=0; int a=3; int b=6; int c; int d; for (int i=0;i<105;i++) { c=f(i); if (c<50) d=a+b; else d=a+c; res+=d; } return res; } Figure 2.8: Source code for an optimization example.

27 Chapter 2. Background 17 B0 res=0 i=0 a=3 b=6 goto B6 Trace 1 B1 invoke f(i) BB0 if (zarray[i]]>=5) goto BB2 BB1 result=3 goto BB3 BB2 result=i*zarray[i]*... BB3 return result B2 c=returned value if (c>=50) goto B4 B3 d=a+b goto B5 B4 d=a+c B5 res+=d i++ B6 if (i<105) goto B1 B7 return res Figure 2.9: Control flow graph for the optimization example.

28 Chapter 2. Background 18 Trace1 B0 res=0 i=0 a=3 b=6 goto B6 d=a+b if (f(i) is wrong) goto trace_exit1 if (zarray[i]>=5) goto trace_exit2 res+=d i++ if (i<105) goto if (f(i)...) trace_exit3 B1 invoke f(i) trace_exit1 cleanup code BB0 if (zarray[i]]>=5) goto BB2 trace_exit2 cleanup code BB1 result=3 goto BB3 BB2 result=i*zarray[i]*... trace_exit3 BB3 return result B2 c=returned value if (c>=50) goto B4 B3 d=a+b goto B5 B4 d=a+c B5 res+=d i++ B6 if (i<105) goto B1 B7 return res Figure 2.10: Control flow graph with a trace for the optimization example.

29 Chapter 2. Background 19 the execution engine module that directs all the other modules Jupiter s execution engine module is an interpreter that is designed so that each bytecode is interpreted in a case statement within a big switch block. The interpreter s main function contains this switch block and is passed in a set of bytecodes and information about the context in which the bytecodes should execute. The case statement for each bytecode performs the required tasks by calling the appropriate functions in various modules, including the class, object, thread, and memory source modules. Many of the frequently executed portions of the interpreter are written as macros so as to make the interpreter easier to manage. We have extended Jupiter with a framework that collects information about the runtime behaviour of a program and generates traces. 2.7 Jikes The Jikes Research Virtual Machine (RVM) is an open source JVM developed at IBM that is designed to be used for just-in-time (JIT) compiler research [AFG + 00]. Jikes is written in Java and bootstraps itself 7. It is designed to deliver performance that is comparable to commercial JVMs. To achieve this, it uses a compile-only strategy and indeed employs two compilers. The first, or baseline, compiler quickly translates Java bytecodes into unoptimized native code. The second compiler is an optimizing compiler that takes longer to generate native code, but the native code is much faster than the one produced by the first compiler. This second compiler is only used to compile methods that are frequently executed. This strategy is necessary because the time spent optimizing the infrequently executed sections of the program would not be recovered by the increase in speed of that section s execution. Jikes consists of four major parts: The core runtime which, in turn, consists of threads, the class loader, the hardware interface, etc., and is the portion of Jikes that loads programs from files, executes native code produced by the compilers. It also provides a mechanism for the native code to access the other parts of Jikes. 7 A scaled down version of Jikes is executed by another JVM to produce the final Jikes executable.

30 Chapter 2. Background 20 A memory manager that is responsible for all of the memory that Jikes uses when it is executing a program and is called when new memory is allocated or when garbage collection is performed. Baseline and optimizing compilers that take methods loaded by the runtime and turn their bytecode instructions into native code that can be executed. Optimizing systems that control when certain frequently executed methods are recompiled and which optimizations are performed during the recompilation. The different optimization systems in Jikes are: a static system, an adaptive system, and an ahead-of-time system. The static system just uses the default compiler that is selected by the user. The adaptive system uses instrumentation and a second thread so that it can monitor the behaviour of the program and optimize code as the program executes. The ahead-of-time system, which is also called the optimization test harness, performs compilation before the program executes. The compilation is based on options given to Jikes by the user. The options include methods of both Jikes and the program to be compiled as well as the optimization options that should be used when compiling. In all cases, when Jikes is invoked, parameters can be passed via the command line to affect the behaviour of the optimization system and all the other components of Jikes. Consider how Jikes, with the adaptive system, executes a program that contains method a() in class A, and method main() in class B. When Jikes starts executing this program, it loads the program s primary class, class B, and compiles the main method of the program using one of the two compilers. Once the main method and anything else that is required is compiled, the program is started in its own thread. Classes and objects are loaded and compiled when they are first called. So if main() is executed and calls method a() then Jikes will load and compile method a(). The program s thread is suspended periodically so that Jikes can perform internal tasks and, if a multi-threaded program is executing, other threads can execute. The internal tasks include the optimization system being called so that it keeps track of what is being frequently executed and methods being compiled. As the program executes, the

31 Chapter 2. Background 21 optimization system controls method compilation. Frequently executed methods are therefore compiled and recompiled, and the newest versions are always used. Methods that are infrequently executed on the other hand will be compiled only at the beginning with little or no optimization. The Jikes executable can only have one type of memory manager and one type of optimization system. The user must therefore choose between these different types of components when the Jikes executable is created via command line parameters. The selected components are compiled along with the core runtime, the compilers, and certain parts of the Java class library that the selected components use. All of this is put into one large executable. The remainder of this section consists of a more detailed description of the operation of the optimization systems and how the optimizing compiler performs inlining The Adaptive System The adaptive system controls the recompilation of frequently executed methods so that they are optimized and will therefore execute faster. The adaptive system consists of a set of listeners, a set of organizer threads, a compilation directing thread, and a set of compilation threads. Listeners record information about the program s execution. Organizer threads analyze this information and generate compilation requests based on the analysis. The compilation directing thread takes the compilation requests and creates a compilation thread for each request that it services. The compilation threads perform the actual compilation by invoking the optimizing compiler. The program that is being executed is instrumented by Jikes s compilers so that at certain time intervals the executing thread yields to another thread. While the thread is in the process of yielding it calls the adaptive system s listeners, which record the point at which the yield occurred. When a listener determines that there is enough information to process, it wakes up its associated organizer thread. The organizer thread carries out the necessary analyses to determine what should be compiled and how it should be compiled based on the data available, and then it puts all compilation requests on a compilation queue. When the compilation directing thread executes, it goes through this

32 Chapter 2. Background 22 queue and processes the compilation requests through the use of compilation threads. The adaptive system can have multiple listener organizer thread pairs so that it can use multiple adaptive strategies at the same time. The adaptive system can be configured by the use of command line parameters to behave in various ways. These parameters specify which optimizations are enabled, how inlining is performed, and other aspects of compilation The Optimization Test Harness Another type of optimization system within Jikes is the optimization test harness. The harness takes compiler commands as input. All of the commands are passed in as command line parameters or in a file whose name is passed in as a command line parameter. The commands consist of: parameters to the baseline and optimizing compilers, the methods to compile using the baseline compiler, the methods to compile using the optimizing compiler, which inlining plan to use, and what the main method to execute should be. The harness acts on each command one at a time starting from the first one that it is given. Once the test harness compiles all the methods that it is supposed to, it executes the program. If a method that has not been compiled ahead of time is encountered by the program s execution then the execution is suspended until the compilation is performed. Once the program is completely executed the harness prints out the elapsed time between the beginning and end of the program s execution. This allows us to measure the amount of time that a program takes to execute given everything is compiled and inlined according to a selected strategy. We can therefore compare different compilation and inlining strategies in terms of the corresponding execution times Inlining in Jikes Inlining is the replacement of the invoke of a method with that method s instructions. When inlining is performed at an invoke, referred to as a call site, the call site is said to be inlined. Inlining enables inter-procedural analysis by turning several methods into one large method. In Jikes, inlining is performed via a set of inlining oracles, which

33 Chapter 2. Background 23 are objects that specify whether a specific call site should be inlined or not. The oracle can base its decision on several factors: the current call site, the target method, information about the current method and the target method, as well as an inline sequence, which is a list of call sites that specifies the location of the call site within an inlining hierarchy. Consider the example in Figure 2.11 where method a() is being compiled. In this example a call to method b() at bytecode index 5 is already inlined, and there is a call site in method b() at bytecode index 10 to method c() for which the inlining oracle is called. In this example an inlining oracle that bases its decision on inline sequences must decide whether the inline sequence a() 5, b() 10, c() is valid. The oracle must have a list of acceptable inline sequences that it uses for this purpose. The specified sequence is valid only when one of the acceptable inline sequences is a suffix of it. Therefore if the list of acceptable sequences contains the sequence a() 5, b() 10, c() then the sequence a() 5, b() 10, c() is valid. Furthermore, if b() 10, c() is in the list of acceptable sequences then a() 5, b() 10, c() is also valid. These are the only two sequences that the acceptable sequence list can contain which would make the sequence a() 5, b() 10, c() valid. method a()... 5: invoke b()... compile a() method b()... 10: invoke c() method... a() Should c() be inlined? method b()... 10: invoke c()... Inline Sequence: a() 5, b() 10, c() method c()... If one of the acceptable sequences is the sequence b() 10, c() or the sequence a() 5, b() 10, c() then yes, and otherwise no. Figure 2.11: An example of inlining using inline sequences.

34 Chapter 2. Background Inlining Oracles There are several inlining oracles. The simplest is a static inlining oracle that just looks at the source of the method and how much code expansion would be caused by inlining the method. Inlining will occur only if the expansion is below a certain threshold. This is done because it is often beneficial to inline small methods since they do not cause a large code expansion and the overhead of calling them is a substantial part of their execution. Another type of oracle bases its decision on a set of call site/method pairs that have been identified as beneficial. This oracle first calls the static oracle to determine whether a method should be inlined regardless of what the inlining plan contains. If the static inlining oracle decides that the method should be inlined then the method is inlined. Otherwise, the target method is inlined only if a call site method pair exists that corresponds to a call site and a target method that is being considered for inlining. The inline may not be performed in some cases, such as when the inline would cause the code to grow beyond memory limits imposed by Jikes. Both the adaptive and the ahead-of-time systems use inlining oracles with inline plans. In the case of the adaptive system, the inlining plan is generated as the system collects information about the program. Optionally, the system can read in an initial inline plan that can be further enhanced as the system collects information about the executing program. When the system decides it is time to compile a method, it invokes a compiler and passes to it a reference an inlining oracle with a specific inlining plan. The name of the file that contains an initial plan can be passed in as a command line parameter by the user. The ahead-of-time system, also referred to as the optimization test harness, can also be controlled via the command line to load in a certain inlining plan and to use an inlining oracle based on it. 2.8 Related Work There is a great deal of work that has been done both in the area of feedback directed optimization [Smi00] and just-in-time (JIT) compilation [Ayc01]. Although in a sense it is all related to our work, we will only focus on related work that deals with traces and related work that deals with feedback directed optimization in Java.

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

Jupiter: A Modular and Extensible JVM

Jupiter: A Modular and Extensible JVM Jupiter: A Modular and Extensible JVM Patrick Doyle and Tarek Abdelrahman Edward S. Rogers, Sr. Department of Electrical and Computer Engineering University of Toronto {doylep tsa}@eecg.toronto.edu Outline

More information

CSc 453 Interpreters & Interpretation

CSc 453 Interpreters & Interpretation CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson Interpreters An interpreter is a program that executes another program. An interpreter implements a virtual machine,

More information

Lecture 9 Dynamic Compilation

Lecture 9 Dynamic Compilation Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto)

YETI. GraduallY Extensible Trace Interpreter VEE Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto) YETI GraduallY Extensible Trace Interpreter Mathew Zaleski, Angela Demke Brown (University of Toronto) Kevin Stoodley (IBM Toronto) VEE 2007 1 Goal Create a VM that is more easily extended with a just

More information

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center October

More information

Virtual Machine Design

Virtual Machine Design Virtual Machine Design Lecture 4: Multithreading and Synchronization Antero Taivalsaari September 2003 Session #2026: J2MEPlatform, Connected Limited Device Configuration (CLDC) Lecture Goals Give an overview

More information

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1 SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine David Bélanger dbelan2@cs.mcgill.ca Sable Research Group McGill University Montreal, QC January 28, 2004 SABLEJIT: A Retargetable

More information

CS553 Lecture Profile-Guided Optimizations 3

CS553 Lecture Profile-Guided Optimizations 3 Profile-Guided Optimizations Last time Instruction scheduling Register renaming alanced Load Scheduling Loop unrolling Software pipelining Today More instruction scheduling Profiling Trace scheduling CS553

More information

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center April

More information

Reducing the Overhead of Dynamic Compilation

Reducing the Overhead of Dynamic Compilation Reducing the Overhead of Dynamic Compilation Chandra Krintz y David Grove z Derek Lieber z Vivek Sarkar z Brad Calder y y Department of Computer Science and Engineering, University of California, San Diego

More information

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

Trace-based JIT Compilation

Trace-based JIT Compilation Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,

More information

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides

Acknowledgements These slides are based on Kathryn McKinley s slides on garbage collection as well as E Christopher Lewis s slides Garbage Collection Last time Compiling Object-Oriented Languages Today Motivation behind garbage collection Garbage collection basics Garbage collection performance Specific example of using GC in C++

More information

Full file at

Full file at Import Settings: Base Settings: Brownstone Default Highest Answer Letter: D Multiple Keywords in Same Paragraph: No Chapter: Chapter 2 Multiple Choice 1. A is an example of a systems program. A) command

More information

Java PathFinder JPF 2 Second Generation of Java Model Checker

Java PathFinder JPF 2 Second Generation of Java Model Checker Java PathFinder JPF 2 Second Generation of Java Model Checker Guenther Brand Mat. Nr. 9430535 27. 06. 2003 Abstract This essay is based on the papers Java PathFinder, Second Generation of Java Model Checker

More information

Techniques for Efficient Processing in Runahead Execution Engines

Techniques for Efficient Processing in Runahead Execution Engines Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt Depment of Electrical and Computer Engineering University of Texas at Austin {onur,hyesoon,patt}@ece.utexas.edu

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Reducing the Overhead of Dynamic Compilation

Reducing the Overhead of Dynamic Compilation Reducing the Overhead of Dynamic Compilation Chandra Krintz David Grove Derek Lieber Vivek Sarkar Brad Calder Department of Computer Science and Engineering, University of California, San Diego IBM T.

More information

Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02)

Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02) USENIX Association Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02) San Francisco, California, USA August 1-2, 2002 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

High-Level Language VMs

High-Level Language VMs High-Level Language VMs Outline Motivation What is the need for HLL VMs? How are these different from System or Process VMs? Approach to HLL VMs Evolutionary history Pascal P-code Object oriented HLL VMs

More information

Mixed Mode Execution with Context Threading

Mixed Mode Execution with Context Threading Mixed Mode Execution with Context Threading Mathew Zaleski, Marc Berndl, Angela Demke Brown University of Toronto {matz,berndl,demke}@cs.toronto.edu (CASCON 2005, Oct 19/2005.) Overview Introduction Background:

More information

CPS221 Lecture: Threads

CPS221 Lecture: Threads Objectives CPS221 Lecture: Threads 1. To introduce threads in the context of processes 2. To introduce UML Activity Diagrams last revised 9/5/12 Materials: 1. Diagram showing state of memory for a process

More information

Compiler-guaranteed Safety in Code-copying Virtual Machines

Compiler-guaranteed Safety in Code-copying Virtual Machines Compiler-guaranteed Safety in Code-copying Virtual Machines Gregory B. Prokopski Clark Verbrugge School of Computer Science Sable Research Group McGill University Montreal, Canada International Conference

More information

Managed runtimes & garbage collection

Managed runtimes & garbage collection Managed runtimes Advantages? Managed runtimes & garbage collection CSE 631 Some slides by Kathryn McKinley Disadvantages? 1 2 Managed runtimes Portability (& performance) Advantages? Reliability Security

More information

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley

Managed runtimes & garbage collection. CSE 6341 Some slides by Kathryn McKinley Managed runtimes & garbage collection CSE 6341 Some slides by Kathryn McKinley 1 Managed runtimes Advantages? Disadvantages? 2 Managed runtimes Advantages? Reliability Security Portability Performance?

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

Chapter 12. CPU Structure and Function. Yonsei University

Chapter 12. CPU Structure and Function. Yonsei University Chapter 12 CPU Structure and Function Contents Processor organization Register organization Instruction cycle Instruction pipelining The Pentium processor The PowerPC processor 12-2 CPU Structures Processor

More information

Towards future method hotness prediction for Virtual Machines

Towards future method hotness prediction for Virtual Machines Towards future method hotness prediction for Virtual Machines Manjiri A. Namjoshi Submitted to the Department of Electrical Engineering & Computer Science and the Faculty of the Graduate School of the

More information

ONLINE PROFILING AND FEEDBACK-DIRECTED OPTIMIZATION OF JAVA

ONLINE PROFILING AND FEEDBACK-DIRECTED OPTIMIZATION OF JAVA ONLINE PROFILING AND FEEDBACK-DIRECTED OPTIMIZATION OF JAVA BY MATTHEW ARNOLD A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial fulfillment

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

Java performance - not so scary after all

Java performance - not so scary after all Java performance - not so scary after all Holly Cummins IBM Hursley Labs 2009 IBM Corporation 2001 About me Joined IBM Began professional life writing event framework for WebSphere 2004 Moved to work on

More information

StackVsHeap SPL/2010 SPL/20

StackVsHeap SPL/2010 SPL/20 StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of

More information

Trace Compilation. Christian Wimmer September 2009

Trace Compilation. Christian Wimmer  September 2009 Trace Compilation Christian Wimmer cwimmer@uci.edu www.christianwimmer.at September 2009 Department of Computer Science University of California, Irvine Background Institute for System Software Johannes

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Atropos User s manual

Atropos User s manual Atropos User s manual Jan Lönnberg 22nd November 2010 1 Introduction Atropos is a visualisation tool intended to display information relevant to understanding the behaviour of concurrent Java programs,

More information

Principles in Computer Architecture I CSE 240A (Section ) CSE 240A Homework Three. November 18, 2008

Principles in Computer Architecture I CSE 240A (Section ) CSE 240A Homework Three. November 18, 2008 Principles in Computer Architecture I CSE 240A (Section 631684) CSE 240A Homework Three November 18, 2008 Only Problem Set Two will be graded. Turn in only Problem Set Two before December 4, 2008, 11:00am.

More information

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13 Run-time Environments Lecture 13 by Prof. Vijay Ganesh) Lecture 13 1 What have we covered so far? We have covered the front-end phases Lexical analysis (Lexer, regular expressions,...) Parsing (CFG, Top-down,

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices LINGLI ZHANG, CHANDRA KRINTZ University of California, Santa Barbara Java Virtual Machines (JVMs)

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace

Project. there are a couple of 3 person teams. a new drop with new type checking is coming. regroup or see me or forever hold your peace Project there are a couple of 3 person teams regroup or see me or forever hold your peace a new drop with new type checking is coming using it is optional 1 Compiler Architecture source code Now we jump

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #29 Arrays in C

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #29 Arrays in C Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #29 Arrays in C (Refer Slide Time: 00:08) This session will learn about arrays in C. Now, what is the word array

More information

Topics on Compilers Spring Semester Christine Wagner 2011/04/13

Topics on Compilers Spring Semester Christine Wagner 2011/04/13 Topics on Compilers Spring Semester 2011 Christine Wagner 2011/04/13 Availability of multicore processors Parallelization of sequential programs for performance improvement Manual code parallelization:

More information

Exploiting the Behavior of Generational Garbage Collector

Exploiting the Behavior of Generational Garbage Collector Exploiting the Behavior of Generational Garbage Collector I. Introduction Zhe Xu, Jia Zhao Garbage collection is a form of automatic memory management. The garbage collector, attempts to reclaim garbage,

More information

Just-In-Time Compilation

Just-In-Time Compilation Just-In-Time Compilation Thiemo Bucciarelli Institute for Software Engineering and Programming Languages 18. Januar 2016 T. Bucciarelli 18. Januar 2016 1/25 Agenda Definitions Just-In-Time Compilation

More information

Visual Amortization Analysis of Recompilation Strategies

Visual Amortization Analysis of Recompilation Strategies 2010 14th International Information Conference Visualisation Information Visualisation Visual Amortization Analysis of Recompilation Strategies Stephan Zimmer and Stephan Diehl (Authors) Computer Science

More information

Program Correctness and Efficiency. Chapter 2

Program Correctness and Efficiency. Chapter 2 Program Correctness and Efficiency Chapter 2 Chapter Objectives To understand the differences between the three categories of program errors To understand the effect of an uncaught exception and why you

More information

Reducing Trace Selection Footprint for Large-scale Java Applications without Performance Loss

Reducing Trace Selection Footprint for Large-scale Java Applications without Performance Loss Reducing Trace Selection Footprint for Large-scale Java Applications without Performance Loss Peng Wu Hiroshige Hayashizaki Hiroshi Inoue Toshio Nakatani IBM Research pengwu@us.ibm.com,{hayashiz,inouehrs,nakatani}@jp.ibm.com

More information

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions Harry Xu May 2012 Complex, concurrent software Precision (no false positives) Find real bugs in real executions Need to modify JVM (e.g., object layout, GC, or ISA-level code) Need to demonstrate realism

More information

Software Speculative Multithreading for Java

Software Speculative Multithreading for Java Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III Subject Name: Operating System (OS) Subject Code: 630004 Unit-1: Computer System Overview, Operating System Overview, Processes

More information

Enterprise Architect. User Guide Series. Profiling

Enterprise Architect. User Guide Series. Profiling Enterprise Architect User Guide Series Profiling Investigating application performance? The Sparx Systems Enterprise Architect Profiler finds the actions and their functions that are consuming the application,

More information

Enterprise Architect. User Guide Series. Profiling. Author: Sparx Systems. Date: 10/05/2018. Version: 1.0 CREATED WITH

Enterprise Architect. User Guide Series. Profiling. Author: Sparx Systems. Date: 10/05/2018. Version: 1.0 CREATED WITH Enterprise Architect User Guide Series Profiling Author: Sparx Systems Date: 10/05/2018 Version: 1.0 CREATED WITH Table of Contents Profiling 3 System Requirements 8 Getting Started 9 Call Graph 11 Stack

More information

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University

Soot A Java Bytecode Optimization Framework. Sable Research Group School of Computer Science McGill University Soot A Java Bytecode Optimization Framework Sable Research Group School of Computer Science McGill University Goal Provide a Java framework for optimizing and annotating bytecode provide a set of API s

More information

Programming Style and Optimisations - An Overview

Programming Style and Optimisations - An Overview Programming Style and Optimisations - An Overview Summary In this lesson we introduce some of the style and optimization features you may find useful to understand as a C++ Programmer. Note however this

More information

Java Performance Tuning

Java Performance Tuning 443 North Clark St, Suite 350 Chicago, IL 60654 Phone: (312) 229-1727 Java Performance Tuning This white paper presents the basics of Java Performance Tuning and its preferred values for large deployments

More information

Network Working Group. Obsoletes: 3452, 3695 March 2009 Category: Standards Track

Network Working Group. Obsoletes: 3452, 3695 March 2009 Category: Standards Track Network Working Group M. Watson Request for Comments: 5445 Digital Fountain Obsoletes: 3452, 3695 March 2009 Category: Standards Track Status of This Memo Basic Forward Error Correction (FEC) Schemes This

More information

Combining Analyses, Combining Optimizations - Summary

Combining Analyses, Combining Optimizations - Summary Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate

More information

Last class: OS and Architecture. OS and Computer Architecture

Last class: OS and Architecture. OS and Computer Architecture Last class: OS and Architecture OS and Computer Architecture OS Service Protection Interrupts System Calls IO Scheduling Synchronization Virtual Memory Hardware Support Kernel/User Mode Protected Instructions

More information

Last class: OS and Architecture. Chapter 3: Operating-System Structures. OS and Computer Architecture. Common System Components

Last class: OS and Architecture. Chapter 3: Operating-System Structures. OS and Computer Architecture. Common System Components Last class: OS and Architecture Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation

More information

CS455: Introduction to Distributed Systems [Spring 2019] Dept. Of Computer Science, Colorado State University

CS455: Introduction to Distributed Systems [Spring 2019] Dept. Of Computer Science, Colorado State University CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [THREADS] The House of Heap and Stacks Stacks clean up after themselves But over deep recursions they fret The cheerful heap has nary a care Harboring memory

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Java Performance: The Definitive Guide

Java Performance: The Definitive Guide Java Performance: The Definitive Guide Scott Oaks Beijing Cambridge Farnham Kbln Sebastopol Tokyo O'REILLY Table of Contents Preface ix 1. Introduction 1 A Brief Outline 2 Platforms and Conventions 2 JVM

More information

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler

Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler , Compilation Technology Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler Daryl Maier, Pramod Ramarao, Mark Stoodley, Vijay Sundaresan TestaRossa JIT compiler

More information

Chapter 3: Operating-System Structures

Chapter 3: Operating-System Structures Chapter 3: Operating-System Structures System Components Operating System Services System Calls System Programs System Structure Virtual Machines System Design and Implementation System Generation 3.1

More information

infix expressions (review)

infix expressions (review) Outline infix, prefix, and postfix expressions queues queue interface queue applications queue implementation: array queue queue implementation: linked queue application of queues and stacks: data structure

More information

Adaptive Optimization using Hardware Performance Monitors. Master Thesis by Mathias Payer

Adaptive Optimization using Hardware Performance Monitors. Master Thesis by Mathias Payer Adaptive Optimization using Hardware Performance Monitors Master Thesis by Mathias Payer Supervising Professor: Thomas Gross Supervising Assistant: Florian Schneider Adaptive Optimization using HPM 1/21

More information

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Field Analysis. Last time Exploit encapsulation to improve memory system performance Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April

More information

QUIZ Friends class Y;

QUIZ Friends class Y; QUIZ Friends class Y; Is a forward declaration neeed here? QUIZ Friends QUIZ Friends - CONCLUSION Forward (a.k.a. incomplete) declarations are needed only when we declare member functions as friends. They

More information

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat

Shenandoah: An ultra-low pause time garbage collector for OpenJDK. Christine Flood Roman Kennke Principal Software Engineers Red Hat Shenandoah: An ultra-low pause time garbage collector for OpenJDK Christine Flood Roman Kennke Principal Software Engineers Red Hat 1 Shenandoah Why do we need it? What does it do? How does it work? What's

More information

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/

CS61C Machine Structures. Lecture 4 C Pointers and Arrays. 1/25/2006 John Wawrzynek. www-inst.eecs.berkeley.edu/~cs61c/ CS61C Machine Structures Lecture 4 C Pointers and Arrays 1/25/2006 John Wawrzynek (www.cs.berkeley.edu/~johnw) www-inst.eecs.berkeley.edu/~cs61c/ CS 61C L04 C Pointers (1) Common C Error There is a difference

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Exploiting Statistical Correlations for Proactive Prediction of Program Behaviors

Exploiting Statistical Correlations for Proactive Prediction of Program Behaviors Exploiting Statistical Correlations for Proactive Prediction of Program Behaviors Yunlian Jiang, Eddy Zhang, Kai Tian, Feng Mao, Malcom Gethers, Xipeng Shen CAPS ResearchR Group, College of William and

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST Chapter 4. Advanced Pipelining and Instruction-Level Parallelism In-Cheol Park Dept. of EE, KAIST Instruction-level parallelism Loop unrolling Dependence Data/ name / control dependence Loop level parallelism

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

CS 221 Review. Mason Vail

CS 221 Review. Mason Vail CS 221 Review Mason Vail Inheritance (1) Every class - except the Object class - directly inherits from one parent class. Object is the only class with no parent. If a class does not declare a parent using

More information

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [THREADS] Frequently asked questions from the previous class survey

CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [THREADS] Frequently asked questions from the previous class survey CS 455: INTRODUCTION TO DISTRIBUTED SYSTEMS [THREADS] Shrideep Pallickara Computer Science Colorado State University L6.1 Frequently asked questions from the previous class survey L6.2 SLIDES CREATED BY:

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Operating Systems: Internals and Design Principles You re gonna need a bigger boat. Steven

More information

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection

Deallocation Mechanisms. User-controlled Deallocation. Automatic Garbage Collection Deallocation Mechanisms User-controlled Deallocation Allocating heap space is fairly easy. But how do we deallocate heap memory no longer in use? Sometimes we may never need to deallocate! If heaps objects

More information

Measuring and Improving the Potential Parallelism of Sequential Java Programs

Measuring and Improving the Potential Parallelism of Sequential Java Programs Measuring and Improving the Potential Parallelism of Sequential Java Programs Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The

More information

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc. Chapter 1 GETTING STARTED SYS-ED/ Computer Education Techniques, Inc. Objectives You will learn: Java platform. Applets and applications. Java programming language: facilities and foundation. Memory management

More information

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices LINGLI ZHANG and CHANDRA KRINTZ University of California, Santa Barbara Java Virtual Machines (JVMs)

More information

Intermediate Code & Local Optimizations

Intermediate Code & Local Optimizations Lecture Outline Intermediate Code & Local Optimizations Intermediate code Local optimizations Compiler Design I (2011) 2 Code Generation Summary We have so far discussed Runtime organization Simple stack

More information

point in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd

point in worrying about performance. The goal of our work is to show that this is not true. This paper is organised as follows. In section 2 we introd A Fast Java Interpreter David Gregg 1, M. Anton Ertl 2 and Andreas Krall 2 1 Department of Computer Science, Trinity College, Dublin 2, Ireland. David.Gregg@cs.tcd.ie 2 Institut fur Computersprachen, TU

More information

Debugging Tools for MIDP Java Devices

Debugging Tools for MIDP Java Devices Debugging Tools for MIDP Java Devices Olli Kallioinen 1 and Tommi Mikkonen 2 1 Sasken Finland, Tampere, Finland olli.kallioinen@sasken.com 2 Tampere University of Technology, Tampere, Finland tommi.mikkonen@tut.fi

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection (2) Advanced Operating Systems Lecture 9 Garbage Collection (2) Advanced Operating Systems Lecture 9 Lecture Outline Garbage collection Generational algorithms Incremental algorithms Real-time garbage collection Practical factors 2 Object Lifetimes

More information

Advanced Programming & C++ Language

Advanced Programming & C++ Language Advanced Programming & C++ Language ~6~ Introduction to Memory Management Ariel University 2018 Dr. Miri (Kopel) Ben-Nissan Stack & Heap 2 The memory a program uses is typically divided into four different

More information

3.4 Data-Centric workflow

3.4 Data-Centric workflow 3.4 Data-Centric workflow One of the most important activities in a S-DWH environment is represented by data integration of different and heterogeneous sources. The process of extract, transform, and load

More information

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation Language Implementation Methods The Design and Implementation of Programming Languages Compilation Interpretation Hybrid In Text: Chapter 1 2 Compilation Interpretation Translate high-level programs to

More information

Life Cycle of Source Program - Compiler Design

Life Cycle of Source Program - Compiler Design Life Cycle of Source Program - Compiler Design Vishal Trivedi * Gandhinagar Institute of Technology, Gandhinagar, Gujarat, India E-mail: raja.vishaltrivedi@gmail.com Abstract: This Research paper gives

More information

Hardware-Supported Pointer Detection for common Garbage Collections

Hardware-Supported Pointer Detection for common Garbage Collections 2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute

More information

Running class Timing on Java HotSpot VM, 1

Running class Timing on Java HotSpot VM, 1 Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return

More information