ONLINE PROFILING AND FEEDBACK-DIRECTED OPTIMIZATION OF JAVA

Size: px
Start display at page:

Download "ONLINE PROFILING AND FEEDBACK-DIRECTED OPTIMIZATION OF JAVA"

Transcription

1 ONLINE PROFILING AND FEEDBACK-DIRECTED OPTIMIZATION OF JAVA BY MATTHEW ARNOLD A dissertation submitted to the Graduate School New Brunswick Rutgers, The State University of New Jersey in partial fulfillment of the requirements for the degree of Doctor of Philosophy Graduate Program in Computer Science Written under the direction of Barbara Gershon Ryder and approved by New Brunswick, New Jersey October, 2002

2 c 2002 Matthew Arnold ALL RIGHTS RESERVED

3 ABSTRACT OF THE DISSERTATION Online Profiling and Feedback-directed Optimization of Java by MATTHEW ARNOLD Dissertation Director: Barbara Gershon Ryder The dynamic nature of the Java TM programming language presents a number of challenges for Java Virtual Machine (JVM) implementations. Constructs such as dynamic class loading and reflection make traditional whole program analysis and optimization difficult, or even impossible; however, Java s dynamic execution environment also presents a potential performance advantage over the traditional static compilation model: the ability to perform feedback-directed optimizations, where profiling information is collected at runtime and used to tailor the optimization decisions that are made. Although previous work has shown that feedback-directed optimizations can substantially improve program performance, most of these systems used offline profiles collected using a separate training run. Performing profiling and optimization online during the same run is an attractive approach because it avoids the need for a separate training run. Unfortunately, the overhead of collecting online profiles is often a problem, and is one of the main reasons why today s JVM s perform only limited forms of feedback-directed optimizations. The first contribution of this thesis is a new technique called an instrumentation sampling framework, a mechanism that allows previously expensive instrumentation to be executed with low overhead. The instrumentation sampling framework is designed as an automatic code transformation that takes instrumented code as input, and produces a ii

4 modified version of the code that collects a similar profile, but executes with low overhead. We implemented and evaluated our framework in the Jikes Research Virtual Machine; our results demonstrate that the sampling framework effectively reduces the overhead of several types of instrumentation while having only a minimal effect on the accuracy of the profiles collected. The second contribution of this thesis is the design and implementation of an online system that uses instrumentation sampling to drive feedback-directed optimizations and improve the performance of Java programs. Our implementation is built on top of the general adaptive optimization architecture of the Jikes RVM. Our system collects intra-procedural edge profiles using the instrumentation sampling framework and uses the resulting profiles to drive four feedback-directed optimizations. Our empirical evaluation demonstrates that our online approach can improve the performance of long-running programs without degrading the performance of short-running programs. iii

5 Acknowledgements This thesis would not have been possible without the help of many people... I would like to start by thanking my advisor, Professor Barbara Ryder, for the guidance and support that she gave me throughout my entire graduate school experience. She taught me many of the fundamental skills required to become an independent researcher, and I am thankful for the time that she spent working with me. I appreciated her encouragement to pursue ideas that interested me and her continued support while following through on these ideas. I would also like to thank Michael Hind from IBM Research, who acted as a mentor to me during my time at IBM. I am grateful not only for the numerous technical interactions that we shared while discussing research ideas, but also for the advice and support that he provided to help ensure that my time at IBM was a success; without his help, the outcome of this thesis would most likely have been very different. In addition, his help in writing our OOPSLA 02 paper directly influenced the second half of this thesis. I had a tremendous experience at IBM Research working with Stephen Fink, David Grove, Michael Hind, and Peter Sweeney on the Jikes RVM adaptive optimization system. I would like to thank them for creating an environment in which I could learn so much, yet have fun in the process. The conversations and brainstorming sessions we had together are responsible for many of the ideas present in this thesis. I would also like to thank Michael Burke for offering me my original summer internship at IBM, and Peter Sweeney who not only guided me during that internship, but also helped convince me to extend my internship and continue my relationship with IBM. I would also like to thank everyone else at IBM who helped me along the way, including David Bacon, Julian Dolby, Igor Pechtchanski, Vivek Sarkar, Martin Trapp, and Mark Wegman. In addition, I would like to acknowledge IBM research for their financial support. I am grateful to Craig Chambers, Michael Hind, Rich Martin, and Barbara Ryder for iv

6 their helpful comments on this thesis. I also thank all of the members of the PROLANGS lab for providing an enjoyable research environment during my time at Rutgers. In particular, Nasko Rountev and I had endless conversations about topics ranging from research, to the meaning of life; these conversations were not only educational, but helped make graduate school a memorable experience. Finally, I would like to thank my family for their endless support during my time in school. From the very beginning my parents encouraged me to stay in school, and made several strategic moves along the way to ensure that I did. I am very grateful for their efforts that ultimately helped me choose that path that I did. And last, but certainly not least, I want to thank my wife Jelena for many so things. First, she deserves an award simply for putting up with me through all that we experienced during the last several years. But even more importantly, I owe her more than can be expressed for the continual support that she gave me: being there when I needed help, encouraging me to make rational decisions and, in general, helping me to maintain my sanity. I am forever grateful. v

7 Dedication To my family: To my parents, who encouraged and supported me throughout the long journey of my college career; it helped more than you may realize. And to my wife Jelena, who endured all of the ups and downs together with me. I will always remember and appreciate the conversations that we had and the support that you gave me. vi

8 Table of Contents Abstract ii Acknowledgements iv Dedication vi List of Tables xii List of Figures xiii 1. Introduction Thesis Contributions Framework for Low-overhead Instrumentation Online System Performing Instrumentation and Feedback-Directed Optimization Thesis Organization Background: The Jikes RVM The Jikes RVM Optimizing Compiler Threading Model Adaptive Optimization System General Architecture Current Instantiation Controller Model I Low-overhead Instrumentation Instrumentation Sampling Framework vii

9 3.1. Technique Check placement Reducing dynamic check frequency Trigger mechanisms Counter-based sampling Implementation options Timer-based sampling Event-based sampling Discussion: The effect of polling Applicability to various types of instrumentation Space-saving variations Variation 1: Partial-Duplication Variation 2: No-Duplication Implementation and Experimental Evaluation Implementation Full-Duplication code transformation Check implementation Jikes RVM yieldpoint optimization Phase ordering Interaction with Instrumentation Experimental Results Benchmarks Methodology Instrumentation examples Framework overhead Full-Duplication algorithm No-Duplication algorithm Sampled instrumentation overhead and accuracy viii

10 Overhead Accuracy Jikes RVM-specific optimization Trigger Mechanisms II Online Instrumentation and Feedback-Directed Optimization Online FDO: Design and Implementation Background Challenges in Performing Online FDO Existing Online Strategies Profile early during unoptimized execution Profile optimized code Sample throughout execution Design Goals Online Strategy Implementation Online Strategy Instrumentation: Intraprocedural Edge Profiles Collecting Edge Profiles Using Edge Profiles Feedback-Directed Optimizations Splitting Method Inlining Code Positioning Loop Unrolling Online FDO: Experimental Results ix

11 6.1. SPECjvm98 Benchmark Suite Steady-State Performance Online Performance Space Overhead Server Benchmark Performance Related Work Complete Online Systems Self Java IBM DK Hotspot MRL Binary translators Prefetching Dynamo Ephemeral Instrumentation Oberon Other Systems Offline Optimization FDO Selective Optimization Profiling Exhaustive Instrumentation Sampling Conclusions and Future Work Low-overhead Instrumentation Online FDO x

12 8.3. Discussion and Future Work References Vita xi

13 List of Tables 4.1. Benchmark suite used to evaluate sampling framework Time overhead of example instrumentations (without framework) Time overhead of Full-Duplication framework Space overhead of Full-Duplication framework Time overhead of No-Duplication framework Total overhead and accuracy of sampling example instrumentations Comparing timer- and counter- based triggers Cost/benefit rations for controller model Characteristics of long-running SPECjvm98 suite Recompilation and space statistics of FDO on SPECjvm98 benchmarks SPECjbb2000 server benchmark performance, with and without online feedback-directed optimization xii

14 List of Figures 2.1. Overview of the Jikes RVM optimizing compiler Design of Jikes RVM adaptive optimization system Implementation of Jikes RVM adaptive optimization system High-level view of instrumentation-sampling framework A high-level view of an instrumented method generated by the sampling framework Detailed example of sampling framework Code inserted for a counter-based check Example of how a timer-based trigger can lead to non-intuitive sampling results Removing nodes increases checks Example of Partial-Duplication Example of No-Duplication Performing the Full-Duplication code transformation Assembly pseudocode for counter-based check Placement of Full-Duplication transformation within optimizing compiler Graphical overlap percentage example Overhead of Full-Duplication framework with yieldpoint optimization applied Online strategy for online FDO Modifications to Jikes RVM adaptive optimization system for performing FDO Feedback-directed splitting algorithm xiii

15 5.4. One iteration of feedback-directed splitting Peak performance improvement when using instrumentation and feedbackdirected optimization Online performance of non-fdo and FDO systems Online improvement of FDO vs non-fdo systems SPECjbb2000 server benchmark performance xiv

16 1 Chapter 1 Introduction The dynamic nature of the Java TM programming language presents a number of challenges for Java Virtual Machine (JVM) implementations. Constructs such as dynamic class loading and reflection make traditional whole program analysis and optimization difficult, or even impossible. To improve application performance in the presence of these restrictions, many of today s JVMs employ a dynamic optimizing compiler which compiles Java bytecode into native code at runtime, while the application program is executing. Dynamic compilation has a number of disadvantages compared to traditional static compilation, most notably that the overhead incurred by performing compilation at runtime can be substantial. To minimize this overhead, attention has been focused on 1) reducing the execution time of the optimizer, and 2) applying optimization to only the key portions of the application [51, 64, 71, 36, 8]. This second approach, often referred to as selective optimization, avoids the overhead of optimizing all methods, thus is particularly beneficial for shorter running programs that do not execute long enough to recoup the time spent optimizing all methods [11]. Despite its potential disadvantages, dynamic compilation also has a potential performance advantage over traditional static compilation: the ability to tailor optimizations to the current execution environment. Such an approach, typically referred to as feedbackdirected optimization (FDO), not only instructs the optimizer what to optimize, but also specifies how the method should be optimized. By observing and optimizing the common execution patterns of the executing program, a system performing feedback-directed optimization compiler has the potential to outperform a traditional static compiler. Several systems [45, 59, 58, 26] have shown that performance can be substantially improved by exploiting invariant runtime values; however, these systems were not fully

17 2 automatic and relied on programmer directives to identify regions of code to be optimized. There exists a large body of work on collecting profiling information by performing instrumentation [25, 44, 4, 18, 17], as well as fully-automatic optimizations based instrumented profiles [34, 27, 46, 30, 31, 10, 62, 65, 53]. However this work assumes the execution model where a profiles can be collected offline, using a separate training run. Although the resulting speedups are often promising, this approach fails in scenarios where 1) it is impractical to collect a profile prior to execution, or 2) the application does not behave like the training run. Performing profiling and optimization online, during the same run, is an attractive approach because it avoids the previously mentioned drawbacks of offline profiling. Unfortunately, using online profiles to guide optimization has limitations of its own because the amount of work that must be performed at runtime is increased, including 1) collecting the profiling information, 2) examining the profile data and making decisions based on it, and 3) performing the actual feedback-directed optimizations. All 3 of these steps involve overhead and creating the potential for degrading performance rather than improving it. Most importantly, the overhead of collecting instrumented profiles is a problem. Overheads in the range of 30% 1,000% above non-instrumented code is not uncommon [46, 17, 18, 27, 26, 4] for collecting the kinds of profiles often used to drive feedbackdirected optimizations, and overheads in the range of 10,000% (100 times slower) have been reported [26]. This overhead is one of the main reasons why today s JVM s perform only limited forms of feedback-directed optimizations [8, 71, 36, 64]. Optimizations that are currently being used online are usually based on profiles that can be collected easily with low overhead. Some online systems, such as Dynamo [16], are designed to identify when performance is being degraded so that profile-guided optimizations can be disabled for the remainder of execution. This thesis presents a new approach for performing online instrumentation and feedback-directed optimization. First, we present instrumentation sampling [12], a new technique for reducing the the runtime overhead of executing instrumented code. By allowing a wide range of traditionally offline instrumentation techniques to be collected

18 3 with low overhead, one the biggest obstacles to performing feedback-directed optimizations online is eliminated. Second, we describe how instrumentation sampling can be incorporated into an online, adaptive Java Virtual Machine. We show that with minimal overhead, instrumentation sampling can be used effectively to reduce the overhead of collecting online instrumented profiles. Using several examples of feedback-directed optimizations, we also show that our online approach can effectively improve the performance of long-running Java applications, without sacrificing the performance of short-running applications. 1.1 Thesis Contributions The specific contributions of this thesis can be divided into the following two categories Framework for Low-overhead Instrumentation The first contribution of this thesis is a new technique called an instrumentation sampling framework, a mechanism that allows previously expensive instrumentation to be executed with low overhead. The main goal of the framework is to automate the process of reducing instrumentation overhead, allowing a wide range of profiles to be collected efficiently, without requiring a separate low-overhead implementation for each. The instrumentation sampling framework is designed as an automatic code transformation that takes instrumented code as input, and produces a modified version of the code that collects a similar profile, but executes with low overhead. We implemented and evaluated our framework in the Jikes Research Virtual Machine; our results demonstrate that the sampling framework effectively reduces the overhead of several types of instrumentation while having only a minimal effect on the accuracy of the profiles collected Online System Performing Instrumentation and Feedback- Directed Optimization The second contribution of this thesis is the design and implementation of an online system that uses instrumentation sampling to drive feedback-directed optimizations to improve the

19 4 performance of Java programs. Our implementation is built on top of the general adaptive optimization architecture of the Jikes RVM described in [8]. We describe a fully automated system that makes online decisions regarding when instrumentation and feedback-directed optimization should be performed. Our system collects intra-procedural edge profiles using the instrumentation sampling framework and uses the resulting profiles to drive four feedback-directed optimizations. Our empirical evaluation demonstrates that our online approach can improve the performance of long-running programs without degrading the performance of short-running programs. 1.2 Thesis Organization The remainder of this thesis is organized as follows. Chapter 2 describes background information about the Jikes RVM, the infrastructure in which this thesis work is implemented. Part I (Chapters 3 and 4) describe the design and implementation of the instrumentation sampling framework. Part II (Chapters 5 and 6) describe the design and implementation of an online system that uses instrumentation sampling to perform online profiling and optimization of Java programs. Chapter 7 presents the related work, and Chapter 8 presents our conclusions.

20 5 Chapter 2 Background: The Jikes RVM The Jikes Research Virtual Machine (Jikes RVM) 1 is a virtual machine developed at the IBM T.J. Watson Research Center. This section gives a brief overview of the Jikes RVM system, and provides a detailed description of those components that are directly relevant to this thesis. The Jikes RVM is written almost entirely in Java. It begins execution by reading from a boot image file, which contains the core services of Jikes RVM precompiled to machine code. The Jikes RVM uses a compile-only approach (no interpreter); thus all methods are compiled to native code upon first execution. The Jikes RVM currently contains two compilers, a fast, non-optimizing compiler called the baseline compiler, and an aggressive optimizing compiler [23]. The Jikes RVM also contains an adaptive optimization system [8] which profiles the applications and makes online decisions regarding when and how the optimizing compiler should be applied. Although it is not a fully complete JVM, the performance of the Jikes RVM has been shown to be competitive with that of commercial JVMs on the PowerPC platform. Components of the Jikes RVM that are of particular relevance to this thesis include 1) the optimizing compiler, 2) the quasi-preemptive threading model, and 3) the adaptive optimization system. These components are discussed in detail in the three sections that follow. 1 The Jikes RVM is an open-source version of the Jalapeño Research Virtual Machine [3, 8] and is available at

21 6 2.1 The Jikes RVM Optimizing Compiler The Jikes RVM optimizing compiler takes Java bytecode as input and produces native code as output. The optimizing compiler begins by converting the bytecode into a register-based intermediate representation, referred to as the IR. Jikes RVM s optimizing compiler consists of a series of optimization phases that transform the intermediate representation (IR) of a method from an unoptimized to an optimized state. As shown in Figure 2.1, there are three categories of optimization phases: 1) high-level optimizations, which are architecture and VM independent, 2) low-level optimizations, which are architecture independent, but not necessarily VM independent, and 3) machine-level optimizations, which are specific to the target architecture. The optimizations performed by the optimizing compiler are grouped into the following three predefined optimization levels. Level 0: Local, on-the-fly optimizations and register allocation is performed. No inlining is performed. Level 1: Augments Level 0 with more sophisticated local optimizations such as common subexpression elimination, array bounds check elimination and redundant load elimination. Inlining is performed based on size heuristics. Level 2: Augments Level 1 with SSA-based global optimizations. Each optimization level performs a superset of the optimizations performed at lower optimization levels, and therefore incurs additional compilation cost with the hope of generating better quality code. These optimization levels are chosen automatically by the adaptive optimization system, or can be chosen manually for non-adaptive versions of the Jikes RVM. 2.2 Threading Model To allow scalability and rapid thread switching, the Jikes RVM uses an M xn threading model where M Java threads are multiplexed on N operating system threads. In the

22 Figure 2.1: Overview of the Jikes RVM optimizing compiler. This figure is recreated from the original description of the Jalapeño optimizing compiler [23] (Figure 3). HIR, LIR, and MIR represent (respectively) High-, Low-, and Machine-level Intermediate Representations. 7

23 8 current implementation, N is the number of physical processors being used by the application, so there is one operating system thread for each physical processor. The Jikes RVM implements its own thread scheduler which multiplexes the Java threads on top of these operating system threads. The Jikes RVM scheduling model is quasi-preemptive, meaning that Java threads can be preempted, but only at certain predefined points, called yieldpoints. A yieldpoint is a sequence of instructions that checks a threadswitch bit to determine whether it is time for the currently executing thread to stop executing and yield control back to the thread scheduler. This bit is set every 10 milliseconds by an operating system interrupt. To guarantee that a Java thread cannot execute indefinitely, the Jikes RVM must ensure that only a finite amount of execution can occur before a yieldpoint is executed. The guarantee is currently met by simply placing yieldpoints on all method entries and loop backedges Adaptive Optimization System This sections describes the Jikes RVM adaptive optimization system [8], focusing on the components that are extended by this thesis General Architecture Figure 2.2 gives an overview of the general design of the Jikes RVM s adaptive optimization system. The architecture contains three main components: runtime measurements, the controller, and the recompilation subsystem. Methods are compiled with the non-optimizing baseline compiler upon their first execution, and an aggressive optimizing compiler is applied selectively by the adaptive optimization system. 2 There are slight exceptions to this yieldpoint placement, depending on whether the adaptive optimization system is being used. For versions of the Jikes RVM without the adaptive optimization system, yieldpoints are excluded from the prologues of trivial methods that are guaranteed to execute for a finite amount of time. For versions with the adaptive optimization system, yieldpoints are placed on the prologues, as well as the epilogues of all methods; this placement is not necessary for correctness but helps increase the accuracy of profiling system.

24 9 Core VM Classes Baseline Compiler Application Classes Unoptimized Code Optimizing Compiler Boot Image Executable Code Runtime Measurements Profile Information Optimized/ Instrumented Code Recompilation Subsystem Dynamic Measurements Controller Compilation/ Instrumentation Plan Figure 2.2: Overview of the general design of Jikes RVM adaptive optimization system [8] The runtime measurements component collects profiling information about the currently executing methods, and periodically summarizes the information and passes it to the controller. The architecture of the runtime measurements component is designed to be flexible enough to support collecting profiling information in a variety of ways, including the use of periodic sampling, instrumentation, and hardware performance monitors. The controller is the brain of the adaptive optimization system; it makes all decisions regarding profiling and optimization activity. Based on the profiling information provided by the runtime measurements system, the controller can choose to perform additional profiling, or perform optimization by constructing a compilation plan and passing it to the recompilation subsystem. The recompilation subsystem consists of one or more compilation threads that execute compilation plans created by the controller. A compilation plan provides instructions regarding how to the method should be compiled. The next section reviews the current instantiation of this general design.

25 10 Baseline Compiler Application Classes Unoptimized Code Runtime Measurements Method Samples Call Edge Samples Executable Code Optimized Code Recompilation Subsystem Optimizing Compiler Hot Method Organizer Inlining Organizer Decay Organizer AOS Database Compilation Thread Dynamic Call Graph Event Queue Controller Compilation Plan Compilation Queue Figure 2.3: Default implementation of Jikes RVM adaptive optimization system [8] Current Instantiation Figure 2.3 presents the base implementation (version 1.1b) of the Jikes RVM adaptive optimization system architecture as described in [8]. This differs from Figure 2.2, which provides a general design of what could be implemented. The runtime measurements subsystem performs profiling throughout execution using a low overhead timer-based sampling mechanism to identify frequently executed methods. This timer-based sampling is based on the yieldpoint mechanism described in Section 2.2. A sample is taken each time a yieldpoint notifies the thread scheduler that it is time to switch threads. The thread switching code triggers a callback to the runtime measurements system, which examines the currently executing method and increments a counter for that method. Because this sampling mechanism is built into thread-switching code of the Jikes RVM thread scheduler, no instrumentation of the executing method is required. Periodically the Hot Method Organizer notifies the controller with a set of methods that have been sampled frequently. For each method, the controller then examines the profile data and makes a decision regarding whether the method should be optimized, and if so, at what optimization level. The controller makes these decisions using a cost/benefit model, which is discussed in detail in the next section. Once the controller has decided

26 11 that a method should be recompiled, the controller notifies the recompilation subsystem, which currently consists of a single compilation thread. The Jikes RVM also performs one form of FDO, called adaptive inlining. Adaptive inlining uses the same timer-based sampling mechanism described above to detect hot call edges, so that they can be inlined if the caller method is recompiled in the future. These samples are processed by the inlining organizer and decayed by the decay organizer. If the inlining organizer detects a hot call edge in a method that is already optimized at the highest level (O2), it informs the controller of this fact, which can lead to the recompilation of that method (from O2 O2) for the sole purpose of incorporating the new inlining decision. The general design of the adaptive system also allows the controller to request that the optimizing compiler insert instrumentation during compilation to collect additional profiling information for driving feedback-directed optimizations. This feature, however, as well as the necessary modifications to the three components of the adaptive system, were not implemented in [8]. Controller Model The controller in the current adaptive optimization systems uses a cost/benefit model to determine what action should be taken for each recompilation candidate it considers. The goal of the controller is to make decisions in such a way that good performance is achieved by both short- and long-running applications. To evaluate desirability of optimizing a method M at optimization level O, the controller computes two values: 1. Estimated cost: The cost of performing the optimization is the expected time required to compile method M at optimization level O. 2. Estimated benefit: The benefit is the speedup that can be expected after method M is compiled at optimization level O. This speedup is estimated using the assumption that the past will repeat itself and method M will execute twice as long as it already has.

27 12 When considering whether to optimize a particular method, the viable choices are to do nothing, or recompile at one of Jikes RVM s three optimization-levels (O0, O1, O2). The controller estimates the cost and benefit of each potential recompilation choice, then picks the choice that would minimize total execution time.

28 Part I Low-overhead Instrumentation 13

29 14 Chapter 3 Instrumentation Sampling Framework This chapter presents an instrumentation sampling framework, a technique that allows previously expensive instrumentation to be performed with low overhead. The sampling framework is an automatic code transformation that takes an instrumented method (that would execute with high overhead) as input, and transforms the method to produce a modified version that will execute with low overhead, yet collect a similar profile (see figure 3.1). The main goal of the framework is to automate the process of reducing instrumentation overhead, allowing a wide range of profiles to be collected efficiently, without requiring a separate low-overhead implementation for each. Instead, high-overhead versions of instrumentation can be used and the sampling framework automatically reduces the overhead. Our framework offers the following advantages: Overhead is reduced substantially, allowing previously expensive instrumentation techniques to be used at runtime, even in situations where performance is important. In our experience, the accuracy of the profile being collected remains high, allowing the technique to be used even when accurate profiles are needed. The technique can be used to collect a wide range of profiles. Many common instrumentation techniques can be incorporated into our framework without modification. Multiple types of instrumentation can be inserted simultaneously and sampled by the framework. The sampling framework is easy to implement and can be applied at any level of abstraction, ranging from a source-to-source transformation to a binary-to-binary

30 15 Figure 3.1: An illustration of the goal of the instrumentation-sampling framework. The input to the framework is an instrumented method that would execute with high overhead; the output is a modified version that will execute with low overhead, but produce a similar profile. transformation. The framework is tunable, allowing the tradeoff between overhead and accuracy to be adjusted easily at runtime. The framework does not rely on any hardware or operating system support. Sections describes the instrumentation sampling framework in detail. Section 3.4 describe the types of profiles for which this framework is effective, and possible modifications for collecting other types of profiles. Section 3.5 describes two variations of the framework designed to reduce the space requirements. 3.1 Technique Assume that a method F is to be instrumented. The sampling transformation of F is accomplished as follows. A second version of F, called the duplicated code, is introduced within the instrumented method, as shown in Figure 3.2. The duplicated code contains all of the heavyweight (high-overhead) instrumentation. The original version of the code is now referred to as the checking code because it is modified only slightly in a way that allows execution to swap back and forth between the checking code and the duplicated code in a fine-grained, controlled manner. At regular sample intervals, execution moves into the duplicated code for a small, bounded amount of time. Total overhead can be kept

31 16 Figure 3.2: A high-level view of an instrumented method generated by the sampling framework. A second version of the code is introduced, called the duplicated code, which contains all instrumentation. The original code becomes the checking code, which is minimally instrumented to allow control to transfer in and out of the duplicated code in a fine-grained and controlled manner. to a minimum by ensuring that the majority of execution occurs in the checking code. As long as the duplicated code is executed infrequently, expensive instrumentation inserted into the duplicated code will have only a small impact on overall overhead. This version of the framework will be referred to as Full-Duplication, since all of the code in the method is duplicated. The switching between the checking and duplicated code is illustrated in Figure 3.3. The checking code has conditional branches inserted (which will be referred to as checks) that monitor a sample condition. When a check determines that the sample condition is true, a sample is triggered and control jumps to the duplicated code, rather than continuing in the checking code. The duplicated code is also slightly modified to ensure that only a bounded amount of execution occurs in the duplicated code before the sample condition is re-evaluated. This is accomplished by modifying the backward branches (which will be referred to as backedges) in the duplicated code to transfer control back to the checking code, allowing the sample condition to be re-evaluated to determine whether execution should continue in the duplicated code or checking code. Therefore, taking a sample implies executing one

32 17 Figure 3.3: Illustration of the flow of control between the checking code and duplicated code. All method entries and backedges in the checking code contain a conditional branch that jumps to the duplicated code when a sample condition is true. All backedges in the duplicated code are modified to return to the checking code. acyclic path through the duplicated code, and then re-evaluating the sample condition. Where the checks are placed within the checking code, and how samples are triggered are two of the flexible aspects of the framework; they are discussed in Sections 3.2 and 3.3, respectively. The key idea of the Full-Duplication framework is that the ratio of time spent in each version of the code can be controlled by changing the rate at which the sample condition is true. The overhead of the duplicated code, and all instrumentation inserted into it, can be controlled by reducing the rate at which samples are taken. As the sample rate is decreased, the total overhead will converge to the overhead of the checks being executed in the checking code. Another more subtle advantage of this framework is that it makes it easy to stop executing the instrumentation when no more profiling information is needed. In an online system (as discussed further in Chapter 5) it may be desirable to execute instrumented code for some period of time, after which execution should transfer back to the non-instrumented version. It is important to have some mechanism for stopping the instrumented code from

33 18 executing to prevent the program from running indefinitely with poor performance. This could be achieved by using dynamic code patching [49, 75, 71, 72] to insert and remove instrumentation without recompiling the method, or by performing on-stack replacement [50] to hot-swap execution back to the non-instrumented version while the method is running. In our sampling framework, this problem is avoided because setting the sample condition to be permanently false will ensure that execution remains in the checking code and no more instrumentation will be executed. Unless the system performs on-stack replacement, execution cannot switch back to a totally non-instrumented version of the method (i.e., a version of the method with no instrumentation and without the sampling transformation applied) until the method exits; however, no samples will be triggered during this time, so the total overhead will be that of the checking code. Depending on the implementation of the checks, this overhead should be small compared to the cost of instrumentation. 3.2 Check placement Placement of the checks within the checking code is one of the flexible aspects of the sampling framework. To reduce overhead it is desirable to minimize the number of checks executed; however, to ensure that an accurate profile is collected, enough checks should be placed so that all of the duplicated code has a chance to be sampled. One approach is to place checks on all method entries and backedges in the checking code. This placement of the checks ensures that (a) only a bounded amount of execution occurs between checks so that execution cannot execute indefinitely in either the checking code or the duplicated code, and (b) all the code has an opportunity to be sampled. An important property of this check placement is that the number of checks executed (and thus the overhead of the checking code) is completely independent of the instrumentation inserted in the duplicated code. For ease of reference this will be referred to as Property 1, defined as follows: Property 1 The number of checks executed in the checking code is less than or equal to the number of backedges and methods entries executed, independent of the instrumentation being performed.

34 19 Property 1 is an important characteristic of the Full-Duplication framework because it implies scalability in regard to the amount of instrumentation that can be inserted in the duplicated code. The checking overhead is a fixed cost overhead that is independent of the instrumentation in the duplicated code; therefore, it becomes possible to insert as much instrumentation as desired, and the total overhead can be reduced to approach that of the checking code by reducing the sample rate Reducing dynamic check frequency The runtime overhead caused by the checks depends on many factors, such as the efficiency of the checks themselves. The number of checks executed at runtime plays a major role in determining the overhead of the checking code. The simple check placement described above (method entries and loop backedges) resulted in acceptable overhead in our implementation (see Section 4.2). However, there could be situations in which the checking overhead is unacceptably high. For example, the backedge checks will introduce more overhead in programs that contain tight loops. Similarly, the method entry checks will introduce more overhead in programs that make frequent method calls. In scenarios where the overhead of this basic check placement is too high there are several possible approaches to help reduce the number of checks executed at runtime. First, several common code transformations can help reduce the number of checks executed at runtime. For example, loop unrolling or loop tiling can be used to reduce the number of backedge checks executed. These transformations increase the number of instructions executed per loop iteration, and thus increase the number of instructions executed between successive backedge checks. By performing more work between checks, the checking overhead caused by backedge checks will be reduced. Similarly, more aggressive inlining can be used to reduce the number of method entries executed, thus reducing the overhead caused by method entry checks. A second possibility for reducing the number of checks executed is to perform analysis of the instrumentation that is inserted in the duplicated code, and remove checks that are unnecessary. For example, if a particular loop contains no instrumentation then no

35 20 backedge check is needed for that loop. This approach was used by Hirzel and Chilimbi [48] to reduce the checking overhead when collecting memory reference profiles. 3.3 Trigger mechanisms The sampling framework relies on runtime checks which need some kind of trigger mechanism to determine when execution should be transferred from the checking code to the duplicated code. There is a wide range of possible strategies that could be used for triggering samples. Different triggers may make sense in different situations, and the ability to select a trigger mechanism to match the desired use of the profiling system is one of the flexible aspects of the framework. Three examples of trigger mechanisms are discussed in the sections that follow Counter-based sampling Counting a particular event and sampling when the counter reaches a threshold (which we refer to as counter-based sampling) is an effective mechanism for triggering samples proportionally to the frequency of that event. Counter-based sampling is particularly appealing when collecting profiles to guide feedback-directed optimizations because these optimizations often rely on the relative execution frequencies of certain events. Counterbased sampling has been used in previous systems, such as DCPI [6], where interrupts signaled by hardware performance counters are used to sample instructions proportionally to their execution frequency. To trigger samples in our framework we propose implementing a counter-based trigger in software by having the compiler insert code to decrement and check a global counter as shown in Figure 3.4. We call this technique compiler-inserted counter-based sampling. As long as the overhead of the counting and checking is kept to a minimum, the advantages of compiler-inserted counter-based sampling are numerous. Such advantages include: Easy to implement Counter-based sampling is a simple, but effective approach for triggering samples. It

36 21 globalcounter--; if (globalcounter <= 0) { globalcounter = resetvalue; takesample(); } Figure 3.4: Code inserted for a counter-based check easy to implement and does not rely on any hardware or operating system support. 1 Flexible, high-frequency sample rate A counter-based trigger provides a flexible, high-frequency sample rate. Any desired sample rate can be achieved by simply changing the value of resetvalue, and this value can even be changed dynamically to vary the sampling rate during profiling. This is an important advantage over other sampling mechanisms, such as hardware or operating system timer interrupts, which provide a fixed-frequency sample rate that may be too infrequent for some profiling scenarios [6]. Samples are triggered proportionally to execution frequency The number of times each check triggers a sample is proportional to the number of times that particular check is executed, therefore the instructions in the duplicated code are executed proportionally to their execution frequency in the noninstrumented code. This property makes counter-based sampling effective for estimating the execution frequencies of program events. Deterministic sampling A counter-based trigger has the advantage of triggering samples deterministically. If the application being executed is deterministic, two runs of the programs (with the same input) will produce the same sampled profile. One potential disadvantage of using a deterministic sampling strategy is that it is possible for the program behavior to correlate with the sampling behavior, resulting in a 1 Although hardware and operating system techniques may be used to lower the overhead of the checks, no support from either is required.

37 22 highly inaccurate profile. For example, if a program performs some uncommon behavior every 1000th loop iteration, any sample interval that is a multiple of 1000 could result in the uncommon behavior being observed on every sample. Although our experimental results suggest that this problem did not occur for benchmarks used in this study (see Section 4.2), the problem can be easily avoided by adding some degree of randomness to the sampling mechanism. One possibility is to add small pseudo-random factor to the reset value (as done in [6]) to reduce the probability of program behavior correlating with the sample interval. Such an approach could potentially increase accuracy in the average case as well by eliminating inaccuracy caused by correlation between the sample interval and program behavior. Implementation options There are several options for implementing a counter-based sampling approach; the simplest approach is to have each check execute the code exactly as shown in Figure 3.4. The counter variable (globalcounter) will most likely be in a register, or in the cache, and the branch will be predicted (not taken), therefore the performance overhead should be low. Such an approach was implemented in Jikes RVM without using a dedicated register, placing the code in Figure 3.4 on all backedges and method entries, and the overhead averaged 4.9% (for executing the checks only, when no samples were taken). A detailed evaluation of the overhead of counter-based checks is included in Section For multi-threaded applications, the global counter may raise some concerns. First, access to the global counter is not synchronized for performance reasons, so data races may occur. Fortunately, it may not be necessary to maintain 100% accuracy of the global counter, as it is simply a means of triggering samples roughly proportional to execution frequency. Having the counter value off-by-one occasionally would have little affect on the resulting accuracy. A more serious problem is that access to a single global counter could become a performance bottleneck as the number of threads and processors increases. In this case, the global counter could be replaced by thread- or processor-specific counters, allowing access to the counter with no resource contention.

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

The Use of Traces in Optimization

The Use of Traces in Optimization The Use of Traces in Optimization by Borys Jan Bradel A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Edward S. Rogers Sr. Department of Electrical and

More information

Advanced Program Analyses for Object-oriented Systems

Advanced Program Analyses for Object-oriented Systems Advanced Program Analyses for Object-oriented Systems Dr. Barbara G. Ryder Rutgers University http://www.cs.rutgers.edu/~ryder http://prolangs.rutgers.edu/ July 2007 ACACES-3 July 2007 BG Ryder 1 Lecture

More information

Reducing the Overhead of Dynamic Compilation

Reducing the Overhead of Dynamic Compilation Reducing the Overhead of Dynamic Compilation Chandra Krintz y David Grove z Derek Lieber z Vivek Sarkar z Brad Calder y y Department of Computer Science and Engineering, University of California, San Diego

More information

Visual Amortization Analysis of Recompilation Strategies

Visual Amortization Analysis of Recompilation Strategies 2010 14th International Information Conference Visualisation Information Visualisation Visual Amortization Analysis of Recompilation Strategies Stephan Zimmer and Stephan Diehl (Authors) Computer Science

More information

Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02)

Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02) USENIX Association Proceedings of the 2 nd Java TM Virtual Machine Research and Technology Symposium (JVM '02) San Francisco, California, USA August 1-2, 2002 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION

More information

Coupling On-Line and Off-Line Profile Information to Improve Program Performance

Coupling On-Line and Off-Line Profile Information to Improve Program Performance Coupling On-Line and Off-Line Profile Information to Improve Program Performance Chandra Krintz Computer Science Department University of California, Santa Barbara ckrintz@cs.ucsb.edu Abstract In this

More information

Improving Virtual Machine Performance Using a Cross-Run Profile Repository

Improving Virtual Machine Performance Using a Cross-Run Profile Repository Improving Virtual Machine Performance Using a Cross-Run Profile Repository Matthew Arnold Adam Welc V.T. Rajan IBM T.J. Watson Research Center {marnold,vtrajan}@us.ibm.com Purdue University welc@cs.purdue.edu

More information

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling by John Michael Chase A thesis presented to the University of Waterloo in fulfillment of the thesis requirement

More information

Reducing the Overhead of Dynamic Compilation

Reducing the Overhead of Dynamic Compilation Reducing the Overhead of Dynamic Compilation Chandra Krintz David Grove Derek Lieber Vivek Sarkar Brad Calder Department of Computer Science and Engineering, University of California, San Diego IBM T.

More information

Debunking Dynamic Optimization Myths

Debunking Dynamic Optimization Myths Debunking Dynamic Optimization Myths Michael Hind (Material borrowed from 3 hr PLDI 04 Tutorial by Stephen Fink, David Grove, and Michael Hind. See those slides for more details.) September 15, 2004 Future

More information

Improving Mobile Program Performance Through the Use of a Hybrid Intermediate Representation

Improving Mobile Program Performance Through the Use of a Hybrid Intermediate Representation Improving Mobile Program Performance Through the Use of a Hybrid Intermediate Representation Chandra Krintz Computer Science Department University of California, Santa Barbara Abstract We present a novel

More information

IBM Research Report. A Survey of Adaptive Optimization in Virtual Machines

IBM Research Report. A Survey of Adaptive Optimization in Virtual Machines RC23143 (W0312-097) December 17, 2003 UPDATED May 18, 2004 Computer Science IBM Research Report A Survey of Adaptive Optimization in Virtual Machines Matthew Arnold, Stephen J. Fink, David Grove, Michael

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

A Survey of Adaptive Optimization in Virtual Machines

A Survey of Adaptive Optimization in Virtual Machines A Survey of Adaptive Optimization in Virtual Machines MATTHEW ARNOLD, STEPHEN J. FINK, DAVID GROVE, MICHAEL HIND, AND PETER F. SWEENEY, MEMBER, IEEE Invited Paper Virtual machines face significant performance

More information

Using Adaptive Optimization Techniques To Teach Mobile Java Computing

Using Adaptive Optimization Techniques To Teach Mobile Java Computing Using Adaptive Optimization Techniques To Teach Mobile Java Computing Chandra Krintz Computer Science Department University of California, Santa Barbara Abstract Dynamic, adaptive optimization is quickly

More information

Dynamic Selection of Application-Specific Garbage Collectors

Dynamic Selection of Application-Specific Garbage Collectors Dynamic Selection of Application-Specific Garbage Collectors Sunil V. Soman Chandra Krintz University of California, Santa Barbara David F. Bacon IBM T.J. Watson Research Center Background VMs/managed

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell

Pointer Analysis in the Presence of Dynamic Class Loading. Hind Presented by Brian Russell Pointer Analysis in the Presence of Dynamic Class Loading Martin Hirzel, Amer Diwan and Michael Hind Presented by Brian Russell Claim: First nontrivial pointer analysis dealing with all Java language features

More information

Register Allocation via Hierarchical Graph Coloring

Register Allocation via Hierarchical Graph Coloring Register Allocation via Hierarchical Graph Coloring by Qunyan Wu A THESIS Submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE IN COMPUTER SCIENCE MICHIGAN TECHNOLOGICAL

More information

2 TEST: A Tracer for Extracting Speculative Threads

2 TEST: A Tracer for Extracting Speculative Threads EE392C: Advanced Topics in Computer Architecture Lecture #11 Polymorphic Processors Stanford University Handout Date??? On-line Profiling Techniques Lecture #11: Tuesday, 6 May 2003 Lecturer: Shivnath

More information

CSc 453 Interpreters & Interpretation

CSc 453 Interpreters & Interpretation CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson Interpreters An interpreter is a program that executes another program. An interpreter implements a virtual machine,

More information

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler Hiroshi Inoue, Hiroshige Hayashizaki, Peng Wu and Toshio Nakatani IBM Research Tokyo IBM Research T.J. Watson Research Center October

More information

COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY. Thesis by. Peter Anthony Leong. In Partial Fulfillment of the Requirements

COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY. Thesis by. Peter Anthony Leong. In Partial Fulfillment of the Requirements COMPUTATIONAL CHALLENGES IN HIGH-RESOLUTION CRYO-ELECTRON MICROSCOPY Thesis by Peter Anthony Leong In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California Institute

More information

Trace-based JIT Compilation

Trace-based JIT Compilation Trace-based JIT Compilation Hiroshi Inoue, IBM Research - Tokyo 1 Trace JIT vs. Method JIT https://twitter.com/yukihiro_matz/status/533775624486133762 2 Background: Trace-based Compilation Using a Trace,

More information

Improving Mobile Program Performance Through the Use of a Hybrid Intermediate Representation

Improving Mobile Program Performance Through the Use of a Hybrid Intermediate Representation Improving Mobile Program Performance Through the Use of a Hybrid Intermediate Representation Chandra Krintz Computer Science Department University of California, Santa Barbara Abstract Abstract. We present

More information

z/os Heuristic Conversion of CF Operations from Synchronous to Asynchronous Execution (for z/os 1.2 and higher) V2

z/os Heuristic Conversion of CF Operations from Synchronous to Asynchronous Execution (for z/os 1.2 and higher) V2 z/os Heuristic Conversion of CF Operations from Synchronous to Asynchronous Execution (for z/os 1.2 and higher) V2 z/os 1.2 introduced a new heuristic for determining whether it is more efficient in terms

More information

Intelligent Compilation

Intelligent Compilation Intelligent Compilation John Cavazos Department of Computer and Information Sciences University of Delaware Autotuning and Compilers Proposition: Autotuning is a component of an Intelligent Compiler. Code

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Using Adaptive Optimization Techniques To Teach Mobile Java Computing

Using Adaptive Optimization Techniques To Teach Mobile Java Computing Using Adaptive Optimization Techniques To Teach Mobile Java Computing Chandra Krintz Computer Science Department University of California, Santa Barbara Abstract Abstract. Dynamic, adaptive optimization

More information

WHITE PAPER Application Performance Management. The Case for Adaptive Instrumentation in J2EE Environments

WHITE PAPER Application Performance Management. The Case for Adaptive Instrumentation in J2EE Environments WHITE PAPER Application Performance Management The Case for Adaptive Instrumentation in J2EE Environments Why Adaptive Instrumentation?... 3 Discovering Performance Problems... 3 The adaptive approach...

More information

Towards future method hotness prediction for Virtual Machines

Towards future method hotness prediction for Virtual Machines Towards future method hotness prediction for Virtual Machines Manjiri A. Namjoshi Submitted to the Department of Electrical Engineering & Computer Science and the Faculty of the Graduate School of the

More information

Untyped Memory in the Java Virtual Machine

Untyped Memory in the Java Virtual Machine Untyped Memory in the Java Virtual Machine Andreas Gal and Michael Franz University of California, Irvine {gal,franz}@uci.edu Christian W. Probst Technical University of Denmark probst@imm.dtu.dk July

More information

Just-In-Time Compilers & Runtime Optimizers

Just-In-Time Compilers & Runtime Optimizers COMP 412 FALL 2017 Just-In-Time Compilers & Runtime Optimizers Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved.

More information

Exploiting Hardware Resources: Register Assignment across Method Boundaries

Exploiting Hardware Resources: Register Assignment across Method Boundaries Exploiting Hardware Resources: Register Assignment across Method Boundaries Ian Rogers, Alasdair Rawsthorne, Jason Souloglou The University of Manchester, England {Ian.Rogers,Alasdair.Rawsthorne,Jason.Souloglou}@cs.man.ac.uk

More information

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured

In examining performance Interested in several things Exact times if computable Bounded times if exact not computable Can be measured System Performance Analysis Introduction Performance Means many things to many people Important in any design Critical in real time systems 1 ns can mean the difference between system Doing job expected

More information

Lecture 9 Dynamic Compilation

Lecture 9 Dynamic Compilation Lecture 9 Dynamic Compilation I. Motivation & Background II. Overview III. Compilation Policy IV. Partial Method Compilation V. Partial Dead Code Elimination VI. Escape Analysis VII. Results Partial Method

More information

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices LINGLI ZHANG, CHANDRA KRINTZ University of California, Santa Barbara Java Virtual Machines (JVMs)

More information

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1 SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine David Bélanger dbelan2@cs.mcgill.ca Sable Research Group McGill University Montreal, QC January 28, 2004 SABLEJIT: A Retargetable

More information

Trace Compilation. Christian Wimmer September 2009

Trace Compilation. Christian Wimmer  September 2009 Trace Compilation Christian Wimmer cwimmer@uci.edu www.christianwimmer.at September 2009 Department of Computer Science University of California, Irvine Background Institute for System Software Johannes

More information

Jazz: A Tool for Demand-Driven Structural Testing

Jazz: A Tool for Demand-Driven Structural Testing Jazz: A Tool for Demand-Driven Structural Testing J. Misurda, J. A. Clause, J. L. Reed, P. Gandra, B. R. Childers, and M. L. Soffa Department of Computer Science University of Pittsburgh Pittsburgh, Pennsylvania

More information

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions

Complex, concurrent software. Precision (no false positives) Find real bugs in real executions Harry Xu May 2012 Complex, concurrent software Precision (no false positives) Find real bugs in real executions Need to modify JVM (e.g., object layout, GC, or ISA-level code) Need to demonstrate realism

More information

Hardware Emulation and Virtual Machines

Hardware Emulation and Virtual Machines Hardware Emulation and Virtual Machines Overview Review of How Programs Run: Registers Execution Cycle Processor Emulation Types: Pure Translation Static Recompilation Dynamic Recompilation Direct Bytecode

More information

Dynamic Feedback: An Effective Technique for Adaptive Computing

Dynamic Feedback: An Effective Technique for Adaptive Computing Dynamic Feedback: An Effective Technique for Adaptive Computing Pedro Diniz and Martin Rinard Department of Computer Science Engineering I Building University of California, Santa Barbara Santa Barbara,

More information

Performance Profiling

Performance Profiling Performance Profiling Minsoo Ryu Real-Time Computing and Communications Lab. Hanyang University msryu@hanyang.ac.kr Outline History Understanding Profiling Understanding Performance Understanding Performance

More information

Predicting Messaging Response Time in a Long Distance Relationship

Predicting Messaging Response Time in a Long Distance Relationship Predicting Messaging Response Time in a Long Distance Relationship Meng-Chen Shieh m3shieh@ucsd.edu I. Introduction The key to any successful relationship is communication, especially during times when

More information

Java On Steroids: Sun s High-Performance Java Implementation. History

Java On Steroids: Sun s High-Performance Java Implementation. History Java On Steroids: Sun s High-Performance Java Implementation Urs Hölzle Lars Bak Steffen Grarup Robert Griesemer Srdjan Mitrovic Sun Microsystems History First Java implementations: interpreters compact

More information

Design, Implementation, and Evaluation of a Compilation Server

Design, Implementation, and Evaluation of a Compilation Server Design, Implementation, and Evaluation of a Compilation Server Technical Report CU--978-04 HAN B. LEE University of Colorado AMER DIWAN University of Colorado and J. ELIOT B. MOSS University of Massachusetts

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

WHITE PAPER: ENTERPRISE AVAILABILITY. Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management

WHITE PAPER: ENTERPRISE AVAILABILITY. Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management WHITE PAPER: ENTERPRISE AVAILABILITY Introduction to Adaptive Instrumentation with Symantec Indepth for J2EE Application Performance Management White Paper: Enterprise Availability Introduction to Adaptive

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Last Class: Processes

Last Class: Processes Last Class: Processes A process is the unit of execution. Processes are represented as Process Control Blocks in the OS PCBs contain process state, scheduling and memory management information, etc A process

More information

Class Analysis for Testing of Polymorphism in Java Software

Class Analysis for Testing of Polymorphism in Java Software Class Analysis for Testing of Polymorphism in Java Software Atanas Rountev Ana Milanova Barbara G. Ryder Rutgers University, New Brunswick, NJ 08903, USA {rountev,milanova,ryder@cs.rutgers.edu Abstract

More information

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski

Operating Systems Design Fall 2010 Exam 1 Review. Paul Krzyzanowski Operating Systems Design Fall 2010 Exam 1 Review Paul Krzyzanowski pxk@cs.rutgers.edu 1 Question 1 To a programmer, a system call looks just like a function call. Explain the difference in the underlying

More information

Enhanced Web Log Based Recommendation by Personalized Retrieval

Enhanced Web Log Based Recommendation by Personalized Retrieval Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor

More information

Proceedings of the Third Virtual Machine Research and Technology Symposium

Proceedings of the Third Virtual Machine Research and Technology Symposium USENIX Association Proceedings of the Third Virtual Machine Research and Technology Symposium San Jose, CA, USA May 6 7, 2004 2004 by The USENIX Association All Rights Reserved For more information about

More information

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices

The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices The Design, Implementation, and Evaluation of Adaptive Code Unloading for Resource-Constrained Devices LINGLI ZHANG and CHANDRA KRINTZ University of California, Santa Barbara Java Virtual Machines (JVMs)

More information

MODEL-DRIVEN CODE OPTIMIZATION. Min Zhao. B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996

MODEL-DRIVEN CODE OPTIMIZATION. Min Zhao. B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996 MODEL-DRIVEN CODE OPTIMIZATION by Min Zhao B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996 M.S. Computer Science, University of Pittsburgh, 2001 Submitted to the Graduate

More information

Pointer Analysis in the Presence of Dynamic Class Loading

Pointer Analysis in the Presence of Dynamic Class Loading Pointer Analysis in the Presence of Dynamic Class Loading Martin Hirzel, Amer Diwan University of Colorado at Boulder Michael Hind IBM T.J. Watson Research Center 1 Pointer analysis motivation Code a =

More information

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications

Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Vertical Profiling: Understanding the Behavior of Object-Oriented Applications Matthias Hauswirth, Amer Diwan University of Colorado at Boulder Peter F. Sweeney, Michael Hind IBM Thomas J. Watson Research

More information

ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA

ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA ANALYZING THREADS FOR SHARED MEMORY CONSISTENCY BY ZEHRA NOMAN SURA B.E., Nagpur University, 1998 M.S., University of Illinois at Urbana-Champaign, 2001 DISSERTATION Submitted in partial fulfillment of

More information

Software Speculative Multithreading for Java

Software Speculative Multithreading for Java Software Speculative Multithreading for Java Christopher J.F. Pickett and Clark Verbrugge School of Computer Science, McGill University {cpicke,clump}@sable.mcgill.ca Allan Kielstra IBM Toronto Lab kielstra@ca.ibm.com

More information

Enterprise Architect. User Guide Series. Profiling

Enterprise Architect. User Guide Series. Profiling Enterprise Architect User Guide Series Profiling Investigating application performance? The Sparx Systems Enterprise Architect Profiler finds the actions and their functions that are consuming the application,

More information

Enterprise Architect. User Guide Series. Profiling. Author: Sparx Systems. Date: 10/05/2018. Version: 1.0 CREATED WITH

Enterprise Architect. User Guide Series. Profiling. Author: Sparx Systems. Date: 10/05/2018. Version: 1.0 CREATED WITH Enterprise Architect User Guide Series Profiling Author: Sparx Systems Date: 10/05/2018 Version: 1.0 CREATED WITH Table of Contents Profiling 3 System Requirements 8 Getting Started 9 Call Graph 11 Stack

More information

Chapter 12. UML and Patterns. Copyright 2008 Pearson Addison-Wesley. All rights reserved

Chapter 12. UML and Patterns. Copyright 2008 Pearson Addison-Wesley. All rights reserved Chapter 12 UML and Patterns Copyright 2008 Pearson Addison-Wesley. All rights reserved Introduction to UML and Patterns UML and patterns are two software design tools that can be used within the context

More information

By Arjan Van De Ven, Senior Staff Software Engineer at Intel.

By Arjan Van De Ven, Senior Staff Software Engineer at Intel. Absolute Power By Arjan Van De Ven, Senior Staff Software Engineer at Intel. Abstract: Power consumption is a hot topic from laptop, to datacenter. Recently, the Linux kernel has made huge steps forward

More information

FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN

FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN Approved by: Dr. Richard Barr Dr. Eli Olinick Dr. Marion Sobol Dr. Jerrell Stracener Dr. Stephen A. Szygenda FORMULATION

More information

Just-In-Time Compilation

Just-In-Time Compilation Just-In-Time Compilation Thiemo Bucciarelli Institute for Software Engineering and Programming Languages 18. Januar 2016 T. Bucciarelli 18. Januar 2016 1/25 Agenda Definitions Just-In-Time Compilation

More information

Multiprocessor Systems. Chapter 8, 8.1

Multiprocessor Systems. Chapter 8, 8.1 Multiprocessor Systems Chapter 8, 8.1 1 Learning Outcomes An understanding of the structure and limits of multiprocessor hardware. An appreciation of approaches to operating system support for multiprocessor

More information

Field Analysis. Last time Exploit encapsulation to improve memory system performance

Field Analysis. Last time Exploit encapsulation to improve memory system performance Field Analysis Last time Exploit encapsulation to improve memory system performance This time Exploit encapsulation to simplify analysis Two uses of field analysis Escape analysis Object inlining April

More information

MODEL-DRIVEN CODE OPTIMIZATION. Min Zhao. B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996

MODEL-DRIVEN CODE OPTIMIZATION. Min Zhao. B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996 MODEL-DRIVEN CODE OPTIMIZATION by Min Zhao B.E. Computer Science and Engineering, Xi an Jiaotong University, P.R. China, 1996 M.S. Computer Science, University of Pittsburgh, 2001 Submitted to the Graduate

More information

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel

Fault Tolerant Java Virtual Machine. Roy Friedman and Alon Kama Technion Haifa, Israel Fault Tolerant Java Virtual Machine Roy Friedman and Alon Kama Technion Haifa, Israel Objective Create framework for transparent fault-tolerance Support legacy applications Intended for long-lived, highly

More information

Kasper Lund, Software engineer at Google. Crankshaft. Turbocharging the next generation of web applications

Kasper Lund, Software engineer at Google. Crankshaft. Turbocharging the next generation of web applications Kasper Lund, Software engineer at Google Crankshaft Turbocharging the next generation of web applications Overview Why did we introduce Crankshaft? Deciding when and what to optimize Type feedback and

More information

A Framework for Optimistic Program Optimization

A Framework for Optimistic Program Optimization A Framework for Optimistic Program Optimization by Igor Pechtchanski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science

More information

Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr

Tag der mündlichen Prüfung: 03. Juni 2004 Dekan / Dekanin: Prof. Dr. Bernhard Steffen Gutachter / Gutachterinnen: Prof. Dr. Francky Catthoor, Prof. Dr Source Code Optimization Techniques for Data Flow Dominated Embedded Software Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften der Universität Dortmund am Fachbereich Informatik

More information

A GENERIC SIMULATION OF COUNTING NETWORKS

A GENERIC SIMULATION OF COUNTING NETWORKS A GENERIC SIMULATION OF COUNTING NETWORKS By Eric Neil Klein A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of

More information

JOVE. An Optimizing Compiler for Java. Allen Wirfs-Brock Instantiations Inc.

JOVE. An Optimizing Compiler for Java. Allen Wirfs-Brock Instantiations Inc. An Optimizing Compiler for Java Allen Wirfs-Brock Instantiations Inc. Object-Orient Languages Provide a Breakthrough in Programmer Productivity Reusable software components Higher level abstractions Yield

More information

N N Sudoku Solver. Sequential and Parallel Computing

N N Sudoku Solver. Sequential and Parallel Computing N N Sudoku Solver Sequential and Parallel Computing Abdulaziz Aljohani Computer Science. Rochester Institute of Technology, RIT Rochester, United States aaa4020@rit.edu Abstract 'Sudoku' is a logic-based

More information

Running class Timing on Java HotSpot VM, 1

Running class Timing on Java HotSpot VM, 1 Compiler construction 2009 Lecture 3. A first look at optimization: Peephole optimization. A simple example A Java class public class A { public static int f (int x) { int r = 3; int s = r + 5; return

More information

Native POSIX Thread Library (NPTL) CSE 506 Don Porter

Native POSIX Thread Library (NPTL) CSE 506 Don Porter Native POSIX Thread Library (NPTL) CSE 506 Don Porter Logical Diagram Binary Memory Threads Formats Allocators Today s Lecture Scheduling System Calls threads RCU File System Networking Sync User Kernel

More information

Intel Hyper-Threading technology

Intel Hyper-Threading technology Intel Hyper-Threading technology technology brief Abstract... 2 Introduction... 2 Hyper-Threading... 2 Need for the technology... 2 What is Hyper-Threading?... 3 Inside the technology... 3 Compatibility...

More information

ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY

ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY ADAPTIVE VIDEO STREAMING FOR BANDWIDTH VARIATION WITH OPTIMUM QUALITY Joseph Michael Wijayantha Medagama (08/8015) Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Master of Science

More information

Ch 4 : CPU scheduling

Ch 4 : CPU scheduling Ch 4 : CPU scheduling It's the basis of multiprogramming operating systems. By switching the CPU among processes, the operating system can make the computer more productive In a single-processor system,

More information

IA-64 Compiler Technology

IA-64 Compiler Technology IA-64 Compiler Technology David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav (speaker), Carole Dulong Microcomputer Software Lab Page-1 Introduction IA-32 compiler optimizations Profile Guidance (PGOPTI)

More information

HotPy (2) Binary Compatible High Performance VM for Python. Mark Shannon

HotPy (2) Binary Compatible High Performance VM for Python. Mark Shannon HotPy (2) Binary Compatible High Performance VM for Python Mark Shannon Who am I? Mark Shannon PhD thesis on building VMs for dynamic languages During my PhD I developed: GVMT. A virtual machine tool kit

More information

Process Scheduling Part 2

Process Scheduling Part 2 Operating Systems and Computer Networks Process Scheduling Part 2 pascal.klein@uni-due.de Alexander Maxeiner, M.Sc. Faculty of Engineering Agenda Process Management Time Sharing Synchronization of Processes

More information

Using Cache Line Coloring to Perform Aggressive Procedure Inlining

Using Cache Line Coloring to Perform Aggressive Procedure Inlining Using Cache Line Coloring to Perform Aggressive Procedure Inlining Hakan Aydın David Kaeli Department of Electrical and Computer Engineering Northeastern University Boston, MA, 02115 {haydin,kaeli}@ece.neu.edu

More information

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1 Table of Contents About the Authors... iii Introduction... xvii Chapter 1: System Software... 1 1.1 Concept of System Software... 2 Types of Software Programs... 2 Software Programs and the Computing Machine...

More information

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS

HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS HARNESSING CERTAINTY TO SPEED TASK-ALLOCATION ALGORITHMS FOR MULTI-ROBOT SYSTEMS An Undergraduate Research Scholars Thesis by DENISE IRVIN Submitted to the Undergraduate Research Scholars program at Texas

More information

Parallelizing SPECjbb2000 with Transactional Memory

Parallelizing SPECjbb2000 with Transactional Memory Parallelizing SPECjbb2000 with Transactional Memory JaeWoong Chung, Chi Cao Minh, Brian D. Carlstrom, Christos Kozyrakis Computer Systems Laboratory Stanford University {jwchung, caominh, bdc, kozyraki}@stanford.edu

More information

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable)

CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) CSL373: Lecture 5 Deadlocks (no process runnable) + Scheduling (> 1 process runnable) Past & Present Have looked at two constraints: Mutual exclusion constraint between two events is a requirement that

More information

Process- Concept &Process Scheduling OPERATING SYSTEMS

Process- Concept &Process Scheduling OPERATING SYSTEMS OPERATING SYSTEMS Prescribed Text Book Operating System Principles, Seventh Edition By Abraham Silberschatz, Peter Baer Galvin and Greg Gagne PROCESS MANAGEMENT Current day computer systems allow multiple

More information

Phases in Branch Targets of Java Programs

Phases in Branch Targets of Java Programs Phases in Branch Targets of Java Programs Technical Report CU-CS-983-04 ABSTRACT Matthias Hauswirth Computer Science University of Colorado Boulder, CO 80309 hauswirt@cs.colorado.edu Recent work on phase

More information

Operating Systems Unit 3

Operating Systems Unit 3 Unit 3 CPU Scheduling Algorithms Structure 3.1 Introduction Objectives 3.2 Basic Concepts of Scheduling. CPU-I/O Burst Cycle. CPU Scheduler. Preemptive/non preemptive scheduling. Dispatcher Scheduling

More information

OS Schedulers: Fair-Share Scheduling in the Windows Research Kernel (WRK) Version 1.0 June 2007

OS Schedulers: Fair-Share Scheduling in the Windows Research Kernel (WRK) Version 1.0 June 2007 Version 1.0 June 2007 Marty Humphrey Assistant Professor Department of Computer Science University of Virginia Charlottesville, VA 22904 The purpose of this experiment is to gain more experience with CPU

More information

Processes and Non-Preemptive Scheduling. Otto J. Anshus

Processes and Non-Preemptive Scheduling. Otto J. Anshus Processes and Non-Preemptive Scheduling Otto J. Anshus Threads Processes Processes Kernel An aside on concurrency Timing and sequence of events are key concurrency issues We will study classical OS concurrency

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Design Patterns for Real-Time Computer Music Systems

Design Patterns for Real-Time Computer Music Systems Design Patterns for Real-Time Computer Music Systems Roger B. Dannenberg and Ross Bencina 4 September 2005 This document contains a set of design patterns for real time systems, particularly for computer

More information

Sista: Improving Cog s JIT performance. Clément Béra

Sista: Improving Cog s JIT performance. Clément Béra Sista: Improving Cog s JIT performance Clément Béra Main people involved in Sista Eliot Miranda Over 30 years experience in Smalltalk VM Clément Béra 2 years engineer in the Pharo team Phd student starting

More information

ECE519 Advanced Operating Systems

ECE519 Advanced Operating Systems IT 540 Operating Systems ECE519 Advanced Operating Systems Prof. Dr. Hasan Hüseyin BALIK (10 th Week) (Advanced) Operating Systems 10. Multiprocessor, Multicore and Real-Time Scheduling 10. Outline Multiprocessor

More information