Overhead-Free Portable Thread-Stack Checkpoints

Size: px
Start display at page:

Download "Overhead-Free Portable Thread-Stack Checkpoints"

Transcription

1 Overhead-Free Portable Thread-Stack Checkpoints Ronald Veldema and Michael Philippsen University of Erlangen-Nuremberg, Computer Science Department 2, Martensstr Erlangen Germany {veldema, philippsen@cs.fau.de Abstract. Checkpointing is the process of taking a snapshot of a thread s stack and possibly the objects that it uses such that a thread can be either restarted (for error recovery) or moved to another machine (to improve load balancing). Current approaches to thread stack checkpointing are either not heterogeneous as they do not allow a call stack created using architecture X to be restored on a machine with architecture Y or they introduce large runtime overhead. In general, previous approaches add overhead by instrumenting each function in a program to constantly test if the current method invocation is for thread restoration purposes or whether it is a normal invocation. The instrumentation costs are even incurred when no checkpointing is performed. Our implementation introduces no runtime overhead during regular execution. Furthermore our approach supports heterogeneity. We implement this by letting our compiler create extra functions to portably save and rebuild activation records to and from a machine-independent format. Each variable of an activation record is described in terms of its usages in a variable usage descriptor string. As the computed variable usage descriptor strings for a given variable are the same on all architectures they are used to uniquely identify variables inside activation records across different architectures. 1 Introduction Checkpointing a thread is the process of taking a snapshot of all activation records that form the thread s call stack and the objects on the heap reachable from them for later restoration on another machine. There are many usages for checkpointing. For example: to migrate a thread from one machine to another machine, to directly utilize resources or special features that a specific machine has, or to provide rollback fault tolerance. Many checkpointing packages (including the system presented here) give a programmer access to a checkpoint this thread() function. Invoking this method causes all relevant information of the current thread such as the thread s stack, active registers, accessible heap to be saved to disk. Later, the program can be restarted using a special command line parameter that causes the thread to be completely restored. To the program, it then seems as if control just returned from the checkpoint this thread() function as if nothing has happened. Creating a fully heterogeneous checkpointing package is difficult because of the many architectural differences. First of all, local variables and parameters are stored in different physical locations and binary formats on different processors. Also, one machine can have more registers than another. If this occurs, less variables are allocated in

2 2 memory and more are allocated in registers (but the sum of the number of variables allocated in registers and memory remains the same). Variables that are stored in memory can also be stored in different locations inside the stackframes. We therefore cannot perform a simple bitwise copy of an entire activation record from one machine to another. Likewise, the layout in memory of an object can vary between architectures. We have implemented our checkpointing algorithm in Jackal [7]. Jackal includes an aggressively optimizing static compiler: the compiler accepts Java source code and generates an executable. Jackal, also supports heterogeneous clusters; objects and pointers can already be freely exchanged between different types of machines. Our new checkpointing algorithm solves all the problems mentioned above: a function s activation record can now be converted to a machine-independent format using a novel way to describe the individual variables. This allows a checkpoint to be restored on a different architecture than where the checkpoint was taken while imposing no overhead during normal execution. Although the techniques described here are implemented in a Java system, they are applicable to any type-safe programming language and has no function pointers. 2 Implementation The central problem in heterogeneous checkpointing is how to translate an activation record created on one architecture to the activation record on another architecture without changing the code on either architecture. I.e. how to associate variable X of function F allocated in register or stackframe position P with the X of F on a different architecture where X is allocated in stackframe position or register Q. For example, on an x86, a source code variable might be spilled to memory at -8(ebp) while on an IA64, the variable is allocated in register loc0. Unfortunately, checkpointing requires such low level details as it needs to machine-dependently save and restore a stackframe. Alas, source code variable names cannot be used to reassociate the variables with their physical locations as naming information is lost with increasing optimization levels. Compilers with simultaneous debugging and optimization in such cases only provide approximations to the debugger which won t suffice for our purposes. 2.1 Stack Checkpointing and Restoration A machine-independent description of the variables inside an activation record is needed. Our solution is to uniquely identify variables by characterizing how each register or memory location in a stackframe is used. This characterization is portable as it uniquely identifies variables without looking at how the variables are physically stored. A characterization of a variable (either in register or memory location) is initially in the form of a usage descriptor string. As checkpointing can only occur at a call instruction (due to calling checkpoint this thread directly or indirectly), we only need to create a usage descriptor string for each live variable at each callsite. The rule for usage descriptor strings is then: if on architectures X and Y the descriptor strings for variables v 1 and v 2 are the same, then the variables represent the

3 3 same variable. At runtime, the checkpoint file then consists of a series of tuples {usage descriptor string, value of variable in universal binary format. Our checkpointing algorithm operates in two passes at compile time. After all the machine-independent optimizations, we create lists of the above tuples for each live variable at each callsite. After code generation and machine-dependent optimizations we generate two helper functions: checkpoint(c in F) that checkpoints function F at callsite C and restore(c in F) that performs the reverse operation. The original callsites and functions are unaffected (pseudocode for checkpoint(c in F) and restore(c in F) is shown in 3). At runtime, checkpointing unwinds the stack from checkpoint this thread upward toward the thread s run method. For each activation record we locate checkpoint(c in F) for that callsite by hash lookup on the callsite s address. Checkpoint(C in F) then outputs for each live variable V a tuple {descriptor(v), value(v). As checkpoint(c in F) and restore(c in F) are machine dependently generated for that specific callsite, they know where each variable is physically located. Restoration recursively restores activation records until the whole call chain is restored. The restoration process for a single activation record starts by reading a single callsite descriptor string C in F. Next restore(c in F) is located and invoked. Restore(C in F) first reads the activation record s complete list of tuples {descriptor(l), value(l). For each live variable to restore, it searches for value(v) by searching for a matching descriptor(v). Restore(C in F) then converts and assigns value(v) to the right location in either memory or register as the code generated for F requires. After all formal parameters and live variables have been initialized, the activation record information for the next stack frame is read from the checkpoint (file). This continues until all activation records have been restored. After restoration, the call stack will look like a series of restore functions each calling the next. Transfer of control after a stackframe has been restored is implemented by performing a jump statement that jumps directly to the position in the function that was checkpointed. However, when that function returns, it will return to the generated restore function of the invoking activation record. That restore function will in turn immediately jump to the originally invoked function etc. Pseudo code for Restore(C in F) and checkpoint(c in F) is shown in Figure 3 for an IA64 restore and a matching x86 checkpointer. As can be seen in the example, the checkpointer creates tuples keyed on fixed destriptor strings that the restore function uses to locate the correct value to put in a given physical location. The next section explains how the arguments to read/write tuples are constructed. 2.2 Variable Descriptor Strings The key idea of our checkpointing algorithm is the concept of the usage descriptor string. A usage descriptor string describes machine independently, for each variable, how that variable is used inside a procedure. To ease the construction of these strings, they are created after machine-independent optimizations (see Figure 1). The descriptor string is created by traversing the Control Flow Graph (CFG) to find all usages of a variable. When building a descriptor for a variable A we first search for its definitions. Whenever a usage of A is found, one of the rules below is applied:

4 4 MACHINE INDEPENDENT OPTIMIZATIONS MACHINE SPECIFIC OPTIMIZATIONS (x86) ASSEMBLE x86 BINARY MACHINE SPECIFIC OPTIMIZATIONS (IA64) ASSEMBLE IA64 BINARY CREATE VARIABLE DESCRIPTORS CREATE METHODS CHECKPOINT(C in F) AND RESTORE(C in F) Fig. 1. The compiler s pipeline. Java code basic block void foo(int a, Object b, int c) { 0 a = a + 2; 0 int y=0; 0 do { int x=0; 1 do { zoo(a, b, c); 2 // live variables = {a,b,c,x,y 2 while(x++<10); 2 while(y++<10); 3 Fig. 2. Example: Live Variables and Usage Descriptors. // x86: a is in memory at -8(ebp) checkpoint zoo in foo(stackframe info f): // allow the correct restorer to be found for frame: write name string( restore zoo in foo ); write tuple( +P:1,C:2@B:0, f->mem( -8(ebp) )); // same for b, c, x, y, etc // IA64: a is in register loc0 restore zoo in foo: tuples[]t = read tuples(); loc0 = find tuple(t, +P:1,C:2@B:0 )->value; //... same as above for b,c,x,y etc call next restorer( read name string() ); jump to insn after call to zoo in foo; Fig. 3. Generated checkpoint and restore functions for Figure Upon encountering an assignment: A = constant, modify the descriptor string as follows: string(a) = string(a) + C:<constant> 2 Upon encountering: A = B, continue the search for usages of variable B. 3 When A is the return value of a call: A = call, then modify the string as follows: string(a) = string(a) + call:<index of call in all calls inside the containing function> 4 When A is assigned the value of a formal parameter: A = param(x), then append the parameter s index into the parameter list to the string. String(A) = string(a) + P:<index of param(x)> 5 When A is given the value of an object field: A = object access(expression, field), then change the descriptor as follows: string(a) = string(a) + access:field, and continue building the descriptor with the variables in expression. 6 When encountering a generic assignment such as: A = B op C, where op is one of the binary operands such as +,,, /, then append <op>, string(b) and string(c) to the string of A. 7 When making a modification to a usage descriptor string using one of the above rules, add a basic block identifier to the string: string(a) = string(a) As an example, let us construct the usage descriptor strings for the live variables in the code given in Figure 2.2. At zoo in foo, the compiler computes the set of live variables (a, b, c, x, y). The analysis pass takes this set and performs a traversal over the control flow graph of the function.

5 5 During the traversal, the initialization of x is encountered and the usage descriptor string is updated by appending the string C:0 (rule 1). That happens in basic block 1 thus the is appended to the descriptor string (rule 7). Likewise, initialization of y in basic block 0, results in the string C:0@B:0. The assignment to a is more complex. Evaluation of a + 2 delivers a string +P:1,C:2@B:0 (rules 6 and 7), the + from the addition operation, the C:2 from the constant and the P:1 because it is a formal parameter (rule 4). Variables b and c, are formal parameters 2 and 3, thus they receive the strings P:2@B:0 and P:3@B:0 respectively. During code generation and machine-specific optimization these variables are renamed along with the actual variables. During register allocation, for example, replacing register a with memory location b also replaces the a in the live variable set. 2.3 Compressing Variable Descriptor Strings One potential performance problem with the above descriptor strings is the length of the strings that need to be saved in the checkpoint (file). For a large procedure with many variables and where each of these variables is used many times, large strings may result. This can cause large checkpoint files or huge amounts of data to be sent over the network when migrating a thread. In high performance computing where optimizing thread migration latencies are an issue this needs to be avoided. To combat this effect, the compiler can sort the list of the created descriptor strings for a given callsite and assign each resulting string a 16 bit index in the sorted table. The descriptor strings are then replaced by 16 bit table indexes. Most importantly, the table indexes are emitted to the stream instead of the descriptor strings themselves. This transformation is correct as we are replacing unique strings with unique identifiers. Notice that we need the descriptor strings to generate the integers. We cannot directly construct the integers. 3 Performance Two aspects of performance are important for any checkpointing implementation: the size of the resulting checkpoint (be it a file or sent over a network for thread migration) and the time it takes to perform checkpointing and to restore from a checkpoint. For three small benchmark applications and two architectures we will examine the performance of our checkpointer. We did not modify the applications except of inserting a single call to start our checkpointer. Everything else is fully automated. All performance tests were run on both a 2.4 GHz Pentium IV (Linux ) and a 900 MHz Itanium II (Linux ). The x86 had a 160 GByte IDE disk with 2 MByte cache and 1 GByte of RAM. The IA64 had an SCSI RAID0 with 73 GByte per disk and 10 GByte of RAM. Table 1 displays the sizes of the generated checkpoint files and how long it took to write them. As the checkpoints are written to disk, the checkpoint includes a copy of the reachable heap from the thread that requested checkpointing. As the size of the checkpoint file is independent of the architecture one table suffices.

6 6 Table 1. Checkpoint file sizes and checkpointing wall time (seconds) A) Checkpoint sizes. #checkpoints total size (KBytes) total size compressed (KBytes) Fib(17) Matrix Water B) Checkpointing run times. x86 checkpoint x86 compressed x86 no checkpoint x86 restore Fib(17) Matrix Water IA64 checkpoint IA64 compressed IA64 no checkpoint IA64 restore Fib(17) Matrix Water To ensure that the checkpointing implementation works correctly, each application is given the checkpoint file generated on the alternate architecture when performing restoration. Again, note that the IA64 has many more registers than the x86 and has a different pointer size (64 bit vs 32 bit). class Fib { public long fib(long n) { if (n < 2) { checkpoint this thread(); return n; return fib(n-1) + fib(n-2); public static void main(string args[]) { System.out.println( fib = + new Fib().fib(17)); Fig. 4. Fibonacci example. Fibonacci. Fibonacci recursively computes Fibonacci numbers (see Figure 4). A checkpoint is made in each leaf of the recursion to create a large number of checkpoints. On the x86, a single checkpoint takes 0.18 milliseconds (( seconds)/2584 checkpoints) including file I/O. Restoring a checkpoint takes about 0.7 ms for a 5020 byte checkpoint file. Compression (Section 2.3) reduces the size of the checkpoint by about 15% on average. Checkpoint construction time compared to performing the file I/O times are minimal. As Fibonacci uses only one object on the heap, heap checkpointing times are zero. On the IA64 checkpointing is faster because of better disk performance. Overall the IA64 is slower because of its lower clock speed (900 MHz vs 2.4 GHz).

7 7 Matrix. The 2D array benchmark is designed to test how well the checkpointing routines perform when writing large volumes of data. In total, five arrays of arrays (1024x1024) are written to disk for a total of 40+ MBytes. Compression of the usage descriptor strings does not gain much as there are only 12 variables in total over all activation records. Virtually no computation is performed in this benchmark, all time is spent in the runtime system and kernel. Checkpoint and restore are much slower on the IA64 than on the x86 because of slower clock and memory speeds which impacts the speed in which objects can be allocated. Water. Water performs an N-body simulation of a number of water molecules. A water molecule is coded as a 4D array of doubles holding position, force, velocity and acceleration. Our input data set contains 1728 water molecules. A checkpoint is made after each time step and contains all molecules, a thread object, a force computation object, and some small arrays to maintain global state. There is very little stack to checkpoint but the effort to enable checkpointing was zero: no code needed to be written to explicitly write the molecules to the checkpoint file. Two checkpoint files are created of 4.2 MBytes each. Compression of the usage descriptor strings aids little in reducing checkpoint file sizes as the number of live variables when checkpointing is small. On both machines, checkpointing costs are less than 1% of the total runtime. 4 Related Work There are many packages and systems that offer checkpointing services to applications [1, 2, 5]. Most packages do not support heterogeneity at all. Heterogeneity in this context means that a checkpoint created on one architecture can be restored on another architecture. In general, packages that do support heterogeneity have high runtime overheads due to code instrumentation or employ very complex and error prone implementation techniques. Related work can be roughly divided into two classes: those that use a compiler and those that implement their checkpointing algorithm inside a library/os. Although library/os checkpointing packages are simple to implement, they do not support heterogeneity. Library approaches include, for example, Condor [3] and libchkpt [4]. Bouchenak et al. [2] created a system for JavaThread serialization based on decompilation for Java JITs. JIT-generated code is decompiled to the same format that the interpreter would use for its stackframes. This ensures that the checkpointer only has to deal with the Java operand stack in interpreted form. However, with increasing levels of optimization the process of decompilation will become increasingly difficult. Porch [1] is a project to create a small preprocessor for C programs to allow heterogeneous checkpointing. The preprocessor instruments each function of a program with some extra code to test if that function is to perform restoration, checkpointing, or to execute the actual code of that function as normal. However, the code introduced to perform the checkpointing/restoration causes substantial overhead during normal program execution. PREACHES [6] offers heterogeneous checkpointing. Instead of creating a single generic checkpoint suitable for all architectures, PREACHES creates checkpoints suitable for each architecture that the user might wish to restore the checkpoint on. The

8 8 downside is that at all times during a programs execution a machine of each different architecture needs to be available to perform the conversion of stackframes to each architecture. 5 Conclusions We have described an algorithm where a compiler creates two extra functions for each callsite to save and restore the state of the activation record at that point. When not performing checkpointing, our algorithm has no overhead. When checkpointing is enabled, our algorithm generates portable checkpoint files: a checkpoint that is created on one architecture can be restored on another by using activation record variable descriptors. The variable usage descriptors are portable entities as they describe how a variable is used rather than describing its location. The checkpoint file sizes are moderate and can be decreased further by our proposed simple compression scheme. References 1. B. Ramkumar and V. Strumpen. Portable checkpointing for heterogeneous architectures. In In 27th International Symposium on Fault-Tolerant Computing - Digest of Papers, pages 58 67, S. Bouchenak, D. Hagimont, and N. De Palma. Techniques for Implementing Efficient Java Thread Serialization. In ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 03), July M. Litzkow and M. Solomon. Supporting Checkpointing and Process Migration Outside the UNIX Kernel. In Usenix Conference Proceedings, pages , January J.S. Plank, M. Beck, G. Kingsley, and K. Li. Technical report, libckpt: Transparent checkpointing under unix. Technical Report UT-CS , P. Smith and N.C. Hutchinson. Heterogeneous process migration: The Tui system. Software Practice and Experience, 28(6): , Kuo-Feng Ssu and W. Kent Fuchs. PREACHES - portable recovery and checkpointing in heterogeneous systems. In Symposium on Fault-Tolerant Computing, pages 38 47, R. Veldema, R. F. H. Hofman, R. A. F. Bhoedjang, and H. E. Bal. Runtime optimizations for a Java DSM implementation. In 2001 joint ACM-ISCOPE Conference on Java Grande, pages , Palo Alto, CA., 2001.

What is checkpoint. Checkpoint libraries. Where to checkpoint? Why we need it? When to checkpoint? Who need checkpoint?

What is checkpoint. Checkpoint libraries. Where to checkpoint? Why we need it? When to checkpoint? Who need checkpoint? What is Checkpoint libraries Bosilca George bosilca@cs.utk.edu Saving the state of a program at a certain point so that it can be restarted from that point at a later time or on a different machine. interruption

More information

An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution

An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution Prashanth P. Bungale +, Swaroop Sridhar + Department of Computer Science The

More information

Space-Efficient Page-Level Incremental Checkpointing *

Space-Efficient Page-Level Incremental Checkpointing * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 22, 237-246 (2006) Space-Efficient Page-Level Incremental Checkpointing * JUNYOUNG HEO, SANGHO YI, YOOKUN CHO AND JIMAN HONG + School of Computer Science

More information

A Behavior Based File Checkpointing Strategy

A Behavior Based File Checkpointing Strategy Behavior Based File Checkpointing Strategy Yifan Zhou Instructor: Yong Wu Wuxi Big Bridge cademy Wuxi, China 1 Behavior Based File Checkpointing Strategy Yifan Zhou Wuxi Big Bridge cademy Wuxi, China bstract

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 16 - Checkpointing I Chapter 6 - Checkpointing Part.16.1 Failure During Program Execution Computers today are much faster,

More information

the Cornell Checkpoint (pre-)compiler

the Cornell Checkpoint (pre-)compiler 3 the Cornell Checkpoint (pre-)compiler Daniel Marques Department of Computer Science Cornell University CS 612 April 10, 2003 Outline Introduction and background Checkpointing process state Checkpointing

More information

CA Compiler Construction

CA Compiler Construction CA4003 - Compiler Construction David Sinclair When procedure A calls procedure B, we name procedure A the caller and procedure B the callee. A Runtime Environment, also called an Activation Record, is

More information

System Software Assignment 1 Runtime Support for Procedures

System Software Assignment 1 Runtime Support for Procedures System Software Assignment 1 Runtime Support for Procedures Exercise 1: Nested procedures Some programming languages like Oberon and Pascal support nested procedures. 1. Find a run-time structure for such

More information

processes based on Message Passing Interface

processes based on Message Passing Interface Checkpointing and Migration of parallel processes based on Message Passing Interface Zhang Youhui, Wang Dongsheng, Zheng Weimin Department of Computer Science, Tsinghua University, China. Abstract This

More information

LLVM code generation and implementation of nested functions for the SimpliC language

LLVM code generation and implementation of nested functions for the SimpliC language LLVM code generation and implementation of nested functions for the SimpliC language Oscar Legetth Lunds University dat12ole@student.lth.se Gustav Svensson Lunds University dat12gs1@student.lth.se Abstract

More information

Processes. Johan Montelius KTH

Processes. Johan Montelius KTH Processes Johan Montelius KTH 2017 1 / 47 A process What is a process?... a computation a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other

More information

A process. the stack

A process. the stack A process Processes Johan Montelius What is a process?... a computation KTH 2017 a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other processes

More information

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington

Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems Robert Grimm University of Washington Extensions Added to running system Interact through low-latency interfaces Form

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution

EECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution 1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can

More information

CS201 - Introduction to Programming Glossary By

CS201 - Introduction to Programming Glossary By CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with

More information

Call Paths for Pin Tools

Call Paths for Pin Tools , Xu Liu, and John Mellor-Crummey Department of Computer Science Rice University CGO'14, Orlando, FL February 17, 2014 What is a Call Path? main() A() B() Foo() { x = *ptr;} Chain of function calls that

More information

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation

ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

Compilers and Code Optimization EDOARDO FUSELLA

Compilers and Code Optimization EDOARDO FUSELLA Compilers and Code Optimization EDOARDO FUSELLA Contents Data memory layout Instruction selection Register allocation Data memory layout Memory Hierarchy Capacity vs access speed Main memory Classes of

More information

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion

CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion Review from Lectures 5 & 6 Arrays and pointers, Pointer arithmetic and dereferencing, Types of memory ( automatic, static,

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

Computer Organization & Assembly Language Programming

Computer Organization & Assembly Language Programming Computer Organization & Assembly Language Programming CSE 2312 Lecture 11 Introduction of Assembly Language 1 Assembly Language Translation The Assembly Language layer is implemented by translation rather

More information

Chapter 8 & Chapter 9 Main Memory & Virtual Memory

Chapter 8 & Chapter 9 Main Memory & Virtual Memory Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array

More information

Hardware-Supported Pointer Detection for common Garbage Collections

Hardware-Supported Pointer Detection for common Garbage Collections 2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute

More information

An Overview of the BLITZ System

An Overview of the BLITZ System An Overview of the BLITZ System Harry H. Porter III Department of Computer Science Portland State University Introduction The BLITZ System is a collection of software designed to support a university-level

More information

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18

PROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18 PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations

More information

A Bytecode Interpreter for Secure Program Execution in Untrusted Main Memory

A Bytecode Interpreter for Secure Program Execution in Untrusted Main Memory A Bytecode Interpreter for Secure Program Execution in Untrusted Main Memory Maximilian Seitzer, Michael Gruhn, Tilo Müller Friedrich Alexander Universität Erlangen-Nürnberg https://www1.cs.fau.de Introduction

More information

CS 326: Operating Systems. Process Execution. Lecture 5

CS 326: Operating Systems. Process Execution. Lecture 5 CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation

More information

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.

Code Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline. The Main Idea of Today s Lecture Code Generation We can emit stack-machine-style code for expressions via recursion (We will use MIPS assembly as our target language) 2 Lecture Outline What are stack machines?

More information

We can emit stack-machine-style code for expressions via recursion

We can emit stack-machine-style code for expressions via recursion Code Generation The Main Idea of Today s Lecture We can emit stack-machine-style code for expressions via recursion (We will use MIPS assembly as our target language) 2 Lecture Outline What are stack machines?

More information

Enhanced Debugging with Traces

Enhanced Debugging with Traces Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult

More information

Lecture 7: Examples, MARS, Arithmetic

Lecture 7: Examples, MARS, Arithmetic Lecture 7: Examples, MARS, Arithmetic Today s topics: More examples MARS intro Numerical representations 1 Dealing with Characters Instructions are also provided to deal with byte-sized and half-word quantities:

More information

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Computer Systems A Programmer s Perspective 1 (Beta Draft) Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface

More information

CPS311 Lecture: Procedures Last revised 9/9/13. Objectives:

CPS311 Lecture: Procedures Last revised 9/9/13. Objectives: CPS311 Lecture: Procedures Last revised 9/9/13 Objectives: 1. To introduce general issues that any architecture must address in terms of calling/returning from procedures, passing parameters (including

More information

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)

A software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp) A software view User Interface Computer Systems MTSU CSCI 3240 Spring 2016 Dr. Hyrum D. Carroll Materials from CMU and Dr. Butler How it works hello.c #include int main() { printf( hello, world\n

More information

CS4215 Programming Language Implementation

CS4215 Programming Language Implementation CS4215 Programming Language Implementation You have 45 minutes to complete the exam. Use a B2 pencil to fill up the provided MCQ form. Leave Section A blank. Fill up Sections B and C. After finishing,

More information

Chapter 3 Process Description and Control

Chapter 3 Process Description and Control Operating Systems: Internals and Design Principles Chapter 3 Process Description and Control Seventh Edition By William Stallings Process Control Block Structure of Process Images in Virtual Memory How

More information

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition

Chapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure

More information

Recursive Definitions

Recursive Definitions Recursion Objectives Explain the underlying concepts of recursion Examine recursive methods and unravel their processing steps Explain when recursion should and should not be used Demonstrate the use of

More information

DRAFT -- DRAFT -- DRAFT -- DRAFT -- DRAFT --

DRAFT -- DRAFT -- DRAFT -- DRAFT -- DRAFT -- CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1 7 [Version: 2002/09/19 v2.02] Reparallelization techniques for migrating OpenMP codes in computational

More information

OVERVIEW. Recursion is an algorithmic technique where a function calls itself directly or indirectly. Why learn recursion?

OVERVIEW. Recursion is an algorithmic technique where a function calls itself directly or indirectly. Why learn recursion? CH. 5 RECURSION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016) OVERVIEW Recursion is an algorithmic

More information

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1 Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies

More information

Why do we care about parallel?

Why do we care about parallel? Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The

More information

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1

SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1 SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine David Bélanger dbelan2@cs.mcgill.ca Sable Research Group McGill University Montreal, QC January 28, 2004 SABLEJIT: A Retargetable

More information

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University

COS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University COS 318: Operating Systems Overview Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Logistics Precepts: Tue: 7:30pm-8:30pm, 105 CS

More information

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.

a) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage. CS3410 Spring 2015 Problem Set 2 (version 3) Due Saturday, April 25, 11:59 PM (Due date for Problem-5 is April 20, 11:59 PM) NetID: Name: 200 points total. Start early! This is a big problem set. Problem

More information

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts

Memory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of

More information

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1 CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines

More information

Migration Transparency in a Mobile Agent Based Computational Grid

Migration Transparency in a Mobile Agent Based Computational Grid Migration Transparency in a Mobile Agent Based Computational Grid RAFAEL FERNANDES LOPES and FRANCISCO JOSÉ DA SILVA E SILVA Departamento de Informática Universidade Federal do Maranhão, UFMA Av dos Portugueses,

More information

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:

More information

Experiences Implementing Efficient Java Thread Serialization, Mobility and Persistence

Experiences Implementing Efficient Java Thread Serialization, Mobility and Persistence SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2000; 00:1 7 [Version: 2002/09/23 v2.2] Experiences Implementing Efficient Java Thread Serialization, Mobility and Persistence S. Bouchenak, D. Hagimont,

More information

File Systems. File system interface (logical view) File system implementation (physical view)

File Systems. File system interface (logical view) File system implementation (physical view) File Systems File systems provide long-term information storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple processes must be able

More information

CS61 Section Solutions 3

CS61 Section Solutions 3 CS61 Section Solutions 3 (Week of 10/1-10/5) 1. Assembly Operand Specifiers 2. Condition Codes 3. Jumps 4. Control Flow Loops 5. Procedure Calls 1. Assembly Operand Specifiers Q1 Operand Value %eax 0x104

More information

Memory Space Representation for Heterogeneous Network Process Migration

Memory Space Representation for Heterogeneous Network Process Migration Memory Space Representation for Heterogeneous Network Process Migration Kasidit Chanchio Xian-He Sun Department of Computer Science Louisiana State University Baton Rouge, LA 70803-4020 sun@bit.csc.lsu.edu

More information

CS307: Operating Systems

CS307: Operating Systems CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn

More information

Sistemi in Tempo Reale

Sistemi in Tempo Reale Laurea Specialistica in Ingegneria dell'automazione Sistemi in Tempo Reale Giuseppe Lipari Introduzione alla concorrenza Fundamentals Algorithm: It is the logical procedure to solve a certain problem It

More information

Chapter 8: Memory-Management Strategies

Chapter 8: Memory-Management Strategies Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and

More information

Threads (light weight processes) Chester Rebeiro IIT Madras

Threads (light weight processes) Chester Rebeiro IIT Madras Threads (light weight processes) Chester Rebeiro IIT Madras 1 Processes Separate streams of execution Each process isolated from the other Process state contains Process ID Environment Working directory.

More information

WRL Research Report 98/5. Efficient Dynamic Procedure Placement. Daniel J. Scales. d i g i t a l

WRL Research Report 98/5. Efficient Dynamic Procedure Placement. Daniel J. Scales. d i g i t a l A U G U S T 1 9 9 8 WRL Research Report 98/5 Efficient Dynamic Procedure Placement Daniel J. Scales d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA The Western

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )

Chapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont ) Chapter 2 Computer Abstractions and Technology Lesson 4: MIPS (cont ) Logical Operations Instructions for bitwise manipulation Operation C Java MIPS Shift left >>> srl Bitwise

More information

Stacks and Frames Demystified. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han

Stacks and Frames Demystified. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han s and Frames Demystified CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han Announcements Homework Set #2 due Friday at 11 am - extension Program Assignment #1 due Tuesday Feb. 15 at 11 am - note extension

More information

Semantic Analysis and Type Checking

Semantic Analysis and Type Checking Semantic Analysis and Type Checking The compilation process is driven by the syntactic structure of the program as discovered by the parser Semantic routines: interpret meaning of the program based on

More information

Virtual Machines and Dynamic Translation: Implementing ISAs in Software

Virtual Machines and Dynamic Translation: Implementing ISAs in Software Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application

More information

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree. Group B Assignment 9 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code generation using DAG. 9.1.1 Problem Definition: Code generation using DAG / labeled tree. 9.1.2 Perquisite: Lex, Yacc,

More information

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs

Performance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,

More information

Interaction of JVM with x86, Sparc and MIPS

Interaction of JVM with x86, Sparc and MIPS Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical

More information

CS399 New Beginnings. Jonathan Walpole

CS399 New Beginnings. Jonathan Walpole CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,

More information

Checkpoint (T1) Thread 1. Thread 1. Thread2. Thread2. Time

Checkpoint (T1) Thread 1. Thread 1. Thread2. Thread2. Time Using Reection for Checkpointing Concurrent Object Oriented Programs Mangesh Kasbekar, Chandramouli Narayanan, Chita R Das Department of Computer Science & Engineering The Pennsylvania State University

More information

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer

Computer Architecture. Chapter 2-2. Instructions: Language of the Computer Computer Architecture Chapter 2-2 Instructions: Language of the Computer 1 Procedures A major program structuring mechanism Calling & returning from a procedure requires a protocol. The protocol is a sequence

More information

CSc 453 Interpreters & Interpretation

CSc 453 Interpreters & Interpretation CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson Interpreters An interpreter is a program that executes another program. An interpreter implements a virtual machine,

More information

Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June

Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June 21 2008 Agenda Introduction Gemulator Bochs Proposed ISA Extensions Conclusions and Future Work Q & A Jun-21-2008 AMAS-BT 2008 2 Introduction

More information

Chapter 8 Virtual Memory

Chapter 8 Virtual Memory Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES

CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide

More information

Run-time Environments

Run-time Environments Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction

More information

kguard++: Improving the Performance of kguard with Low-latency Code Inflation

kguard++: Improving the Performance of kguard with Low-latency Code Inflation kguard++: Improving the Performance of kguard with Low-latency Code Inflation Jordan P. Hendricks Brown University Abstract In this paper, we introduce low-latency code inflation for kguard, a GCC plugin

More information

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program. Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation

More information

Chapter 8: Main Memory. Operating System Concepts 9 th Edition

Chapter 8: Main Memory. Operating System Concepts 9 th Edition Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

JVM ByteCode Interpreter

JVM ByteCode Interpreter JVM ByteCode Interpreter written in Haskell (In under 1000 Lines of Code) By Louis Jenkins Presentation Schedule ( 15 Minutes) Discuss and Run the Virtual Machine first

More information

Programming Techniques Programming Languages Programming Languages

Programming Techniques Programming Languages Programming Languages Ecient Java RMI for Parallel Programming Jason Maassen, Rob van Nieuwpoort, Ronald Veldema, Henri Bal, Thilo Kielmann, Ceriel Jacobs, Rutger Hofman Department of Mathematics and Computer Science, Vrije

More information

Assignment 11: functions, calling conventions, and the stack

Assignment 11: functions, calling conventions, and the stack Assignment 11: functions, calling conventions, and the stack ECEN 4553 & 5013, CSCI 4555 & 5525 Prof. Jeremy G. Siek December 5, 2008 The goal of this week s assignment is to remove function definitions

More information

CS1 Recitation. Week 2

CS1 Recitation. Week 2 CS1 Recitation Week 2 Sum of Squares Write a function that takes an integer n n must be at least 0 Function returns the sum of the square of each value between 0 and n, inclusive Code: (define (square

More information

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors

More information

Combining Analyses, Combining Optimizations - Summary

Combining Analyses, Combining Optimizations - Summary Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

Advanced Memory Management

Advanced Memory Management Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions

More information

StackVsHeap SPL/2010 SPL/20

StackVsHeap SPL/2010 SPL/20 StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of

More information

Mechanisms for entering the system

Mechanisms for entering the system Mechanisms for entering the system Yolanda Becerra Fontal Juan José Costa Prats Facultat d'informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) BarcelonaTech 2017-2018 QP Content Introduction

More information

Lecture 4: Memory Management & The Programming Interface

Lecture 4: Memory Management & The Programming Interface CS 422/522 Design & Implementation of Operating Systems Lecture 4: Memory Management & The Programming Interface Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken

More information

Chapter 8: Main Memory

Chapter 8: Main Memory Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel

More information

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance

More information

Accelerated Library Framework for Hybrid-x86

Accelerated Library Framework for Hybrid-x86 Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit

More information

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler From Whence It Came: Detecting Source Code Clones by Analyzing Assembler Ian J. Davis and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada

More information

Processors, Performance, and Profiling

Processors, Performance, and Profiling Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode

More information

Transparent Pointer Compression for Linked Data Structures

Transparent Pointer Compression for Linked Data Structures Transparent Pointer Compression for Linked Data Structures lattner@cs.uiuc.edu Vikram Adve vadve@cs.uiuc.edu June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu llvm.cs.uiuc.edu/ Growth of 64-bit computing

More information