Overhead-Free Portable Thread-Stack Checkpoints
|
|
- Polly Cook
- 5 years ago
- Views:
Transcription
1 Overhead-Free Portable Thread-Stack Checkpoints Ronald Veldema and Michael Philippsen University of Erlangen-Nuremberg, Computer Science Department 2, Martensstr Erlangen Germany {veldema, philippsen@cs.fau.de Abstract. Checkpointing is the process of taking a snapshot of a thread s stack and possibly the objects that it uses such that a thread can be either restarted (for error recovery) or moved to another machine (to improve load balancing). Current approaches to thread stack checkpointing are either not heterogeneous as they do not allow a call stack created using architecture X to be restored on a machine with architecture Y or they introduce large runtime overhead. In general, previous approaches add overhead by instrumenting each function in a program to constantly test if the current method invocation is for thread restoration purposes or whether it is a normal invocation. The instrumentation costs are even incurred when no checkpointing is performed. Our implementation introduces no runtime overhead during regular execution. Furthermore our approach supports heterogeneity. We implement this by letting our compiler create extra functions to portably save and rebuild activation records to and from a machine-independent format. Each variable of an activation record is described in terms of its usages in a variable usage descriptor string. As the computed variable usage descriptor strings for a given variable are the same on all architectures they are used to uniquely identify variables inside activation records across different architectures. 1 Introduction Checkpointing a thread is the process of taking a snapshot of all activation records that form the thread s call stack and the objects on the heap reachable from them for later restoration on another machine. There are many usages for checkpointing. For example: to migrate a thread from one machine to another machine, to directly utilize resources or special features that a specific machine has, or to provide rollback fault tolerance. Many checkpointing packages (including the system presented here) give a programmer access to a checkpoint this thread() function. Invoking this method causes all relevant information of the current thread such as the thread s stack, active registers, accessible heap to be saved to disk. Later, the program can be restarted using a special command line parameter that causes the thread to be completely restored. To the program, it then seems as if control just returned from the checkpoint this thread() function as if nothing has happened. Creating a fully heterogeneous checkpointing package is difficult because of the many architectural differences. First of all, local variables and parameters are stored in different physical locations and binary formats on different processors. Also, one machine can have more registers than another. If this occurs, less variables are allocated in
2 2 memory and more are allocated in registers (but the sum of the number of variables allocated in registers and memory remains the same). Variables that are stored in memory can also be stored in different locations inside the stackframes. We therefore cannot perform a simple bitwise copy of an entire activation record from one machine to another. Likewise, the layout in memory of an object can vary between architectures. We have implemented our checkpointing algorithm in Jackal [7]. Jackal includes an aggressively optimizing static compiler: the compiler accepts Java source code and generates an executable. Jackal, also supports heterogeneous clusters; objects and pointers can already be freely exchanged between different types of machines. Our new checkpointing algorithm solves all the problems mentioned above: a function s activation record can now be converted to a machine-independent format using a novel way to describe the individual variables. This allows a checkpoint to be restored on a different architecture than where the checkpoint was taken while imposing no overhead during normal execution. Although the techniques described here are implemented in a Java system, they are applicable to any type-safe programming language and has no function pointers. 2 Implementation The central problem in heterogeneous checkpointing is how to translate an activation record created on one architecture to the activation record on another architecture without changing the code on either architecture. I.e. how to associate variable X of function F allocated in register or stackframe position P with the X of F on a different architecture where X is allocated in stackframe position or register Q. For example, on an x86, a source code variable might be spilled to memory at -8(ebp) while on an IA64, the variable is allocated in register loc0. Unfortunately, checkpointing requires such low level details as it needs to machine-dependently save and restore a stackframe. Alas, source code variable names cannot be used to reassociate the variables with their physical locations as naming information is lost with increasing optimization levels. Compilers with simultaneous debugging and optimization in such cases only provide approximations to the debugger which won t suffice for our purposes. 2.1 Stack Checkpointing and Restoration A machine-independent description of the variables inside an activation record is needed. Our solution is to uniquely identify variables by characterizing how each register or memory location in a stackframe is used. This characterization is portable as it uniquely identifies variables without looking at how the variables are physically stored. A characterization of a variable (either in register or memory location) is initially in the form of a usage descriptor string. As checkpointing can only occur at a call instruction (due to calling checkpoint this thread directly or indirectly), we only need to create a usage descriptor string for each live variable at each callsite. The rule for usage descriptor strings is then: if on architectures X and Y the descriptor strings for variables v 1 and v 2 are the same, then the variables represent the
3 3 same variable. At runtime, the checkpoint file then consists of a series of tuples {usage descriptor string, value of variable in universal binary format. Our checkpointing algorithm operates in two passes at compile time. After all the machine-independent optimizations, we create lists of the above tuples for each live variable at each callsite. After code generation and machine-dependent optimizations we generate two helper functions: checkpoint(c in F) that checkpoints function F at callsite C and restore(c in F) that performs the reverse operation. The original callsites and functions are unaffected (pseudocode for checkpoint(c in F) and restore(c in F) is shown in 3). At runtime, checkpointing unwinds the stack from checkpoint this thread upward toward the thread s run method. For each activation record we locate checkpoint(c in F) for that callsite by hash lookup on the callsite s address. Checkpoint(C in F) then outputs for each live variable V a tuple {descriptor(v), value(v). As checkpoint(c in F) and restore(c in F) are machine dependently generated for that specific callsite, they know where each variable is physically located. Restoration recursively restores activation records until the whole call chain is restored. The restoration process for a single activation record starts by reading a single callsite descriptor string C in F. Next restore(c in F) is located and invoked. Restore(C in F) first reads the activation record s complete list of tuples {descriptor(l), value(l). For each live variable to restore, it searches for value(v) by searching for a matching descriptor(v). Restore(C in F) then converts and assigns value(v) to the right location in either memory or register as the code generated for F requires. After all formal parameters and live variables have been initialized, the activation record information for the next stack frame is read from the checkpoint (file). This continues until all activation records have been restored. After restoration, the call stack will look like a series of restore functions each calling the next. Transfer of control after a stackframe has been restored is implemented by performing a jump statement that jumps directly to the position in the function that was checkpointed. However, when that function returns, it will return to the generated restore function of the invoking activation record. That restore function will in turn immediately jump to the originally invoked function etc. Pseudo code for Restore(C in F) and checkpoint(c in F) is shown in Figure 3 for an IA64 restore and a matching x86 checkpointer. As can be seen in the example, the checkpointer creates tuples keyed on fixed destriptor strings that the restore function uses to locate the correct value to put in a given physical location. The next section explains how the arguments to read/write tuples are constructed. 2.2 Variable Descriptor Strings The key idea of our checkpointing algorithm is the concept of the usage descriptor string. A usage descriptor string describes machine independently, for each variable, how that variable is used inside a procedure. To ease the construction of these strings, they are created after machine-independent optimizations (see Figure 1). The descriptor string is created by traversing the Control Flow Graph (CFG) to find all usages of a variable. When building a descriptor for a variable A we first search for its definitions. Whenever a usage of A is found, one of the rules below is applied:
4 4 MACHINE INDEPENDENT OPTIMIZATIONS MACHINE SPECIFIC OPTIMIZATIONS (x86) ASSEMBLE x86 BINARY MACHINE SPECIFIC OPTIMIZATIONS (IA64) ASSEMBLE IA64 BINARY CREATE VARIABLE DESCRIPTORS CREATE METHODS CHECKPOINT(C in F) AND RESTORE(C in F) Fig. 1. The compiler s pipeline. Java code basic block void foo(int a, Object b, int c) { 0 a = a + 2; 0 int y=0; 0 do { int x=0; 1 do { zoo(a, b, c); 2 // live variables = {a,b,c,x,y 2 while(x++<10); 2 while(y++<10); 3 Fig. 2. Example: Live Variables and Usage Descriptors. // x86: a is in memory at -8(ebp) checkpoint zoo in foo(stackframe info f): // allow the correct restorer to be found for frame: write name string( restore zoo in foo ); write tuple( +P:1,C:2@B:0, f->mem( -8(ebp) )); // same for b, c, x, y, etc // IA64: a is in register loc0 restore zoo in foo: tuples[]t = read tuples(); loc0 = find tuple(t, +P:1,C:2@B:0 )->value; //... same as above for b,c,x,y etc call next restorer( read name string() ); jump to insn after call to zoo in foo; Fig. 3. Generated checkpoint and restore functions for Figure Upon encountering an assignment: A = constant, modify the descriptor string as follows: string(a) = string(a) + C:<constant> 2 Upon encountering: A = B, continue the search for usages of variable B. 3 When A is the return value of a call: A = call, then modify the string as follows: string(a) = string(a) + call:<index of call in all calls inside the containing function> 4 When A is assigned the value of a formal parameter: A = param(x), then append the parameter s index into the parameter list to the string. String(A) = string(a) + P:<index of param(x)> 5 When A is given the value of an object field: A = object access(expression, field), then change the descriptor as follows: string(a) = string(a) + access:field, and continue building the descriptor with the variables in expression. 6 When encountering a generic assignment such as: A = B op C, where op is one of the binary operands such as +,,, /, then append <op>, string(b) and string(c) to the string of A. 7 When making a modification to a usage descriptor string using one of the above rules, add a basic block identifier to the string: string(a) = string(a) As an example, let us construct the usage descriptor strings for the live variables in the code given in Figure 2.2. At zoo in foo, the compiler computes the set of live variables (a, b, c, x, y). The analysis pass takes this set and performs a traversal over the control flow graph of the function.
5 5 During the traversal, the initialization of x is encountered and the usage descriptor string is updated by appending the string C:0 (rule 1). That happens in basic block 1 thus the is appended to the descriptor string (rule 7). Likewise, initialization of y in basic block 0, results in the string C:0@B:0. The assignment to a is more complex. Evaluation of a + 2 delivers a string +P:1,C:2@B:0 (rules 6 and 7), the + from the addition operation, the C:2 from the constant and the P:1 because it is a formal parameter (rule 4). Variables b and c, are formal parameters 2 and 3, thus they receive the strings P:2@B:0 and P:3@B:0 respectively. During code generation and machine-specific optimization these variables are renamed along with the actual variables. During register allocation, for example, replacing register a with memory location b also replaces the a in the live variable set. 2.3 Compressing Variable Descriptor Strings One potential performance problem with the above descriptor strings is the length of the strings that need to be saved in the checkpoint (file). For a large procedure with many variables and where each of these variables is used many times, large strings may result. This can cause large checkpoint files or huge amounts of data to be sent over the network when migrating a thread. In high performance computing where optimizing thread migration latencies are an issue this needs to be avoided. To combat this effect, the compiler can sort the list of the created descriptor strings for a given callsite and assign each resulting string a 16 bit index in the sorted table. The descriptor strings are then replaced by 16 bit table indexes. Most importantly, the table indexes are emitted to the stream instead of the descriptor strings themselves. This transformation is correct as we are replacing unique strings with unique identifiers. Notice that we need the descriptor strings to generate the integers. We cannot directly construct the integers. 3 Performance Two aspects of performance are important for any checkpointing implementation: the size of the resulting checkpoint (be it a file or sent over a network for thread migration) and the time it takes to perform checkpointing and to restore from a checkpoint. For three small benchmark applications and two architectures we will examine the performance of our checkpointer. We did not modify the applications except of inserting a single call to start our checkpointer. Everything else is fully automated. All performance tests were run on both a 2.4 GHz Pentium IV (Linux ) and a 900 MHz Itanium II (Linux ). The x86 had a 160 GByte IDE disk with 2 MByte cache and 1 GByte of RAM. The IA64 had an SCSI RAID0 with 73 GByte per disk and 10 GByte of RAM. Table 1 displays the sizes of the generated checkpoint files and how long it took to write them. As the checkpoints are written to disk, the checkpoint includes a copy of the reachable heap from the thread that requested checkpointing. As the size of the checkpoint file is independent of the architecture one table suffices.
6 6 Table 1. Checkpoint file sizes and checkpointing wall time (seconds) A) Checkpoint sizes. #checkpoints total size (KBytes) total size compressed (KBytes) Fib(17) Matrix Water B) Checkpointing run times. x86 checkpoint x86 compressed x86 no checkpoint x86 restore Fib(17) Matrix Water IA64 checkpoint IA64 compressed IA64 no checkpoint IA64 restore Fib(17) Matrix Water To ensure that the checkpointing implementation works correctly, each application is given the checkpoint file generated on the alternate architecture when performing restoration. Again, note that the IA64 has many more registers than the x86 and has a different pointer size (64 bit vs 32 bit). class Fib { public long fib(long n) { if (n < 2) { checkpoint this thread(); return n; return fib(n-1) + fib(n-2); public static void main(string args[]) { System.out.println( fib = + new Fib().fib(17)); Fig. 4. Fibonacci example. Fibonacci. Fibonacci recursively computes Fibonacci numbers (see Figure 4). A checkpoint is made in each leaf of the recursion to create a large number of checkpoints. On the x86, a single checkpoint takes 0.18 milliseconds (( seconds)/2584 checkpoints) including file I/O. Restoring a checkpoint takes about 0.7 ms for a 5020 byte checkpoint file. Compression (Section 2.3) reduces the size of the checkpoint by about 15% on average. Checkpoint construction time compared to performing the file I/O times are minimal. As Fibonacci uses only one object on the heap, heap checkpointing times are zero. On the IA64 checkpointing is faster because of better disk performance. Overall the IA64 is slower because of its lower clock speed (900 MHz vs 2.4 GHz).
7 7 Matrix. The 2D array benchmark is designed to test how well the checkpointing routines perform when writing large volumes of data. In total, five arrays of arrays (1024x1024) are written to disk for a total of 40+ MBytes. Compression of the usage descriptor strings does not gain much as there are only 12 variables in total over all activation records. Virtually no computation is performed in this benchmark, all time is spent in the runtime system and kernel. Checkpoint and restore are much slower on the IA64 than on the x86 because of slower clock and memory speeds which impacts the speed in which objects can be allocated. Water. Water performs an N-body simulation of a number of water molecules. A water molecule is coded as a 4D array of doubles holding position, force, velocity and acceleration. Our input data set contains 1728 water molecules. A checkpoint is made after each time step and contains all molecules, a thread object, a force computation object, and some small arrays to maintain global state. There is very little stack to checkpoint but the effort to enable checkpointing was zero: no code needed to be written to explicitly write the molecules to the checkpoint file. Two checkpoint files are created of 4.2 MBytes each. Compression of the usage descriptor strings aids little in reducing checkpoint file sizes as the number of live variables when checkpointing is small. On both machines, checkpointing costs are less than 1% of the total runtime. 4 Related Work There are many packages and systems that offer checkpointing services to applications [1, 2, 5]. Most packages do not support heterogeneity at all. Heterogeneity in this context means that a checkpoint created on one architecture can be restored on another architecture. In general, packages that do support heterogeneity have high runtime overheads due to code instrumentation or employ very complex and error prone implementation techniques. Related work can be roughly divided into two classes: those that use a compiler and those that implement their checkpointing algorithm inside a library/os. Although library/os checkpointing packages are simple to implement, they do not support heterogeneity. Library approaches include, for example, Condor [3] and libchkpt [4]. Bouchenak et al. [2] created a system for JavaThread serialization based on decompilation for Java JITs. JIT-generated code is decompiled to the same format that the interpreter would use for its stackframes. This ensures that the checkpointer only has to deal with the Java operand stack in interpreted form. However, with increasing levels of optimization the process of decompilation will become increasingly difficult. Porch [1] is a project to create a small preprocessor for C programs to allow heterogeneous checkpointing. The preprocessor instruments each function of a program with some extra code to test if that function is to perform restoration, checkpointing, or to execute the actual code of that function as normal. However, the code introduced to perform the checkpointing/restoration causes substantial overhead during normal program execution. PREACHES [6] offers heterogeneous checkpointing. Instead of creating a single generic checkpoint suitable for all architectures, PREACHES creates checkpoints suitable for each architecture that the user might wish to restore the checkpoint on. The
8 8 downside is that at all times during a programs execution a machine of each different architecture needs to be available to perform the conversion of stackframes to each architecture. 5 Conclusions We have described an algorithm where a compiler creates two extra functions for each callsite to save and restore the state of the activation record at that point. When not performing checkpointing, our algorithm has no overhead. When checkpointing is enabled, our algorithm generates portable checkpoint files: a checkpoint that is created on one architecture can be restored on another by using activation record variable descriptors. The variable usage descriptors are portable entities as they describe how a variable is used rather than describing its location. The checkpoint file sizes are moderate and can be decreased further by our proposed simple compression scheme. References 1. B. Ramkumar and V. Strumpen. Portable checkpointing for heterogeneous architectures. In In 27th International Symposium on Fault-Tolerant Computing - Digest of Papers, pages 58 67, S. Bouchenak, D. Hagimont, and N. De Palma. Techniques for Implementing Efficient Java Thread Serialization. In ACS/IEEE International Conference on Computer Systems and Applications (AICCSA 03), July M. Litzkow and M. Solomon. Supporting Checkpointing and Process Migration Outside the UNIX Kernel. In Usenix Conference Proceedings, pages , January J.S. Plank, M. Beck, G. Kingsley, and K. Li. Technical report, libckpt: Transparent checkpointing under unix. Technical Report UT-CS , P. Smith and N.C. Hutchinson. Heterogeneous process migration: The Tui system. Software Practice and Experience, 28(6): , Kuo-Feng Ssu and W. Kent Fuchs. PREACHES - portable recovery and checkpointing in heterogeneous systems. In Symposium on Fault-Tolerant Computing, pages 38 47, R. Veldema, R. F. H. Hofman, R. A. F. Bhoedjang, and H. E. Bal. Runtime optimizations for a Java DSM implementation. In 2001 joint ACM-ISCOPE Conference on Java Grande, pages , Palo Alto, CA., 2001.
What is checkpoint. Checkpoint libraries. Where to checkpoint? Why we need it? When to checkpoint? Who need checkpoint?
What is Checkpoint libraries Bosilca George bosilca@cs.utk.edu Saving the state of a program at a certain point so that it can be restarted from that point at a later time or on a different machine. interruption
More informationAn Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution
An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution Prashanth P. Bungale +, Swaroop Sridhar + Department of Computer Science The
More informationSpace-Efficient Page-Level Incremental Checkpointing *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 22, 237-246 (2006) Space-Efficient Page-Level Incremental Checkpointing * JUNYOUNG HEO, SANGHO YI, YOOKUN CHO AND JIMAN HONG + School of Computer Science
More informationA Behavior Based File Checkpointing Strategy
Behavior Based File Checkpointing Strategy Yifan Zhou Instructor: Yong Wu Wuxi Big Bridge cademy Wuxi, China 1 Behavior Based File Checkpointing Strategy Yifan Zhou Wuxi Big Bridge cademy Wuxi, China bstract
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 16 - Checkpointing I Chapter 6 - Checkpointing Part.16.1 Failure During Program Execution Computers today are much faster,
More informationthe Cornell Checkpoint (pre-)compiler
3 the Cornell Checkpoint (pre-)compiler Daniel Marques Department of Computer Science Cornell University CS 612 April 10, 2003 Outline Introduction and background Checkpointing process state Checkpointing
More informationCA Compiler Construction
CA4003 - Compiler Construction David Sinclair When procedure A calls procedure B, we name procedure A the caller and procedure B the callee. A Runtime Environment, also called an Activation Record, is
More informationSystem Software Assignment 1 Runtime Support for Procedures
System Software Assignment 1 Runtime Support for Procedures Exercise 1: Nested procedures Some programming languages like Oberon and Pascal support nested procedures. 1. Find a run-time structure for such
More informationprocesses based on Message Passing Interface
Checkpointing and Migration of parallel processes based on Message Passing Interface Zhang Youhui, Wang Dongsheng, Zheng Weimin Department of Computer Science, Tsinghua University, China. Abstract This
More informationLLVM code generation and implementation of nested functions for the SimpliC language
LLVM code generation and implementation of nested functions for the SimpliC language Oscar Legetth Lunds University dat12ole@student.lth.se Gustav Svensson Lunds University dat12gs1@student.lth.se Abstract
More informationProcesses. Johan Montelius KTH
Processes Johan Montelius KTH 2017 1 / 47 A process What is a process?... a computation a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other
More informationA process. the stack
A process Processes Johan Montelius What is a process?... a computation KTH 2017 a program i.e. a sequence of operations a set of data structures a set of registers means to interact with other processes
More informationSeparating Access Control Policy, Enforcement, and Functionality in Extensible Systems. Robert Grimm University of Washington
Separating Access Control Policy, Enforcement, and Functionality in Extensible Systems Robert Grimm University of Washington Extensions Added to running system Interact through low-latency interfaces Form
More informationShort Notes of CS201
#includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system
More informationEECE.3170: Microprocessor Systems Design I Summer 2017 Homework 4 Solution
1. (40 points) Write the following subroutine in x86 assembly: Recall that: int f(int v1, int v2, int v3) { int x = v1 + v2; urn (x + v3) * (x v3); Subroutine arguments are passed on the stack, and can
More informationCS201 - Introduction to Programming Glossary By
CS201 - Introduction to Programming Glossary By #include : The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with
More informationCall Paths for Pin Tools
, Xu Liu, and John Mellor-Crummey Department of Computer Science Rice University CGO'14, Orlando, FL February 17, 2014 What is a Call Path? main() A() B() Foo() { x = *ptr;} Chain of function calls that
More informationECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation
ECE902 Virtual Machine Final Project: MIPS to CRAY-2 Binary Translation Weiping Liao, Saengrawee (Anne) Pratoomtong, and Chuan Zhang Abstract Binary translation is an important component for translating
More informationChapter 7 The Potential of Special-Purpose Hardware
Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationCompilers and Code Optimization EDOARDO FUSELLA
Compilers and Code Optimization EDOARDO FUSELLA Contents Data memory layout Instruction selection Register allocation Data memory layout Memory Hierarchy Capacity vs access speed Main memory Classes of
More informationCSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion
CSCI-1200 Data Structures Spring 2018 Lecture 7 Order Notation & Basic Recursion Review from Lectures 5 & 6 Arrays and pointers, Pointer arithmetic and dereferencing, Types of memory ( automatic, static,
More informationUNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.
UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known
More informationComputer Organization & Assembly Language Programming
Computer Organization & Assembly Language Programming CSE 2312 Lecture 11 Introduction of Assembly Language 1 Assembly Language Translation The Assembly Language layer is implemented by translation rather
More informationChapter 8 & Chapter 9 Main Memory & Virtual Memory
Chapter 8 & Chapter 9 Main Memory & Virtual Memory 1. Various ways of organizing memory hardware. 2. Memory-management techniques: 1. Paging 2. Segmentation. Introduction Memory consists of a large array
More informationHardware-Supported Pointer Detection for common Garbage Collections
2013 First International Symposium on Computing and Networking Hardware-Supported Pointer Detection for common Garbage Collections Kei IDEUE, Yuki SATOMI, Tomoaki TSUMURA and Hiroshi MATSUO Nagoya Institute
More informationAn Overview of the BLITZ System
An Overview of the BLITZ System Harry H. Porter III Department of Computer Science Portland State University Introduction The BLITZ System is a collection of software designed to support a university-level
More informationPROCESS VIRTUAL MEMORY. CS124 Operating Systems Winter , Lecture 18
PROCESS VIRTUAL MEMORY CS124 Operating Systems Winter 2015-2016, Lecture 18 2 Programs and Memory Programs perform many interactions with memory Accessing variables stored at specific memory locations
More informationA Bytecode Interpreter for Secure Program Execution in Untrusted Main Memory
A Bytecode Interpreter for Secure Program Execution in Untrusted Main Memory Maximilian Seitzer, Michael Gruhn, Tilo Müller Friedrich Alexander Universität Erlangen-Nürnberg https://www1.cs.fau.de Introduction
More informationCS 326: Operating Systems. Process Execution. Lecture 5
CS 326: Operating Systems Process Execution Lecture 5 Today s Schedule Process Creation Threads Limited Direct Execution Basic Scheduling 2/5/18 CS 326: Operating Systems 2 Today s Schedule Process Creation
More informationCode Generation. The Main Idea of Today s Lecture. We can emit stack-machine-style code for expressions via recursion. Lecture Outline.
The Main Idea of Today s Lecture Code Generation We can emit stack-machine-style code for expressions via recursion (We will use MIPS assembly as our target language) 2 Lecture Outline What are stack machines?
More informationWe can emit stack-machine-style code for expressions via recursion
Code Generation The Main Idea of Today s Lecture We can emit stack-machine-style code for expressions via recursion (We will use MIPS assembly as our target language) 2 Lecture Outline What are stack machines?
More informationEnhanced Debugging with Traces
Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult
More informationLecture 7: Examples, MARS, Arithmetic
Lecture 7: Examples, MARS, Arithmetic Today s topics: More examples MARS intro Numerical representations 1 Dealing with Characters Instructions are also provided to deal with byte-sized and half-word quantities:
More informationComputer Systems A Programmer s Perspective 1 (Beta Draft)
Computer Systems A Programmer s Perspective 1 (Beta Draft) Randal E. Bryant David R. O Hallaron August 1, 2001 1 Copyright c 2001, R. E. Bryant, D. R. O Hallaron. All rights reserved. 2 Contents Preface
More informationCPS311 Lecture: Procedures Last revised 9/9/13. Objectives:
CPS311 Lecture: Procedures Last revised 9/9/13 Objectives: 1. To introduce general issues that any architecture must address in terms of calling/returning from procedures, passing parameters (including
More informationA software view. Computer Systems. The Compilation system. How it works. 1. Preprocesser. 1. Preprocessor (cpp)
A software view User Interface Computer Systems MTSU CSCI 3240 Spring 2016 Dr. Hyrum D. Carroll Materials from CMU and Dr. Butler How it works hello.c #include int main() { printf( hello, world\n
More informationCS4215 Programming Language Implementation
CS4215 Programming Language Implementation You have 45 minutes to complete the exam. Use a B2 pencil to fill up the provided MCQ form. Leave Section A blank. Fill up Sections B and C. After finishing,
More informationChapter 3 Process Description and Control
Operating Systems: Internals and Design Principles Chapter 3 Process Description and Control Seventh Edition By William Stallings Process Control Block Structure of Process Images in Virtual Memory How
More informationChapter 7: Main Memory. Operating System Concepts Essentials 8 th Edition
Chapter 7: Main Memory Operating System Concepts Essentials 8 th Edition Silberschatz, Galvin and Gagne 2011 Chapter 7: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure
More informationRecursive Definitions
Recursion Objectives Explain the underlying concepts of recursion Examine recursive methods and unravel their processing steps Explain when recursion should and should not be used Demonstrate the use of
More informationDRAFT -- DRAFT -- DRAFT -- DRAFT -- DRAFT --
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. 2000; 00:1 7 [Version: 2002/09/19 v2.02] Reparallelization techniques for migrating OpenMP codes in computational
More informationOVERVIEW. Recursion is an algorithmic technique where a function calls itself directly or indirectly. Why learn recursion?
CH. 5 RECURSION ACKNOWLEDGEMENT: THESE SLIDES ARE ADAPTED FROM SLIDES PROVIDED WITH DATA STRUCTURES AND ALGORITHMS IN JAVA, GOODRICH, TAMASSIA AND GOLDWASSER (WILEY 2016) OVERVIEW Recursion is an algorithmic
More informationAgenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1
Agenda CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Summer 2004 Java virtual machine architecture.class files Class loading Execution engines Interpreters & JITs various strategies
More informationWhy do we care about parallel?
Threads 11/15/16 CS31 teaches you How a computer runs a program. How the hardware performs computations How the compiler translates your code How the operating system connects hardware and software The
More informationSABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine p. 1
SABLEJIT: A Retargetable Just-In-Time Compiler for a Portable Virtual Machine David Bélanger dbelan2@cs.mcgill.ca Sable Research Group McGill University Montreal, QC January 28, 2004 SABLEJIT: A Retargetable
More informationCOS 318: Operating Systems. Overview. Andy Bavier Computer Science Department Princeton University
COS 318: Operating Systems Overview Andy Bavier Computer Science Department Princeton University http://www.cs.princeton.edu/courses/archive/fall10/cos318/ Logistics Precepts: Tue: 7:30pm-8:30pm, 105 CS
More informationa) Do exercise (5th Edition Patterson & Hennessy). Note: Branches are calculated in the execution stage.
CS3410 Spring 2015 Problem Set 2 (version 3) Due Saturday, April 25, 11:59 PM (Due date for Problem-5 is April 20, 11:59 PM) NetID: Name: 200 points total. Start early! This is a big problem set. Problem
More informationMemory management. Last modified: Adaptation of Silberschatz, Galvin, Gagne slides for the textbook Applied Operating Systems Concepts
Memory management Last modified: 26.04.2016 1 Contents Background Logical and physical address spaces; address binding Overlaying, swapping Contiguous Memory Allocation Segmentation Paging Structure of
More informationCSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1
CSE P 501 Compilers Java Implementation JVMs, JITs &c Hal Perkins Winter 2008 3/11/2008 2002-08 Hal Perkins & UW CSE V-1 Agenda Java virtual machine architecture.class files Class loading Execution engines
More informationMigration Transparency in a Mobile Agent Based Computational Grid
Migration Transparency in a Mobile Agent Based Computational Grid RAFAEL FERNANDES LOPES and FRANCISCO JOSÉ DA SILVA E SILVA Departamento de Informática Universidade Federal do Maranhão, UFMA Av dos Portugueses,
More informationFaculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology
Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology exam Compiler Construction in4020 July 5, 2007 14.00-15.30 This exam (8 pages) consists of 60 True/False
More informationChapter 8: Main Memory
Chapter 8: Main Memory Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and 64-bit Architectures Example:
More informationExperiences Implementing Efficient Java Thread Serialization, Mobility and Persistence
SOFTWARE PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2000; 00:1 7 [Version: 2002/09/23 v2.2] Experiences Implementing Efficient Java Thread Serialization, Mobility and Persistence S. Bouchenak, D. Hagimont,
More informationFile Systems. File system interface (logical view) File system implementation (physical view)
File Systems File systems provide long-term information storage Must store large amounts of data Information stored must survive the termination of the process using it Multiple processes must be able
More informationCS61 Section Solutions 3
CS61 Section Solutions 3 (Week of 10/1-10/5) 1. Assembly Operand Specifiers 2. Condition Codes 3. Jumps 4. Control Flow Loops 5. Procedure Calls 1. Assembly Operand Specifiers Q1 Operand Value %eax 0x104
More informationMemory Space Representation for Heterogeneous Network Process Migration
Memory Space Representation for Heterogeneous Network Process Migration Kasidit Chanchio Xian-He Sun Department of Computer Science Louisiana State University Baton Rouge, LA 70803-4020 sun@bit.csc.lsu.edu
More informationCS307: Operating Systems
CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn
More informationSistemi in Tempo Reale
Laurea Specialistica in Ingegneria dell'automazione Sistemi in Tempo Reale Giuseppe Lipari Introduzione alla concorrenza Fundamentals Algorithm: It is the logical procedure to solve a certain problem It
More informationChapter 8: Memory-Management Strategies
Chapter 8: Memory-Management Strategies Chapter 8: Memory Management Strategies Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel 32 and
More informationThreads (light weight processes) Chester Rebeiro IIT Madras
Threads (light weight processes) Chester Rebeiro IIT Madras 1 Processes Separate streams of execution Each process isolated from the other Process state contains Process ID Environment Working directory.
More informationWRL Research Report 98/5. Efficient Dynamic Procedure Placement. Daniel J. Scales. d i g i t a l
A U G U S T 1 9 9 8 WRL Research Report 98/5 Efficient Dynamic Procedure Placement Daniel J. Scales d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA The Western
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationChapter 2. Computer Abstractions and Technology. Lesson 4: MIPS (cont )
Chapter 2 Computer Abstractions and Technology Lesson 4: MIPS (cont ) Logical Operations Instructions for bitwise manipulation Operation C Java MIPS Shift left >>> srl Bitwise
More informationStacks and Frames Demystified. CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han
s and Frames Demystified CSCI 3753 Operating Systems Spring 2005 Prof. Rick Han Announcements Homework Set #2 due Friday at 11 am - extension Program Assignment #1 due Tuesday Feb. 15 at 11 am - note extension
More informationSemantic Analysis and Type Checking
Semantic Analysis and Type Checking The compilation process is driven by the syntactic structure of the program as discovered by the parser Semantic routines: interpret meaning of the program based on
More informationVirtual Machines and Dynamic Translation: Implementing ISAs in Software
Virtual Machines and Dynamic Translation: Implementing ISAs in Software Krste Asanovic Laboratory for Computer Science Massachusetts Institute of Technology Software Applications How is a software application
More informationGroup B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.
Group B Assignment 9 Att (2) Perm(3) Oral(5) Total(10) Sign Title of Assignment: Code generation using DAG. 9.1.1 Problem Definition: Code generation using DAG / labeled tree. 9.1.2 Perquisite: Lex, Yacc,
More informationPerformance of Non-Moving Garbage Collectors. Hans-J. Boehm HP Labs
Performance of Non-Moving Garbage Collectors Hans-J. Boehm HP Labs Why Use (Tracing) Garbage Collection to Reclaim Program Memory? Increasingly common Java, C#, Scheme, Python, ML,... gcc, w3m, emacs,
More informationInteraction of JVM with x86, Sparc and MIPS
Interaction of JVM with x86, Sparc and MIPS Sasikanth Avancha, Dipanjan Chakraborty, Dhiral Gada, Tapan Kamdar {savanc1, dchakr1, dgada1, kamdar}@cs.umbc.edu Department of Computer Science and Electrical
More informationCS399 New Beginnings. Jonathan Walpole
CS399 New Beginnings Jonathan Walpole Memory Management Memory Management Memory a linear array of bytes - Holds O.S. and programs (processes) - Each cell (byte) is named by a unique memory address Recall,
More informationCheckpoint (T1) Thread 1. Thread 1. Thread2. Thread2. Time
Using Reection for Checkpointing Concurrent Object Oriented Programs Mangesh Kasbekar, Chandramouli Narayanan, Chita R Das Department of Computer Science & Engineering The Pennsylvania State University
More informationComputer Architecture. Chapter 2-2. Instructions: Language of the Computer
Computer Architecture Chapter 2-2 Instructions: Language of the Computer 1 Procedures A major program structuring mechanism Calling & returning from a procedure requires a protocol. The protocol is a sequence
More informationCSc 453 Interpreters & Interpretation
CSc 453 Interpreters & Interpretation Saumya Debray The University of Arizona Tucson Interpreters An interpreter is a program that executes another program. An interpreter implements a virtual machine,
More informationDarek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June
Darek Mihocka, Emulators.com Stanislav Shwartsman, Intel Corp. June 21 2008 Agenda Introduction Gemulator Bochs Proposed ISA Extensions Conclusions and Future Work Q & A Jun-21-2008 AMAS-BT 2008 2 Introduction
More informationChapter 8 Virtual Memory
Operating Systems: Internals and Design Principles Chapter 8 Virtual Memory Seventh Edition William Stallings Modified by Rana Forsati for CSE 410 Outline Principle of locality Paging - Effect of page
More informationG Programming Languages - Fall 2012
G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter
More informationRun-time Environments
Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction
More informationCHAPTER 8 - MEMORY MANAGEMENT STRATEGIES
CHAPTER 8 - MEMORY MANAGEMENT STRATEGIES OBJECTIVES Detailed description of various ways of organizing memory hardware Various memory-management techniques, including paging and segmentation To provide
More informationRun-time Environments
Run-time Environments Status We have so far covered the front-end phases Lexical analysis Parsing Semantic analysis Next come the back-end phases Code generation Optimization Register allocation Instruction
More informationkguard++: Improving the Performance of kguard with Low-latency Code Inflation
kguard++: Improving the Performance of kguard with Low-latency Code Inflation Jordan P. Hendricks Brown University Abstract In this paper, we introduce low-latency code inflation for kguard, a GCC plugin
More informationLanguage Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.
Language Translation Compilation vs. interpretation Compilation diagram Step 1: compile program compiler Compiled program Step 2: run input Compiled program output Language Translation compilation is translation
More informationChapter 8: Main Memory. Operating System Concepts 9 th Edition
Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel
More informationJVM ByteCode Interpreter
JVM ByteCode Interpreter written in Haskell (In under 1000 Lines of Code) By Louis Jenkins Presentation Schedule ( 15 Minutes) Discuss and Run the Virtual Machine first
More informationProgramming Techniques Programming Languages Programming Languages
Ecient Java RMI for Parallel Programming Jason Maassen, Rob van Nieuwpoort, Ronald Veldema, Henri Bal, Thilo Kielmann, Ceriel Jacobs, Rutger Hofman Department of Mathematics and Computer Science, Vrije
More informationAssignment 11: functions, calling conventions, and the stack
Assignment 11: functions, calling conventions, and the stack ECEN 4553 & 5013, CSCI 4555 & 5525 Prof. Jeremy G. Siek December 5, 2008 The goal of this week s assignment is to remove function definitions
More informationCS1 Recitation. Week 2
CS1 Recitation Week 2 Sum of Squares Write a function that takes an integer n n must be at least 0 Function returns the sum of the square of each value between 0 and n, inclusive Code: (define (square
More informationSemantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End
Outline Semantic Analysis The role of semantic analysis in a compiler A laundry list of tasks Scope Static vs. Dynamic scoping Implementation: symbol tables Types Static analyses that detect type errors
More informationCombining Analyses, Combining Optimizations - Summary
Combining Analyses, Combining Optimizations - Summary 1. INTRODUCTION Cliff Click s thesis Combining Analysis, Combining Optimizations [Click and Cooper 1995] uses a structurally different intermediate
More informationThe Google File System (GFS)
1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints
More informationAdvanced Memory Management
Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions
More informationStackVsHeap SPL/2010 SPL/20
StackVsHeap Objectives Memory management central shared resource in multiprocessing RTE memory models that are used in Java and C++ services for Java/C++ programmer from RTE (JVM / OS). Perspectives of
More informationMechanisms for entering the system
Mechanisms for entering the system Yolanda Becerra Fontal Juan José Costa Prats Facultat d'informàtica de Barcelona (FIB) Universitat Politècnica de Catalunya (UPC) BarcelonaTech 2017-2018 QP Content Introduction
More informationLecture 4: Memory Management & The Programming Interface
CS 422/522 Design & Implementation of Operating Systems Lecture 4: Memory Management & The Programming Interface Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken
More informationChapter 8: Main Memory
Chapter 8: Main Memory Silberschatz, Galvin and Gagne 2013 Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Segmentation Paging Structure of the Page Table Example: The Intel
More informationJAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder
JAVA PERFORMANCE PR SW2 S18 Dr. Prähofer DI Leopoldseder OUTLINE 1. What is performance? 1. Benchmarking 2. What is Java performance? 1. Interpreter vs JIT 3. Tools to measure performance 4. Memory Performance
More informationAccelerated Library Framework for Hybrid-x86
Software Development Kit for Multicore Acceleration Version 3.0 Accelerated Library Framework for Hybrid-x86 Programmer s Guide and API Reference Version 1.0 DRAFT SC33-8406-00 Software Development Kit
More informationFrom Whence It Came: Detecting Source Code Clones by Analyzing Assembler
From Whence It Came: Detecting Source Code Clones by Analyzing Assembler Ian J. Davis and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada
More informationProcessors, Performance, and Profiling
Processors, Performance, and Profiling Architecture 101: 5-Stage Pipeline Fetch Decode Execute Memory Write-Back Registers PC FP ALU Memory Architecture 101 1. Fetch instruction from memory. 2. Decode
More informationTransparent Pointer Compression for Linked Data Structures
Transparent Pointer Compression for Linked Data Structures lattner@cs.uiuc.edu Vikram Adve vadve@cs.uiuc.edu June 12, 2005 MSP 2005 http://llvm.cs.uiuc.edu llvm.cs.uiuc.edu/ Growth of 64-bit computing
More information