An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution

Size: px
Start display at page:

Download "An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution"

Transcription

1 An Approach to Heterogeneous Process State Capture / Recovery to Achieve Minimum Performance Overhead During Normal Execution Prashanth P. Bungale +, Swaroop Sridhar + Department of Computer Science The Johns Hopkins University Baltimore, MD 21218, U. S. A. { prash, swaroop cs.jhu.edu Vinay Krishnamurthy Department of Computer Science & Engg. Vidyavardhaka College of Engineering Mysore, INDIA vinay_krishnamurthy@rediffmail.com Abstract A major issue of process state capture in heterogeneous computing systems is capture initiation. Current approaches incur significant performance overhead during normal execution of the process (i.e., when state capture/recovery is not being performed) in order to ensure proper initiation of state capture. This is because of their introduction of instructions into the user code, either to poll for a capture request, or to ensure correctness of self-modifying code in the case of a poll-free mechanism. In this paper, we propose a fundamentally new approach to heterogeneous process state capture and recovery that achieves minimum performance overhead during normal execution by obviating the introduction of such instructions. In the case of high-performance computing applications, the performance gain thus achieved especially within critical loops would be significant. Also, our solution is suitable for effectively enabling all potential points of equivalence present in a computation if minimal latency is desired. 1. Introduction With the cost benefits of the mass production of smaller machines, the majority of today s computing power exists in the form of PCs or workstations [1]. Distributed computing systems, comprising of several loosely-coupled computers communicating over a high-bandwidth network, are becoming more and more available to the user community. Some level of heterogeneity is really the norm in such systems. The presence of heterogeneity in such computing systems significantly complicates the design of a process state capture/recovery mechanism, which must automatically capture the state of a running program in some stable form and then restart the program from the point of capture at some later time, possibly on a different platform. A substantial body of research demonstrates the utility and desirability of such a mechanism. For example, process migration policies supporting load sharing and/or fault tolerance can be based on a process state capture facility (e.g. Condor [2]). Also, one common method of providing reliability for applications is to provide checkpoint and restart functionality. If application state is stored periodically (checkpointed), when some component of the system is lost (due either to failure or to load-balancing requirements) an application can use recently saved computational state to restart without having to regress too far in the computation. Providing fault-tolerance for applications in these systems will be greatly simplified by checkpoint/restart functionality. Beyond these existing uses for process state capture and recovery, the availability of such a mechanism also opens up new possibilities such as improved resource management, platform-independent debugging using checkpoints and message logs to replay a process from a given point in execution, or statically examining the state of a process as captured in a checkpoint [3]. The increasing importance of the use of process state capture and recovery has made the design of such a mechanism a key research issue in heterogeneous computing. + This work was done while the authors were at the Dept. of Computer Science & Engg. of The National Institute of Engineering, Mysore, INDIA

2 2. The Heterogeneous Process State Capture Problem A mechanism solving the heterogeneous process state capture and recovery problem must provide the ability to generate a checkpoint for an active process a complete description of that process s state and point in execution and also support the later use of that checkpoint to restart a process with equivalent state and at an equivalent point in execution, possibly on a different platform from the one on which the original checkpoint was created [4]. A major issue of process state capture in heterogeneous computing systems is its initiation, once the request has been received. The process cannot simply be paused (for capturing its state) at any point of its execution, but can only be paused at points that have equivalent points in all the instances of the same computation on different platforms [5]. There are many possible points of execution equivalence that can be identified in any program. But usually, not all of these candidates would be effectively enforced as actual points of execution equivalence. This is because, in order to ensure consistency, machine-dependent optimizations would need to be disabled across such enabled points of equivalence, which results in performance degradation. The selection of the subset of points of equivalence to be enabled is based on a trade-off between the performance overhead incurred during normal execution and the wait-time from request to actual initiation of the capture. Also, the possibility of cross-platform recovery leads to the most fundamental solution constraint: the mechanism should capture the state in a platform-independent manner i.e., checkpoints produced on a computer system of any architecture should be recoverable on a system of any other architecture [6]. For example, one straightforward approach is to use an interpreted language. In these designs, the interpreter acts as a virtual machine that can artificially homogenize a system composed of heterogeneous elements. Unfortunately, such schemes severely compromise performance since they run at least an order of magnitude slower than their native code counterparts. Therefore, in our intended environment, processes run on nodes, typically executing in native code form due to performance considerations. In homogeneous systems, process state capture and recovery mechanisms can simply and directly copy the state of a process verbatim, without semantic analysis and interpretation of that state. Unfortunately, in a heterogeneous environment, the state of a process cannot be captured using this naïve approach because of differences in instruction sets and data representation. To mask the varying features of a process s environment in a heterogeneous system, a state capture mechanism must examine and capture the logical internal structure and meaning of the process state i.e., the logical point in execution, the call stack (or call stacks, if threads are supported), complex data structures, the logical structure and contents of heap allocated memory, and all other process state must be analyzed and captured in a platformindependent format. 3. Related Work Although process state capture/recovery mechanisms for homogeneous computing systems are well developed and can now typically be performed with minimal overhead and latency, much less progress has been made towards providing such a functionality across heterogeneous architectures. Because of the additional inherent complexity introduced by heterogeneity, very few designs for such a facility have been developed to date. The Tui system [7] has been constructed to provide a heterogeneous migration tool for use on four common architectures within the Unix environment. In this system, the capture (and recovery) of the activation history state of a process is performed in a highly machine dependent manner, with full knowledge of the particular idiosyncratic conventions followed on each of the target platforms. Also, when a process state capture is requested, a breakpoint instruction to trap to the capture mechanism is placed at all pre-emption points (the enabled points of equivalence) available in the program to effect capture initiation. In such approaches, on architectures with varying instruction lengths, it is possible that the breakpoint instruction placed at one pre-emption point overwrites the instruction placed at another point or the current instruction (pointed to by the program counter). To avoid this loss of correctness, space has to be padded at each pre-emption point, enough to accommodate the breakpoint instruction. This can typically be done by placing dummy instructions, which introduce performance overhead during normal execution of the program. The process introspection model proposed by Ferrari [4] and the portable (or shadow-) checkpointing model presented in [11,12] perform the capture of the activation history state in an entirely machine-independent manner, using the call-return semantics provided by the high-level programming language or by the intermediate instruction set. Here, the architectural differences will be taken care of by the compiler. However, they follow a polling approach to initiate the process state capture. Poll points are introduced into the execution where the process determines if a capture should be initiated. In such an approach, a substantial amount of performance overhead would be incurred during normal execution due to continuous polling. In applications such as process migration due to loadbalancing policies, or logging mechanisms for fault tolerant systems, arbitrarily long latencies would not be acceptable. In the case of load balancing, for example, the very purpose

3 of migrating the process itself is to reduce the load on the system as soon as possible. For such applications with a minimal latency requirement, almost all potential points of equivalence present in a computation must be effectively enabled. A polling approach would not be suitable because the performance overhead due to persistent polling at all such points would reach severely unacceptable levels. 4. Our Approach 4.1. Objectives a. Efficiency: As a goal, in addition to providing efficient state capture and recovery, the run-time performance overhead introduced by the mechanism should be at acceptable levels. In particular, if checkpoints are not performed during a certain period of execution, a process with state capture and recovery service available to it should not run significantly slower than the version of the code without this service available over the same period. b. Generality: The mechanism should be appropriate for use with a wide range of architectures and a wide variety of programs that are written in a variety of languages, and that solve a wide range of problems. c. Suitability for Minimal Latency: The state capture mechanism should be suitable even for ensuring minimum possible latency (the time delay between when a capture is scheduled or requested and when the capture is actually initiated) which is the time taken to reach the very next possible point of equivalence in the computation. d. Ease of Use: When possible, the mechanism should be fully automatic, requiring little or no effort on the part of the application programmer. Such full automation should be possible for programs expressed in a platform-independent manner, i.e. programs that do not rely on specific hardware or software features of certain computer systems Solution Overview As described in Section 3 (Related Work), current approaches to the heterogeneous process state capture problem incur significant performance overhead during normal execution of the process. However, it is desirable that the performance during normal execution should not be degraded. We propose a poll-free solution that involves semantics-preserving self-modification of code. However, unlike the current poll-free solutions (eg. Tui system), we do not place capture initiation instructions at all enabled points of equivalence, when a capture is requested. Instead, we place the initiation instructions in a copy of only a small portion of the code. Thus, we obviate the introduction of placeholder dummy instructions, which would have been a source of performance overhead. The selection of the enabled subset from the entire set of potential points of equivalence shall be done according to the application-specific requirements and constraints. For example, i) An application desiring minimal latency may effectively enable all potential points of equivalence present in a computation something that could not be done by a polling approach, because of the exorbitant performance overhead incurred during normal execution. ii) In the case of high-performance computing applications, only a subset of the potential points of equivalence would have to be enabled in order to exploit the performance enhancements achieved especially within critical loops through machine-dependent optimizations across the non-enabled, but potential, points of equivalence. One of the major aspects of our design is the modification of a program to incorporate state capture and recovery functionality, giving processes the ability to autonomously save and restore their states, once initiated. We assume that the process is based on a program that is either written in or has been translated to an imperative, stack-based intermediate representation to which our transformations will be applied likely by a compiler, but also possibly by a programmer. The key element of our design is a table which maps all the enabled points of equivalence to the corresponding points in object code for each target architecture, obtained using compiler-support. This table is primarily used to capture in a machine-independent manner the current execution point of the process and all the points where execution was frozen due to function activations. These points are then mapped onto their corresponding points in the destination architecture s object code during recovery. Certain parts of the process state are easily captured for example, any global variable or heap allocated data structures, being globally addressable, are easy to capture and recover. The key difficulty in capturing the process state is the capture of the activation history state. In our approach, the process utilizes the native function return mechanism provided by the intermediate instruction set to capture the stack state. The currently active function saves its own state (which only it can access) and returns to its caller, which in turn saves its own state, and so on, until the stack capture is complete. Similarly, to effect recovery, the process employs the native function call mechanism provided by the intermediate instruction set. During recovery, the base function restores its state, and then calls the next function in the captured stack, which repeats this process until the stack is completely restored. To preserve program correctness (so that the function call sequence repeated during recovery does not produce any sideeffects), the given program shall be transformed such that

4 all function calls appear only in simple expression statements, which are side effect free. The activation history capture/recovery mechanism described above is accomplished by adding epilogues in each function (at points which are non-reachable during normal execution but are reached only during capture / recovery), for both saving and restoring the activation state Assumptions a. Compiler support is available for obtaining the equivalence point table. b. Operating system support is available for obtaining the current value of the program counter for a process. c. Machine dependent optimizations across enabled points of equivalence shall not be allowed since they may prevent the different compiled versions of the program across heterogeneous platforms from hitting every such point consistently State Capture Initiation Algorithm When a request for capture of a process s state is received from an external agent, the following steps are carried out: a. The current value of the program counter is obtained using operating system support. b. The program counter value is used to identify the currently executing function and the current block (the segment of code between two points of equivalence containing the current instruction). c. The following steps are carried out for ensuring that the state capture is initiated on encountering the next point of equivalence: i. If the program counter is exactly at a point of equivalence, an instruction to initiate state capture is placed at the same point. ii. If no program control instruction lies between the current point of execution and the sequentially next point of equivalence, an instruction to initiate state capture is placed at the sequentially next point of equivalence. iii. Else, we copy and pass control to the code fragment lying between the current point of execution and the sequentially next point of equivalence, with the following modifications made for each program control instruction lying within that fragment: Code is placed in place of the program control instruction to ensure that the point of equivalence that would have been encountered next, if the program control instruction were allowed to execute, is registered. An instruction to initiate state capture is then placed at the end of that code. Most importantly, the program control instruction is not allowed to actually execute the steps that it would have carried out if it would have executed (for example, setting up a new activation record in the case of a procedure call instruction), will be made to be carried out, except for passing control to some point, instead of which the control is passed to the state capture mechanism. The process is now informed to initiate the state capture as soon as possible. Once the process starts executing later, it eventually encounters a point of equivalence. Here, the initiation instruction (which is a call to the state capture mechanism) gets executed and the control is thus transferred to the state capture mechanism. This algorithm requires that all points in the code which are possible destinations of jump instructions should be enabled as points of equivalence. This may not be a restrictive requirement, as optimizations would anyway not be performed across jump destinations to preserve program correctness State Capture Algorithm a. Once the control is transferred to the state capture mechanism function, the following tasks are performed initially: 1. The point of equivalence at which (and the interrupted function within which) capture has been initiated is noted. 2. The return address of the current activation record is replaced by the address of the saving epilogue of the interrupted function. Finally, a return is performed so that control is passed to that saving epilogue. b. The saving epilogue performs the following tasks: 1. Save the local variables and actual parameters present in the current activation record. 2. Identify the caller of the current function using the return address available in the current activation record. The point of equivalence preceding the point of execution of the calling function is noted. 3. The return address of the current activation record is replaced by the address of the saving epilogue of the caller function. Next, a return is performed so that control is passed to the caller s saving epilogue. c. Step b is repeated until all activations are saved. d. When the bottom of the activation history stack is reached, the epilogue also performs the saving of the static (global) data and the heap data. While saving the activation, static or heap data, the state of a pointer is captured according to its logical meaning (i.e., the data object or code point to which it is pointing) rather than its value indicating the physical address [8, 9, 10]. In order to identify the data object being pointed, given a physical address, we require one of the following: i. Compiler-support in terms of information about the positions of globals within the global data area and the

5 activation record structures of the various functions, as well as run-time support only for registering heap data. ii. Run-time support for registering local, global and heap data. For all other data types, the data is copied verbatim into the checkpoint. 4.6 State Recovery Algorithm Once a new process has been created on the destination machine, the following steps are carried out to perform the process state recovery: a. A jump instruction with destination as the corresponding restoring epilogue is placed at the entry point of each function present in the program. b. When the first activation record (for the base function) is created, the corresponding epilogue will restore the static (global) and heap data from the checkpoint file. c. The restoring epilogue performs the following tasks: 1. Restore the local variables and actual parameters into the current activation record. 2. If there are no more activations to be restored: The original entry point of each function is restored back. 3. A jump takes place to the point in the destination s object code corresponding to the point of equivalence (at which this function activation s execution was frozen) noted during state capture. d. Step c is repeated until all activations have been restored. This happens automatically since the point to which the jump takes place will contain a call instruction until all the activation records have been restored. (Once all the activations have been restored, the original function entry points are restored and control is transferred to the point of equivalence at which the state capture had been initiated). e. The process finally resumes normal execution. Again, while restoring the activation, static or heap data, the state of a pointer is mapped back from its logical meaning to its value indicating the physical address on this machine. For all other data types, the source data copied into the checkpoint is now translated accordingly into the destination architecture data format by using data format mapping functions, if necessary. This is in accordance with the receiver makes right policy. This policy has the advantage that only one translation needs to be done (by the receiver) and there is no need for any intermediate data format representation. Also, if the state is to be recovered on the same architecture as the source, there is no need for any translation at all. 4.7 Performance Implications The overhead incurred by our approach is limited to: the performance enhancements lost because of disallowing machine-dependent optimizations across enabled points of equivalence, and the run-time support required for the registration of dynamic data. Both of these overheads are inevitable in order to preserve program correctness across heterogeneous computing platforms. Our approach thus achieves the minimum possible performance overhead during normal execution of the process. 5. Conclusion This paper presents an approach to process state capture and recovery in heterogeneous computing systems that achieves minimum performance overhead during normal execution of the process. The solution presented, being poll-free, is suitable even for an application desiring minimal latency as it can afford to effectively enable all potential points of equivalence present in a computation. Also, high-performance computing applications can perform significantly better due to the reduced performance overhead, especially within critical loops. Acknowledgement The authors are deeply indebted to Dr. Sreenivas T. H. (Asst. Professor at the Department of Computer Science and Engineering at the National Institute of Engineering) for his suggestions, guidance and help rendered during the course of this work. References [1] Ramon Lawrence, A Survey of Process Migration Mechanisms, Student Report for Course Project 1997, University of Manitoba. [2] M.J. Litzkow, M. Livny, and M.W. Mutka, Condor A Hunter of Idle Workstations, in Proceedings of the Eighth International Conference on Distributed Computing Systems, pp , [3] Adam John Ferrari, Process State Capture and Recovery in High-Performance Heterogeneous Distributed Computing Systems, Ph.D. Thesis, Department of Computer Science, University of Virginia, January [4] Adam J. Ferrari, Stephen J. Chapin, and Andrew S. Grimshaw, Process Introspection: A Heterogeneous Checkpoint / Restart Mechanism Based on Automatic Code Modification, Technical Report CS-97-05, Department of Computer Science, University of Virginia. [5] David G. Von Bank, Charles M. Shub, and Robert W. Sebesta, A Unified Model of Pointwise Equivalence of Procedural Computations, ACM Transactions on Programming Languages and Systems, Vol. 16, No. 6, November 1994, Pages [6] Holly J. Dail, Checkpointing and Migration in Heterogeneous, Distributed Systems, Final Project Paper submitted for Keith Marzullo s Graduate Distributed Systems Course at the University of California at San Diego. [7] Peter. W. Smith, The Possibilities and Limitations of Heterogeneous Process Migration, Ph.D. Thesis, Department of Computer Science, The University of British Columbia, October 1997.

6 [8] Kasidit Chanchio and Xian-He Sun, Memory Space Representation for Heterogeneous Network Process Migration, 12 th International Parallel Processing Symposium, March [9] Kasidit Chanchio and Xian-He Sun, Data Collection and Restoration for Heterogeneous Process Migration, Technical Report , Department of Computer Science, Louisiana State University, [10] Xian-He Sun, V. K. Niak, and Kasidit Chanchio, A Coordinated Approach for Process Migration in Heterogeneous Environments, SIAM Parallel Processing Conference, [11] Strumpen. V, Ramkumar. B, Portable Checkpointing and Recovery in Heterogeneous Environments, Technical Report , Department of Electrical and Computer Engineering, University of Iowa, June [12] Balkrishna Ramkumar and Volker Strumpen, Portable Checkpointing for Heterogeneous Architectures, ' Proceedings of the 27th International Symposium on Fault- Tolerant Computing - Digest of Papers, Seattle, WA, pages 58-67, June Biographies Prashanth P. Bungale is currently a graduate student at the Department of Computer Science of The Johns Hopkins University. He has obtained his Bachelor of Engineering degree from The National Institute of Engineering, India. His research interests are in the areas of operating systems, distributed systems and programming languages. Swaroop Sridhar is currently a graduate student at the Department of Computer Science of The Johns Hopkins University. He has obtained his Bachelor of Engineering degree from The National Institute of Engineering, India. His research interests are in the areas of operating systems, distributed systems and programming languages. Vinay Krishnamurthy obtained his Bachelor of Engineering degree from Vidyavardhaka College of Engineering, India. His research interests are in the areas of operating systems and distributed computing.

Memory Space Representation for Heterogeneous Network Process Migration

Memory Space Representation for Heterogeneous Network Process Migration Memory Space Representation for Heterogeneous Network Process Migration Kasidit Chanchio Xian-He Sun Department of Computer Science Louisiana State University Baton Rouge, LA 70803-4020 sun@bit.csc.lsu.edu

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

Advanced Memory Management

Advanced Memory Management Advanced Memory Management Main Points Applications of memory management What can we do with ability to trap on memory references to individual pages? File systems and persistent storage Goals Abstractions

More information

Job-Oriented Monitoring of Clusters

Job-Oriented Monitoring of Clusters Job-Oriented Monitoring of Clusters Vijayalaxmi Cigala Dhirajkumar Mahale Monil Shah Sukhada Bhingarkar Abstract There has been a lot of development in the field of clusters and grids. Recently, the use

More information

Space-Efficient Page-Level Incremental Checkpointing *

Space-Efficient Page-Level Incremental Checkpointing * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 22, 237-246 (2006) Space-Efficient Page-Level Incremental Checkpointing * JUNYOUNG HEO, SANGHO YI, YOOKUN CHO AND JIMAN HONG + School of Computer Science

More information

A Feasibility Study for Methods of Effective Memoization Optimization

A Feasibility Study for Methods of Effective Memoization Optimization A Feasibility Study for Methods of Effective Memoization Optimization Daniel Mock October 2018 Abstract Traditionally, memoization is a compiler optimization that is applied to regions of code with few

More information

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow

Making Workstations a Friendly Environment for Batch Jobs. Miron Livny Mike Litzkow Making Workstations a Friendly Environment for Batch Jobs Miron Livny Mike Litzkow Computer Sciences Department University of Wisconsin - Madison {miron,mike}@cs.wisc.edu 1. Introduction As time-sharing

More information

Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network

Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network Diana Hecht 1 and Constantine Katsinis 2 1 Electrical and Computer Engineering, University of Alabama in Huntsville,

More information

the Cornell Checkpoint (pre-)compiler

the Cornell Checkpoint (pre-)compiler 3 the Cornell Checkpoint (pre-)compiler Daniel Marques Department of Computer Science Cornell University CS 612 April 10, 2003 Outline Introduction and background Checkpointing process state Checkpointing

More information

A Behavior Based File Checkpointing Strategy

A Behavior Based File Checkpointing Strategy Behavior Based File Checkpointing Strategy Yifan Zhou Instructor: Yong Wu Wuxi Big Bridge cademy Wuxi, China 1 Behavior Based File Checkpointing Strategy Yifan Zhou Wuxi Big Bridge cademy Wuxi, China bstract

More information

Rule partitioning versus task sharing in parallel processing of universal production systems

Rule partitioning versus task sharing in parallel processing of universal production systems Rule partitioning versus task sharing in parallel processing of universal production systems byhee WON SUNY at Buffalo Amherst, New York ABSTRACT Most research efforts in parallel processing of production

More information

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet

Condor and BOINC. Distributed and Volunteer Computing. Presented by Adam Bazinet Condor and BOINC Distributed and Volunteer Computing Presented by Adam Bazinet Condor Developed at the University of Wisconsin-Madison Condor is aimed at High Throughput Computing (HTC) on collections

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

Virtual Memory Primitives for User Programs

Virtual Memory Primitives for User Programs Virtual Memory Primitives for User Programs Andrew W. Appel & Kai Li Department of Computer Science Princeton University Presented By: Anirban Sinha (aka Ani), anirbans@cs.ubc.ca 1 About the Authors Andrew

More information

kguard++: Improving the Performance of kguard with Low-latency Code Inflation

kguard++: Improving the Performance of kguard with Low-latency Code Inflation kguard++: Improving the Performance of kguard with Low-latency Code Inflation Jordan P. Hendricks Brown University Abstract In this paper, we introduce low-latency code inflation for kguard, a GCC plugin

More information

CS 333 Introduction to Operating Systems. Class 1 - Introduction to OS-related Hardware and Software

CS 333 Introduction to Operating Systems. Class 1 - Introduction to OS-related Hardware and Software CS 333 Introduction to Operating Systems Class 1 - Introduction to OS-related Hardware and Software Jonathan Walpole Computer Science Portland State University 1 About the instructor Instructor - Jonathan

More information

Enhanced Debugging with Traces

Enhanced Debugging with Traces Enhanced Debugging with Traces An essential technique used in emulator development is a useful addition to any programmer s toolbox. Peter Phillips Creating an emulator to run old programs is a difficult

More information

Threaded Language As a Form of Partial Evaluator

Threaded Language As a Form of Partial Evaluator Threaded Language As a Form of Partial Evaluator Prabhas Chongstitvatana Department of Computer Engineering, Chulalongkorn University Bangkok 10330, Thailand Email: prabhas@chula.ac.th Abstract This work

More information

Michel Heydemann Alain Plaignaud Daniel Dure. EUROPEAN SILICON STRUCTURES Grande Rue SEVRES - FRANCE tel : (33-1)

Michel Heydemann Alain Plaignaud Daniel Dure. EUROPEAN SILICON STRUCTURES Grande Rue SEVRES - FRANCE tel : (33-1) THE ARCHITECTURE OF A HIGHLY INTEGRATED SIMULATION SYSTEM Michel Heydemann Alain Plaignaud Daniel Dure EUROPEAN SILICON STRUCTURES 72-78 Grande Rue - 92310 SEVRES - FRANCE tel : (33-1) 4626-4495 Abstract

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

The Google File System (GFS)

The Google File System (GFS) 1 The Google File System (GFS) CS60002: Distributed Systems Antonio Bruto da Costa Ph.D. Student, Formal Methods Lab, Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur 2 Design constraints

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

processes based on Message Passing Interface

processes based on Message Passing Interface Checkpointing and Migration of parallel processes based on Message Passing Interface Zhang Youhui, Wang Dongsheng, Zheng Weimin Department of Computer Science, Tsinghua University, China. Abstract This

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs

More information

The Google File System

The Google File System October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single

More information

Distributed Deadlock Detection for. Distributed Process Networks

Distributed Deadlock Detection for. Distributed Process Networks 0 Distributed Deadlock Detection for Distributed Process Networks Alex Olson Embedded Software Systems Abstract The distributed process network (DPN) model allows for greater scalability and performance

More information

Chapter 1 Computer System Overview

Chapter 1 Computer System Overview Operating Systems: Internals and Design Principles Chapter 1 Computer System Overview Seventh Edition By William Stallings Objectives of Chapter To provide a grand tour of the major computer system components:

More information

Statement of Research for Taliver Heath

Statement of Research for Taliver Heath Statement of Research for Taliver Heath Research on the systems side of Computer Science straddles the line between science and engineering. Both aspects are important, so neither side should be ignored

More information

Approaches to Capturing Java Threads State

Approaches to Capturing Java Threads State Approaches to Capturing Java Threads State Abstract This paper describes a range of approaches to capturing the state of Java threads. The capture and restoration of Java threads state have two main purposes:

More information

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III

GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III GUJARAT TECHNOLOGICAL UNIVERSITY MASTER OF COMPUTER APPLICATION SEMESTER: III Subject Name: Operating System (OS) Subject Code: 630004 Unit-1: Computer System Overview, Operating System Overview, Processes

More information

Storage in Programs. largest. address. address

Storage in Programs. largest. address. address Storage in rograms Almost all operand storage used by programs is provided by memory. Even though registers are more efficiently accessed by instructions, there are too few registers to hold the stored

More information

Overhead-Free Portable Thread-Stack Checkpoints

Overhead-Free Portable Thread-Stack Checkpoints Overhead-Free Portable Thread-Stack Checkpoints Ronald Veldema and Michael Philippsen University of Erlangen-Nuremberg, Computer Science Department 2, Martensstr. 3 91058 Erlangen Germany {veldema, philippsen@cs.fau.de

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

Comparative Performance of Hierarchical Fault Tolerance Protocol for Mobile Agent Systems

Comparative Performance of Hierarchical Fault Tolerance Protocol for Mobile Agent Systems Comparative Performance of Hierarchical Fault Tolerance Protocol for Mobile Agent Systems Heman Pathak, Nipur Dept. of Computer Science Gurukul Kangri Vishwavidyalaya Haridwar, India hemanp@rediffmail.com,

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Lecture Notes on Garbage Collection

Lecture Notes on Garbage Collection Lecture Notes on Garbage Collection 15-411: Compiler Design Frank Pfenning Lecture 21 November 4, 2014 These brief notes only contain a short overview, a few pointers to the literature with detailed descriptions,

More information

Chapter 7 The Potential of Special-Purpose Hardware

Chapter 7 The Potential of Special-Purpose Hardware Chapter 7 The Potential of Special-Purpose Hardware The preceding chapters have described various implementation methods and performance data for TIGRE. This chapter uses those data points to propose architecture

More information

Concurrent & Distributed Systems Supervision Exercises

Concurrent & Distributed Systems Supervision Exercises Concurrent & Distributed Systems Supervision Exercises Stephen Kell Stephen.Kell@cl.cam.ac.uk November 9, 2009 These exercises are intended to cover all the main points of understanding in the lecture

More information

3.1 Introduction. Computers perform operations concurrently

3.1 Introduction. Computers perform operations concurrently PROCESS CONCEPTS 1 3.1 Introduction Computers perform operations concurrently For example, compiling a program, sending a file to a printer, rendering a Web page, playing music and receiving e-mail Processes

More information

3. Process Management in xv6

3. Process Management in xv6 Lecture Notes for CS347: Operating Systems Mythili Vutukuru, Department of Computer Science and Engineering, IIT Bombay 3. Process Management in xv6 We begin understanding xv6 process management by looking

More information

Implementing an efficient method of check-pointing on CPU-GPU

Implementing an efficient method of check-pointing on CPU-GPU Implementing an efficient method of check-pointing on CPU-GPU Harsha Sutaone, Sharath Prasad and Sumanth Suraneni Abstract In this paper, we describe the design, implementation, verification and analysis

More information

Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide

Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide Page i THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT

More information

Lecture 1 Introduction (Chapter 1 of Textbook)

Lecture 1 Introduction (Chapter 1 of Textbook) Bilkent University Department of Computer Engineering CS342 Operating Systems Lecture 1 Introduction (Chapter 1 of Textbook) Dr. İbrahim Körpeoğlu http://www.cs.bilkent.edu.tr/~korpe 1 References The slides

More information

CS 333 Introduction to Operating Systems. Class 1 - Introduction to OS-related Hardware and Software

CS 333 Introduction to Operating Systems. Class 1 - Introduction to OS-related Hardware and Software CS 333 Introduction to Operating Systems Class 1 - Introduction to OS-related Hardware and Software Jonathan Walpole Computer Science Portland State University About the instructor Instructor - Jonathan

More information

Avida Checkpoint/Restart Implementation

Avida Checkpoint/Restart Implementation Avida Checkpoint/Restart Implementation Nilab Mohammad Mousa: McNair Scholar Dirk Colbry, Ph.D.: Mentor Computer Science Abstract As high performance computing centers (HPCC) continue to grow in popularity,

More information

Machine-Independent Virtual Memory Management for Paged June Uniprocessor 1st, 2010and Multiproce 1 / 15

Machine-Independent Virtual Memory Management for Paged June Uniprocessor 1st, 2010and Multiproce 1 / 15 Machine-Independent Virtual Memory Management for Paged Uniprocessor and Multiprocessor Architectures Matthias Lange TU Berlin June 1st, 2010 Machine-Independent Virtual Memory Management for Paged June

More information

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

! Design constraints.  Component failures are the norm.  Files are huge by traditional standards. ! POSIX-like Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total

More information

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels

Virtualization. Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels Virtualization Operating Systems, 2016, Meni Adler, Danny Hendler & Amnon Meisels 1 What is virtualization? Creating a virtual version of something o Hardware, operating system, application, network, memory,

More information

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc.

High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. High Availability through Warm-Standby Support in Sybase Replication Server A Whitepaper from Sybase, Inc. Table of Contents Section I: The Need for Warm Standby...2 The Business Problem...2 Section II:

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 16 - Checkpointing I Chapter 6 - Checkpointing Part.16.1 Failure During Program Execution Computers today are much faster,

More information

MPI versions. MPI History

MPI versions. MPI History MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

Garbage Collection (2) Advanced Operating Systems Lecture 9

Garbage Collection (2) Advanced Operating Systems Lecture 9 Garbage Collection (2) Advanced Operating Systems Lecture 9 Lecture Outline Garbage collection Generational algorithms Incremental algorithms Real-time garbage collection Practical factors 2 Object Lifetimes

More information

Key-value store with eventual consistency without trusting individual nodes

Key-value store with eventual consistency without trusting individual nodes basementdb Key-value store with eventual consistency without trusting individual nodes https://github.com/spferical/basementdb 1. Abstract basementdb is an eventually-consistent key-value store, composed

More information

RTI Performance on Shared Memory and Message Passing Architectures

RTI Performance on Shared Memory and Message Passing Architectures RTI Performance on Shared Memory and Message Passing Architectures Steve L. Ferenci Richard Fujimoto, PhD College Of Computing Georgia Institute of Technology Atlanta, GA 3332-28 {ferenci,fujimoto}@cc.gatech.edu

More information

Topological Design of Minimum Cost Survivable Computer Communication Networks: Bipartite Graph Method

Topological Design of Minimum Cost Survivable Computer Communication Networks: Bipartite Graph Method Topological Design of Minimum Cost Survivable Computer Communication Networks: Bipartite Graph Method Kamalesh V.N Research Scholar, Department of Computer Science and Engineering, Sathyabama University,

More information

Comparing Centralized and Decentralized Distributed Execution Systems

Comparing Centralized and Decentralized Distributed Execution Systems Comparing Centralized and Decentralized Distributed Execution Systems Mustafa Paksoy mpaksoy@swarthmore.edu Javier Prado jprado@swarthmore.edu May 2, 2006 Abstract We implement two distributed execution

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

Programming Languages Third Edition. Chapter 7 Basic Semantics

Programming Languages Third Edition. Chapter 7 Basic Semantics Programming Languages Third Edition Chapter 7 Basic Semantics Objectives Understand attributes, binding, and semantic functions Understand declarations, blocks, and scope Learn how to construct a symbol

More information

An introduction to checkpointing. for scientifc applications

An introduction to checkpointing. for scientifc applications damien.francois@uclouvain.be UCL/CISM An introduction to checkpointing for scientifc applications November 2016 CISM/CÉCI training session What is checkpointing? Without checkpointing: $./count 1 2 3^C

More information

MPI History. MPI versions MPI-2 MPICH2

MPI History. MPI versions MPI-2 MPICH2 MPI versions MPI History Standardization started (1992) MPI-1 completed (1.0) (May 1994) Clarifications (1.1) (June 1995) MPI-2 (started: 1995, finished: 1997) MPI-2 book 1999 MPICH 1.2.4 partial implemention

More information

A Unix process s address space appears to be three regions of memory: a read-only text region (containing executable code); a read-write region

A Unix process s address space appears to be three regions of memory: a read-only text region (containing executable code); a read-write region A Unix process s address space appears to be three regions of memory: a read-only text region (containing executable code); a read-write region consisting of initialized data (simply called data), uninitialized

More information

Custodial Multicast in Delay Tolerant Networks

Custodial Multicast in Delay Tolerant Networks Custodial Multicast in Delay Tolerant Networks Challenges and Approaches Susan Symington, Robert C. Durst, and Keith Scott The MITRE Corporation McLean, Virginia susan@mitre.org, durst@mitre.org, kscott@mitre.org

More information

Implementing Virtual Memory in a Vector Processor with Software Restart Markers

Implementing Virtual Memory in a Vector Processor with Software Restart Markers Implementing Virtual Memory in a Vector Processor with Software Restart Markers Mark Hampton & Krste Asanovic Computer Architecture Group MIT CSAIL Vector processors offer many benefits One instruction

More information

Distributed System. Gang Wu. Spring,2018

Distributed System. Gang Wu. Spring,2018 Distributed System Gang Wu Spring,2018 Lecture7:DFS What is DFS? A method of storing and accessing files base in a client/server architecture. A distributed file system is a client/server-based application

More information

Distributed Objects and Remote Invocation. Programming Models for Distributed Applications

Distributed Objects and Remote Invocation. Programming Models for Distributed Applications Distributed Objects and Remote Invocation Programming Models for Distributed Applications Extending Conventional Techniques The remote procedure call model is an extension of the conventional procedure

More information

Chapter 1: Distributed Information Systems

Chapter 1: Distributed Information Systems Chapter 1: Distributed Information Systems Contents - Chapter 1 Design of an information system Layers and tiers Bottom up design Top down design Architecture of an information system One tier Two tier

More information

Concurrent VSAM access for batch and CICS: A white paper series

Concurrent VSAM access for batch and CICS: A white paper series Concurrent VSAM access for batch and CICS: A white paper series Overview: Conventional and current alternatives A white paper from: IBM and CICS are trademarks or registered trademarks of International

More information

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture Last 2 Classes: Introduction to Operating Systems & C++ tutorial User apps OS Virtual machine interface hardware physical machine interface An operating system is the interface between the user and the

More information

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space.

Virtual Memory - Overview. Programmers View. Virtual Physical. Virtual Physical. Program has its own virtual memory space. Virtual Memory - Overview Programmers View Process runs in virtual (logical) space may be larger than physical. Paging can implement virtual. Which pages to have in? How much to allow each process? Program

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff and Shun Tak Leung Google* Shivesh Kumar Sharma fl4164@wayne.edu Fall 2015 004395771 Overview Google file system is a scalable distributed file system

More information

Concepts. Virtualization

Concepts. Virtualization Concepts Virtualization Concepts References and Sources James Smith, Ravi Nair, The Architectures of Virtual Machines, IEEE Computer, May 2005, pp. 32-38. Mendel Rosenblum, Tal Garfinkel, Virtual Machine

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

The Google File System

The Google File System The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google SOSP 03, October 19 22, 2003, New York, USA Hyeon-Gyu Lee, and Yeong-Jae Woo Memory & Storage Architecture Lab. School

More information

Introduction CHAPTER. Exercises

Introduction CHAPTER. Exercises 1 CHAPTER Introduction Chapter 1 introduces the general topic of operating systems and a handful of important concepts (multiprogramming, time sharing, distributed system, and so on). The purpose is to

More information

G Programming Languages - Fall 2012

G Programming Languages - Fall 2012 G22.2110-003 Programming Languages - Fall 2012 Lecture 4 Thomas Wies New York University Review Last week Control Structures Selection Loops Adding Invariants Outline Subprograms Calling Sequences Parameter

More information

Chapter 4 Communication

Chapter 4 Communication DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 4 Communication Layered Protocols (1) Figure 4-1. Layers, interfaces, and protocols in the OSI

More information

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System

Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Distributed Scheduling for the Sombrero Single Address Space Distributed Operating System Donald S. Miller Department of Computer Science and Engineering Arizona State University Tempe, AZ, USA Alan C.

More information

Introduction to Operating Systems. Chapter Chapter

Introduction to Operating Systems. Chapter Chapter Introduction to Operating Systems Chapter 1 1.3 Chapter 1.5 1.9 Learning Outcomes High-level understand what is an operating system and the role it plays A high-level understanding of the structure of

More information

Introduction to Concurrency (Processes, Threads, Interrupts, etc.)

Introduction to Concurrency (Processes, Threads, Interrupts, etc.) Introduction to Concurrency (Processes, Threads, Interrupts, etc.) CS-3013 Operating Systems Hugh C. Lauer (Slides include materials from Slides include materials from Modern Operating Systems, 3 rd ed.,

More information

Binary-to-Binary Translation Literature Survey. University of Texas at Austin Department of Electrical and Computer Engineering

Binary-to-Binary Translation Literature Survey. University of Texas at Austin Department of Electrical and Computer Engineering Binary-to-Binary Translation Literature Survey University of Texas at Austin Department of Electrical and Computer Engineering Juan Rubio Wade Schwartzkopf March 16, 1998 I. INTRODUCTION...4 II. HISTORY...4

More information

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table

The basic operations defined on a symbol table include: free to remove all entries and free the storage of a symbol table SYMBOL TABLE: A symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier in a program's source code is associated with information relating

More information

Sold!: Auction Methods for Multirobot Coordination. Presented by Yohan Sawant March 27, 2003

Sold!: Auction Methods for Multirobot Coordination. Presented by Yohan Sawant March 27, 2003 Sold!: Auction Methods for Multirobot Coordination Presented by Yohan Sawant March 27, 2003 Sold!: Auction Methods for Multirobot Coordination Brian P. Gerkey and Maja J. Mataric University of Southern

More information

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne

Distributed Computing: PVM, MPI, and MOSIX. Multiple Processor Systems. Dr. Shaaban. Judd E.N. Jenne Distributed Computing: PVM, MPI, and MOSIX Multiple Processor Systems Dr. Shaaban Judd E.N. Jenne May 21, 1999 Abstract: Distributed computing is emerging as the preferred means of supporting parallel

More information

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM

PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM PARALLEL PROGRAM EXECUTION SUPPORT IN THE JGRID SYSTEM Szabolcs Pota 1, Gergely Sipos 2, Zoltan Juhasz 1,3 and Peter Kacsuk 2 1 Department of Information Systems, University of Veszprem, Hungary 2 Laboratory

More information

Today: Segmentation. Last Class: Paging. Costs of Using The TLB. The Translation Look-aside Buffer (TLB)

Today: Segmentation. Last Class: Paging. Costs of Using The TLB. The Translation Look-aside Buffer (TLB) Last Class: Paging Process generates virtual addresses from 0 to Max. OS divides the process onto pages; manages a page table for every process; and manages the pages in memory Hardware maps from virtual

More information

signature i-1 signature i instruction j j+1 branch adjustment value "if - path" initial value signature i signature j instruction exit signature j+1

signature i-1 signature i instruction j j+1 branch adjustment value if - path initial value signature i signature j instruction exit signature j+1 CONTROL FLOW MONITORING FOR A TIME-TRIGGERED COMMUNICATION CONTROLLER Thomas M. Galla 1, Michael Sprachmann 2, Andreas Steininger 1 and Christopher Temple 1 Abstract A novel control ow monitoring scheme

More information

GFS: The Google File System. Dr. Yingwu Zhu

GFS: The Google File System. Dr. Yingwu Zhu GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can

More information

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University G22.2110-001 Programming Languages Spring 2010 Lecture 4 Robert Grimm, New York University 1 Review Last week Control Structures Selection Loops 2 Outline Subprograms Calling Sequences Parameter Passing

More information

Chapter 3 - Theory of Operation

Chapter 3 - Theory of Operation SimMatrix User's Manual Chapter 3 -, '-" 3.1 Overview The SimMatrix co-simulation process comprises three different stages; design assembly, partitioning and co-simulation (Figure 3-1). Design Assembly

More information

E-SCIENCE WORKFLOW ON THE GRID

E-SCIENCE WORKFLOW ON THE GRID E-SCIENCE WORKFLOW ON THE GRID Yaohang Li Department of Computer Science North Carolina A&T State University, Greensboro, NC 27411, USA yaohang@ncat.edu Michael Mascagni Department of Computer Science

More information

A Low-Cost Correction Algorithm for Transient Data Errors

A Low-Cost Correction Algorithm for Transient Data Errors A Low-Cost Correction Algorithm for Transient Data Errors Aiguo Li, Bingrong Hong School of Computer Science and Technology Harbin Institute of Technology, Harbin 150001, China liaiguo@hit.edu.cn Introduction

More information

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant. 24-vm.txt Mon Nov 21 22:13:36 2011 1 Notes on Virtual Machines 15-440, Fall 2011 Carnegie Mellon University Randal E. Bryant References: Tannenbaum, 3.2 Barham, et al., "Xen and the art of virtualization,"

More information

Lecture 13: Address Translation

Lecture 13: Address Translation CS 422/522 Design & Implementation of Operating Systems Lecture 13: Translation Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of the

More information

Computer Fundamentals : Pradeep K. Sinha& Priti Sinha

Computer Fundamentals : Pradeep K. Sinha& Priti Sinha Computer Fundamentals Pradeep K. Sinha Priti Sinha Chapter 14 Operating Systems Slide 1/74 Learning Objectives In this chapter you will learn about: Definition and need for operating system Main functions

More information

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station

Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Power and Locality Aware Request Distribution Technical Report Heungki Lee, Gopinath Vageesan and Eun Jung Kim Texas A&M University College Station Abstract With the growing use of cluster systems in file

More information

Incompatibility Dimensions and Integration of Atomic Commit Protocols

Incompatibility Dimensions and Integration of Atomic Commit Protocols The International Arab Journal of Information Technology, Vol. 5, No. 4, October 2008 381 Incompatibility Dimensions and Integration of Atomic Commit Protocols Yousef Al-Houmaily Department of Computer

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information