Fault-Tolerant Parallel and Distributed Systems

Size: px
Start display at page:

Download "Fault-Tolerant Parallel and Distributed Systems"

Transcription

1 Fault-Tolerant Parallel and Distributed Systems

2 Fault-Tolerant Parallel and Distributed Systems by DIMITER R. AVRESKY Department of Electrical and Computer Engineering Boston University Boston, MA and DAVID R. KAELI Department of Electrical and Computer Engineering Northeastern University Boston, MA.., ~ Springer Science+Business Media, LLC

3 ISBN ISBN (ebook) DOI / Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress. Copyright 1998 by Springer Science+Business Media New York Origioally published by Kluwer Academic Publishers in 1998 Softcover reprint ofthe hardcover lst edition 1998 AII rights reserved. No part ofthis publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission ofthe publisher, Kluwer Springer Science+Business Media, LLC. Printed on acid-free pa per.

4 Contents Preface ix Part I Fault-Tolerant Protocols 1. Comparing Synchronous and Asynchronous Group Communication 3 F. Cristian 2. Using Static Total Causal Ordering Protocols to Achieve 25 Ordered View Synchrony K.-Y. Siu and M. Iyer 3. A Failure-Aware Datagram Service 55 C. Fetzer and F. Cristian Part II Fault-Tolerant Distributed Systems Portable Checkpoint For Heterogeneous Architectures 73 V. Strum pen and B. Ramkumar 5. A Checkpointing-Recovery Scheme for 93 Domino-Free Distributed Systems F.Quaglia, B. Ciciani, and R. Baldoni 6. Overview of a Fault-Tolerant System 109 A. Pruscino 7. An Efficient Recoverable DSM on a Network of Workstations: 123 Design and Implementation A.-M. Kermarrec and C. Morin 8. Fault-Tolerant Issues of Local Area MultiProcessors (LAMP) 139 Storage Subsystem Q. Li, E. Hong, and A. Tsukerman 9.. Fault-Tolerance Issues in RDBMS on SCI-Based Local Area 155 MultiProcessor (LAMP) Q. Li, A. Tsukerman, and E. Hong Part III: Dependable Systems Distributed Safety-Critical Systems 173 P.J. Perrone and B. W. Johnson

5 vi 11. Dependability and Other Challenges in the Collision 195 Between Computing and Telecommunication Y. Levendel 12. A Unified Approach for the Synthesis of Scalable and Testable 213 Embedded Architectures P.B. Bhat, C. Aktouf, Y.K. Prasanna, S. Gupta, and M.A. Breuer 13. A Fault-Robust SPMD Architecture for 3D-TV Image Processing 231 A. Chiari, B. Ciciani, and M. Romero Part IV: Fault-Tolerant Parallel Systems A Parallel Algorithm for Embedding Complete 249 Binary Trees in Faulty Hypercubes S.B. Choi and A.K. Somani 15. Fault-Tolerant Broadcasting in a K-ary N-cube 267 B. Broeg and B. Bose 16. Fault Isolation and Diagnosis in Multiprocessor Systems with 285 Point-to-Point Communication Links K. Chakrabarty, M.G. Karpovsky, and L.B. Levitin 17. An Efficient Hardware Fault-Tolerant Technique 301 S.H. Hosseini, O.A. Abulnaja, and K. Vairavan 18. Reliability Evaluation of a Task Under a Hardware 315 Fault-Tolerant Technique O.A. Abulnaja, S.H. Hosseini, and K.. Vair 19. Fault Tolerance Measures for m-ary n-dimensional Hypercubes 329 Based on Forbidden Faulty Sets J. Wu and G. Guo 20. Dynamic Fault Recovery for Wormhole-Routed 341 Two-Dimensional Meshes D.R. Avresky and C.M. Cunningham 21. Fault-Tolerant Dynamic Task Scheduling Based on Dataflow Graphs E. Maehle and F.-J. Markus 22. A Novel Replication Technique for Implementing Fault-Tolerant Parallel Software A. Cheri/. M. Suzuki, and T. Katayama

6 23. User-Transparent Checkpoing and Restart for Parallel Computers 385 B. Bieker and E. Maehle Index 401 vii

7 Preface The most important use of computing in the future will be in the context of the global "digital convergence" where everything becomes digital and everything is inter-networked. The application will be dominated by storage, search, retrieval, analysis, exchange and updating of information in a wide variety of forms. Heavy demands will be placed on systems by many simultaneous requests. And, fundamentally, all this shall be delivered at much higher levels of dependability, integrity and security. Increasingly, large parallel computing systems and networks are providing unique challenges to industry and academia in dependable computing, especially because of the higher failure rates intrinsic to these systems. The challenge in the last part of this decade is to build a systems that is both inexpensive and highly available. A machine cluster built of commodity hardware parts, with each node running an OS instance and a set of applications extended to be fault resilient can satisfy the new stringent high-availability requirements. The focus of this book is to present recent techniques and methods for implementing fault-tolerant parallel and distributed computing systems. Section I, Fault-Tolerant Protocols, considers basic techniques for achieving fault-tolerance in communication protocols for distributed systems, including synchronous and asynchronous group communication, static total causal ordering protocols, and fail-aware datagram service that supports communications by time. A common framework for describing synchronous and asynchronous group communication services and a comparison of the properties that synchronous and asynchronous group communication can provide to simplify replicated programming is presented in the paper "Comparing Synchronous and Asynchronous Group Communication". Group communication services, such as membership and atomic broadcast, simplify the maintenance of state replica consistency despite random communication delays, failures and recoveries. In distributed systems, high service availability can be achieved by letting a group of servers replicate the service state; if some servers fail, the surviving ones know the service state and can continue to provide the service.

8 x The paper "Using Static Total Causal Ordering Protocols to Achieve Ordered View Synchrony" describes a view-synchronous totally ordered message delivery protocol for a dynamic asynchronous process group in an asynchronous communication environment. The protocol can handle asynchronous processes or link failures and also the simultaneous joining of multiple group of processes. A fail-aware datagram service that supports communication by t.ime delivers all messages whose computed one-way transmission delays are smaller than a given bound as "fast" and all other message as "slow" is presented in the paper "A Fail-Aware Datagram Service". The fail-aware datagram service is the foundation of all other fail-aware services, such as fail-aware clock synchronization, fail-aware membership and fail-aware atomic broadcast. In Section II, Fault-Tolerant Distributed Systems, we consider different methods and approaches for achieving fault tolerance in distributed systems such as portable check-pointing for heterogeneous architectures, checkpointing-recovery scheme insuring domino-freeness, dependable cluster systems, recoverable distributed shared memory (DSM) on a network of workstations (NOW), faulttolerant scalable coherent interface (SCI)-based local area multiprocessor. An approach, which enables the failed computation to be recovered on a different processor architecture is shown in the paper "Portable Checkpointing for Heterogeneous Architectures". Sequential C programs are compiled into faulttolerant C programs, whose checkpoints can be migrated across heterogeneous networks and restarted on binary-incompatible architectures. The paper "A Checkpointing-Recovery Scheme for Domino-Free Distributed Systems" presents a checkpointing-recovery scheme for distributed systems. The proposed checkpointing algorithm ensures the progression of the recovery line reducing the number of checkpoints in comparison to previous proposals. The goal is achieved by introducing an equivalence relation between local checkpoints of a process and by exploiting the process' event history. A hardware architecture based on a cluster of commodity p a i3 ~ and a set of software cluster services that will help in the design implementation and deployment of fault-resilient software is described in the paper "Overview of a Fault-Tolerant System". Depending on the use of these services and mechanisms the system can reach different levels of fault tolerance and reliability characteristics. Networks of Workstations (NOW) have become a convenient and less expensive alternative to parallel architectures for the execution of long-running parallel applications. The paper "An Efficient Recoverable DSM on a Network of Workstations: Design and Implementation" presents the realization and performance evaluation of ICARE - a recoverable DSM (RDSM) associated with a process checkpointing mechanism. ICARE tolerates a single permanent node failure transparently to parallel applications which continue their execution on the remaining nodes. A prototype of ICARE is fully operational on an ATM network of workstations, running CHORUS micro-kernel. In the paper "Fault-Tolerant Issues of Local Area Multiprocessor (LAMP) Storage Subsystem" three main fault tolerance issues of the LAMP storage subsystem are discussed: system configurability for fault tolerance and perfor-

9 mance, fast error detection and recovery, and fast logical volume reconstruction. Local Area MultiProcessor (LAMP) is a network of workstations with a shared physical memory. It uses low-latency and high bandwidth interconnections and provides remote DMA support. The interconnection is the Scalable Coherent Interface (SCI) which provides cache coherent, physically shared memory for multiprocessors via its bus-like point-point connections with high bandwidth and low latency. The interconnection network of LAMP is based on the Scalable Coherent Interface (SCI, IEEE std 1596 Scalable Coherent Interface). The paper "Fault-Tolerance Issues in RDBMS on SCI-based Local Area Multiprocessor (LAMP)" explores the issues related to implementation of database systems on LAMP, particularly the fault-tolerant issues. In Section III, Dependable Systems, we consider general models and features of distributed safety-critical systems using commercial off-the-shelf component (COTS), service dependability in telecomputing systems constructed with offthe-shelf components offering scalability and graceful degradation, a scalable and testable heterogeneous embedded architecture based on COTS for high-end signal processing applications, a fault-tolerant SPMD hierarchical architecture for real time processing of video signals. An overview of the problems encountered by those designing safety-critical systems along with the fundamentals. definitions and concepts employed by their design is presented in the paper "Distributed Safety-Critical Systems". A taxonomy that classifies the design solution space for safety-critical systems is presented. The paper "Dependability and Other Challenges in the Collision between Computing and Telecommunication" describes a distributed system composed of off-the-shelf components which can deliver advanced telecommunication services. It is pointed out that the main difficulty to realize services using this approach resides in the need to create a robust dependable system. The resources and their servers are heterogeneous and may be distributed locally or globally in the network. This architecture offers scalability and congestion management, and poses the significant challenge of overall service dependability. A new concept, that of scalable and testable embedded systems, is introduced in the paper "A unified approach for the synthesis of scalable embedded architectures". Parallel heterogeneous architectures based on COTS (Commercial Off-The-Shelf) components are becoming increasingly attractive as computing platforms for high-end signal processing applications such as Radar and Sonar. In comparison with traditional custom VLSI designs, these architectures offer advantages of flexibility, high performance, rapid design time, easy upgradability, and low cost. The paper describes an unified approach for the synthesis of scalable architecture, based on COTS components. The approach is illustrated through a concrete example of a signal processing application. A fault-tolerant SPMD hierarchical architecture for real-time processing of video signals is introduced in the paper "A Fault-Robust SPMD Architecture for 3D-TV Image Processing". Fault-tolerant characteristics are evaluated by comparing the images produced by the system with and without faults in the architecture. xi

10 xii Section IV, Fault-Tolerant Parallel Systems, considers embedding complete binary trees into a faulty hypercube interconnection architecture, single-node broadcasting in a faulty k-ary n-cube, software-implemented system-level testing technique for multiprocessor systems with dedicated communication links, reliable execution of tasks and concurrent diagnosis of faulty processors and links, conditional connectivity for the m-ary n-dimensional hypercube, on-line recovery from intermittent and permanent faults within the links and nodes in two-dimensional meshes, fault-tolerance in parallel computers based on checkpointing, self-diagnosis and rollback recovery, functional and attribute-based language for programming fault-tolerant applications, user-transparent backward error recovery for message passing systems are considered. A scheme that can be used recursively in parallel to map a complete binary tree into a hypercube interconnection architecture with some faulty nodes is proposed in the paper "A Parallel Algorithm for Embedding Complete Binary Trees in the Faulty Hypercubes". Two algorithms have been described: one for a fault-free hypercube and the other for a faulty hypercube. It is shown that the scheme has a low time complexity as compared to the complexity of the existing algorithms. The paper "Fault-Tolerant Broadcasting in a K-ary N-cube " depicts an algorithm for one-to-all broadcasting in a k-ary-n cube. The algorithm is nonredundant and fault-tolerant, and broadcasts correctly given n-l or less faults. It is called Partner Fault-Tolerant Algorithm. The time complexity of the algorithm is given. The paper "Fault Isolation and Diagnosis in Multiprocessor Systems with Point-to-Point Communication Links" presents an approach, which combines distributed system-level testing with processor self-test, and ensures fault-free operation by disconnecting all faulty processors and links from the system. The placement of monitors has been determined for several multiprocessor topologies including trees, hypercubes and meshes. In the paper "An Efficient Hardware Fault-Tolerant Technique" it is shown, that based on an efficient hardware fault-tolerant technique the reliable execution of tasks and concurrent diagnosis of faults can be accomplished, while processors and communication channels are subject to failure. The paper "Reliability Evaluation of a Task under a Hardware Fault-Tolerant Technique" presents an efficient technique, based on which each task's reliability is increased when processors and communication channels are subject to failure. The concept of a forbidden set is exploited in the paper "Fault Tolerance Measures for M-ary N-dimensional Hypercubes Based on Forbidden Faulty Set" to achieve fault tolerance in hypercubes. In general, there are many ways to define a forbidden (feasible) faulty set depending on the topology of the system, application environment, statistical analysis of faulty patterns, and distribution of faulty-free nodes. An algorithm for detecting and compensating for intermittent and permanent faults within the links and nodes of parallel computers, having an NxN

11 two-dimensional mesh interconnection topology, is described in the paper" Online Fault Recovery for Wormhole-Routed Two- Dimensional Meshes". A fully distributed algorithm for fault-tolerant scheduling is given in the paper "Fault-Tolerant Dynamic Task Scheduling". The main advantage of this algorithm is that fail-soft behavior (graceful degradation) is achieved in a user-transparent way. Another important aspects of this approach is that it is applicable for a wide variety of target machines including message-passing architectures, workstation clusters or even shared memory machines. A replication technique based on the FTAG computation model, and different novel mechanisms for recovery in case of failures are presented in the paper "A Novel Replication Technique for Implementing Fault-Tolerant Parallel Software". FTAG is functional and attribute based language for progr&mming fault-tolerant parallel applications. User-transparent backward error recovery for message-passing systems is presented in the paper "User-Transparent Checkpointing and Restart for Parallel Computers". This book contains selected and revised articles at the IEEE Fault-Tolerant Parallel and Distributed Systems (FTPDS'98) workshops, Hawaii, Honolulu, 1996 and Geneva, Switzerland, As well, several authors have been invited to submit papers. The selection process of the papers was greatly facilitated by the steadfast work of the program committee members and the reviewers, for which we are most grateful. We would like to extend a special thanks to the members of the Network Computing Laboratory, Department of Electrical and Computer Engineering at Boston University for their help. xiii

Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network

Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network Fault-tolerant Distributed-Shared-Memory on a Broadcast-based Interconnection Network Diana Hecht 1 and Constantine Katsinis 2 1 Electrical and Computer Engineering, University of Alabama in Huntsville,

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Topological Structure and Analysis of Interconnection Networks

Topological Structure and Analysis of Interconnection Networks Topological Structure and Analysis of Interconnection Networks Network Theory and Applications Volume 7 Managing Editors: Ding-Zhu Du, University of Minnesota, U.S.A. and Cauligi Raghavendra, University

More information

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs

ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ARCHITECTURE AND CAD FOR DEEP-SUBMICRON FPGAs Vaughn Betz Jonathan Rose Alexander Marquardt

More information

Energy Efficient Microprocessor Design

Energy Efficient Microprocessor Design Energy Efficient Microprocessor Design Energy Efficient Microprocessor Design by Thomas D. Burd Robert W. Brodersen with Contributions Irom Trevor Pering Anthony Stratakos Berkeley Wireless Research Center

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

WIRELESS ATM AND AD-HOC NETWORKS. Protocols and Architectures

WIRELESS ATM AND AD-HOC NETWORKS. Protocols and Architectures WIRELESS ATM AND AD-HOC NETWORKS Protocols and Architectures WIRELESS ATM AND AD-HOC NETWORKS Protocols and Architectures C-K Toh, Ph.D. University of Cambridge Cambridge, United Kingdom SPRINGER-SCIENCE+BUSINESS

More information

PERFORMANCE ANALYSIS OF REAL-TIME EMBEDDED SOFTWARE

PERFORMANCE ANALYSIS OF REAL-TIME EMBEDDED SOFTWARE PERFORMANCE ANALYSIS OF REAL-TIME EMBEDDED SOFTWARE PERFORMANCE ANALYSIS OF REAL-TIME EMBEDDED SOFTWARE Yau-Tsun Steven Li Monterey Design Systems, Inc. Sharad Malik Princeton University ~. " SPRINGER

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.

More information

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols

Portland State University ECE 588/688. Directory-Based Cache Coherence Protocols Portland State University ECE 588/688 Directory-Based Cache Coherence Protocols Copyright by Alaa Alameldeen and Haitham Akkary 2018 Why Directory Protocols? Snooping-based protocols may not scale All

More information

Lecture 9: MIMD Architecture

Lecture 9: MIMD Architecture Lecture 9: MIMD Architecture Introduction and classification Symmetric multiprocessors NUMA architecture Cluster machines Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico September 26, 2011 CPD

More information

It also performs many parallelization operations like, data loading and query processing.

It also performs many parallelization operations like, data loading and query processing. Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency

More information

COMMUNICATION SYSTEMS The State of the Art

COMMUNICATION SYSTEMS The State of the Art COMMUNICATION SYSTEMS The State of the Art IFIP The International Federation for Information Processing lfip was founded in 1960 under the auspices of UNESCO, following the First World Computer Congress

More information

Lecture 9: MIMD Architectures

Lecture 9: MIMD Architectures Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected

More information

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON. DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing MSc in Information Systems and Computer Engineering DEA in Computational Engineering Department of Computer

More information

Groupware and the World Wide Web

Groupware and the World Wide Web Groupware and the World Wide Web Edited by Richard Bentley, Uwe Busbach, David Kerr & Klaas Sikkel German National Research Center for Information Technology, Institutefor Applied Information Technology

More information

RETARGETABLE CODE GENERATION FOR DIGITAL SIGNAL PROCESSORS

RETARGETABLE CODE GENERATION FOR DIGITAL SIGNAL PROCESSORS RETARGETABLE CODE GENERATION FOR DIGITAL SIGNAL PROCESSORS RETARGETABLE CODE GENERATION FOR DIGITAL SIGNAL PROCESSORS Rainer LEUPERS University of Dortmund Department of Computer Science Dortmund, Germany

More information

SMD149 - Operating Systems - Multiprocessing

SMD149 - Operating Systems - Multiprocessing SMD149 - Operating Systems - Multiprocessing Roland Parviainen December 1, 2005 1 / 55 Overview Introduction Multiprocessor systems Multiprocessor, operating system and memory organizations 2 / 55 Introduction

More information

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy

Overview. SMD149 - Operating Systems - Multiprocessing. Multiprocessing architecture. Introduction SISD. Flynn s taxonomy Overview SMD149 - Operating Systems - Multiprocessing Roland Parviainen Multiprocessor systems Multiprocessor, operating system and memory organizations December 1, 2005 1/55 2/55 Multiprocessor system

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

TIME-CONSTRAINED TRANSACTION MANAGEMENT. Real-Time Constraints in Database Transaction Systems

TIME-CONSTRAINED TRANSACTION MANAGEMENT. Real-Time Constraints in Database Transaction Systems TIME-CONSTRAINED TRANSACTION MANAGEMENT Real-Time Constraints in Database Transaction Systems The Kluwer International Series on ADV ANCES IN DATABASE SYSTEMS Other books in the Series: Series Editor Ahmed

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Parallel and High Performance Computing CSE 745

Parallel and High Performance Computing CSE 745 Parallel and High Performance Computing CSE 745 1 Outline Introduction to HPC computing Overview Parallel Computer Memory Architectures Parallel Programming Models Designing Parallel Programs Parallel

More information

INTERCONNECTION NETWORKS LECTURE 4

INTERCONNECTION NETWORKS LECTURE 4 INTERCONNECTION NETWORKS LECTURE 4 DR. SAMMAN H. AMEEN 1 Topology Specifies way switches are wired Affects routing, reliability, throughput, latency, building ease Routing How does a message get from source

More information

Concepts of Distributed Systems 2006/2007

Concepts of Distributed Systems 2006/2007 Concepts of Distributed Systems 2006/2007 Introduction & overview Johan Lukkien 1 Introduction & overview Communication Distributed OS & Processes Synchronization Security Consistency & replication Programme

More information

Chapter 9 Multiprocessors

Chapter 9 Multiprocessors ECE200 Computer Organization Chapter 9 Multiprocessors David H. lbonesi and the University of Rochester Henk Corporaal, TU Eindhoven, Netherlands Jari Nurmi, Tampere University of Technology, Finland University

More information

Distributed Systems LEEC (2006/07 2º Sem.)

Distributed Systems LEEC (2006/07 2º Sem.) Distributed Systems LEEC (2006/07 2º Sem.) Introduction João Paulo Carvalho Universidade Técnica de Lisboa / Instituto Superior Técnico Outline Definition of a Distributed System Goals Connecting Users

More information

THE VERILOG? HARDWARE DESCRIPTION LANGUAGE

THE VERILOG? HARDWARE DESCRIPTION LANGUAGE THE VERILOG? HARDWARE DESCRIPTION LANGUAGE THE VERILOGf HARDWARE DESCRIPTION LANGUAGE by Donald E. Thomas Carnegie Mellon University and Philip R. Moorby Cadence Design Systems, Inc. SPRINGER SCIENCE+BUSINESS

More information

Scheduling in Distributed Computing Systems Analysis, Design & Models

Scheduling in Distributed Computing Systems Analysis, Design & Models Scheduling in Distributed Computing Systems Analysis, Design & Models (A Research Monograph) Scheduling in Distributed Computing Systems Analysis, Design & Models (A Research Monograph) by Deo Prakash

More information

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1

06-Dec-17. Credits:4. Notes by Pritee Parwekar,ANITS 06-Dec-17 1 Credits:4 1 Understand the Distributed Systems and the challenges involved in Design of the Distributed Systems. Understand how communication is created and synchronized in Distributed systems Design and

More information

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ

CMPE 511 TERM PAPER. Distributed Shared Memory Architecture. Seda Demirağ CMPE 511 TERM PAPER Distributed Shared Memory Architecture by Seda Demirağ 2005701688 1. INTRODUCTION: Despite the advances in processor design, users still demand more and more performance. Eventually,

More information

Chapter 20: Database System Architectures

Chapter 20: Database System Architectures Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

Lecture 23 Database System Architectures

Lecture 23 Database System Architectures CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Definition of a Distributed System (1) A distributed system is: A collection of

More information

High-Performance Parallel Database Processing and Grid Databases

High-Performance Parallel Database Processing and Grid Databases High-Performance Parallel Database Processing and Grid Databases David Taniar Monash University, Australia Clement H.C. Leung Hong Kong Baptist University and Victoria University, Australia Wenny Rahayu

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

CA464 Distributed Programming

CA464 Distributed Programming 1 / 25 CA464 Distributed Programming Lecturer: Martin Crane Office: L2.51 Phone: 8974 Email: martin.crane@computing.dcu.ie WWW: http://www.computing.dcu.ie/ mcrane Course Page: "/CA464NewUpdate Textbook

More information

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that:

Distributed Systems. Overview. Distributed Systems September A distributed system is a piece of software that ensures that: Distributed Systems Overview Distributed Systems September 2002 1 Distributed System: Definition A distributed system is a piece of software that ensures that: A collection of independent computers that

More information

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology

Distributed Systems. Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology Distributed Systems Thoai Nam Faculty of Computer Science and Engineering HCMC University of Technology Chapter 1: Introduction Distributed Systems Hardware & software Transparency Scalability Distributed

More information

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013 Chapter 1: Distributed Systems: What is a distributed system? Fall 2013 Course Goals and Content n Distributed systems and their: n Basic concepts n Main issues, problems, and solutions n Structured and

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Running Head: NETWORKING 1

Running Head: NETWORKING 1 Running Head: NETWORKING 1 Switches and Bridges - Comparison and Contrast [Name of the Writer] [Name of the Institution] NETWORKING 2 Switches and Bridges Introduction This paper presents a comparison

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Introduction Version: October 25, 2009 2 / 26 Contents Chapter

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and

More information

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.

Non-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors. CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Multiprocessor Interconnection Networks- Part Three

Multiprocessor Interconnection Networks- Part Three Babylon University College of Information Technology Software Department Multiprocessor Interconnection Networks- Part Three By The k-ary n-cube Networks The k-ary n-cube network is a radix k cube with

More information

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition. Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 01: Version: February 21, 2011 1 / 26 Contents Chapter 01: 02: Architectures

More information

DISSEMINATING SECURITY UPDATES AT INTERNET SCALE

DISSEMINATING SECURITY UPDATES AT INTERNET SCALE DISSEMINATING SECURITY UPDATES AT INTERNET SCALE Advances in Information Security Sushil Jajodia Consulting editor Center for Secure Information Systems George Mason University Fairfax, VA 22030-4444 email:

More information

Multiprocessors - Flynn s Taxonomy (1966)

Multiprocessors - Flynn s Taxonomy (1966) Multiprocessors - Flynn s Taxonomy (1966) Single Instruction stream, Single Data stream (SISD) Conventional uniprocessor Although ILP is exploited Single Program Counter -> Single Instruction stream The

More information

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic

More information

HIGH-SPEED COMMUNICATION NETWORKS

HIGH-SPEED COMMUNICATION NETWORKS HIGH-SPEED COMMUNICATION NETWORKS HIGH-SPEED COMMUNICATION NETWORKS Edited by Harry Perros North Carolina State University Raleigh, North Carolina Springer Science+Busines s Media, LL C Library of Congress

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer

MIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware

More information

A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM

A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM A Low-Latency DMR Architecture with Efficient Recovering Scheme Exploiting Simultaneously Copiable SRAM Go Matsukawa 1, Yohei Nakata 1, Yuta Kimi 1, Yasuo Sugure 2, Masafumi Shimozawa 3, Shigeru Oho 4,

More information

SYNTHESIS OF FINITE STATE MACHINES: LOGIC OPTIMIZATION

SYNTHESIS OF FINITE STATE MACHINES: LOGIC OPTIMIZATION SYNTHESIS OF FINITE STATE MACHINES: LOGIC OPTIMIZATION SYNTHESIS OF FINITE STATE MACHINES: LOGIC OPTIMIZATION Tiziano Villa University of California/Berkeley Timothy Kam Intel Corporation Robert K. Brayton

More information

Loop Tiling for Parallelism

Loop Tiling for Parallelism Loop Tiling for Parallelism THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE LOOP TILING FOR PARALLELISM JINGLING XUE School of Computer Science and Engineering The University of New

More information

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS

More information

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing

CSE Introduction to Parallel Processing. Chapter 4. Models of Parallel Processing Dr Izadi CSE-4533 Introduction to Parallel Processing Chapter 4 Models of Parallel Processing Elaborate on the taxonomy of parallel processing from chapter Introduce abstract models of shared and distributed

More information

I/O Commercial Workloads. Scalable Disk Arrays. Scalable ICDA Performance. Outline of This Talk: Related Work on Disk Arrays.

I/O Commercial Workloads. Scalable Disk Arrays. Scalable ICDA Performance. Outline of This Talk: Related Work on Disk Arrays. Scalable Disk Arrays I/O Commercial Workloads David Kaeli Northeastern University Computer Architecture Research Laboratory Boston, MA Manpreet Singh William Zahavi EMC Corporation Hopkington, MA Industry

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Intel iapx 432-VLSI building blocks for a fault-tolerant computer

Intel iapx 432-VLSI building blocks for a fault-tolerant computer Intel iapx 432-VLSI building blocks for a fault-tolerant computer by DAVE JOHNSON, DAVE BUDDE, DAVE CARSON, and CRAIG PETERSON Intel Corporation Aloha, Oregon ABSTRACT Early in 1983 two new VLSI components

More information

Chunjie Duan Brock J. LaMeres Sunil P. Khatri. On and Off-Chip Crosstalk Avoidance in VLSI Design

Chunjie Duan Brock J. LaMeres Sunil P. Khatri. On and Off-Chip Crosstalk Avoidance in VLSI Design Chunjie Duan Brock J. LaMeres Sunil P. Khatri On and Off-Chip Crosstalk Avoidance in VLSI Design 123 On and Off-Chip Crosstalk Avoidance in VLSI Design Chunjie Duan Brock J. LaMeres Sunil P. Khatri On

More information

SPECC: SPECIFICATION LANGUAGE AND METHODOLOGY

SPECC: SPECIFICATION LANGUAGE AND METHODOLOGY SPECC: SPECIFICATION LANGUAGE AND METHODOLOGY SPECC: SPECIFICATION LANGUAGE AND METHODOLOGY Daniel D. Gajski Jianwen Zhu Rainer Dömer Andreas Gerstlauer Shuqing Zhao University of California, Irvine SPRINGER

More information

Introduction. Distributed Systems IT332

Introduction. Distributed Systems IT332 Introduction Distributed Systems IT332 2 Outline Definition of A Distributed System Goals of Distributed Systems Types of Distributed Systems 3 Definition of A Distributed System A distributed systems

More information

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism

Motivation for Parallelism. Motivation for Parallelism. ILP Example: Loop Unrolling. Types of Parallelism Motivation for Parallelism Motivation for Parallelism The speed of an application is determined by more than just processor speed. speed Disk speed Network speed... Multiprocessors typically improve the

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

Overview. Processor organizations Types of parallel machines. Real machines

Overview. Processor organizations Types of parallel machines. Real machines Course Outline Introduction in algorithms and applications Parallel machines and architectures Overview of parallel machines, trends in top-500, clusters, DAS Programming methods, languages, and environments

More information

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a

Kevin Skadron. 18 April Abstract. higher rate of failure requires eective fault-tolerance. Asynchronous consistent checkpointing oers a Asynchronous Checkpointing for PVM Requires Message-Logging Kevin Skadron 18 April 1994 Abstract Distributed computing using networked workstations oers cost-ecient parallel computing, but the higher rate

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

Fundamentals of Operating Systems. Fifth Edition

Fundamentals of Operating Systems. Fifth Edition Fundamentals of Operating Systems Fifth Edition Fundamentals of Operating Systems A.M. Lister University of Queensland R. D. Eager University of Kent at Canterbury Fifth Edition Springer Science+Business

More information

Client Server & Distributed System. A Basic Introduction

Client Server & Distributed System. A Basic Introduction Client Server & Distributed System A Basic Introduction 1 Client Server Architecture A network architecture in which each computer or process on the network is either a client or a server. Source: http://webopedia.lycos.com

More information

Algorithms and Parallel Computing

Algorithms and Parallel Computing Algorithms and Parallel Computing Algorithms and Parallel Computing Fayez Gebali University of Victoria, Victoria, BC A John Wiley & Sons, Inc., Publication Copyright 2011 by John Wiley & Sons, Inc. All

More information

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research

This chapter provides the background knowledge about Multistage. multistage interconnection networks are explained. The need, objectives, research CHAPTER 1 Introduction This chapter provides the background knowledge about Multistage Interconnection Networks. Metrics used for measuring the performance of various multistage interconnection networks

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS

LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS DINESH C. VERMA IBM T. J. Watson Research Center A JOHN WILEY & SONS, INC., PUBLICATION LEGITIMATE APPLICATIONS OF PEER-TO-PEER NETWORKS LEGITIMATE APPLICATIONS

More information

PARALLEL ARCHITECTURES AND PARALLEL ALGORITHMS FOR INTEGRATED VISION SYSTEMS

PARALLEL ARCHITECTURES AND PARALLEL ALGORITHMS FOR INTEGRATED VISION SYSTEMS PARALLEL ARCHITECTURES AND PARALLEL ALGORITHMS FOR INTEGRATED VISION SYSTEMS THE KLUWER INTERNATIONAL SERIES IN ENGINEERING AND COMPUTER SCIENCE ROBOTICS: VISION, MANIPULATION AND SENSORS Consulting Editor:

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 01 (version September 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20.

More information

Distributed Systems COMP 212. Revision 2 Othon Michail

Distributed Systems COMP 212. Revision 2 Othon Michail Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise

More information

Parallel Architectures

Parallel Architectures Parallel Architectures Instructor: Tsung-Che Chiang tcchiang@ieee.org Department of Science and Information Engineering National Taiwan Normal University Introduction In the roughly three decades between

More information

Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures.

Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures. An Efficient and Scalable Approach for Implementing Fault-Tolerant DSM Architectures Christine Morin, Anne-Marie Kermarrec, Michel Banâtre, Alain Gefflaut To cite this version: Christine Morin, Anne-Marie

More information

Cluster-Based Scalable Network Services

Cluster-Based Scalable Network Services Cluster-Based Scalable Network Services Suhas Uppalapati INFT 803 Oct 05 1999 (Source : Fox, Gribble, Chawathe, and Brewer, SOSP, 1997) Requirements for SNS Incremental scalability and overflow growth

More information

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing

Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing Spider-Web Topology: A Novel Topology for Parallel and Distributed Computing 1 Selvarajah Thuseethan, 2 Shanmuganathan Vasanthapriyan 1,2 Department of Computing and Information Systems, Sabaragamuwa University

More information

CMSC 611: Advanced. Parallel Systems

CMSC 611: Advanced. Parallel Systems CMSC 611: Advanced Computer Architecture Parallel Systems Parallel Computers Definition: A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System RAIDIX Data Storage Solution Clustered Data Storage Based on the RAIDIX Software and GPFS File System 2017 Contents Synopsis... 2 Introduction... 3 Challenges and the Solution... 4 Solution Architecture...

More information

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections

Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections Fault-Tolerant Multiple Task Migration in Mesh NoC s over virtual Point-to-Point connections A.SAI KUMAR MLR Group of Institutions Dundigal,INDIA B.S.PRIYANKA KUMARI CMR IT Medchal,INDIA Abstract Multiple

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Whitestein Series in software Agent Technologies. About whitestein Technologies

Whitestein Series in software Agent Technologies. About whitestein Technologies Whitestein Series in software Agent Technologies Series Editors: Marius Walliser Stefan Brantschen Monique Calisti Thomas Hempfling This series reports new developments in agent-based software technologies

More information

MINING VERY LARGE DATABASES WITH PARALLEL PROCESSING

MINING VERY LARGE DATABASES WITH PARALLEL PROCESSING MINING VERY LARGE DATABASES WITH PARALLEL PROCESSING The Kluwer International Series on ADVANCES IN DATABASE SYSTEMS Series Editor Ahmed K. Elmagarmid Purdue University West Lafayette, IN 47907 Other books

More information

Chapter 2 Distributed Information Systems Architecture

Chapter 2 Distributed Information Systems Architecture Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information