Reliable Distribution of Data Using Replicated Web Servers

Size: px
Start display at page:

Download "Reliable Distribution of Data Using Replicated Web Servers"

Transcription

1 Reliable Distribution of Data Using Replicated Web Servers Moreno Marzolla Dipartimento di Informatica Università Ca' Foscari di Venezia via Torino 155, Mestre (ITALY)

2 Talk Outline Introduction Fault-tolerant Data Retrieval Reliability Evaluation Conclusions and Future Works Moreno Marzolla HADIS'05, Copenhagen, aug 22,

3 Introduction Accessing large documents over a network is a challenging problem for several issues Performance Security Reliability We consider here the reliability issue How to efficiently fetch large data files over unreliable media for read-only datasets Moreno Marzolla HADIS'05, Copenhagen, aug 22,

4 Network failure What do we mean by unreliable media? Server failure Moreno Marzolla HADIS'05, Copenhagen, aug 22,

5 Reliability Model Links (or servers) may fail at any moment Failed components simply do not deliver any data (i.e., no byzantine failures) They deliver correct data until they crash Failures may be transient or permanent Moreno Marzolla HADIS'05, Copenhagen, aug 22,

6 Usage scenario We consider the problem of downloading large documents from WEB servers Documents are fully replicated among different, geographically distributed WEB servers Data is accessed using standard HTTP/1.1 protocol Moreno Marzolla HADIS'05, Copenhagen, aug 22,

7 Possible solution Data Redundancy Add redundancy (e.g. parity information) to data delivered Client computes missing data from redundant information Example: RAID-like solution Dataset 1 Dataset 2 Dataset 3 Parity Moreno Marzolla HADIS'05, Copenhagen, aug 22,

8 Problems Traditional RAID (RAID-5, parity-based) only tolerates a single dataset failure Can be improved if different RAID layouts are hierarchically combined Can be improved if sophisticated Error Correcting Codes are employed In case of failures client needs to compute missing informations Can be CPU-intensive; not applicable if client has limited computing power (eg, mobile device) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

9 Proposed Approach We consider a document W of size W which is replicated among N WEB servers S 0, S 1,... S N-1 User selects a parameter K, 1 K N We prepare requests R 0, R 1,...R N-1 to be sent to S 0, S 1,...S N-1 respectively, such that: Any K replies are sufficient to reconstruct W Moreno Marzolla HADIS'05, Copenhagen, aug 22,

10 Example N=5, K=5 N=5, K=4 N=5, K=3 N=5, K=2 Moreno Marzolla HADIS'05, Copenhagen, aug 22,

11 Some properties The size (number of bytes) of each request is The total size of all requests is then (Almost) computation-free from client side Trivial deployment of replica It's just the same file Feedback-free Moreno Marzolla HADIS'05, Copenhagen, aug 22,

12 Analysis We consider a very simple model From the user's perspective, from a given connection either data is coming or not Each connection is modeles as a two-state, continuous-time Markov Chain 0 Idle (no data coming) 1 Active (data coming at rate Bw) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

13 System model N-1 N-1 1 Bw 0 Bw 1 Bw 2 Bw N-1 Moreno Marzolla HADIS'05, Copenhagen, aug 22,

14 Analysis / 1 Let T N,K (W) be the time needed to download W from N WEB servers with parameter K We want to compute: (probability of downloading W from at least K out of N servers in time at most t) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

15 Analysis / 2 We write: Moreno Marzolla HADIS'05, Copenhagen, aug 22,

16 Analysis / 3 Let D j denote the minimum time needed to download request R j from server S j Let O j (t) denote the cumulative time spent in state 1 by server S j during the time interval [0,t) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

17 Analysis / 4 Then, we have: I P is the indicator function for predicate P: I P =1 iff P is true Moreno Marzolla HADIS'05, Copenhagen, aug 22,

18 Analysis / 5 The distribution of O j (t) is the Operational Time Distribution of the associated Markov Chain Pr( O j (t) < D j ) can be evaluated numerically using algorithms developed by Rubino and Sericola [IEEE Trans. Comp., 1993] Moreno Marzolla HADIS'05, Copenhagen, aug 22,

19 Settings Moreno Marzolla HADIS'05, Copenhagen, aug 22,

20 Parameters Moreno Marzolla HADIS'05, Copenhagen, aug 22,

21 Results / 1 (5 fast & good) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

22 Results / 2 (4 fast & good, 1 slow & poor) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

23 Results / 3 (2 fast & good, 3 slow & poor) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

24 Results / 4 (2 fast&poor, 2 slow&good, 1 slow&very poor) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

25 Conclusions We proposed a simple solution to provide a high degree of fault-tolerance to data retrieval using the standard WEB infrastructure Feedback-free Automatically selects the K fastest servers without the need for complex protocols Almost no computations required on client or server side (suitable for thin clients) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

26 What's next? How do we select the value for K? From past measurements... You know no less than K servers can be reached......need something better... Need a compromise between reliability and redundancy K=1 maximum reliability, but wastes bandwidth K=N minimum reliability, maximum net efficiency Moreno Marzolla HADIS'05, Copenhagen, aug 22,

27 Moreno Marzolla HADIS'05, Copenhagen, aug 22,

28 Applications Data delivery over wireless networks At least K out of N servers can be reached Moreno Marzolla HADIS'05, Copenhagen, aug 22,

29 Definition of requests Algorithm 1 Computation of R 0,R 1,...R N-1 Require: K, 1 K N Ensure: R i is the request for server S i fragsize := W /N t := 0 R 0 := R 1 :=... := R N-1 := {}; for i = 0 to N-1 do W i := W[i fragsize, (i+1) fragsize-1] for j = 1 to N-K+1 do R t := R t + W i t := (t + 1) mod N end for end for Moreno Marzolla HADIS'05, Copenhagen, aug 22,

Valutazione delle prestazioni di Architetture Software con specifica UML tramite modelli di simulazione Moreno Marzolla

Valutazione delle prestazioni di Architetture Software con specifica UML tramite modelli di simulazione Moreno Marzolla Valutazione delle prestazioni di Architetture Software con specifica UML tramite modelli di simulazione Moreno Marzolla Dipartimento di Informatica Università Ca' Foscari di Venezia marzolla@dsi.unive.it

More information

A Performance Monitoring System for Large Computing Clusters

A Performance Monitoring System for Large Computing Clusters A Performance Monitoring System for Large Computing Clusters Moreno Marzolla marzolla@dsi.unive.it http://www.dsi.unive.it/~marzolla Dip. Informatica, Università Ca' Foscari di Venezia and Istituto Nazionale

More information

The Spanning Tree Protocol

The Spanning Tree Protocol Università Ca Foscari di Venezia Dipartimento di Informatica Corso di Sistemi Distribuiti 2009 Presentation outline Introduction 1 Introduction Local internetworking Motivations 2 High level description

More information

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today Network Science: Peer-to-Peer Systems Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Introduction Peer-to-peer (PP) systems have become

More information

Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems

Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Quantitative Dependability Analysis with Stochastic Activity Networks: the Möbius Tool April 2016 Andrea Domenici

More information

Dependable and Secure Systems Dependability

Dependable and Secure Systems Dependability Dependable and Secure Systems Dependability Master of Science in Embedded Computing Systems Quantitative Dependability Analysis with Stochastic Activity Networks: the Möbius Tool Andrea Domenici DII, Università

More information

Fault Tolerance in Distributed Systems: An Introduction

Fault Tolerance in Distributed Systems: An Introduction Fault Tolerance in Distributed Systems: An Introduction Distributed Systems Sistemi Distribuiti Andrea Omicini andrea.omicini@unibo.it Dipartimento di Informatica Scienza e Ingegneria (DISI) Alma Mater

More information

Resource Discovery in a Dynamic Grid Environment

Resource Discovery in a Dynamic Grid Environment Resource Discovery in a Dynamic Grid Environment Moreno Marzolla 1 Matteo Mordacchini 1,2 Salvatore Orlando 1,3 1 Dip. di Informatica, Università Ca Foscari di Venezia, via Torino 155, 30172 Mestre, Italy

More information

Performance Evaluation of UML Software Architectures with Multiclass Queueing Network Models

Performance Evaluation of UML Software Architectures with Multiclass Queueing Network Models Performance Evaluation of UML Software Architectures with Multiclass Queueing Network Models Simonetta Balsamo Moreno Marzolla Dipartimento di Informatica, Università Ca Foscari di Venezia via Torino 155

More information

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) FLAT DATACENTER STORAGE CHANDNI MODI (FN8692) OUTLINE Flat datacenter storage Deterministic data placement in fds Metadata properties of fds Per-blob metadata in fds Dynamic Work Allocation in fds Replication

More information

Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM

Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM Evaluating the Fault Tolerance Capabilities of Embedded Systems via BDM M. Rebaudengo, M. Sonza Reorda Politecnico di Torino Dipartimento di Automatica e Informatica Torino, Italy Fault tolerant system

More information

Parallelizing Loops. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna.

Parallelizing Loops. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna. Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ Copyright 2017, 2018 Moreno Marzolla, Università di Bologna, Italy (http://www.moreno.marzolla.name/teaching/hpc/)

More information

Resiliency at Scale in the Distributed Storage Cloud

Resiliency at Scale in the Distributed Storage Cloud Resiliency at Scale in the Distributed Storage Cloud Alma Riska Advanced Storage Division EMC Corporation In collaboration with many at Cloud Infrastructure Group Outline Wi topic but this talk will focus

More information

Talk Outline. Moreno Marzolla. Motivations. How can performances be evaluated?

Talk Outline. Moreno Marzolla. Motivations. How can performances be evaluated? Talk Outline Moreno Marzolla Motivations and General Principles Contribution Introduction to The The Conclusions Dipartimento di Informatica Università Ca' Foscari di Venezia marzolla@dsi.unive.it M. Marzolla

More information

Chapter 17: Distributed Systems (DS)

Chapter 17: Distributed Systems (DS) Chapter 17: Distributed Systems (DS) Silberschatz, Galvin and Gagne 2013 Chapter 17: Distributed Systems Advantages of Distributed Systems Types of Network-Based Operating Systems Network Structure Communication

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Simulation Modeling of UML Software Architectures

Simulation Modeling of UML Software Architectures Simulation ing of UML Software Architectures Moreno Marzolla Dipartimento di Informatica Università Ca' Foscari di Venezia marzolla@dsi.unive.it Talk Outline Motivations and General Principles Contribution

More information

Simulating storage system performance: a useful approach for SuperB?

Simulating storage system performance: a useful approach for SuperB? Simulating storage system performance: a useful approach for SuperB? Moreno Marzolla Dipartimento di Scienze dell'informazione Università di Bologna marzolla@cs.unibo.it http://www.moreno.marzolla.name/

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

It also performs many parallelization operations like, data loading and query processing.

It also performs many parallelization operations like, data loading and query processing. Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency

More information

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De

Viewstamped Replication to Practical Byzantine Fault Tolerance. Pradipta De Viewstamped Replication to Practical Byzantine Fault Tolerance Pradipta De pradipta.de@sunykorea.ac.kr ViewStamped Replication: Basics What does VR solve? VR supports replicated service Abstraction is

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Fault tolerance and Reliability

Fault tolerance and Reliability Fault tolerance and Reliability Reliability measures Fault tolerance in a switching system Modeling of fault tolerance and reliability Rka -k2002 Telecommunication Switching Technology 14-1 Summary of

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Introduction to Peer-to-Peer Systems

Introduction to Peer-to-Peer Systems Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura CSE 451: Operating Systems Winter 2013 Redundant Arrays of Inexpensive Disks (RAID) and OS structure Gary Kimura The challenge Disk transfer rates are improving, but much less fast than CPU performance

More information

Fault Tolerance in Distributed Systems: An Introduction

Fault Tolerance in Distributed Systems: An Introduction Fault Tolerance in Distributed Systems: An Introduction Distributed Systems Sistemi Distribuiti Andrea Omicini andrea.omicini@unibo.it Ingegneria Due Alma Mater Studiorum Università di Bologna a Cesena

More information

Distributed Systems 24. Fault Tolerance

Distributed Systems 24. Fault Tolerance Distributed Systems 24. Fault Tolerance Paul Krzyzanowski pxk@cs.rutgers.edu 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware failure Software bugs Operator errors Network

More information

Distributed Simulation of Large Computer Systems

Distributed Simulation of Large Computer Systems Distributed Simulation of Large Computer Systems Moreno Marzolla Univ. di Venezia Ca Foscari Dept. of Computer Science and INFN Padova Email: marzolla@dsi.unive.it Web: www.dsi.unive.it/ marzolla Moreno

More information

Mass-Storage Structure

Mass-Storage Structure CS 4410 Operating Systems Mass-Storage Structure Summer 2011 Cornell University 1 Today How is data saved in the hard disk? Magnetic disk Disk speed parameters Disk Scheduling RAID Structure 2 Secondary

More information

Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance. Failure Masking by Redundancy Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing

More information

6.033 Lecture Fault Tolerant Computing 3/31/2014

6.033 Lecture Fault Tolerant Computing 3/31/2014 6.033 Lecture 14 -- Fault Tolerant Computing 3/31/2014 So far what have we seen: Modularity RPC Processes Client / server Networking Implements client/server Seen a few examples of dealing with faults

More information

EEC-484/584 Computer Networks

EEC-484/584 Computer Networks EEC-484/584 Computer Networks Lecture 2 Wenbing Zhao wenbing@ieee.org (Lecture nodes are based on materials supplied by Dr. Louise Moser at UCSB and Prentice-Hall) Misc. Interested in research? Secure

More information

Address Accessible Memories. A.R. Hurson Department of Computer Science Missouri University of Science & Technology

Address Accessible Memories. A.R. Hurson Department of Computer Science Missouri University of Science & Technology Address Accessible Memories A.R. Hurson Department of Computer Science Missouri University of Science & Technology 1 Memory System Memory Requirements for a Computer An internal storage medium to store

More information

Fault Tolerance. Distributed Software Systems. Definitions

Fault Tolerance. Distributed Software Systems. Definitions Fault Tolerance Distributed Software Systems Definitions Availability: probability the system operates correctly at any given moment Reliability: ability to run correctly for a long interval of time Safety:

More information

Performance Testing from UML Models with Resource Descriptions *

Performance Testing from UML Models with Resource Descriptions * Performance Testing from UML Models with Resource Descriptions * Flávio M. de Oliveira 1, Rômulo da S. Menna 1, Hugo V. Vieira 1, Duncan D.A. Ruiz 1 1 Faculdade de Informática Pontifícia Universidade Católica

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

Module 4: Stochastic Activity Networks

Module 4: Stochastic Activity Networks Module 4: Stochastic Activity Networks Module 4, Slide 1 Stochastic Petri nets Session Outline Places, tokens, input / output arcs, transitions Readers / Writers example Stochastic activity networks Input

More information

OSI Transport Layer. Network Fundamentals Chapter 4. Version Cisco Systems, Inc. All rights reserved. Cisco Public 1

OSI Transport Layer. Network Fundamentals Chapter 4. Version Cisco Systems, Inc. All rights reserved. Cisco Public 1 OSI Transport Layer Network Fundamentals Chapter 4 Version 4.0 1 Transport Layer Role and Services Transport layer is responsible for overall end-to-end transfer of application data 2 Transport Layer Role

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Distributed Systems 23. Fault Tolerance

Distributed Systems 23. Fault Tolerance Distributed Systems 23. Fault Tolerance Paul Krzyzanowski pxk@cs.rutgers.edu 4/20/2011 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware failure Software bugs Operator errors

More information

TOWARDS PERFORMANCE EVALUATION OF MOBILE SYSTEMS IN UML

TOWARDS PERFORMANCE EVALUATION OF MOBILE SYSTEMS IN UML TOWARDS PERFORMANCE EVALUATION OF MOBILE SYSTEMS IN UML Simonetta Balsamo Moreno Marzolla Dipartimento di Informatica Università Ca Foscari di Venezia via Torino 155, 30172 Mestre (VE), Italy e-mail: {balsamo

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

System for Large Computing Clusters

System for Large Computing Clusters (Towards) A Scalable Monitoring System for Large Computing Clusters Moreno Marzolla Email marzolla@dsi.unive.it Web: http://www.dsi.unive.it/~marzolla Dip. Informatica, Università di Venezia Talk Outline

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

BUbiNG. Massive Crawling for the Masses. Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna

BUbiNG. Massive Crawling for the Masses. Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna BUbiNG Massive Crawling for the Masses Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna Dipartimento di Informatica Università degli Studi di Milano Italy Once upon a time UbiCrawler UbiCrawler

More information

2017 Paul Krzyzanowski 1

2017 Paul Krzyzanowski 1 Question 1 What problem can arise with a system that exhibits fail-restart behavior? Distributed Systems 06. Exam 1 Review Stale state: the system has an outdated view of the world when it starts up. Not:

More information

Providing Real-Time and Fault Tolerance for CORBA Applications

Providing Real-Time and Fault Tolerance for CORBA Applications Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing

More information

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013 Distributed Systems 19. Fault Tolerance Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware

More information

I/O Hardwares. Some typical device, network, and data base rates

I/O Hardwares. Some typical device, network, and data base rates Input/Output 1 I/O Hardwares Some typical device, network, and data base rates 2 Device Controllers I/O devices have components: mechanical component electronic component The electronic component is the

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

StorageCraft OneXafe and Veeam 9.5

StorageCraft OneXafe and Veeam 9.5 TECHNICAL DEPLOYMENT GUIDE NOV 2018 StorageCraft OneXafe and Veeam 9.5 Expert Deployment Guide Overview StorageCraft, with its scale-out storage solution OneXafe, compliments Veeam to create a differentiated

More information

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic

More information

Robust BFT Protocols

Robust BFT Protocols Robust BFT Protocols Sonia Ben Mokhtar, LIRIS, CNRS, Lyon Joint work with Pierre Louis Aublin, Grenoble university Vivien Quéma, Grenoble INP 18/10/2013 Who am I? CNRS reseacher, LIRIS lab, DRIM research

More information

The term "physical drive" refers to a single hard disk module. Figure 1. Physical Drive

The term physical drive refers to a single hard disk module. Figure 1. Physical Drive HP NetRAID Tutorial RAID Overview HP NetRAID Series adapters let you link multiple hard disk drives together and write data across them as if they were one large drive. With the HP NetRAID Series adapter,

More information

Performance Evaluation of Complex Systems: from Large Software Architectures to the Cell Processor

Performance Evaluation of Complex Systems: from Large Software Architectures to the Cell Processor Performance Evaluation of Complex Systems: from Large Software Architectures to the Cell Processor Moreno Marzolla INFN Sezione di Padova moreno.marzolla@pd.infn.it http://www.dsi.unive.it/~marzolla Università

More information

Bandwidth Allocation for Video Streaming in WiMax Networks

Bandwidth Allocation for Video Streaming in WiMax Networks Bandwidth Allocation for Video Streaming in WiMax Networks Alessandra Scicchitano DEIS, Università della Calabria Andrea Bianco, Carla-Fabiana Chiasserini, Emilio Leonardi Dipartimento di Elettronica,

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems

Outline. Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems Distributed Systems Outline Definition of a Distributed System Goals of a Distributed System Types of Distributed Systems What Is A Distributed System? A collection of independent computers that appears

More information

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie

More information

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기

MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,

More information

Reducing the Costs of Large-Scale BFT Replication

Reducing the Costs of Large-Scale BFT Replication Reducing the Costs of Large-Scale BFT Replication Marco Serafini & Neeraj Suri TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

More information

Next Generation Erasure Coding Techniques Wesley Leggette Cleversafe

Next Generation Erasure Coding Techniques Wesley Leggette Cleversafe Next Generation Erasure Coding Techniques Wesley Leggette Cleversafe Topics r What is Erasure Coded Storage? r The evolution of Erasure Coded storage r From first- to third-generation erasure coding r

More information

Performability Modeling & Analysis in UML

Performability Modeling & Analysis in UML Performability Modeling & Analysis in UML March 2-3, 2010: PaCo second mid-term meeting (L'Aquila, Italy) Luca Berardinelli luca.berardinelli@univaq.it Dipartimento di Informatica Università dell Aquila

More information

CS3600 SYSTEMS AND NETWORKS

CS3600 SYSTEMS AND NETWORKS CS3600 SYSTEMS AND NETWORKS NORTHEASTERN UNIVERSITY Lecture 9: Mass Storage Structure Prof. Alan Mislove (amislove@ccs.neu.edu) Moving-head Disk Mechanism 2 Overview of Mass Storage Structure Magnetic

More information

CS370 Operating Systems

CS370 Operating Systems CS370 Operating Systems Colorado State University Yashwant K Malaiya Spring 2018 Lecture 24 Mass Storage, HDFS/Hadoop Slides based on Text by Silberschatz, Galvin, Gagne Various sources 1 1 FAQ What 2

More information

On Exploiting Transient Contact Patterns for Data Forwarding in Delay Tolerant Networks

On Exploiting Transient Contact Patterns for Data Forwarding in Delay Tolerant Networks On Exploiting Transient Contact Patterns for Data Forwarding in Delay Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania State University Outline Introduction

More information

The Controller Area Network (CAN) Interface

The Controller Area Network (CAN) Interface The Controller Area Network (CAN) Interface ARSLAB - Autonomous and Robotic Systems Laboratory Dipartimento di Matematica e Informatica - Università di Catania, Italy santoro@dmi.unict.it L.S.M. Course

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE RAID SEMINAR REPORT 2004 Submitted on: Submitted by: 24/09/2004 Asha.P.M NO: 612 S7 ECE CONTENTS 1. Introduction 1 2. The array and RAID controller concept 2 2.1. Mirroring 3 2.2. Parity 5 2.3. Error correcting

More information

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Computer Organization and Structure. Bing-Yu Chen National Taiwan University Computer Organization and Structure Bing-Yu Chen National Taiwan University Storage and Other I/O Topics I/O Performance Measures Types and Characteristics of I/O Devices Buses Interfacing I/O Devices

More information

Practical Byzantine Fault

Practical Byzantine Fault Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005 What is a Byzantine fault? Rationale for Byzantine Fault

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections ) Lecture 23: Storage Systems Topics: disk access, bus design, evaluation metrics, RAID (Sections 7.1-7.9) 1 Role of I/O Activities external to the CPU are typically orders of magnitude slower Example: while

More information

Queueing Networks analysis with GNU Octave. Moreno Marzolla Università di Bologna

Queueing Networks analysis with GNU Octave. Moreno Marzolla  Università di Bologna The queueing Package Queueing Networks analysis with GNU Octave Moreno Marzolla marzolla@cs.unibo.it http://www.moreno.marzolla.name/ Università di Bologna december 4, 2012 Moreno Marzolla (Università

More information

CSE 451: Operating Systems Spring Module 18 Redundant Arrays of Inexpensive Disks (RAID)

CSE 451: Operating Systems Spring Module 18 Redundant Arrays of Inexpensive Disks (RAID) CSE 451: Operating Systems Spring 2017 Module 18 Redundant Arrays of Inexpensive Disks (RAID) John Zahorjan 2017 Gribble, Lazowska, Levy, Zahorjan, Zbikowski 1 Disks are cheap Background An individual

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414 The UNIVERSITY of EDINBURGH SCHOOL of INFORMATICS CS4/MSc Distributed Systems Björn Franke bfranke@inf.ed.ac.uk Room 2414 (Lecture 13: Multicast and Group Communication, 16th November 2006) 1 Group Communication

More information

ARTIST-Relevant Research from Linköping

ARTIST-Relevant Research from Linköping ARTIST-Relevant Research from Linköping Department of Computer and Information Science (IDA) Linköping University http://www.ida.liu.se/~eslab/ 1 Outline Communication-Intensive Real-Time Systems Timing

More information

Stochastic Petri nets

Stochastic Petri nets Stochastic Petri nets 1 Stochastic Petri nets Markov Chain grows very fast with the dimension of the system Petri nets: High-level specification formalism Markovian Stochastic Petri nets adding temporal

More information

Secure Mission-Centric Operations in Cloud Computing

Secure Mission-Centric Operations in Cloud Computing Secure Mission-Centric Operations in Cloud Computing Massimiliano Albanese, Sushil Jajodia, Ravi Jhawar, Vincenzo Piuri George Mason University, USA Università degli Studi di Milano, Italy ARO Workshop

More information

RAID. Redundant Array of Inexpensive Disks. Industry tends to use Independent Disks

RAID. Redundant Array of Inexpensive Disks. Industry tends to use Independent Disks RAID Chapter 5 1 RAID Redundant Array of Inexpensive Disks Industry tends to use Independent Disks Idea: Use multiple disks to parallelise Disk I/O for better performance Use multiple redundant disks for

More information

Distributed Systems Conclusions & Exam. Brian Nielsen

Distributed Systems Conclusions & Exam. Brian Nielsen Distributed Systems Conclusions & Exam Brian Nielsen bnielsen@cs.aau.dk Definition A distributed system is the one in which hardware and software components at networked computers communicate and coordinate

More information

Firewall Management With FireWall Synthesizer

Firewall Management With FireWall Synthesizer Firewall Management With FireWall Synthesizer Chiara Bodei 1, Pierpaolo Degano 1, Riccardo Focardi 2, Letterio Galletta 1, Mauro Tempesta 2, and Lorenzo Veronese 2 1 Dipartimento di Informatica, Università

More information

Pattern-Based Analysis of an Embedded Real-Time System Architecture

Pattern-Based Analysis of an Embedded Real-Time System Architecture Pattern-Based Analysis of an Embedded Real-Time System Architecture Peter Feiler Software Engineering Institute phf@sei.cmu.edu 412-268-7790 Outline Introduction to SAE AADL Standard The case study Towards

More information

Replica Placement. Replica Placement

Replica Placement. Replica Placement Replica Placement Model: We consider objects (and don t worry whether they contain just data or code, or both) Distinguish different processes: A process is capable of hosting a replica of an object or

More information

416 Distributed Systems. Networks review; Day 1 of 2 Jan 5 + 8, 2018

416 Distributed Systems. Networks review; Day 1 of 2 Jan 5 + 8, 2018 416 Distributed Systems Networks review; Day 1 of 2 Jan 5 + 8, 2018 1 Distributed Systems vs. Networks Low level (c/go) Run forever Support others Adversarial environment Distributed & concurrent Resources

More information

Implementation Issues. Remote-Write Protocols

Implementation Issues. Remote-Write Protocols Implementation Issues Two techniques to implement consistency models Primary-based protocols Assume a primary replica for each data item Primary responsible for coordinating all writes Replicated write

More information

An Empirical Study of Data Redundancy for High Availability in Large Overlay Networks

An Empirical Study of Data Redundancy for High Availability in Large Overlay Networks An Empirical Study of Data Redundancy for High Availability in Large Overlay Networks Giovanni Chiola Dipartimento di Informatica e Scienze dell Informazione (DISI) Università di Genova, 35 via Dodecaneso,

More information

Last Class: Consistency Models. Today: Implementation Issues

Last Class: Consistency Models. Today: Implementation Issues Last Class: Consistency Models Need for replication Data-centric consistency Strict, linearizable, sequential, causal, FIFO Lecture 15, page 1 Today: Implementation Issues Replica placement Use web caching

More information

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery

More information