Temporal Redundancy. Yashwant K. Malaiya 10/30/2018 FTC YKM

Size: px
Start display at page:

Download "Temporal Redundancy. Yashwant K. Malaiya 10/30/2018 FTC YKM"

Transcription

1 Temporal Redundancy Yashwant K. Malaiya 1

2 Murphy s Law Anything that can go wrong, will. (Actually not by Murphy but by Finagle) To every law there is an exception. CS530 laws: Anything that can go wrong, it eventually will, but It may not go wrong for a while It may not go wrong the next time Only one thing may go wrong at a time October 30,

3 Temporal Redundancy Effective for time limited faults. Detection: Serial transmission: parity/crc Fault tolerance: Bus errors: Instruction retry Data-bases: check-point & roll-back Bad/lost network packets: retransmission Requirement: Save previous state(s) 3

4 Herodotus 4th BCon the Persians cent If an important decision is to be made, they [the Persians] discuss the question when they are drunk, and the following day the master of the house where the discussion was held submits their decision for reconsideration when they are sober. If they still approve it, it is adopted; if not, it is abandoned. Conversely, any decision they make when they are sober, is reconsidered afterwards when they are drunk. 4

5 Temporal Redundancy Effective for time limited faults. Detection: Serial transmission: parity/crc Fault tolerance: Bus errors: Instruction retry Data-bases: check-point & roll-back Bad/lost network packets: retransmission Requirement: Save previous state(s) in stable storage 5

6 Terminology Check-pointing: saving part of the process state Registers affected Context Part of the state (registers, memory) affected by next process segment Entire data base etc. Rollback: reestablishing a state of the process Audit Trail: chronological record of all transactions Retry: reexecution after rollback (inc. audit-trail reprocessing) 6

7 Strategy subtask i Chpt i Subtask i+1 x delay Rollback to Chpt i subtask i+1 retried Chpt i+1 subtask i+2 Failed retry? Temp fault still active when rollback done. Do another rollback. Fault permanent. Reconfigure hardware before rollback. Checkpoint i info bad. Rollback to checkpoint i-1. 7

8 Assumption Additional Fault No Overhead O(T) where V(T) Average V(T) Analysis of Overhead s : arrival inputs/err per T : : Average T T(F k ) 2 where k is utilizatio includes retry time F V(T) F : fixed retry time P{error rate : λ,interchkpt ors during time time retry time : during duration lost due chkpt/roll to save/load T}.avg n factor. time from last back error overhead Note to error and : T chkpt info overhead time chkpt to error to rollback. Justification? Why T/2? 8

9 Analysis of Overhead (2) Hence O(T) F k ρ(t) F T T 2 Minimum occurs at dρ dt T fractional F T opt Note : k 2 2F λ k λ k 2 overhead 0 ρ(t) T transactio n arrival rate transactio n processing rate : FRactional Overhead Fixed Variable (average) Total Intercheckpoint time Ex: =0.01, k=0.3, F=10 yields T opt =81.6 (above) 9

10 Networks: Packet Retransmission Information is divided into packets Each packet contains destination address, packet number, information slice, CRC Packet is routed through routers through the internet At the destination, each packet is checked. Packets are assembled back and delivered to destination. Should packets be small or large? 10

11 Networks: Packet Retransmission Stop and Wait protocol Packet sent from node A to node B Possibilities 1. Packet delivered correctly: B sends acknowledgment 2. Packet delivered corrupted: no ack or ack bad 3. Packet lost: no ack Cases 2,3 A sends the packet again ARQ : automatic repeat request 11

12 Stop &Wait Utilization When a frame is corrupted, lost (or ACK is lost), it is retransmit ted. Error U wher e t U e free 1 2t 1 t p f p : propagatio n time, If p is P{a frame lost}, then of n frames sent, n.p are lost. 1- p 2t 1 t f utilizatio n of a link between two nodes, p t f : frame duration ( stop & wait ) : 12

13 Stop and Wait utilization(2) If p U e c.t then f f p Optimal frame size is then t 2 p 4t p 16t p 8 tfopt c 2 Higher performance (U 1) obtained by sending multiple frames (slidingwindows). A copy of all frames must be 1-c.tf 2t 1 t saved until acknowledged ue( tf) tf 700 Ex: t p =25, c= Gives optimal t f =307 Unit: microsec 13

14 Undo Undo recovers from human mistakes (or bad decisions). Requires recording a trail and saving any deleted/overwritten information. This may take significant memory. Cannot undo some operations. Multilevel undo: last-in/first-out Selectable level Redo. 14

15 References A Survey of Analytic Models of Rollback and Recovery Stratergies, Chandy, K.M.; Computer, Volume 8, Issue 5, May 1975 Page(s):

Fault Tolerant Computing CS 530

Fault Tolerant Computing CS 530 Fault Tolerant Computing CS 530 Lecture Notes 1 Introduction to the class Yashwant K. Malaiya Colorado State University 1 Instructor, TA Instructor: Yashwant K. Malaiya, Professor malaiya @ cs.colostate.edu

More information

Fault Tolerance. The Three universe model

Fault Tolerance. The Three universe model Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

Lecture 26: Data Link Layer

Lecture 26: Data Link Layer Introduction We have seen in previous lectures that the physical layer is responsible for the transmission of row bits (Ones and Zeros) over the channel. It is responsible for issues related to the line

More information

Lecture 22: Fault Tolerance

Lecture 22: Fault Tolerance Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA 03, Wisconsin A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures, HPCA 07, Spain Error

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 17 - Checkpointing II Chapter 6 - Checkpointing Part.17.1 Coordinated Checkpointing Uncoordinated checkpointing may lead

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Layered Network Architecture. CSC358 - Introduction to Computer Networks

Layered Network Architecture. CSC358 - Introduction to Computer Networks Layered Network Architecture Layered Network Architecture Question: How can we provide a reliable service on the top of a unreliable service? ARQ: Automatic Repeat Request Can be used in every layer TCP

More information

OSI Transport Layer. Network Fundamentals Chapter 4. Version Cisco Systems, Inc. All rights reserved. Cisco Public 1

OSI Transport Layer. Network Fundamentals Chapter 4. Version Cisco Systems, Inc. All rights reserved. Cisco Public 1 OSI Transport Layer Network Fundamentals Chapter 4 Version 4.0 1 Transport Layer Role and Services Transport layer is responsible for overall end-to-end transfer of application data 2 Transport Layer Role

More information

Module 15: Network Structures

Module 15: Network Structures Module 15: Network Structures Background Topology Network Types Communication Communication Protocol Robustness Design Strategies 15.1 A Distributed System 15.2 Motivation Resource sharing sharing and

More information

Data Link Control Protocols

Data Link Control Protocols Protocols : Introduction to Data Communications Sirindhorn International Institute of Technology Thammasat University Prepared by Steven Gordon on 23 May 2012 Y12S1L07, Steve/Courses/2012/s1/its323/lectures/datalink.tex,

More information

The flow of data must not be allowed to overwhelm the receiver

The flow of data must not be allowed to overwhelm the receiver Data Link Layer: Flow Control and Error Control Lecture8 Flow Control Flow and Error Control Flow control refers to a set of procedures used to restrict the amount of data that the sender can send before

More information

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS Assist. Prof. Dr. Volkan TUNALI PART 1 2 RECOVERY Topics 3 Introduction Transactions Transaction Log System Recovery Media Recovery Introduction

More information

Data Link Technology. Suguru Yamaguchi Nara Institute of Science and Technology Department of Information Science

Data Link Technology. Suguru Yamaguchi Nara Institute of Science and Technology Department of Information Science Data Link Technology Suguru Yamaguchi Nara Institute of Science and Technology Department of Information Science Agenda Functions of the data link layer Technologies concept and design error control flow

More information

EE382C Lecture 14. Reliability and Error Control 5/17/11. EE 382C - S11 - Lecture 14 1

EE382C Lecture 14. Reliability and Error Control 5/17/11. EE 382C - S11 - Lecture 14 1 EE382C Lecture 14 Reliability and Error Control 5/17/11 EE 382C - S11 - Lecture 14 1 Announcements Don t forget to iterate with us for your checkpoint 1 report Send time slot preferences for checkpoint

More information

Homework #4. Due: December 2, 4PM. CWND (#pkts)

Homework #4. Due: December 2, 4PM. CWND (#pkts) Homework #4 Due: December 2, 2009 @ 4PM EE122: Introduction to Communication Networks (Fall 2009) Department of Electrical Engineering and Computer Sciences College of Engineering University of California,

More information

Module 16: Distributed System Structures. Operating System Concepts 8 th Edition,

Module 16: Distributed System Structures. Operating System Concepts 8 th Edition, Module 16: Distributed System Structures, Silberschatz, Galvin and Gagne 2009 Chapter 16: Distributed System Structures Motivation Types of Network-Based Operating Systems Network Structure Network Topology

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm

Page 1 FAULT TOLERANT SYSTEMS. Coordinated Checkpointing. Time-Based Synchronization. A Coordinated Checkpointing Algorithm FAULT TOLERANT SYSTEMS Coordinated http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Chapter 6 II Uncoordinated checkpointing may lead to domino effect or to livelock Example: l P wants to take a

More information

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit. Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery

More information

ELEN Network Fundamentals Lecture 15

ELEN Network Fundamentals Lecture 15 ELEN 4017 Network Fundamentals Lecture 15 Purpose of lecture Chapter 3: Transport Layer Reliable data transfer Developing a reliable protocol Reliability implies: No data is corrupted (flipped bits) Data

More information

ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS

ERROR RECOVERY IN MULTICOMPUTERS USING GLOBAL CHECKPOINTS Proceedings of the 13th International Conference on Parallel Processing, Bellaire, Michigan, pp. 32-41, August 1984. ERROR RECOVERY I MULTICOMPUTERS USIG GLOBAL CHECKPOITS Yuval Tamir and Carlo H. Séquin

More information

Distributed System Chapter 16 Issues in ch 17, ch 18

Distributed System Chapter 16 Issues in ch 17, ch 18 Distributed System Chapter 16 Issues in ch 17, ch 18 1 Chapter 16: Distributed System Structures! Motivation! Types of Network-Based Operating Systems! Network Structure! Network Topology! Communication

More information

ES623 Networked Embedded Systems

ES623 Networked Embedded Systems ES623 Networked Embedded Systems Introduction to Network models & Data Communication 16 th April 2013 OSI Models An ISO standard that covers all aspects of network communication is the Open Systems Interconnection

More information

CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 4pm

CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 4pm CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 2015 @ 4pm Your Name: Answers SUNet ID: root @stanford.edu Check if you would like exam routed

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Data Link Layer: Overview, operations

Data Link Layer: Overview, operations Data Link Layer: Overview, operations Chapter 3 1 Outlines 1. Data Link Layer Functions. Data Link Services 3. Framing 4. Error Detection/Correction. Flow Control 6. Medium Access 1 1. Data Link Layer

More information

TCP Congestion Control

TCP Congestion Control TCP Congestion Control Lecture material taken from Computer Networks A Systems Approach, Third Ed.,Peterson and Davie, Morgan Kaufmann, 2003. Computer Networks: TCP Congestion Control 1 TCP Congestion

More information

CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 4pm

CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 4pm CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 2015 @ 4pm Your Name: SUNet ID: @stanford.edu Check if you would like exam routed back via SCPD:

More information

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance

More information

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

I. INTRODUCTION. each station (i.e., computer, telephone, etc.) directly connected to all other stations

I. INTRODUCTION. each station (i.e., computer, telephone, etc.) directly connected to all other stations I. INTRODUCTION (a) Network Topologies (i) point-to-point communication each station (i.e., computer, telephone, etc.) directly connected to all other stations (ii) switched networks (1) circuit switched

More information

Chapter 7: Data Link Control. CS420/520 Axel Krings Page 1

Chapter 7: Data Link Control. CS420/520 Axel Krings Page 1 Chapter 7: Data Link Control CS420/520 Axel Krings Page 1 Data Link Control Protocols Need layer of logic above Physical to manage exchange of data over a link frame synchronization flow control error

More information

Chapter 7: Data Link Control. Data Link Control Protocols

Chapter 7: Data Link Control. Data Link Control Protocols Chapter 7: Data Link Control CS420/520 Axel Krings Page 1 Data Link Control Protocols Need layer of logic above Physical to manage exchange of data over a link frame synchronization flow control error

More information

UNIT IV -- TRANSPORT LAYER

UNIT IV -- TRANSPORT LAYER UNIT IV -- TRANSPORT LAYER TABLE OF CONTENTS 4.1. Transport layer. 02 4.2. Reliable delivery service. 03 4.3. Congestion control. 05 4.4. Connection establishment.. 07 4.5. Flow control 09 4.6. Transmission

More information

ECS 152A Computer Networks Instructor: Liu. Name: Student ID #: Final Exam: March 17, 2005

ECS 152A Computer Networks Instructor: Liu. Name: Student ID #: Final Exam: March 17, 2005 ECS 152A Computer Networks Instructor: Liu Name: Student ID #: Final Exam: March 17, 2005 Duration: 120 Minutes 1. The exam is closed book. However, you may refer to one sheet of A4 paper (double sided)

More information

CCNA 1 Chapter 7 v5.0 Exam Answers 2013

CCNA 1 Chapter 7 v5.0 Exam Answers 2013 CCNA 1 Chapter 7 v5.0 Exam Answers 2013 1 A PC is downloading a large file from a server. The TCP window is 1000 bytes. The server is sending the file using 100-byte segments. How many segments will the

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Principles of Reliable Data Transfer

Principles of Reliable Data Transfer Principles of Reliable Data Transfer 1 Reliable Delivery Making sure that the packets sent by the sender are correctly and reliably received by the receiver amid network errors, i.e., corrupted/lost packets

More information

Multiple Access Links and Protocols

Multiple Access Links and Protocols Multiple Access Links and Protocols Two types of links : point-to-point PPP for dial-up access point-to-point link between Ethernet switch and host broadcast (shared wire or medium) old-fashioned Ethernet

More information

EE 122: Error detection and reliable transmission. Ion Stoica September 16, 2002

EE 122: Error detection and reliable transmission. Ion Stoica September 16, 2002 EE 22: Error detection and reliable transmission Ion Stoica September 6, 2002 High Level View Goal: transmit correct information Problem: bits can get corrupted - Electrical interference, thermal noise

More information

FINAL May 21, minutes

FINAL May 21, minutes CS 421: COMPUTER NETWORKS SPRING 2004 FINAL May 21, 2004 160 minutes Name: Student No: 1) a) Consider a 1 Mbits/sec channel with a 20 msec one-way propagation delay, i.e., 40 msec roundtrip delay. We want

More information

The Transport Layer Reliability

The Transport Layer Reliability The Transport Layer Reliability CS 3, Lecture 7 http://www.cs.rutgers.edu/~sn4/3-s9 Srinivas Narayana (slides heavily adapted from text authors material) Quick recap: Transport Provide logical communication

More information

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!! Garcia UCB!

Internet II. CS10 : Beauty and Joy of Computing. cs10.berkeley.edu. !!Senior Lecturer SOE Dan Garcia!!!  Garcia UCB! cs10.berkeley.edu CS10 : Beauty and Joy of Computing Internet II!!Senior Lecturer SOE Dan Garcia!!!www.cs.berkeley.edu/~ddgarcia CS10 L17 Internet II (1)! Why Networks?! Originally sharing I/O devices

More information

Transaction Management & Concurrency Control. CS 377: Database Systems

Transaction Management & Concurrency Control. CS 377: Database Systems Transaction Management & Concurrency Control CS 377: Database Systems Review: Database Properties Scalability Concurrency Data storage, indexing & query optimization Today & next class Persistency Security

More information

CSE123A discussion session

CSE123A discussion session CSE123A discussion session 2007/02/02 Ryo Sugihara Review Data Link layer (1): Overview Sublayers End-to-end argument Framing sublayer How to delimit frame» Flags and bit stuffing Topics Data Link Layer

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

16.682: Communication Systems Engineering. Lecture 17. ARQ Protocols

16.682: Communication Systems Engineering. Lecture 17. ARQ Protocols 16.682: Communication Systems Engineering Lecture 17 ARQ Protocols Eytan Modiano Automatic repeat request (ARQ) Break large files into packets FILE PKT H PKT H PKT H Check received packets for errors Use

More information

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013 Distributed Systems 19. Fault Tolerance Paul Krzyzanowski Rutgers University Fall 2013 November 27, 2013 2013 Paul Krzyzanowski 1 Faults Deviation from expected behavior Due to a variety of factors: Hardware

More information

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks

Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Toward a Reliable Data Transport Architecture for Optical Burst-Switched Networks Dr. Vinod Vokkarane Assistant Professor, Computer and Information Science Co-Director, Advanced Computer Networks Lab University

More information

1999, Scott F. Midkiff

1999, Scott F. Midkiff Lecture Topics Direct Link Networks: Multiaccess Protocols (.7) Multiaccess control IEEE 80.5 Token Ring and FDDI CS/ECpE 556: Computer Networks Originally by Scott F. Midkiff (ECpE) Modified by Marc Abrams

More information

file:///c:/users/hpguo/dropbox/website/teaching/fall 2017/CS4470/H...

file:///c:/users/hpguo/dropbox/website/teaching/fall 2017/CS4470/H... 1 of 9 11/26/2017, 11:28 AM Homework 3 solutions 1. A window holds bytes 2001 to 5000. The next byte to be sent is 3001. Draw a figure to show the situation of the window after the following two events:

More information

COMPUTER NETWORK PERFORMANCE. Gaia Maselli Room: 319

COMPUTER NETWORK PERFORMANCE. Gaia Maselli Room: 319 COMPUTER NETWORK PERFORMANCE Gaia Maselli maselli@di.uniroma1.it Room: 319 Computer Networks Performance 2 Overview of first class Practical Info (schedule, exam, readings) Goal of this course Contents

More information

CS 421: COMPUTER NETWORKS SPRING FINAL May 24, minutes. Name: Student No: TOT

CS 421: COMPUTER NETWORKS SPRING FINAL May 24, minutes. Name: Student No: TOT CS 421: COMPUTER NETWORKS SPRING 2012 FINAL May 24, 2012 150 minutes Name: Student No: Show all your work very clearly. Partial credits will only be given if you carefully state your answer with a reasonable

More information

Error Detection Codes. Error Detection. Two Dimensional Parity. Internet Checksum Algorithm. Cyclic Redundancy Check.

Error Detection Codes. Error Detection. Two Dimensional Parity. Internet Checksum Algorithm. Cyclic Redundancy Check. Error Detection Two types Error Detection Codes (e.g. CRC, Parity, Checksums) Error Correction Codes (e.g. Hamming, Reed Solomon) Basic Idea Add redundant information to determine if errors have been introduced

More information

Fault-Tolerant Computer Systems ECE 60872/CS Recovery

Fault-Tolerant Computer Systems ECE 60872/CS Recovery Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.

More information

A Two-level Threshold Recovery Mechanism for SCTP

A Two-level Threshold Recovery Mechanism for SCTP A Two-level Threshold Recovery Mechanism for SCTP Armando L. Caro Jr., Janardhan R. Iyengar, Paul D. Amer, Gerard J. Heinz Computer and Information Sciences University of Delaware acaro, iyengar, amer,

More information

Internet Architecture and Protocol

Internet Architecture and Protocol Internet Architecture and Protocol Set# 04 Wide Area Networks Delivered By: Engr Tahir Niazi Wide Area Network Basics Cover large geographical area Network of Networks WANs used to be characterized with

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

Chapter 17: Recovery System

Chapter 17: Recovery System Chapter 17: Recovery System Database System Concepts See www.db-book.com for conditions on re-use Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery

More information

Inst: Chris Davison

Inst: Chris Davison ICS 153 Introduction to Computer Networks Inst: Chris Davison cbdaviso@uci.edu ICS 153 Data Link Layer Contents Simplex and Duplex Communication Frame Creation Flow Control Error Control Performance of

More information

Databases: transaction processing

Databases: transaction processing Databases: transaction processing P.A.Rounce Room 6.18 p.rounce@cs.ucl.ac.uk 1 ACID Database operation is processing of a set of transactions Required features of a database transaction should be Atomicity

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

CSE 1 23: Computer Networks

CSE 1 23: Computer Networks CSE 1 23: Computer Networks Total Points: 47.5 Homework 2 Out: 10/18, Due: 10/25 1. The Sliding Window Protocol Assume that the sender s window size is 3. If we have to send 10 frames in total, and the

More information

NOTES W2006 CPS610 DBMS II. Prof. Anastase Mastoras. Ryerson University

NOTES W2006 CPS610 DBMS II. Prof. Anastase Mastoras. Ryerson University NOTES W2006 CPS610 DBMS II Prof. Anastase Mastoras Ryerson University Recovery Transaction: - a logical unit of work. (text). It is a collection of operations that performs a single logical function in

More information

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax:

Consistent Logical Checkpointing. Nitin H. Vaidya. Texas A&M University. Phone: Fax: Consistent Logical Checkpointing Nitin H. Vaidya Department of Computer Science Texas A&M University College Station, TX 77843-3112 hone: 409-845-0512 Fax: 409-847-8578 E-mail: vaidya@cs.tamu.edu Technical

More information

Chapter 14: Recovery System

Chapter 14: Recovery System Chapter 14: Recovery System Chapter 14: Recovery System Failure Classification Storage Structure Recovery and Atomicity Log-Based Recovery Remote Backup Systems Failure Classification Transaction failure

More information

Problem 7. Problem 8. Problem 9

Problem 7. Problem 8. Problem 9 Problem 7 To best answer this question, consider why we needed sequence numbers in the first place. We saw that the sender needs sequence numbers so that the receiver can tell if a data packet is a duplicate

More information

Lecture / The Data Link Layer: Framing and Error Detection

Lecture / The Data Link Layer: Framing and Error Detection Lecture 2 6.263/16.37 The Data Link Layer: Framing and Error Detection MIT, LIDS Slide 1 Data Link Layer (DLC) Responsible for reliable transmission of packets over a link Framing: Determine the start

More information

SELECTION OF METRICS (CONT) Gaia Maselli

SELECTION OF METRICS (CONT) Gaia Maselli SELECTION OF METRICS (CONT) Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Selecting performance metrics Computer Network Performance 3 Selecting performance metrics speed Individual

More information

Database Management Systems Reliability Management

Database Management Systems Reliability Management Database Management Systems Reliability Management D B M G 1 DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files

More information

Wireless Medium Access Control Protocols

Wireless Medium Access Control Protocols Wireless Medium Access Control Protocols Telecomunicazioni Undergraduate course in Electrical Engineering University of Rome La Sapienza Rome, Italy 2007-2008 Classification of wireless MAC protocols Wireless

More information

STEVEN R. BAGLEY PACKETS

STEVEN R. BAGLEY PACKETS STEVEN R. BAGLEY PACKETS INTRODUCTION Talked about how data is split into packets Allows it to be multiplexed onto the network with data from other machines But exactly how is it split into packets and

More information

Concepts for Robust NoC Communication

Concepts for Robust NoC Communication oncepts for Robust o ommunication Martin Radetzki Department of mbedded ystems ngineering Institute of omputer Architecture and omputer ngineering Universität tuttgart www.iti.uni-stuttgart.de/ese.phtml

More information

T ransaction Management 4/23/2018 1

T ransaction Management 4/23/2018 1 T ransaction Management 4/23/2018 1 Air-line Reservation 10 available seats vs 15 travel agents. How do you design a robust and fair reservation system? Do not enough resources Fair policy to every body

More information

Module 16: Distributed System Structures

Module 16: Distributed System Structures Chapter 16: Distributed System Structures Module 16: Distributed System Structures Motivation Types of Network-Based Operating Systems Network Structure Network Topology Communication Structure Communication

More information

Midterm #2 Exam Solutions April 26, 2006 CS162 Operating Systems

Midterm #2 Exam Solutions April 26, 2006 CS162 Operating Systems University of California, Berkeley College of Engineering Computer Science Division EECS Spring 2006 Anthony D. Joseph Midterm #2 Exam April 26, 2006 CS162 Operating Systems Your Name: SID AND 162 Login:

More information

A SKY Computers White Paper

A SKY Computers White Paper A SKY Computers White Paper High Application Availability By: Steve Paavola, SKY Computers, Inc. 100000.000 10000.000 1000.000 100.000 10.000 1.000 99.0000% 99.9000% 99.9900% 99.9990% 99.9999% 0.100 0.010

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Episode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto

Episode 4. Flow and Congestion Control. Baochun Li Department of Electrical and Computer Engineering University of Toronto Episode 4. Flow and Congestion Control Baochun Li Department of Electrical and Computer Engineering University of Toronto Recall the previous episode Detailed design principles in: The link layer The network

More information

Communication and Networks. Problems

Communication and Networks. Problems Electrical and Information Technology Communication and Networks Problems Link Layer 2016 Problems 1. Consider a network applying a slotted Aloha access system. The assumption for this is that all nodes

More information

William Stallings Data and Computer Communications. Chapter 10 Packet Switching

William Stallings Data and Computer Communications. Chapter 10 Packet Switching William Stallings Data and Computer Communications Chapter 10 Packet Switching Principles Circuit switching designed for voice Resources dedicated to a particular call Much of the time a data connection

More information

CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments

CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments CC-SCTP: Chunk Checksum of SCTP for Enhancement of Throughput in Wireless Network Environments Stream Control Transmission Protocol (SCTP) uses the 32-bit checksum in the common header, by which a corrupted

More information

Outline. CS5984 Mobile Computing

Outline. CS5984 Mobile Computing CS5984 Mobile Computing Dr. Ayman Abdel-Hamid Computer Science Department Virginia Tech Outline Review Transmission Control Protocol (TCP) Based on Behrouz Forouzan, Data Communications and Networking,

More information

Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria:

Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria: Question 1 (6 points) Compare circuit-switching and packet-switching networks based on the following criteria: (a) Reserving network resources ahead of data being sent: (2pts) In circuit-switching networks,

More information

Fault Tolerance. Chapter 7

Fault Tolerance. Chapter 7 Fault Tolerance Chapter 7 Basic Concepts Dependability Includes Availability Reliability Safety Maintainability Failure Models Type of failure Crash failure Omission failure Receive omission Send omission

More information

David B. Johnson. Willy Zwaenepoel. Rice University. Houston, Texas. or the constraints of real-time applications [6, 7].

David B. Johnson. Willy Zwaenepoel. Rice University. Houston, Texas. or the constraints of real-time applications [6, 7]. Sender-Based Message Logging David B. Johnson Willy Zwaenepoel Department of Computer Science Rice University Houston, Texas Abstract Sender-based message logging isanewlow-overhead mechanism for providing

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

Dependability tree 1

Dependability tree 1 Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques

More information

Connectionless and Connection-Oriented Protocols OSI Layer 4 Common feature: Multiplexing Using. The Transmission Control Protocol (TCP)

Connectionless and Connection-Oriented Protocols OSI Layer 4 Common feature: Multiplexing Using. The Transmission Control Protocol (TCP) Lecture (07) OSI layer 4 protocols TCP/UDP protocols By: Dr. Ahmed ElShafee ١ Dr. Ahmed ElShafee, ACU Fall2014, Computer Networks II Introduction Most data-link protocols notice errors then discard frames

More information

The MAC Address Format

The MAC Address Format Directing data is what addressing is all about. At the Data Link layer, this is done by pointing PDUs to the destination MAC address for delivery of a frame within a LAN. The MAC address is the number

More information

SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY COMPUTER NETWORKS UNIT - II DATA LINK LAYER

SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY COMPUTER NETWORKS UNIT - II DATA LINK LAYER SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF INFORMATION TECHNOLOGY COMPUTER NETWORKS UNIT - II DATA LINK LAYER 1. What are the responsibilities of data link layer? Specific responsibilities of

More information

Direct Link Networks: Building Blocks (2.1), Encoding (2.2), Framing (2.3)

Direct Link Networks: Building Blocks (2.1), Encoding (2.2), Framing (2.3) Direct Link Networks: Building Blocks (2.1), Encoding (2.2), Framing (2.3) ECPE/CS 5516: Computer Networks Originally by Scott F. Midkiff (ECpE) Modified by Marc Abrams (CS) Virginia Tech courses.cs.vt.edu/~cs5516

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter

More information

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network

This Lecture. BUS Computer Facilities Network Management. Switching Network. Simple Switching Network This Lecture BUS0 - Computer Facilities Network Management Switching networks Circuit switching Packet switching gram approach Virtual circuit approach Routing in switching networks Faculty of Information

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #19: Logging and Recovery 1

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #19: Logging and Recovery 1 CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #19: Logging and Recovery 1 General Overview Preliminaries Write-Ahead Log - main ideas (Shadow paging) Write-Ahead Log: ARIES

More information

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments 1 A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments E. M. Karanikolaou and M. P. Bekakos Laboratory of Digital Systems, Department of Electrical and Computer Engineering,

More information