EE382C Lecture 14. Reliability and Error Control 5/17/11. EE 382C - S11 - Lecture 14 1
|
|
- Cecilia Flynn
- 5 years ago
- Views:
Transcription
1 EE382C Lecture 14 Reliability and Error Control 5/17/11 EE 382C - S11 - Lecture 14 1
2 Announcements Don t forget to iterate with us for your checkpoint 1 report Send time slot preferences for checkpoint 2 Project presentations next week Let us know if you are OK with presenting on Tuesday May 24th EE 382C - S11 - Lecture 14 2
3 Question of the day Consider a symmetric multiprocessing (SMP) network that does not allow packet loss and needs an availability of Link BER is Router components have failure rate of 1000 FITS How best can you achieve this reliability requirement EE 382C - S11 - Lecture 14 3
4 Reliability: R(t) Reliability and Availability Probability that system is working at time t given that it was working at time t=0, and has had no failures in between Availability: A(t) Probability that the system is working when needed, at a given point in time t Often affected by repair process A ~ (MTBF/(MTBF+MTTR)) MTBF: mean time between failures FIT: failures in time. Inverse of MTBF with zero repair time MTTR: mean time to recovery RAS requirements: Reliability, availability and serviceability EE 382C - S11 - Lecture 14 4
5 Examples of RAS Requirements Enterprise Server A = System level requirement Can reflect to a network-level requirement or detect and recover from network failures In general every packet must be correctly received or system will fail Internet Router A = But OK to drop packets (at rate of ) Turn failures into packet drops EE 382C - S11 - Lecture 14 5
6 RAS Requirements in Those Systems Dropping (reliability) Allowed or not Rate allowed (e.g., ) Availability (A) to Serviceability (MTTR) EE 382C - S11 - Lecture 14 6
7 MTTF and MTTR A MTTF MTTF MTTR EE 382C - S11 - Lecture 14 7
8 Failure Modes and Fault Models Failure Mode Model Rate Units Gaussian Noise on Channel Transient BER Alpha Particle Strike on Memory Soft 10-9 SER Alpha Particle Strike on Logic Transient BER Electromigration Stuck-at 1 MTBF Connector corrosion Stuck-at 10 MTBF Operator Removes Module Fail-Stop 10 5 MTBF Software Failure Fail-Stop 10 4 MTBF EE 382C - S11 - Lecture 14 8
9 An Analogy EE 382C - S11 - Lecture 14 9
10 The Bathtub Curve Failure Rate (FITS) Infant Mortality 10 Wearout Time (hours) EE 382C - S11 - Lecture 14 10
11 Detection, Containment, and Recovery Three-step program to dealing with errors 1. Detection discover the error CRC codes on channels Parity or ECC codes on memories Self-checking logic 2. Contain prevent the error from propagating further Mask it Drop the packet (and retry) Fail stop 3. Recover resume normal service Return to a known state Resume sending traffic Possibly resend faulted packet EE 382C - S11 - Lecture 14 11
12 Example Link Level Error Control Sending Router Receiving Router Retransmit Control Error Check Tx Flit Buffer Channel Input Unit Detection CRC on channel Containment Drop packet with error Recovery Request retransmission and resume normal sequence How can this fail? How to fix it? EE 382C - S11 - Lecture 14 12
13 Link-Level Error Control (2) Tx Channel Flit 1 Flit 2 Flit 3 Flit 4 Flit 5 Flit 6 Flit 2 Flit 3 Flit 4 Flit 5 Flit 6 Rx Channel Flit 1 Error Flit 3 Flit 4 Flit 5 Flit 6 Flit 2 Flit 3 Flit 4 Flit 5 Rx Ack Ack 1 Error 2 Ignore Ignore Ignore Ignore Ack 2 Ack 3 Ack 4 Tx Ack Ack 1 Error 2 Ignore Ignore Ignore Ignore Ack 2 Ack 3 Flit 2 was in error. Flits 2-6 are retransmitted Why would you want to retransmit flits 3-6? Pointers: Ack: next flit to be ACKed Tx: next flit to be transmitted Tail: next free slot Ack Pointer Tx Pointer Tail Pointer Flit 1 Flit 2 Flit 3 Flit 4 Flit 5 Flit 6 EE 382C - S11 - Lecture 14 13
14 Channel Configuration Reconfigure channels with frequent errors Swap in spare bits Reduce width of channel Reduce bit rate If malfunctions continue, decommission channel Assumes routing algorithm will adapt EE 382C - S11 - Lecture 14 14
15 Cray BlackWidow Example Each channel is 3-bits wide at 6.25Gb/s per bit (b = 18.75Gb/s) 3-bits serialized from 24-bit flit Link-level retry rates monitored Each retry attributed to one bit of the channel If retry rate exceeds a threshold bad bit is switched off Channel degrades to two-bits, then one-bit, then is switched off EE 382C - S11 - Lecture 14 15
16 What would happen if: Router Error Control Header bit in input buffer flips Credit count is corrupted Router picks wrong output Selected output flips mid packet Numerous failure modes inside the router Many lead to catastrophic failure Perhaps after hundreds of cycles after the error occurred Many others lead to insidious performance problems E.g., loosing credits EE 382C - S11 - Lecture 14 16
17 Router Error Control (2) Same steps of Detect, Confine, Recover apply Detect Parity or CRC on all storage and communication Quick consistency checks (e.g., on allocators and credits) Two copies of all other logic (in space or time) Confine Stop propagating faulty packets Operate via confinement regions (e.g., channel) Recover Reset to known good state (sometimes via reset) Resend faulted packets (if available) Disable part of the router (fault-containment regions) Replace part of the router (how swapping) EE 382C - S11 - Lecture 14 17
18 Network-Level Error Control Model faulty routers and links as fail-stop components Use adaptive routing to avoid them Table based recompute tables periodically Local adaptive pick another minimal link (or non-minimal) Need to avoid dead ends and deadlocks EE 382C - S11 - Lecture 14 18
19 End-To-End Error Control Keep a copy of each packet at source until acknowledged or timeout (This buffer can get large) If error detected Drop packet (Optionally) send a negative acknowledgement When packet correctly received Send positive acknowledgement When acknowledgement received Discard packet When negative acknowledgement received (or timeout) Resend packet May transmit the same packet multiple times EE 382C - S11 - Lecture 14 19
20 Question of the day Consider a symmetric multiprocessing (SMP) network that does not allow packet loss and needs an availability of Link BER is Router components have failure rate of 1000 FITS How best can you achieve this reliability requirement EE 382C - S11 - Lecture 14 20
21 Summary Specification sets reliability requirements Drop rate Availability Failures are abstracted with fault models Bit errors, soft errors, stuck-at, fail stop Detection, Containment, and Recovery Link-level Ack and retransmit Reconfigure Router level Detect all failures Mask, retry, or reset Network level Route around faulty components End-to-End Retransmit on nack or timeout EE 382C - S11 - Lecture 14 21
416 Distributed Systems. Errors and Failures Feb 1, 2016
416 Distributed Systems Errors and Failures Feb 1, 2016 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More information416 Distributed Systems. Errors and Failures Oct 16, 2018
416 Distributed Systems Errors and Failures Oct 16, 2018 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationLecture 15: PCM, Networks. Today: PCM wrap-up, projects discussion, on-chip networks background
Lecture 15: PCM, Networks Today: PCM wrap-up, projects discussion, on-chip networks background 1 Hard Error Tolerance in PCM PCM cells will eventually fail; important to cause gradual capacity degradation
More informationPOWER4 Systems: Design for Reliability. Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX
Systems: Design for Reliability Douglas Bossen, Joel Tendler, Kevin Reick IBM Server Group, Austin, TX Microprocessor 2-way SMP system on a chip > 1 GHz processor frequency >1GHz Core Shared L2 >1GHz Core
More informationA SKY Computers White Paper
A SKY Computers White Paper High Application Availability By: Steve Paavola, SKY Computers, Inc. 100000.000 10000.000 1000.000 100.000 10.000 1.000 99.0000% 99.9000% 99.9900% 99.9990% 99.9999% 0.100 0.010
More informationECE 574 Cluster Computing Lecture 19
ECE 574 Cluster Computing Lecture 19 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 10 November 2015 Announcements Projects HW extended 1 MPI Review MPI is *not* shared memory
More informationNo book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6
Announcements No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6 Copyright c 2002 2017 UMaine School of Computing and Information S 1 / 33 COS 140:
More informationAnnouncements. No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6
Announcements No book chapter for this topic! Slides are posted online as usual Homework: Will be posted online Due 12/6 Copyright c 2002 2017 UMaine Computer Science Department 1 / 33 1 COS 140: Foundations
More informationUNIT IV -- TRANSPORT LAYER
UNIT IV -- TRANSPORT LAYER TABLE OF CONTENTS 4.1. Transport layer. 02 4.2. Reliable delivery service. 03 4.3. Congestion control. 05 4.4. Connection establishment.. 07 4.5. Flow control 09 4.6. Transmission
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Availability. Copyright 2010 Daniel J. Sorin Duke University
Advanced Computer Architecture II (Parallel Computer Architecture) Availability Copyright 2010 Daniel J. Sorin Duke University Definition and Motivation Outline General Principles of Available System Design
More informationLecture 22: Fault Tolerance
Lecture 22: Fault Tolerance Papers: Token Coherence: Decoupling Performance and Correctness, ISCA 03, Wisconsin A Low Overhead Fault Tolerant Coherence Protocol for CMP Architectures, HPCA 07, Spain Error
More informationPage 1. Review: Internet Protocol Stack. Transport Layer Services. Design Issue EEC173B/ECS152C. Review: TCP
EEC7B/ECS5C Review: Internet Protocol Stack Review: TCP Application Telnet FTP HTTP Transport Network Link Physical bits on wire TCP LAN IP UDP Packet radio Transport Layer Services Design Issue Underlying
More informationIntelligent Drive Recovery (IDR): helping prevent media errors and disk failures with smart media scan
Intelligent Drive Recovery (IDR): helping prevent media errors and disk failures with smart media scan White paper Version: 1.1 Updated: Sep., 2017 Abstract: This white paper introduces Infortrend Intelligent
More informationELEN Network Fundamentals Lecture 15
ELEN 4017 Network Fundamentals Lecture 15 Purpose of lecture Chapter 3: Transport Layer Reliable data transfer Developing a reliable protocol Reliability implies: No data is corrupted (flipped bits) Data
More informationOutline. Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication. Outline
Parity-based ECC and Mechanism for Detecting and Correcting Soft Errors in On-Chip Communication Khanh N. Dang and Xuan-Tu Tran Email: khanh.n.dang@vnu.edu.vn VNU Key Laboratory for Smart Integrated Systems
More informationPage 1. Review: Internet Protocol Stack. Transport Layer Services EEC173B/ECS152C. Review: TCP. Transport Layer: Connectionless Service
EEC7B/ECS5C Review: Internet Protocol Stack Review: TCP Application Telnet FTP HTTP Transport Network Link Physical bits on wire TCP LAN IP UDP Packet radio Do you remember the various mechanisms we have
More informationIntelligent Drive Recovery (IDR): helping prevent media errors and disk failures with smart media scan
Intelligent Drive Recovery (IDR): helping prevent media errors and disk failures with smart media scan White paper Version: 1.1 Updated: Oct., 2017 Abstract: This white paper introduces Infortrend Intelligent
More informationI. INTRODUCTION. each station (i.e., computer, telephone, etc.) directly connected to all other stations
I. INTRODUCTION (a) Network Topologies (i) point-to-point communication each station (i.e., computer, telephone, etc.) directly connected to all other stations (ii) switched networks (1) circuit switched
More informationData Link Technology. Suguru Yamaguchi Nara Institute of Science and Technology Department of Information Science
Data Link Technology Suguru Yamaguchi Nara Institute of Science and Technology Department of Information Science Agenda Functions of the data link layer Technologies concept and design error control flow
More informationCommunication Networks
Communication Networks Prof. Laurent Vanbever Exercises week 4 Reliable Transport Reliable versus Unreliable Transport In the lecture, you have learned how a reliable transport protocol can be built on
More informationFault Tolerant Computing CS 530
Fault Tolerant Computing CS 530 Lecture Notes 1 Introduction to the class Yashwant K. Malaiya Colorado State University 1 Instructor, TA Instructor: Yashwant K. Malaiya, Professor malaiya @ cs.colostate.edu
More informationDeadlock and Router Micro-Architecture
1 EE482: Advanced Computer Organization Lecture #8 Interconnection Network Architecture and Design Stanford University 22 April 1999 Deadlock and Router Micro-Architecture Lecture #8: 22 April 1999 Lecturer:
More informationData Link Control. Surasak Sanguanpong Last updated: 11 July 2000
1/14 Data Link Control Surasak Sanguanpong nguan@ku.ac.th http://www.cpe.ku.ac.th/~nguan Last updated: 11 July 2000 Flow Control 2/14 technique for controlling the data transmission so that s have sufficient
More informationLecture 4: CRC & Reliable Transmission. Lecture 4 Overview. Checksum review. CRC toward a better EDC. Reliable Transmission
1 Lecture 4: CRC & Reliable Transmission CSE 123: Computer Networks Chris Kanich Quiz 1: Tuesday July 5th Lecture 4: CRC & Reliable Transmission Lecture 4 Overview CRC toward a better EDC Reliable Transmission
More informationWireless TCP. TCP mechanism. Wireless Internet: TCP in Wireless. Wireless TCP: transport layer
Wireless TCP W.int.2-2 Wireless Internet: TCP in Wireless Module W.int.2 Mobile IP: layer, module W.int.1 Wireless TCP: layer Dr.M.Y.Wu@CSE Shanghai Jiaotong University Shanghai, China Dr.W.Shu@ECE University
More informationLecture 16: On-Chip Networks. Topics: Cache networks, NoC basics
Lecture 16: On-Chip Networks Topics: Cache networks, NoC basics 1 Traditional Networks Huh et al. ICS 05, Beckmann MICRO 04 Example designs for contiguous L2 cache regions 2 Explorations for Optimality
More informationRELIABILITY and RELIABLE DESIGN. Giovanni De Micheli Centre Systèmes Intégrés
RELIABILITY and RELIABLE DESIGN Giovanni Centre Systèmes Intégrés Outline Introduction to reliable design Design for reliability Component redundancy Communication redundancy Data encoding and error correction
More informationCS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:
CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online
More information2. Software Generation of Advanced Error Reporting Messages
1. Introduction The PEX 8612 provides two mechanisms for error injection: Carter Buck, Sr. Applications Engineer, PLX Technology PCI Express Advanced Error Reporting Status register bits (which normally
More information416 Distributed Systems. Errors and Failures, part 2 Feb 3, 2016
416 Distributed Systems Errors and Failures, part 2 Feb 3, 2016 Options in dealing with failure 1. Silently return the wrong answer. 2. Detect failure. 3. Correct / mask the failure 2 Block error detection/correction
More informationUser Datagram Protocol
Topics Transport Layer TCP s three-way handshake TCP s connection termination sequence TCP s TIME_WAIT state TCP and UDP buffering by the socket layer 2 Introduction UDP is a simple, unreliable datagram
More informationLecture 5: Flow Control. CSE 123: Computer Networks Alex C. Snoeren
Lecture 5: Flow Control CSE 123: Computer Networks Alex C. Snoeren Pipelined Transmission Sender Receiver Sender Receiver Ignored! Keep multiple packets in flight Allows sender to make efficient use of
More informationAerospace Software Engineering
16.35 Aerospace Software Engineering Reliability, Availability, and Maintainability Software Fault Tolerance Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Definitions Software reliability The probability
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationFault Tolerance. The Three universe model
Fault Tolerance High performance systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful
More informationEE 122: Error detection and reliable transmission. Ion Stoica September 16, 2002
EE 22: Error detection and reliable transmission Ion Stoica September 6, 2002 High Level View Goal: transmit correct information Problem: bits can get corrupted - Electrical interference, thermal noise
More informationDep. Systems Requirements
Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small
More informationData Link Control Protocols
Protocols : Introduction to Data Communications Sirindhorn International Institute of Technology Thammasat University Prepared by Steven Gordon on 23 May 2012 Y12S1L07, Steve/Courses/2012/s1/its323/lectures/datalink.tex,
More informationDependability tree 1
Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques
More informationHigh Availability and Redundant Operation
This chapter describes the high availability and redundancy features of the Cisco ASR 9000 Series Routers. Features Overview, page 1 High Availability Router Operations, page 1 Power Supply Redundancy,
More informationCPE 448/548 Exam #1 (100 pts) February 14, Name Class: 448
Name Class: 448 1) (14 pts) A message M = 11001 is transmitted from node A to node B using the CRC code. The CRC generator polynomial is G(x) = x 3 + x 2 + 1 ( bit sequence 1101) a) What is the transmitted
More informationLecture 24: Interconnection Networks. Topics: topologies, routing, deadlocks, flow control
Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control 1 Topology Examples Grid Torus Hypercube Criteria Bus Ring 2Dtorus 6-cube Fully connected Performance Bisection
More informationLecture 12: Interconnection Networks. Topics: dimension/arity, routing, deadlock, flow control
Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control 1 Interconnection Networks Recall: fully connected network, arrays/rings, meshes/tori, trees, butterflies,
More informationDependability and ECC
ecture 38 Computer Science 61C Spring 2017 April 24th, 2017 Dependability and ECC 1 Great Idea #6: Dependability via Redundancy Applies to everything from data centers to memory Redundant data centers
More informationCS 43: Computer Networks. 16: Reliable Data Transfer October 8, 2018
CS 43: Computer Networks 16: Reliable Data Transfer October 8, 2018 Reading Quiz Lecture 16 - Slide 2 Last class We are at the transport-layer protocol! provide services to the application layer interact
More informationWhy Things Break -- With Examples From Autonomous Vehicles ,QVWLWXWH IRU &RPSOH[ (QJLQHHUHG 6\VWHPV
Why Things Break -- With Examples From Autonomous Vehicles Phil Koopman Department of Electrical & Computer Engineering & Institute for Complex Engineered Systems (based, in part, on material from Dan
More informationTCP Congestion Control
TCP Congestion Control What is Congestion The number of packets transmitted on the network is greater than the capacity of the network Causes router buffers (finite size) to fill up packets start getting
More informationTCP Congestion Control
What is Congestion TCP Congestion Control The number of packets transmitted on the network is greater than the capacity of the network Causes router buffers (finite size) to fill up packets start getting
More informationLecture 7: Flow Control"
Lecture 7: Flow Control" CSE 123: Computer Networks Alex C. Snoeren No class Monday! Lecture 7 Overview" Flow control Go-back-N Sliding window 2 Stop-and-Wait Performance" Lousy performance if xmit 1 pkt
More informationLecture 10: Link layer multicast. Mythili Vutukuru CS 653 Spring 2014 Feb 6, Thursday
Lecture 10: Link layer multicast Mythili Vutukuru CS 653 Spring 2014 Feb 6, Thursday Unicast and broadcast Usually, link layer is used to send data over a single hop between source and destination. This
More informationBasic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.
Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery
More informationLecture: Interconnection Networks. Topics: TM wrap-up, routing, deadlock, flow control, virtual channels
Lecture: Interconnection Networks Topics: TM wrap-up, routing, deadlock, flow control, virtual channels 1 TM wrap-up Eager versioning: create a log of old values Handling problematic situations with a
More informationTCP: Flow and Error Control
1 TCP: Flow and Error Control Required reading: Kurose 3.5.3, 3.5.4, 3.5.5 CSE 4213, Fall 2006 Instructor: N. Vlajic TCP Stream Delivery 2 TCP Stream Delivery unlike UDP, TCP is a stream-oriented protocol
More informationThe flow of data must not be allowed to overwhelm the receiver
Data Link Layer: Flow Control and Error Control Lecture8 Flow Control Flow and Error Control Flow control refers to a set of procedures used to restrict the amount of data that the sender can send before
More informationCS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 4pm
CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 2015 @ 4pm Your Name: Answers SUNet ID: root @stanford.edu Check if you would like exam routed
More informationPrinciples of Reliable Data Transfer
Principles of Reliable Data Transfer 1 Reliable Delivery Making sure that the packets sent by the sender are correctly and reliably received by the receiver amid network errors, i.e., corrupted/lost packets
More informationEE 6900: FAULT-TOLERANT COMPUTING SYSTEMS
EE 6900: FAULT-TOLERANT COMPUTING SYSTEMS LECTURE 8: HARDWARE FAULT TOLERANCE TECHNIQUES Fall 2014 Avinash Kodi kodi@ohio.edu Acknowledgement: Daniel Sorin, Behrooz Parhami, Srinivasan Ramasubramanian
More informationThe Transport Layer Reliability
The Transport Layer Reliability CS 3, Lecture 7 http://www.cs.rutgers.edu/~sn4/3-s9 Srinivas Narayana (slides heavily adapted from text authors material) Quick recap: Transport Provide logical communication
More informationDistributed Systems COMP 212. Lecture 19 Othon Michail
Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails
More informationCMSC 417. Computer Networks Prof. Ashok K Agrawala Ashok Agrawala. October 30, 2018
CMSC 417 Computer Networks Prof. Ashok K Agrawala 2018 Ashok Agrawala October 30, 2018 Message, Segment, Packet, and Frame host host HTTP HTTP message HTTP TCP TCP segment TCP router router IP IP packet
More informationOutline: Connecting Many Computers
Outline: Connecting Many Computers Last lecture: sending data between two computers This lecture: link-level network protocols (from last lecture) sending data among many computers 1 Review: A simple point-to-point
More information416 Distributed Systems. Errors and Failures Feb 9, 2018
416 Distributed Systems Errors and Failures Feb 9, 2018 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationError Mitigation of Point-to-Point Communication for Fault-Tolerant Computing
Error Mitigation of Point-to-Point Communication for Fault-Tolerant Computing Authors: Robert L Akamine, Robert F. Hodson, Brock J. LaMeres, and Robert E. Ray www.nasa.gov Contents Introduction to the
More informationWireless TCP Performance Issues
Wireless TCP Performance Issues Issues, transport layer protocols Set up and maintain end-to-end connections Reliable end-to-end delivery of data Flow control Congestion control Udp? Assume TCP for the
More informationECE 435 Network Engineering Lecture 10
ECE 435 Network Engineering Lecture 10 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 September 2017 Announcements HW#4 was due HW#5 will be posted. midterm/fall break You
More informationAnnouncements. IP Forwarding & Transport Protocols. Goals of Today s Lecture. Are 32-bit Addresses Enough? Summary of IP Addressing.
IP Forwarding & Transport Protocols EE 122: Intro to Communication Networks Fall 2007 (WF 4-5:30 in Cory 277) Vern Paxson TAs: Lisa Fowler, Daniel Killebrew & Jorge Ortiz http://inst.eecs.berkeley.edu/~ee122/
More informationFault-Tolerance I: Atomicity, logging, and recovery. COS 518: Advanced Computer Systems Lecture 3 Kyle Jamieson
Fault-Tolerance I: Atomicity, logging, and recovery COS 518: Advanced Computer Systems Lecture 3 Kyle Jamieson What is fault tolerance? Building reliable systems from unreliable components Three basic
More informationARQ and HARQ inter-working for IEEE m system
ARQ and HARQ inter-working for IEEE 802.16m system Document Number: IEEE C802.16m-08/1053r1 Date Submitted: 2008-09-17 Source: Xiangying Yang (xiangying.yang@intel.com) Yuan Zhu Muthaiah Venkatachalam
More informationThe Walking Dead Michael Nitschinger
The Walking Dead A Survival Guide to Resilient Reactive Applications Michael Nitschinger @daschl the right Mindset 2 The more you sweat in peace, the less you bleed in war. U.S. Marine Corps 3 4 5 Not
More informationOutline. EEC-484/584 Computer Networks. Data Link Layer Design Issues. Framing. Lecture 6. Wenbing Zhao Review.
EEC-484/584 Computer Networks Lecture 6 wenbing@ieee.org (Lecture nodes are based on materials supplied by Dr. Louise Moser at UCSB and Prentice-Hall) Outline Review Data Link Layer Design Issues Error
More informationDistributed Systems
15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard
More informationTWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018
TWO-PHASE COMMIT George Porter May 9 and 11, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides
More informationChapter Six. Errors, Error Detection, and Error Control. Data Communications and Computer Networks: A Business User s Approach Seventh Edition
Chapter Six Errors, Error Detection, and Error Control Data Communications and Computer Networks: A Business User s Approach Seventh Edition After reading this chapter, you should be able to: Identify
More informationT10/03-186r2 SAS-1.1 Transport layer retries
To: T10 Technical Committee From: Rob Elliott, HP (elliott@hp.com) and Jim Jones, Quantum (jim.jones@quantum.com) Date: 28 July 2003 Subject: T10/03-186r1 SAS-1.1 Transport layer retries T10/03-186r2 SAS-1.1
More informationData Link Layer, Part 5 Sliding Window Protocols. Preface
Data Link Layer, Part 5 Sliding Window Protocols These slides are created by Dr. Yih Huang of George Mason University. Students registered in Dr. Huang's courses at GMU can make a single machine-readable
More informationudirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults
1/45 1/22 MICRO-46, 9 th December- 213 Davis, California udirec: Unified Diagnosis and Reconfiguration for Frugal Bypass of NoC Faults Ritesh Parikh and Valeria Bertacco Electrical Engineering & Computer
More informationCSE 123: Computer Networks Alex C. Snoeren. HW 1 due NOW!
CSE 123: Computer Networks Alex C. Snoeren HW 1 due NOW! Automatic Repeat Request (ARQ) Acknowledgements (ACKs) and timeouts Stop-and-Wait Sliding Window Forward Error Correction 2 Link layer is lossy
More informationBullet-Proofing PCIe in Enterprise Storage SoCs with RAS features
Bullet-Proofing PCIe in Enterprise Storage SoCs with RAS features Michael Fernandez, Sr. FAE, PLDA Agenda What is RAS(M)? PCIe RAS features What s in the Spec. and what s not Limitations Case studies Problem
More informationCS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 4pm
CS144: Intro to Computer Networks Homework 1 Scan and submit your solution online. Due Friday January 30, 2015 @ 4pm Your Name: SUNet ID: @stanford.edu Check if you would like exam routed back via SCPD:
More informationLecture 26: Data Link Layer
Introduction We have seen in previous lectures that the physical layer is responsible for the transmission of row bits (Ones and Zeros) over the channel. It is responsible for issues related to the line
More informationHigh Level View. EE 122: Error detection and reliable transmission. Overview. Error Detection
High Level View EE 22: Error detection and reliable transmission Ion Stoica September 6, 22 Goal: transmit correct information Problem: bits can get corrupted - Electrical interference, thermal noise Solution
More information03-186r5 SAS-1.1 Transport layer retries 13 January 2004
To: T10 Technical Committee From: Rob Elliott, HP (elliott@hp.com) Date: 13 January 2004 Subject: 03-186r5 SAS-1.1 Transport layer retries Revision history Revision 0 (6 May 2003) first revision Revision
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 18 Chapter 7 Case Studies Part.18.1 Introduction Illustrate practical use of methods described previously Highlight fault-tolerance
More information6.033 Lecture Fault Tolerant Computing 3/31/2014
6.033 Lecture 14 -- Fault Tolerant Computing 3/31/2014 So far what have we seen: Modularity RPC Processes Client / server Networking Implements client/server Seen a few examples of dealing with faults
More informationIntroduc)on to Computer Networks
Introduc)on to Computer Networks COSC 4377 Lecture 7 Spring 2012 February 8, 2012 Announcements HW3 due today Start working on HW4 HW5 posted In- class student presenta)ons No TA office hours this week
More informationfile:///c:/users/hpguo/dropbox/website/teaching/fall 2017/CS4470/H...
1 of 9 11/26/2017, 11:28 AM Homework 3 solutions 1. A window holds bytes 2001 to 5000. The next byte to be sent is 3001. Draw a figure to show the situation of the window after the following two events:
More informationLecture 6: Multicast
Lecture 6: Multicast Challenge: how do we efficiently send messages to a group of machines? Need to revisit all aspects of networking Last time outing This time eliable delivery Ordered delivery Congestion
More information6.033 Computer Systems Engineering: Spring Quiz II THIS IS AN OPEN BOOK, OPEN NOTES QUIZ. NO PHONES, NO COMPUTERS, NO LAPTOPS, NO PDAS, ETC.
Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.033 Computer Systems Engineering: Spring 2005 Quiz II There are 17 questions and 10 pages in this quiz
More informationReliable Transport : Fundamentals of Computer Networks Bill Nace
Reliable Transport 14-740: Fundamentals of Computer Networks Bill Nace Material from Computer Networking: A Top Down Approach, 6 th edition. J.F. Kurose and K.W. Ross Administration Stuff is due HW #1
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More informationAppendix D: Storage Systems (Cont)
Appendix D: Storage Systems (Cont) Instructor: Josep Torrellas CS433 Copyright Josep Torrellas 1999, 2001, 2002, 2013 1 Reliability, Availability, Dependability Dependability: deliver service such that
More informationName: uteid: 1. CS439: Fall 2011 Midterm 2
Name: uteid: 1 Instructions CS: Fall Midterm Stop writing when time is announced at the end of the exam. I will leave the room as soon as I ve given people a fair chance to bring me the exams. I will not
More informationIssues in Programming Language Design for Embedded RT Systems
CSE 237B Fall 2009 Issues in Programming Language Design for Embedded RT Systems Reliability and Fault Tolerance Exceptions and Exception Handling Rajesh Gupta University of California, San Diego ES Characteristics
More informationIntroduction to Robust Systems
Introduction to Robust Systems Subhasish Mitra Stanford University Email: subh@stanford.edu 1 Objective of this Talk Brainstorm What is a robust system? How can we build robust systems? Robust systems
More informationLecture 3 The Transport Control Protocol (TCP) Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it
Lecture 3 The Transport Control Protocol (TCP) Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it TCP segment structure URG: urgent data (generally not used) ACK: ACK # valid PSH: push
More informationAdministrivia. FEC vs. ARQ. Reliable Transmission FEC. Last time: Framing Error detection. FEC provides constant throughput and predictable delay
FEC vs. ARQ Administrivia FEC provides constant throughput and predictable delay If high error rate, need long codes/complex circuitry Does not protect against all errors, or packet loss Last time: Framing
More informationCongestion Control In The Internet Part 2: How it is implemented in TCP. JY Le Boudec 2015
Congestion Control In The Internet Part 2: How it is implemented in TCP JY Le Boudec 2015 1 Contents 1. Congestion control in TCP 2. The fairness of TCP 3. The loss throughput formula 4. Explicit Congestion
More informationECE/CSC 570 Section 001. Final test. December 11, 2006
ECE/CSC 570 Section 001 Final test December 11, 2006 Questions 1 10 each carry 2 marks. Answer only by placing a check mark to indicate whether the statement is true of false in the appropriate box, and
More informationChapter 8 Fault Tolerance
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to
More information