Distributed Systems (5DV147)

Similar documents
System Models for Distributed Systems

02 - Distributed Systems

02 - Distributed Systems

System Models 2. Lecture - System Models 2 1. Areas for Discussion. Introduction. Introduction. System Models. The Modelling Process - General

2. System Models Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems. Chapter 2 System Models

System Models. 2.1 Introduction 2.2 Architectural Models 2.3 Fundamental Models. Nicola Dragoni Embedded Systems Engineering DTU Informatics

System models for distributed systems

Specifying and Proving Broadcast Properties with TLA

Middleware and Distributed Systems. System Models. Dr. Martin v. Löwis

Distributed Systems (ICE 601) Fault Tolerance

C 1. Recap. CSE 486/586 Distributed Systems Failure Detectors. Today s Question. Two Different System Models. Why, What, and How.

Chapter 1: Distributed Systems: What is a distributed system? Fall 2013

Chapter 2 System Models

Middleware and Interprocess Communication

Architecture of distributed systems

Architecture of distributed systems

C 1. Today s Question. CSE 486/586 Distributed Systems Failure Detectors. Two Different System Models. Failure Model. Why, What, and How

CSE 486/586 Distributed Systems

Distributed Systems: Models and Design

Distributed Information Processing

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Distributed Systems COMP 212. Lecture 19 Othon Michail

MODELS OF DISTRIBUTED SYSTEMS

Lecture 1: Introduction to distributed Algorithms

Consensus and related problems

Consensus Problem. Pradipta De

Distributed Systems (5DV020) Group communication. Fall Group communication. Fall Indirect communication

Today: Fault Tolerance

Distributed Systems (5DV147)

Fault Tolerance. Distributed Software Systems. Definitions

The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Distributed Algorithms Models

Distributed Systems. Definitions. Why Build Distributed Systems? Operating Systems - Overview. Operating Systems - Overview

Architectural Models

Fault Tolerance. Distributed Systems. September 2002

To do. Consensus and related problems. q Failure. q Raft

Dep. Systems Requirements

Failure Tolerance. Distributed Systems Santa Clara University

DISTRIBUTED MUTEX. EE324 Lecture 11

MODELS OF DISTRIBUTED SYSTEMS

Transport layer protocols. Lecture 15: Operating Systems and Networks Behzad Bordbar

Fundamental Interaction Model

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Today: Fault Tolerance. Fault Tolerance

Announcements. me your survey: See the Announcements page. Today. Reading. Take a break around 10:15am. Ack: Some figures are from Coulouris

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

DISTRIBUTED OBJECTS AND REMOTE INVOCATION

Fault Tolerance. Basic Concepts

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems (5DV147)

Today: Fault Tolerance. Replica Management

Introduction. IP Datagrams. Internet Service Paradigm. Routers and Routing Tables. Datagram Forwarding. Example Internet and Conceptual Routing Table

Module 8 Fault Tolerance CS655! 8-1!

Paxos Made Simple. Leslie Lamport, 2001

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

Chapter 8 Fault Tolerance

Last Class:Consistency Semantics. Today: More on Consistency

Process groups and message ordering

Chapter 8 Fault Tolerance

SAI/ST course Distributed Systems

DISTRIBUTED COMPUTER SYSTEMS

Module 8 - Fault Tolerance

Distributed Information Processing

Introduction to Distributed Systems

Overview ECE 753: FAULT-TOLERANT COMPUTING 1/21/2014. Recap. Fault Modeling. Fault Modeling (contd.) Fault Modeling (contd.)

CS603: Distributed Systems

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Slides for Chapter 15: Coordination and Agreement

Interprocess Communication

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

Distributed File Systems. CS432: Distributed Systems Spring 2017

Fault-Tolerant Distributed Consensus

Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev

Distributed Systems. Aleardo Manacero Jr.

Fault Tolerance. Distributed Systems IT332

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems. Fault Tolerance

Introduction to Distributed Systems Seif Haridi

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Network Protocols. Sarah Diesburg Operating Systems CS 3430

Distributed Algorithms Benoît Garbinato

Distributed Systems. 19. Fault Tolerance Paul Krzyzanowski. Rutgers University. Fall 2013

Distributed System: Lecture 2

Slides for Chapter 2: System Models

Capacity of Byzantine Agreement: Complete Characterization of Four-Node Networks

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

03 Remote invoaction. Request-reply RPC. Coulouris 5 Birrel_Nelson_84.pdf RMI

The Design and Implementation of a Fault-Tolerant Media Ingest System

Communication in Distributed Systems

Frequently asked questions from the previous class survey

Distributed Systems Concepts Design 5th Edition Solutions

TDDD82 Secure Mobile Systems Lecture 1: Introduction and Distributed Systems Models

Distributed Systems Concepts Design 4th Edition

Failures, Elections, and Raft

CS455: Introduction to Distributed Systems [Spring 2018] Dept. Of Computer Science, Colorado State University

Structured communication (Remote invocation)

Distributed Objects and Remote Invocation. Programming Models for Distributed Applications

Transcription:

Distributed Systems (5DV147) Fundamentals Fall 2013 1

basics 2

basics Single process int i; i=i+1; 1 CPU - Steps are strictly sequential - Program behavior & variables state determined by sequence of operations 1 memory 3

basics Distributed processes 4

A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages. Coulouris, Dollimore, Kindberg, and Blair, 2012 5

State is private and hidden basics Definition of steps of each process including transmission of messages between processes (computation, send, receive) p1 int i; i=i+1; send(i,p2); p2 int a; receive(p1,a); Can t predict the rate at which each process executes nor the timing of transmissions Progress depends on internal local state and of messages received 6

basics Impossible to describe all states of a distributed algorithm because of failure of any of the processes or of message transmissions 7

Models 8

Models Why models? to abstract underlying properties that are common to a large range of systems so that it enables to distinguish the fundamental from the accessory to make explicit all relevant assumptions about the system to discover common design problems to prevent reinvent the wheel for every minor variant of the problem A model abstracts away the key components and the way they interact 9

Models Message-passing model 10

Characteristics: Synchrony of the system Type of communication network point-to-point or broadcast channel Models Model of process and communication failures that may occur 11

Models Synchronous systems 12

Models There is a known upper bound on the time required by any process to execute a step Every process has a clock with known bound rate of drift with respect to real time There is a known bound on message delay time to send, transport, and receive a message over a link (Hadzilacos and Toueg, 1994) 13

Models Asynchronous systems 14

Models No timing assumptions, in particular on the maximum message delay, clock drift, or the time needed to execute a step (Hadzilacos and Toueg, 1994) 15

Models Attractive because of: Simple semantics Applications are easier to port Variable or unexpected workloads are sources of asynchrony Processes share resources and communications channels share the network 16

Models Point-to-point networks 17

can fail, be slow, or produce incorrect output Models process p process q send m message delays or message loss receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer Two assumptions: 1. If p sends a message m to q then q eventually receives m 2. Every process executes an infinite sequence of steps Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 18

Failure model 19

A distributed system is one in which the failure of computer you didn't even know existed can render your own computer unusable. Leslie Lamport, 1987 20

Why computer systems fail? Failure model Hardware reliability Dominant until late in the 1980s Minor component of overall reliability concerns Not so for mechanical devices, e.g., networks or disks Software reliability Software bugs account for a substantial fraction of unplanned downtime (25-35%) Bohrbugs easily localized and reproducible Heisenbug symptoms of other bugs that don t cause an immediate crash (extremely hard to fix, also known as transient bugs) Planned maintenance and environmental factors 21

Failure model Processes and communication channels may fail, i.e., deviation from correct behavior. The failure model defines ways in which failures may occur in order to understand the effects of failures. 22

Failure model Omission failures Process or communication channel fails to do what it is supposed to do. 23

Process Omission failures: Failure model process p send m process q receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer 1. Crash 2. Fail-stop 3. Send-omission 4. Receive-omission Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 24

Channel Omission failures: Failure model process p send m process q receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer m is inserted into p s outgoing message buffer but m is not transported into q s incoming message buffer Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 25

Failure model Timing failures (only for synchronous systems) 26

Class of Failure Affects Description Clock Process Process s local clock exceeds the bounds on its rate of drift from real time. Performance Process Process exceeds the bounds on the interval between two steps. Performance Channel A message s transmission takes longer than the stated bound. Failure model Benign failures Omission failures (asynchronous networks) Timing failures (synchronous networks) Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.11 27

Failure model Arbitrary failures (Byzantine or malicious failures) 28

Failure model Any type of error may occur If the sequence of steps executed by a process deviates arbitrarily from what it should do, e.g., change its state arbitrarily or send a message that it was not supposed to send In communication channels messages may be corrupted, duplicated, or non-existent messages be delivered 29

Why do we want to know the failure characteristics of a component? Failure model We can mask the failure of the components on which it depends by either hiding it or converting it into a more acceptable type of failure Arbitrary failure checksums Omission failure 30

Failure masking by Redundancy Failure model Information redundancy Time redundancy Physical redundancy 31

Reliability of one-to-one communication: Failure model process p send m process q receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer Validity: any message in the outgoing buffer is eventually delivered to the incoming message buffer Integrity: The message received is identical to the one sent, and no messages are delivered twice Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 32

Reliability of broadcast communication: p m 1 (a=1) q 1 q 2 Failure model What happens if p fails in the middle? q 3 Reliable broadcast: all processes agree on messages delivered despite failures Consensus: all processes reach a common decision depending on initial inputs, despite failures 33

Failure model How would you design a reliable broadcast protocol? 34

End-to-end reliability or hop-by-hop? Failure model End-to-end: only between source process and target process of a message Hop-by-hop: reliability at every intermediate node Hop-by-hop reliability increases complexity and reduces performance, and end-to-end reliability is still required. End-To-End Arguments in System Design J. H. SALTZER, D. P. REED, and D. D. CLARK ACM Transactions on Computer Systems, Vol. 2, No. 4, November 1984, Pages 277-288 http://groups.csail.mit.edu/ana/publications/pubpdfs/end-to-end%20arguments%20in%20system%20design.pdf 35

Summary 36

summary Difference between single process and distributed algorithms Message-passing model Synchrony of the system Type of communication network Model of process and communication failures that may occur Differences between synchronous and asynchronous systems 37

Failure Model Class of failure Affects Description Fail-stop Process Process halts and remains halted. Other processes may detect this state. Crash Process Process halts and remains halted. Other processes may not be able to detect this state. Omission Channel A message inserted in an outgoing message buffer never arrives at the other end s incoming message buffer. Send-omission Process A process completes a send, but the message is not put in its outgoing message buffer. Receive-omission Process A message is put in a process s incoming message buffer, but that process does not receive it. Arbitrary (Byzantine) Process or channel Process/channel exhibits arbitrary behaviour: it may send/transmit arbitrary messages at arbitrary times, commit omissions; a process may stop or take an incorrect step. Clock Process Process s local clock exceeds the bounds on its rate of drift from real time. Performance Process Process exceeds the bounds on the interval between two steps. Performance Channel A message s transmission takes longer than the stated bound. summary Figure adapted rom Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 based on Figure 2.11 and Figure 2.15 38

Failure masking Failure masking by redundancy Reliable point-to-point and broadcast communication End-to-end arguments in system design summary 39

Communication paradigms 40

Communication paradigms Inter-process communication & remote invocation 41

Communication paradigms Inter-process communication Low level support for communication between processes message passing socket programming multicast communication 42

Remote invocation Communication paradigms Two-way exchange between communication entities calling of a remote operation, procedure or method 43

Communication paradigms Request-reply protocols pairwise exchange of messages from client to server Remote procedure calls procedures in processes on remote computers can be called as if they were in the local address space Remote method invocation a calling object can invoke a method in a remote object 44

Communicating entities and communication paradigms Communication paradigms Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.2 45

Next Lecture Security 46