Distributed Systems (5DV147) Fundamentals Fall 2013 1
basics 2
basics Single process int i; i=i+1; 1 CPU - Steps are strictly sequential - Program behavior & variables state determined by sequence of operations 1 memory 3
basics Distributed processes 4
A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages. Coulouris, Dollimore, Kindberg, and Blair, 2012 5
State is private and hidden basics Definition of steps of each process including transmission of messages between processes (computation, send, receive) p1 int i; i=i+1; send(i,p2); p2 int a; receive(p1,a); Can t predict the rate at which each process executes nor the timing of transmissions Progress depends on internal local state and of messages received 6
basics Impossible to describe all states of a distributed algorithm because of failure of any of the processes or of message transmissions 7
Models 8
Models Why models? to abstract underlying properties that are common to a large range of systems so that it enables to distinguish the fundamental from the accessory to make explicit all relevant assumptions about the system to discover common design problems to prevent reinvent the wheel for every minor variant of the problem A model abstracts away the key components and the way they interact 9
Models Message-passing model 10
Characteristics: Synchrony of the system Type of communication network point-to-point or broadcast channel Models Model of process and communication failures that may occur 11
Models Synchronous systems 12
Models There is a known upper bound on the time required by any process to execute a step Every process has a clock with known bound rate of drift with respect to real time There is a known bound on message delay time to send, transport, and receive a message over a link (Hadzilacos and Toueg, 1994) 13
Models Asynchronous systems 14
Models No timing assumptions, in particular on the maximum message delay, clock drift, or the time needed to execute a step (Hadzilacos and Toueg, 1994) 15
Models Attractive because of: Simple semantics Applications are easier to port Variable or unexpected workloads are sources of asynchrony Processes share resources and communications channels share the network 16
Models Point-to-point networks 17
can fail, be slow, or produce incorrect output Models process p process q send m message delays or message loss receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer Two assumptions: 1. If p sends a message m to q then q eventually receives m 2. Every process executes an infinite sequence of steps Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 18
Failure model 19
A distributed system is one in which the failure of computer you didn't even know existed can render your own computer unusable. Leslie Lamport, 1987 20
Why computer systems fail? Failure model Hardware reliability Dominant until late in the 1980s Minor component of overall reliability concerns Not so for mechanical devices, e.g., networks or disks Software reliability Software bugs account for a substantial fraction of unplanned downtime (25-35%) Bohrbugs easily localized and reproducible Heisenbug symptoms of other bugs that don t cause an immediate crash (extremely hard to fix, also known as transient bugs) Planned maintenance and environmental factors 21
Failure model Processes and communication channels may fail, i.e., deviation from correct behavior. The failure model defines ways in which failures may occur in order to understand the effects of failures. 22
Failure model Omission failures Process or communication channel fails to do what it is supposed to do. 23
Process Omission failures: Failure model process p send m process q receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer 1. Crash 2. Fail-stop 3. Send-omission 4. Receive-omission Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 24
Channel Omission failures: Failure model process p send m process q receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer m is inserted into p s outgoing message buffer but m is not transported into q s incoming message buffer Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 25
Failure model Timing failures (only for synchronous systems) 26
Class of Failure Affects Description Clock Process Process s local clock exceeds the bounds on its rate of drift from real time. Performance Process Process exceeds the bounds on the interval between two steps. Performance Channel A message s transmission takes longer than the stated bound. Failure model Benign failures Omission failures (asynchronous networks) Timing failures (synchronous networks) Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.11 27
Failure model Arbitrary failures (Byzantine or malicious failures) 28
Failure model Any type of error may occur If the sequence of steps executed by a process deviates arbitrarily from what it should do, e.g., change its state arbitrarily or send a message that it was not supposed to send In communication channels messages may be corrupted, duplicated, or non-existent messages be delivered 29
Why do we want to know the failure characteristics of a component? Failure model We can mask the failure of the components on which it depends by either hiding it or converting it into a more acceptable type of failure Arbitrary failure checksums Omission failure 30
Failure masking by Redundancy Failure model Information redundancy Time redundancy Physical redundancy 31
Reliability of one-to-one communication: Failure model process p send m process q receive Communication channe l Outgo ing messag e buffer Incoming messag e buffer Validity: any message in the outgoing buffer is eventually delivered to the incoming message buffer Integrity: The message received is identical to the one sent, and no messages are delivered twice Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.14 32
Reliability of broadcast communication: p m 1 (a=1) q 1 q 2 Failure model What happens if p fails in the middle? q 3 Reliable broadcast: all processes agree on messages delivered despite failures Consensus: all processes reach a common decision depending on initial inputs, despite failures 33
Failure model How would you design a reliable broadcast protocol? 34
End-to-end reliability or hop-by-hop? Failure model End-to-end: only between source process and target process of a message Hop-by-hop: reliability at every intermediate node Hop-by-hop reliability increases complexity and reduces performance, and end-to-end reliability is still required. End-To-End Arguments in System Design J. H. SALTZER, D. P. REED, and D. D. CLARK ACM Transactions on Computer Systems, Vol. 2, No. 4, November 1984, Pages 277-288 http://groups.csail.mit.edu/ana/publications/pubpdfs/end-to-end%20arguments%20in%20system%20design.pdf 35
Summary 36
summary Difference between single process and distributed algorithms Message-passing model Synchrony of the system Type of communication network Model of process and communication failures that may occur Differences between synchronous and asynchronous systems 37
Failure Model Class of failure Affects Description Fail-stop Process Process halts and remains halted. Other processes may detect this state. Crash Process Process halts and remains halted. Other processes may not be able to detect this state. Omission Channel A message inserted in an outgoing message buffer never arrives at the other end s incoming message buffer. Send-omission Process A process completes a send, but the message is not put in its outgoing message buffer. Receive-omission Process A message is put in a process s incoming message buffer, but that process does not receive it. Arbitrary (Byzantine) Process or channel Process/channel exhibits arbitrary behaviour: it may send/transmit arbitrary messages at arbitrary times, commit omissions; a process may stop or take an incorrect step. Clock Process Process s local clock exceeds the bounds on its rate of drift from real time. Performance Process Process exceeds the bounds on the interval between two steps. Performance Channel A message s transmission takes longer than the stated bound. summary Figure adapted rom Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 based on Figure 2.11 and Figure 2.15 38
Failure masking Failure masking by redundancy Reliable point-to-point and broadcast communication End-to-end arguments in system design summary 39
Communication paradigms 40
Communication paradigms Inter-process communication & remote invocation 41
Communication paradigms Inter-process communication Low level support for communication between processes message passing socket programming multicast communication 42
Remote invocation Communication paradigms Two-way exchange between communication entities calling of a remote operation, procedure or method 43
Communication paradigms Request-reply protocols pairwise exchange of messages from client to server Remote procedure calls procedures in processes on remote computers can be called as if they were in the local address space Remote method invocation a calling object can invoke a method in a remote object 44
Communicating entities and communication paradigms Communication paradigms Figure adapted from Instructor s Guide for Coulouris, Dollimore, Kindberg and Blair, Distributed Systems: Concepts and Design Edn. 5 Pearson Education 2012 Figure 2.2 45
Next Lecture Security 46