Self-Stabilizing Byzantine Digital Clock Synchronization
|
|
- Annabel Cross
- 5 years ago
- Views:
Transcription
1 Self-Stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev, Ariel Daliot School of Engineering and Computer Science, The Hebrew University of Jerusalem, Israel
2 Problem Statement Digital Clock Synchronization Self-stabilizing Byzantine faults
3 Model & Problem Statement Model: N nodes, fully connected (complete graph) Synchronous: Global beat system Rapid Beat interval: on the order of the message delivery time No common initialization point Nodes may be subject to transient faults Permanent presence of Byzantine nodes (up to f < n/4 ) Problem statement: If enough good nodes weren t subject to transient faults for a sufficient period of time, then all these nodes attain the same DigiClock value, and with each consecutive beat increase it by 1 (mod Overlap)
4 What we want to achieve? 09:15:01 09:15:02 09:15:03 09:15:04 09:15:01 09:15:02 09:15:03 09:15:04 08:10:33 09:15:01 09:15:02 09:15:03 09:15:04 09:15:01 09:15:02 09:15:03 09:15:04
5 Why is this hard? 20:10:10 20:10:11 20:10:12 20:10:13 16:00:00 16:00:01 16:00:02 16:00:03 08:10:33 09:15:01 09:15:02 09:15:03 09:15:04 12:45:03 12:45:04 12:45:05 12:45:06
6 Intuitive Solution Overview I. Agreed stream: Produce a common stream of agreed values at all good nodes (using rotating consensus ). (, 20, 4, 23, 19, ) II. III. Continuous Agreed Stream: Transform the stream to a stream of consecutive, increasing, integer values. (, 27, 28, 29, 30, ) Update Clock: Update the internal digital clock in accord with the common increasing stream.
7 Intuitive Solution Contd. Beat Node Node Node Node Last transient fault, arbitrary state Stage I (Agreed stream) Stage II-III (Continuous agreed stream)
8 Classic Byzantine Consensus Problem statement: Each node has an initial value. All nodes need to agree on a common output value within a finite time, in the presence of Byzantine nodes. Agreement: All good nodes terminate with the same output value Validity: If all good nodes have the same initial value, v, then the output value is v Termination: All good nodes terminate within rounds In addition: Solidarity: If the output value is v, then more than n/2 good nodes had v as their initial value; Otherwise, the output value is the default value Assumptions: all good nodes start in a consistent initial state Synchronous execution
9 Stage I Agreed Stream Rotating Consensus : Execute simultaneously Byzantine Consensus instances, differing at their round of execution. At each beat: Execute current round of each of the instances Output the value of the last terminated instance Invoke a new instance of Byzantine consensus
10 Stage I Contd. Beat i Beat i+1 Beat i+2 Execution of round 1 Execution of round 2 Execution of round Execution of round -1 Execution of round Output Output Output
11 Stage I Summary Starting from an arbitrary state: At the next beat, a new instance of Byzantine consensus (BC i ) is initialized After beats all nodes agree on the output value of the BC i consensus instance This holds for all consensus instances initialized after the last transient fault Therefore, we have an agreed stream, such that at each beat, all good nodes have an agreed value associated with that beat
12 Stage II Continuous Agreed Stream At every beat, all nodes send their DigiClock value to all nodes. At each node, the variable most holds a value that was received from more than half of the nodes. If no such value exists, most will hold 0. v holds the value associated with the current beat v_prev holds the value v had at the previous beat All `+` operations are done (mod) Overlap
13 Stage II Contd. Use the following update rule: if (v=0) or (v=v_prev+1) then DigiClock:= most+1 else DigiClock := 0; Initialize the new instance of the Byzantine Consensus algorithm with DigiClock as the initial consensus value.
14 Beat i+1 i Round 1 2 Round 23. Round. 34 Round wv 1 Update v w Rule Round. -1. Other input Round
15 Why Stage I isn t enough? Each round all good nodes agree on some value. However, this is not enough. If we use v+ +1 as the new initial value, we might get stuck in a repeating situation ( ) Other immediate update rules failed.
16 Stage II Closure if (v=0) or (v=v_prev+1) then DigiClock:= most+1 else DigiClock := 0; If the system has already converged then v=v_prev+1 and all good nodes have the same DigiClock value. Hence, most will be the same at all good nodes. Therefore, all good nodes will update their DigiClock value to be increased by 1. And the system stays in a legal state (synchronized clocks).
17 Stage II Contd. if (v=0) or (v=v_prev+1) then DigiClock:= most+1 else DigiClock := 0; Note that once all good nodes agree on DigiClock: They continue to agree. Hence, the values entered into the rotating consensus, will be either 0, or an increment of the previous value. Hence, After beats, DigiClock will increase by 1 each beat.
18 Stage II Convergence if (v=0) or (v=v_prev+1) then DigiClock:= most+1 else DigiClock := 0; Ensuring DigiClock is the same at all good nodes: All nodes execute either the first line or the second line. If during +1 beats, DigiClock isn t the same, then only the first line was executed, and no more than n/2 good nodes had the same DigiClock value. After +1 beats, v would be equal to (due to solidarity requirement). Hence, the second line would be executed, and all good nodes will have the same DigiClock value.
19 Formal Algorithm
20 Convergence Timeline (in beats) Starting from an arbitrary state: The All system good nodes converges agree on the value of v All good nodes agree on the value of v_prev Execute the same lines of code All good nodes agree on the value of DigiClock 3 +3 All good nodes increase DigiClock by 1 each beat Time
21 Complexity Analysis Convergence in Ω( ), that is, Ω( f). Amortized message complexity per round is 2 O( n )
22 Related Work Previous Self-stabilizing (digital) clock synchronization (non- Byzantine) : Arora, Dolev, Gouda, Herman, Papatriantafilou, etc. Not many previous results addressing Self-stabilization with Byzantine faults, due to the complexity of the combined model. Dolev, Welch 95. Self-Stabilizing Clock Synchronization in the presence of Byzantine faults: To the best of our knowledge, only previous work operating in the same model (Global beat, SS, Byz) Expected convergence time is exponential, as opposed to our deterministic linear time. Tolerates up to a third Byzantine nodes, as opposed to our tolerance of a fourth.
23 Related Work Contd. Daliot and Dolev s Self stabilizing Byzantine Clock synchronization No common external synchronization Built on top of an underlying Self-stabilizing Byzantine Pulse with large Cycle length Internal distributed Pulse is difficult to attain Doesn t take advantage of our stronger synchronous model: Precision stays tight (in the order of the network delay), but not 0
24 Contribution of Current Work Deterministic linear convergence time (as opposed to expected exponential time in previous work of in exact same model) Simple solution that takes advantage of the strength of the model (as opposed to a more complex solution with same convergence time) Rotating Consensus mechanism
25 Future Directions Can the Byzantine tolerance be improved to support f < n/3? What happens when the global beats are received at intervals that are less than the message delay? Can the rotating consensus be applied in a more general way? E.g. to create a general stabilizer of Byzantine tolerant algorithms?
26 Questions?
Self-stabilizing Byzantine Digital Clock Synchronization
Self-stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev and Ariel Daliot The Hebrew University of Jerusalem We present a scheme that achieves self-stabilizing Byzantine digital
More informationSelf Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali
Self Stabilization CS553 Distributed Algorithms Prof. Ajay Kshemkalyani by Islam Ismailov & Mohamed M. Ali Introduction There is a possibility for a distributed system to go into an illegitimate state,
More informationIntroduction to Distributed Systems Seif Haridi
Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send
More informationDistributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN
Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus (Recapitulation) A consensus abstraction is specified in terms of two events: 1. Propose ( propose v )» Each process has
More informationThe Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer
The Timed Asynchronous Distributed System Model By Flaviu Cristian and Christof Fetzer - proposes a formal definition for the timed asynchronous distributed system model - presents measurements of process
More informationSemi-Passive Replication in the Presence of Byzantine Faults
Semi-Passive Replication in the Presence of Byzantine Faults HariGovind V. Ramasamy Adnan Agbaria William H. Sanders University of Illinois at Urbana-Champaign 1308 W. Main Street, Urbana IL 61801, USA
More informationByzantine Consensus in Directed Graphs
Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory
More informationConsensus, impossibility results and Paxos. Ken Birman
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationFault-Tolerant Distributed Consensus
Fault-Tolerant Distributed Consensus Lawrence Kesteloot January 20, 1995 1 Introduction A fault-tolerant system is one that can sustain a reasonable number of process or communication failures, both intermittent
More informationConsensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.
Consensus, impossibility results and Paxos Ken Birman Consensus a classic problem Consensus abstraction underlies many distributed systems and protocols N processes They start execution with inputs {0,1}
More informationByzantine Failures. Nikola Knezevic. knl
Byzantine Failures Nikola Knezevic knl Different Types of Failures Crash / Fail-stop Send Omissions Receive Omissions General Omission Arbitrary failures, authenticated messages Arbitrary failures Arbitrary
More informationConsensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS
16 CHAPTER 2. CONSENSUS Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of a node. Chapter
More informationarxiv: v2 [cs.dc] 12 Sep 2017
Efficient Synchronous Byzantine Consensus Ittai Abraham 1, Srinivas Devadas 2, Danny Dolev 3, Kartik Nayak 4, and Ling Ren 2 arxiv:1704.02397v2 [cs.dc] 12 Sep 2017 1 VMware Research iabraham@vmware.com
More informationErez Petrank. Department of Computer Science. Haifa, Israel. Abstract
The Best of Both Worlds: Guaranteeing Termination in Fast Randomized Byzantine Agreement Protocols Oded Goldreich Erez Petrank Department of Computer Science Technion Haifa, Israel. Abstract All known
More informationDetectable Byzantine Agreement Secure Against Faulty Majorities
Detectable Byzantine Agreement Secure Against Faulty Majorities Matthias Fitzi, ETH Zürich Daniel Gottesman, UC Berkeley Martin Hirt, ETH Zürich Thomas Holenstein, ETH Zürich Adam Smith, MIT (currently
More informationParsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast
Parsimonious Asynchronous Byzantine-Fault-Tolerant Atomic Broadcast HariGovind V. Ramasamy Christian Cachin August 19, 2005 Abstract Atomic broadcast is a communication primitive that allows a group of
More informationDiscussion of Failure Mode Assumptions for IEEE 802.1Qbt
Discussion of Failure Mode Assumptions for IEEE 802.1Qbt Wilfried Steiner, Corporate Scientist wilfried.steiner@tttech.com www.tttech.com Page 1 Clock Synchronization is a core building block of many RT
More informationToday: Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationBasic vs. Reliable Multicast
Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?
More informationDistributed Systems 11. Consensus. Paul Krzyzanowski
Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one
More informationFault Tolerance. Distributed Systems. September 2002
Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend
More informationDependable Computer Systems
Dependable Computer Systems Part 6b: System Aspects Contents Synchronous vs. Asynchronous Systems Consensus Fault-tolerance by self-stabilization Examples Time-Triggered Ethernet (FT Clock Synchronization)
More informationLocal Stabilizer. Yehuda Afek y Shlomi Dolev z. Abstract. A local stabilizer protocol that takes any on-line or o-line distributed algorithm and
Local Stabilizer Yehuda Afek y Shlomi Dolev z Abstract A local stabilizer protocol that takes any on-line or o-line distributed algorithm and converts it into a synchronous self-stabilizing algorithm with
More informationFault Tolerance via the State Machine Replication Approach. Favian Contreras
Fault Tolerance via the State Machine Replication Approach Favian Contreras Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial Written by Fred Schneider Why a Tutorial? The
More informationPractical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov
Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change
More informationBYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193
BYZANTINE AGREEMENT by H. R. Strong and D. Dolev IBM Research Laboratory, K55/281 San Jose, CA 95193 ABSTRACT Byzantine Agreement is a paradigm for problems of reliable consistency and synchronization
More informationConsensus in Distributed Systems. Jeff Chase Duke University
Consensus in Distributed Systems Jeff Chase Duke University Consensus P 1 P 1 v 1 d 1 Unreliable multicast P 2 P 3 Consensus algorithm P 2 P 3 v 2 Step 1 Propose. v 3 d 2 Step 2 Decide. d 3 Generalizes
More informationSpecifying and Proving Broadcast Properties with TLA
Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important
More informationCoordination and Agreement
Coordination and Agreement Nicola Dragoni Embedded Systems Engineering DTU Informatics 1. Introduction 2. Distributed Mutual Exclusion 3. Elections 4. Multicast Communication 5. Consensus and related problems
More informationWait-Free Regular Storage from Byzantine Components
Wait-Free Regular Storage from Byzantine Components Ittai Abraham Gregory Chockler Idit Keidar Dahlia Malkhi July 26, 2006 Abstract We consider the problem of implementing a wait-free regular register
More informationInitial Assumptions. Modern Distributed Computing. Network Topology. Initial Input
Initial Assumptions Modern Distributed Computing Theory and Applications Ioannis Chatzigiannakis Sapienza University of Rome Lecture 4 Tuesday, March 6, 03 Exercises correspond to problems studied during
More informationAn optimal novel Byzantine agreement protocol (ONBAP) for heterogeneous distributed database processing systems
Available online at www.sciencedirect.com Procedia Technology 6 (2012 ) 57 66 2 nd International Conference on Communication, Computing & Security An optimal novel Byzantine agreement protocol (ONBAP)
More information6 2 ation Problem (CCP), introduced by Rabin (1982) and described informally as follow : Given a set of n asynchronous concurrent processes and k shar
Distributed Computing (1989) 3 :61-72 D) n J Springer-Verlag 1989 Choice coordination with limited failure Amotz Bar-Noy r, Michael Ben-Or', and Danny Dolev 1,3 1 Department of Computer Science, Stanford
More informationIsolating Compromised Routers. Alper Mizrak, Keith Marzullo and Stefan Savage UC San Diego Department of Computer Science and Engineering
Isolating Compromised Routers Alper Mizrak, Keith Marzullo and Stefan Savage UC San Diego Department of Computer Science and Engineering Problem Routers are vulnerable points in the Internet, especially
More informationFault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University
Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed
More informationCS State Machine Replication
CS 5450 State Machine Replication Key Ideas To tolerate faults replicate functionality! Can represent deterministic distributed system as replicated state machine (SMR) Each replica reaches the same conclusion
More information21. Distributed Algorithms
21. Distributed Algorithms We dene a distributed system as a collection of individual computing devices that can communicate with each other [2]. This denition is very broad, it includes anything, from
More informationCS505: Distributed Systems
Department of Computer Science CS505: Distributed Systems Lecture 13: Distributed Transactions Outline Distributed Transactions Two Phase Commit and Three Phase Commit Non-blocking Atomic Commit with P
More informationDistributed systems. Consensus
Distributed systems Consensus Prof R. Guerraoui Distributed Programming Laboratory Consensus B A C 2 Consensus In the consensus problem, the processes propose values and have to agree on one among these
More informationDistributed Systems (ICE 601) Fault Tolerance
Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability
More informationCSE 5306 Distributed Systems. Fault Tolerance
CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure
More informationImplementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev
Implementing Shared Registers in Asynchronous Message-Passing Systems, 1995; Attiya, Bar-Noy, Dolev Eric Ruppert, York University, www.cse.yorku.ca/ ruppert INDEX TERMS: distributed computing, shared memory,
More informationA Synchronous Self-Stabilizing Minimal Domination Protocol in an Arbitrary Network Graph
A Synchronous Self-Stabilizing Minimal Domination Protocol in an Arbitrary Network Graph Z. Xu, S. T. Hedetniemi, W. Goddard, and P. K. Srimani Department of Computer Science Clemson University Clemson,
More informationGeneric Proofs of Consensus Numbers for Abstract Data Types
Generic Proofs of Consensus Numbers for Abstract Data Types Edward Talmage and Jennifer Welch Parasol Laboratory, Texas A&M University College Station, Texas, USA etalmage@tamu.edu, welch@cse.tamu.edu
More informationDistributed Algorithms Reliable Broadcast
Distributed Algorithms Reliable Broadcast Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents
More informationToday: Fault Tolerance. Fault Tolerance
Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing
More informationCoordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q
Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?
More informationToday: Fault Tolerance. Replica Management
Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery
More informationDistributed Systems Fault Tolerance
Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable
More informationReplication in Distributed Systems
Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over
More informationPrecedence Graphs Revisited (Again)
Precedence Graphs Revisited (Again) [i,i+6) [i+6,i+12) T 2 [i,i+6) [i+6,i+12) T 3 [i,i+2) [i+2,i+4) [i+4,i+6) [i+6,i+8) T 4 [i,i+1) [i+1,i+2) [i+2,i+3) [i+3,i+4) [i+4,i+5) [i+5,i+6) [i+6,i+7) T 5 [i,i+1)
More informationProseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita
Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may
More informationSynchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer
Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors Michel Raynal, Julien Stainer Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors
More informationDistributed Computing Building Blocks for Rational Agents
Distributed Computing Building Blocks for Rational Agents Yehuda Afek, Yehonatan Ginzberg, Shir Landau Feibish, and Moshe Sulamy Blavatnik School of Computer Science, Tel-Aviv University, Israel afek@cs.tau.ac.il,
More informationFault Tolerance. Distributed Software Systems. Definitions
Fault Tolerance Distributed Software Systems Definitions Availability: probability the system operates correctly at any given moment Reliability: ability to run correctly for a long interval of time Safety:
More information6.852: Distributed Algorithms Fall, Class 21
6.852: Distributed Algorithms Fall, 2009 Class 21 Today s plan Wait-free synchronization. The wait-free consensus hierarchy Universality of consensus Reading: [Herlihy, Wait-free synchronization] (Another
More informationFault Tolerance. Basic Concepts
COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time
More informationArvind Krishnamurthy Fall Collection of individual computing devices/processes that can communicate with each other
Distributed Systems Arvind Krishnamurthy Fall 2003 Concurrent Systems Collection of individual computing devices/processes that can communicate with each other General definition encompasses a wide range
More informationConsensus in the Presence of Partial Synchrony
Consensus in the Presence of Partial Synchrony CYNTHIA DWORK AND NANCY LYNCH.Massachusetts Institute of Technology, Cambridge, Massachusetts AND LARRY STOCKMEYER IBM Almaden Research Center, San Jose,
More informationLeslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works.
April 20, 2016 Born February 7 1941 in New York Mathematician by his education Has worked in industry, not an academic Fields: concurrency and distributed systems Lists 180 publications and other texts
More informationComputer Science Technical Report
Computer Science Technical Report Feasibility of Stepwise Addition of Multitolerance to High Atomicity Programs Ali Ebnenasir and Sandeep S. Kulkarni Michigan Technological University Computer Science
More informationOverview ECE 753: FAULT-TOLERANT COMPUTING 1/23/2014. Recap. Introduction. Introduction (contd.) Introduction (contd.)
ECE 753: FAULT-TOLERANT COMPUTING Kewal K.Saluja Department of Electrical and Computer Engineering Test Generation and Fault Simulation Lectures Set 3 Overview Introduction Basics of testing Complexity
More informationIdeal Stabilization in a PIF Chain
Ideal Stabilization in a PIF Chain Jordan Adamek Department of Computer Science Kent State University Kent, OH 44240, USA jadamek2@kent.edu Department of Computer Science Kent State University Technical
More informationConsensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55
8.3. IMPOSSIBILITY OF CONSENSUS 55 Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of
More informationRecall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos
Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4
More informationConsensus Problem. Pradipta De
Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr
More information6.852: Distributed Algorithms Fall, Instructor: Nancy Lynch TAs: Cameron Musco, Katerina Sotiraki Course Secretary: Joanne Hanley
6.852: Distributed Algorithms Fall, 2015 Instructor: Nancy Lynch TAs: Cameron Musco, Katerina Sotiraki Course Secretary: Joanne Hanley What are Distributed Algorithms? Algorithms that run on networked
More informationFAULT TOLERANT SYSTEMS
FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 5 Processor-Level Techniques & Byzantine Failures Chapter 2 Hardware Fault Tolerance Part.5.1 Processor-Level Techniques
More informationRuminations on Domain-Based Reliable Broadcast
Ruminations on Domain-Based Reliable Broadcast Svend Frølund Fernando Pedone Hewlett-Packard Laboratories Palo Alto, CA 94304, USA Abstract A distributed system is no longer confined to a single administrative
More informationPreventing (Network) Time Travel with Chronos. Omer Deutsch, Neta Rozen Schiff, Danny Dolev, Michael Schapira
Preventing (Network) Time Travel with Chronos Omer Deutsch, Neta Rozen Schiff, Danny Dolev, Michael Schapira Network Time Protocol (NTP) NTP synchronizes time across computer systems over the Internet.
More informationFailure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems
Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements
More informationA Framework for Highly Available Services Based on Group Communication
A Framework for Highly Available Services Based on Group Communication Alan Fekete fekete@cs.usyd.edu.au http://www.cs.usyd.edu.au/ fekete Department of Computer Science F09 University of Sydney 2006,
More informationNOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS: The copyright law of the United States (title 17, U.S. Code) governs the making of photocopies or other reproductions of copyrighted material. Any copying
More informationCapacity of Byzantine Agreement: Complete Characterization of Four-Node Networks
Capacity of Byzantine Agreement: Complete Characterization of Four-Node Networks Guanfeng Liang and Nitin Vaidya Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University
More informationBYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement)
BYZANTINE GENERALS (1) BYZANTINE GENERALS A fable: BYZANTINE GENERALS (2) Byzantine Generals Problem: Condition 1: All loyal generals decide upon the same plan of action. Condition 2: A small number of
More informationThe alternator. Mohamed G. Gouda F. Furman Haddix
Distrib. Comput. (2007) 20:21 28 DOI 10.1007/s00446-007-0033-1 The alternator Mohamed G. Gouda F. Furman Haddix Received: 28 August 1999 / Accepted: 5 July 2000 / Published online: 12 June 2007 Springer-Verlag
More informationCprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University
Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable
More informationPractical Byzantine Fault Tolerance
Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What
More informationAlgorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48
Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem
More informationInstructions. Notation. notation: In particular, t(i, 2) = 2 2 2
Instructions Deterministic Distributed Algorithms, 10 21 May 2010, Exercises http://www.cs.helsinki.fi/jukka.suomela/dda-2010/ Jukka Suomela, last updated on May 20, 2010 exercises are merely suggestions
More informationLast Class:Consistency Semantics. Today: More on Consistency
Last Class:Consistency Semantics Consistency models Data-centric consistency models Client-centric consistency models Eventual Consistency and epidemic protocols Lecture 16, page 1 Today: More on Consistency
More informationSelf-stabilizing Mutual Exclusion and Group Mutual Exclusion for Population Protocols with Covering (Extended Version)
Self-stabilizing Mutual Exclusion and Group Mutual Exclusion for Population Protocols with Covering (Extended Version) Joffroy Beauquier 1 and Janna Burman 2 1 LRI, University Paris-Sud 11, France. joffroy.beauquier@lri.fr
More informationAn Anonymous Self-Stabilizing Algorithm For 1-Maximal Matching in Trees
An Anonymous Self-Stabilizing Algorithm For 1-Maximal Matching in Trees Wayne Goddard, Stephen T. Hedetniemi Department of Computer Science, Clemson University {goddard,hedet}@cs.clemson.edu Zhengnan Shi
More informationEnhancing Throughput of
Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano Enhancing
More informationDistributed Algorithms Failure detection and Consensus. Ludovic Henrio CNRS - projet SCALE
Distributed Algorithms Failure detection and Consensus Ludovic Henrio CNRS - projet SCALE ludovic.henrio@cnrs.fr Acknowledgement The slides for this lecture are based on ideas and materials from the following
More informationChapter 39: Concepts of Time-Triggered Communication. Wenbo Qiao
Chapter 39: Concepts of Time-Triggered Communication Wenbo Qiao Outline Time and Event Triggered Communication Fundamental Services of a Time-Triggered Communication Protocol Clock Synchronization Periodic
More informationDistributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN
Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus The processes use consensus to agree on a common value out of values they initially propose Reaching consensus is one of
More informationCS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT
1 CS5412: CONSENSUS AND THE FLP IMPOSSIBILITY RESULT Lecture XII Ken Birman Generalizing Ron and Hermione s challenge 2 Recall from last time: Ron and Hermione had difficulty agreeing where to meet for
More informationPreventing (Network) Time Travel with Chronos. Omer Deutsch, Neta Rozen Schiff, Danny Dolev, Michael Schapira
Preventing (Network) Time Travel with Chronos Omer Deutsch, Neta Rozen Schiff, Danny Dolev, Michael Schapira Network Time Protocol (NTP) NTP synchronizes time across computer systems over the Internet.
More informationConsensus and related problems
Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?
More informationChallenges in Ubiquitous Data Mining
LIAAD-INESC Porto, University of Porto, Portugal jgama@fep.up.pt 1 2 Very-short-term Forecasting in Photovoltaic Systems 3 4 Problem Formulation: Network Data Model Querying Model Query = Q( n i=0 S i)
More informationLecture 1: Introduction to distributed Algorithms
Distributed Algorithms M.Tech., CSE, 2016 Lecture 1: Introduction to distributed Algorithms Faculty: K.R. Chowdhary : Professor of CS Disclaimer: These notes have not been subjected to the usual scrutiny
More informationParallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin
Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better
More informationReal-Time Component Software. slide credits: H. Kopetz, P. Puschner
Real-Time Component Software slide credits: H. Kopetz, P. Puschner Overview OS services Task Structure Task Interaction Input/Output Error Detection 2 Operating System and Middleware Application Software
More informationLecture 5: Duality Theory
Lecture 5: Duality Theory Rajat Mittal IIT Kanpur The objective of this lecture note will be to learn duality theory of linear programming. We are planning to answer following questions. What are hyperplane
More informationPaxos Replicated State Machines as the Basis of a High- Performance Data Store
Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a
More informationFailure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18
Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick
More informationPushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,
PushyDB Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina, osong}@mit.edu https://github.com/jeffchan/6.824 1. Abstract PushyDB provides a more fully featured database that exposes
More informationPropagated Timestamps: A Scheme for The Stabilization of Maximum Flow Routing Protocols
Propagated Timestamps: A Scheme for The Stabilization of Maximum Flow Routing Protocols Jorge A. Cobb Mohamed Waris Department of Computer Science University of Houston Houston, TX 77204-3475 Abstract
More informationCS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.
CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138
More information