Leslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works.

Size: px
Start display at page:

Download "Leslie Lamport. April 20, Leslie Lamport. Jenny Tyrväinen. Introduction. Education and Career. Most important works."

Transcription

1 April 20, 2016

2 Born February in New York

3 Mathematician by his education Has worked in industry, not an academic Fields: concurrency and distributed systems Lists 180 publications and other texts at his home page Several awards and honors (IEEE John von Neumann Medal 2008)

4 BSc at MIT 1960 MSc at Brandeis University 1963 Mitre Corporation Marlboro College Massachusetts Computer Associates (Compass)

5 continues PhD at Brandeis University 1972: The Analytic Cauchy with Singular Data SRI International Digital Equipment Corporation / Compaq Microsoft Research

6 , 1974 (Mutual exclusion) Works as queue ticket dispencers at bakery etc. Each process chooses own queue number by asking around and choosing a larger number If same number, process number decides Enters critical section on turn, smallest number first

7 Solves mutual exclusion problem! Doesn t rely on lower level Process can read all memory, write only its own: simultaneous read and write ok Process can read arbitrary value

8 , 1978 The Maintenance of Duplicate Databases by Paul Johnson and Bob Thomas Ordering of events: Two observers can disagree, which event happened first Synchronization of logical clocks to achieve partial ordering Expand to total ordering state-machine for arbitrary distributed system PODC Influential Paper Award 2000 and ACM SIGOPS Hall of Fame Award 2007

9 Clock conditions: C1. If a and b events in P i, and a comes before b, C i (a) < C i (b) C2. If a: P i sends a message and b is P j receiving it, C i (a) < C j (b) Implementation rules: IR1. Each process increments its clock between two successive events IR2. (a) event a: send m by process then include timestamp with current clock value (b) event b: receive m: process sets clock greater or equal to present value and greater than received timestamp

10 The, 1982 (consensus) SIFT: Design and Analysis of a Fault-Tolerant Computer for Aircraft Control, 1978 (Jean-Claude Laprie Award in Dependable Computing 2014) Reaching Agreement in the Presence of Faults, 1980 (Edsger W. Dijkstra Prize 2005) New paper mainly to assign a new name to the problem Jean-Claude Laprie Award in Dependable Computing 2013

11 The Group of generals deciding on attack/retreat, need to reach consensus Some are traitors, trying to get loyal generals to adopt a bad plan How many generals are needed to tolerate n traitors?

12 The Solution Solution with oral messages: 3n+1 generals With signatures: 2n+1 generals

13 OM(0): (1) Commander sends value to every lieutenant (2) Each lieutenant: use the value received or RETREAT, if no value OM(m), m > 0: if no value received, use RETREAT (1) Commander sends value to every lieutenant (2) every lieutenant uses OM(m-1) to send received value to n-2 other lieutenants (3) from received values, select majority

14 : Determining Global States of a Distributed System, 1985 Observer process: save state, send a token to all others Process on receiving token: save state, send to observer send token on every subsequent message If a process receives a message without the token, forward to observer The observer builds the global state ACM SIGOPS Hall of Fame Award 2013 and Edsger W. Dijkstra Prize in Distributed Computing 2014

15 L A TEX: A Document Preparation System, 1986 Macro package to Donald Knuth s TEX For writing the Great American Concurrency book (didn t) A user manual, published by Addison-Wesley

16 (TLA), 1994 Formalism for describing concurrent systems If you re not writing a program, don t use a programming language Developed also TLA+ and PlusCal TLA IDE Logic, algorithm language and tools for mechanically checking proofs

17 TLA Example IDE screenshot.png

18 The Part-Time Parliament, 1998 Tried to prove, that it is impossible to maintain consistency despite any number of non-byzantine faults Paxos algorithm instead Tried to popularize like the Byzantine generals problem ACM SIGOPS Hall of Fame Award 2012

19 Paxos algorithm The consensus algorithm: synod Three-phase: first value proposed, then accepted and lastly learned Client-server model, server duplicated: described as deterministic state-machine Fault-tolerance against arbitrary faults

20 Enormous impact on the development of distributed systems Developed many methods and concepts now considered elemental Mathematics to computer science: Formal descriptions Proofs for concurrency IEEE John von Neumann medal: For establishment of the foundations of distributed and concurrent computing

21 Thank you! Questions?

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement)

BYZANTINE GENERALS BYZANTINE GENERALS (1) A fable: Michał Szychowiak, 2002 Dependability of Distributed Systems (Byzantine agreement) BYZANTINE GENERALS (1) BYZANTINE GENERALS A fable: BYZANTINE GENERALS (2) Byzantine Generals Problem: Condition 1: All loyal generals decide upon the same plan of action. Condition 2: A small number of

More information

Consensus Problem. Pradipta De

Consensus Problem. Pradipta De Consensus Problem Slides are based on the book chapter from Distributed Computing: Principles, Paradigms and Algorithms (Chapter 14) by Kshemkalyani and Singhal Pradipta De pradipta.de@sunykorea.ac.kr

More information

Byzantine Techniques

Byzantine Techniques November 29, 2005 Reliability and Failure There can be no unity without agreement, and there can be no agreement without conciliation René Maowad Reliability and Failure There can be no unity without agreement,

More information

Proving the Correctness of Distributed Algorithms using TLA

Proving the Correctness of Distributed Algorithms using TLA Proving the Correctness of Distributed Algorithms using TLA Khushboo Kanjani, khush@cs.tamu.edu, Texas A & M University 11 May 2007 Abstract This work is a summary of the Temporal Logic of Actions(TLA)

More information

Distributed Systems 11. Consensus. Paul Krzyzanowski

Distributed Systems 11. Consensus. Paul Krzyzanowski Distributed Systems 11. Consensus Paul Krzyzanowski pxk@cs.rutgers.edu 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value must be one

More information

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS

Consensus. Chapter Two Friends. 2.3 Impossibility of Consensus. 2.2 Consensus 16 CHAPTER 2. CONSENSUS 16 CHAPTER 2. CONSENSUS Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of a node. Chapter

More information

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013

CSCI 5454, CU Boulder Samriti Kanwar Lecture April 2013 1. Byzantine Agreement Problem In the Byzantine agreement problem, n processors communicate with each other by sending messages over bidirectional links in order to reach an agreement on a binary value.

More information

Consensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55

Consensus. Chapter Two Friends. 8.3 Impossibility of Consensus. 8.2 Consensus 8.3. IMPOSSIBILITY OF CONSENSUS 55 8.3. IMPOSSIBILITY OF CONSENSUS 55 Agreement All correct nodes decide for the same value. Termination All correct nodes terminate in finite time. Validity The decision value must be the input value of

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Specifying and Proving Broadcast Properties with TLA

Specifying and Proving Broadcast Properties with TLA Specifying and Proving Broadcast Properties with TLA William Hipschman Department of Computer Science The University of North Carolina at Chapel Hill Abstract Although group communication is vitally important

More information

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure Yong-Hwan Cho, Sung-Hoon Park and Seon-Hyong Lee School of Electrical and Computer Engineering, Chungbuk National

More information

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015 Page 1 Introduction We frequently want to get a set of nodes in a distributed system to agree Commitment protocols and mutual

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus

Concepts. Techniques for masking faults. Failure Masking by Redundancy. CIS 505: Software Systems Lecture Note on Consensus CIS 505: Software Systems Lecture Note on Consensus Insup Lee Department of Computer and Information Science University of Pennsylvania CIS 505, Spring 2007 Concepts Dependability o Availability ready

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Byzantine Failures. Nikola Knezevic. knl

Byzantine Failures. Nikola Knezevic. knl Byzantine Failures Nikola Knezevic knl Different Types of Failures Crash / Fail-stop Send Omissions Receive Omissions General Omission Arbitrary failures, authenticated messages Arbitrary failures Arbitrary

More information

6.852: Distributed Algorithms Fall, Instructor: Nancy Lynch TAs: Cameron Musco, Katerina Sotiraki Course Secretary: Joanne Hanley

6.852: Distributed Algorithms Fall, Instructor: Nancy Lynch TAs: Cameron Musco, Katerina Sotiraki Course Secretary: Joanne Hanley 6.852: Distributed Algorithms Fall, 2015 Instructor: Nancy Lynch TAs: Cameron Musco, Katerina Sotiraki Course Secretary: Joanne Hanley What are Distributed Algorithms? Algorithms that run on networked

More information

CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries

CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries CMSC 858F: Algorithmic Game Theory Fall 2010 Achieving Byzantine Agreement and Broadcast against Rational Adversaries Instructor: Mohammad T. Hajiaghayi Scribe: Adam Groce, Aishwarya Thiruvengadam, Ateeq

More information

System Models 2. Lecture - System Models 2 1. Areas for Discussion. Introduction. Introduction. System Models. The Modelling Process - General

System Models 2. Lecture - System Models 2 1. Areas for Discussion. Introduction. Introduction. System Models. The Modelling Process - General Areas for Discussion System Models 2 Joseph Spring School of Computer Science MCOM0083 - Distributed Systems and Security Lecture - System Models 2 1 Architectural Models Software Layers System Architecture

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Introduction Malicious attacks and software errors that can cause arbitrary behaviors of faulty nodes are increasingly common Previous

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

Course Syllabus. Comp 735 Distributed and Concurrent Algorithms. Spring 2017

Course Syllabus. Comp 735 Distributed and Concurrent Algorithms. Spring 2017 Course Syllabus Comp 735 Distributed and Concurrent Algorithms Spring 2017 Meeting Place: SN 115 Meeting Time: 11:00 12:15, TuTh Course Web Page: http://www.cs.unc.edu/ anderson/teach/comp735 Instructor:

More information

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed

More information

Java RMI Middleware Project

Java RMI Middleware Project Java RMI Middleware Project Nathan Balon CIS 578 Advanced Operating Systems December 7, 2004 Introduction The semester project was to implement a middleware similar to Java RMI or CORBA. The purpose of

More information

Blockchain, Throughput, and Big Data Trent McConaghy

Blockchain, Throughput, and Big Data Trent McConaghy Blockchain, Throughput, and Big Data Trent McConaghy Bitcoin Startups Berlin Oct 28, 2014 Conclusion Outline Throughput numbers Big data Consensus algorithms ACID Blockchain Big data? Throughput numbers

More information

Self-Stabilizing Byzantine Digital Clock Synchronization

Self-Stabilizing Byzantine Digital Clock Synchronization Self-Stabilizing Byzantine Digital Clock Synchronization Ezra N. Hoch, Danny Dolev, Ariel Daliot School of Engineering and Computer Science, The Hebrew University of Jerusalem, Israel Problem Statement

More information

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation

CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation CPEG 852 Advanced Topics in Computing Systems The Dataflow Model of Computation Stéphane Zuckerman Computer Architecture & Parallel Systems Laboratory Electrical & Computer Engineering Dept. University

More information

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita

Proseminar Distributed Systems Summer Semester Paxos algorithm. Stefan Resmerita Proseminar Distributed Systems Summer Semester 2016 Paxos algorithm stefan.resmerita@cs.uni-salzburg.at The Paxos algorithm Family of protocols for reaching consensus among distributed agents Agents may

More information

Stabilization. Chapter Self-Stabilization 148 CHAPTER 13. STABILIZATION

Stabilization. Chapter Self-Stabilization 148 CHAPTER 13. STABILIZATION 148 CHAPTER 13. STABILIZATION Definition 13.2 (Time Complexity). The time complexity of a self-stabilizing system is the time that passed after the last (transient) failure until the system has converged

More information

Exam 2 Review. Fall 2011

Exam 2 Review. Fall 2011 Exam 2 Review Fall 2011 Question 1 What is a drawback of the token ring election algorithm? Bad question! Token ring mutex vs. Ring election! Ring election: multiple concurrent elections message size grows

More information

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value? Modes of communication Unicast 1ç è 1 Point to point Anycast 1è

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information

Distributed Systems. Chapman & Hall/CRC. «H Taylor S* Francis Croup Boca Raton London New York

Distributed Systems. Chapman & Hall/CRC. «H Taylor S* Francis Croup Boca Raton London New York Distributed Systems An Algorithmic Approach Sukumar Ghosh University of Iowa Iowa City, U.S.A. Chapman & Hall/CRC «H Taylor S* Francis Croup Boca Raton London New York Chapman & Hall/CRC is an imprint

More information

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING COURSE PLAN Course Code: CS0468 Course Title: Advanced Operating Systems Semester:VIII Course Time:Jan-May

More information

Isolating Compromised Routers. Alper Mizrak, Keith Marzullo and Stefan Savage UC San Diego Department of Computer Science and Engineering

Isolating Compromised Routers. Alper Mizrak, Keith Marzullo and Stefan Savage UC San Diego Department of Computer Science and Engineering Isolating Compromised Routers Alper Mizrak, Keith Marzullo and Stefan Savage UC San Diego Department of Computer Science and Engineering Problem Routers are vulnerable points in the Internet, especially

More information

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193 BYZANTINE AGREEMENT by H. R. Strong and D. Dolev IBM Research Laboratory, K55/281 San Jose, CA 95193 ABSTRACT Byzantine Agreement is a paradigm for problems of reliable consistency and synchronization

More information

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13.

Distributed Systems. Before We Begin. Advantages. What is a Distributed System? CSE 120: Principles of Operating Systems. Lecture 13. CSE 120: Principles of Operating Systems Lecture 13 Distributed Systems December 2, 2003 Before We Begin Read Chapters 15, 17 (on Distributed Systems topics) Prof. Joe Pasquale Department of Computer Science

More information

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

A definition. Byzantine Generals Problem. Synchronous, Byzantine world The Byzantine Generals Problem Leslie Lamport, Robert Shostak, and Marshall Pease ACM TOPLAS 1982 Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov OSDI 1999 A definition Byzantine (www.m-w.com):

More information

How Bitcoin achieves Decentralization. How Bitcoin achieves Decentralization

How Bitcoin achieves Decentralization. How Bitcoin achieves Decentralization Centralization vs. Decentralization Distributed Consensus Consensus without Identity, using a Block Chain Incentives and Proof of Work Putting it all together Centralization vs. Decentralization Distributed

More information

There Is More Consensus in Egalitarian Parliaments

There Is More Consensus in Egalitarian Parliaments There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine

More information

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1 Recap Distributed Systems Steve Ko Computer Sciences and Engineering University at Buffalo Facebook photo storage CDN (hot), Haystack (warm), & f4 (very warm) Haystack RAID-6, per stripe: 10 data disks,

More information

Kingdom of Saudi Arabia Ministry of Higher Education College of Computer & Information Sciences Majmaah University. Course Profile

Kingdom of Saudi Arabia Ministry of Higher Education College of Computer & Information Sciences Majmaah University. Course Profile Kingdom Saudi Arabia Ministry Higher Education College Computer & Information Sciences Majmaah University Course Prile Course Name:- Course Code:- CEN 449 Academic Year:- 1434-1435(2013-2014) Semester

More information

Fault-Tolerant Distributed Services and Paxos"

Fault-Tolerant Distributed Services and Paxos Fault-Tolerant Distributed Services and Paxos" INF346, 2015 2015 P. Kuznetsov and M. Vukolic So far " " Shared memory synchronization:" Wait-freedom and linearizability" Consensus and universality " Fine-grained

More information

Distributed Systems. Fault Tolerance. Paul Krzyzanowski

Distributed Systems. Fault Tolerance. Paul Krzyzanowski Distributed Systems Fault Tolerance Paul Krzyzanowski Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License. Faults Deviation from expected

More information

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d)

FAULT TOLERANCE. Fault Tolerant Systems. Faults Faults (cont d) Distributed Systems Fö 9/10-1 Distributed Systems Fö 9/10-2 FAULT TOLERANCE 1. Fault Tolerant Systems 2. Faults and Fault Models. Redundancy 4. Time Redundancy and Backward Recovery. Hardware Redundancy

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q Coordination 1 To do q q q Mutual exclusion Election algorithms Next time: Global state Coordination and agreement in US Congress 1798-2015 Process coordination How can processes coordinate their action?

More information

Thread Synchronization: Foundations. Properties. Safety properties. Edsger s perspective. Nothing bad happens

Thread Synchronization: Foundations. Properties. Safety properties. Edsger s perspective. Nothing bad happens Edsger s perspective Testing can only prove the presence of bugs Thread Synchronization: Foundations Properties Property: a predicate that is evaluated over a run of the program (a trace) every message

More information

Supplementary Reading List

Supplementary Reading List Massachusetts Institute of Technology Handout 3 6.852: Distributed Algorithms Prof. Nancy Lynch February 5, 2008 Supplementary Reading List 1. Other distributed algorithms textbooks [1] Hagit Attiya and

More information

Interval Algorithms for Coin Flipping

Interval Algorithms for Coin Flipping IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.2, February 2010 55 Interval Algorithms for Coin Flipping Sung-il Pae, Hongik University, Seoul, Korea Summary We discuss

More information

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos Consensus I Recall our 2PC commit problem FLP Impossibility, Paxos Client C 1 C à TC: go! COS 418: Distributed Systems Lecture 7 Michael Freedman Bank A B 2 TC à A, B: prepare! 3 A, B à P: yes or no 4

More information

Midterm CSE 21 Spring 2012

Midterm CSE 21 Spring 2012 Signature Name Student ID Midterm CSE 21 Spring 2012 Page 1 Page 2 Page 3 Page 4 Page 5 Page 6 _ (20 points) _ (15 points) _ (13 points) _ (23 points) _ (10 points) _ (8 points) Total _ (89 points) (84

More information

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What

More information

Artificial Neural Network Based Byzantine Agreement Protocol

Artificial Neural Network Based Byzantine Agreement Protocol P Artificial Neural Network Based Byzantine Agreement Protocol K.W. Lee and H.T. Ewe 2 Faculty of Engineering & Technology 2 Faculty of Information Technology Multimedia University Multimedia University

More information

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel.

Research Statement. Yehuda Lindell. Dept. of Computer Science Bar-Ilan University, Israel. Research Statement Yehuda Lindell Dept. of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il www.cs.biu.ac.il/ lindell July 11, 2005 The main focus of my research is the theoretical foundations

More information

BYZANTINE CONSENSUS THROUGH BITCOIN S PROOF- OF-WORK

BYZANTINE CONSENSUS THROUGH BITCOIN S PROOF- OF-WORK Informatiemanagement: BYZANTINE CONSENSUS THROUGH BITCOIN S PROOF- OF-WORK The aim of this paper is to elucidate how Byzantine consensus is achieved through Bitcoin s novel proof-of-work system without

More information

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM

More information

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms

Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Verteilte Systeme/Distributed Systems Ch. 5: Various distributed algorithms Holger Karl Computer Networks Group Universität Paderborn Goal of this chapter Apart from issues in distributed time and resulting

More information

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18 Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick

More information

Leslie Lamport: The Specification Language TLA +

Leslie Lamport: The Specification Language TLA + Leslie Lamport: The Specification Language TLA + This is an addendum to a chapter by Stephan Merz in the book Logics of Specification Languages by Dines Bjørner and Martin C. Henson (Springer, 2008). It

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

CS State Machine Replication

CS State Machine Replication CS 5450 State Machine Replication Key Ideas To tolerate faults replicate functionality! Can represent deterministic distributed system as replicated state machine (SMR) Each replica reaches the same conclusion

More information

Distributed Systems 2 Introduction

Distributed Systems 2 Introduction Distributed Systems 2 Introduction Alberto Montresor University of Trento, Italy 2018/09/13 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1 Getting

More information

Signed Messages. Signed Messages

Signed Messages. Signed Messages Signed Messages! Traitors ability to lie makes Byzantine General Problem so difficult.! If we restrict this ability, then the problem becomes easier! Use authentication, i.e. allow generals to send unforgeable

More information

Fault-Tolerant Distributed Consensus

Fault-Tolerant Distributed Consensus Fault-Tolerant Distributed Consensus Lawrence Kesteloot January 20, 1995 1 Introduction A fault-tolerant system is one that can sustain a reasonable number of process or communication failures, both intermittent

More information

Distributed Algorithms Introduction

Distributed Algorithms Introduction Distributed Algorithms Introduction Alberto Montresor University of Trento, Italy 2016/04/26 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Contents 1

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models

The Long March of BFT. Weird Things Happen in Distributed Systems. A specter is haunting the system s community... A hierarchy of failure models A specter is haunting the system s community... The Long March of BFT Lorenzo Alvisi UT Austin BFT Fail-stop A hierarchy of failure models Crash Weird Things Happen in Distributed Systems Send Omission

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN

Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Distributed Algorithms (PhD course) Consensus SARDAR MUHAMMAD SULAMAN Consensus (Recapitulation) A consensus abstraction is specified in terms of two events: 1. Propose ( propose v )» Each process has

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

CONCURRENT/DISTRIBUTED PROGRAMMING ILLUSTRATED USING THE DINING PHILOSOPHERS PROBLEM *

CONCURRENT/DISTRIBUTED PROGRAMMING ILLUSTRATED USING THE DINING PHILOSOPHERS PROBLEM * CONCURRENT/DISTRIBUTED PROGRAMMING ILLUSTRATED USING THE DINING PHILOSOPHERS PROBLEM * S. Krishnaprasad Mathematical, Computing, and Information Sciences Jacksonville State University Jacksonville, AL

More information

Distributed Systems COMP 212. Lecture 1 Othon Michail

Distributed Systems COMP 212. Lecture 1 Othon Michail Distributed Systems COMP 212 Lecture 1 Othon Michail Course Information Lecturer: Othon Michail Office 2.14 Holt Building Module Website: http://csc.liv.ac.uk/~michailo/teaching/comp212 VITAL may be used

More information

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering Agreement and Consensus SWE 622, Spring 2017 Distributed Software Engineering Today General agreement problems Fault tolerance limitations of 2PC 3PC Paxos + ZooKeeper 2 Midterm Recap 200 GMU SWE 622 Midterm

More information

Fundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1)

Fundamentals Large-Scale Distributed System Design. (a.k.a. Distributed Systems 1) Fundamentals Large-Scale Distributed System Design (a.k.a. Distributed Systems 1) https://columbia.github.io/ds1-class/ 1 Interested in... 1. scalable web services? 2. big data? 3. and the large-scale

More information

Global Technology Standards a bridge to the future

Global Technology Standards a bridge to the future Global Technology Standards a bridge to the future Donald E. Purcell, Chairman The Center for Global Standards Analysis May 19, 2005 Creative Commons License The information provided in this presentation

More information

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm

A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Appears as Technical Memo MIT/LCS/TM-590, MIT Laboratory for Computer Science, June 1999 A Correctness Proof for a Practical Byzantine-Fault-Tolerant Replication Algorithm Miguel Castro and Barbara Liskov

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process

More information

Prime Generating Algorithms by Skipping Composite Divisors

Prime Generating Algorithms by Skipping Composite Divisors Prime Generating Algorithms by Skipping Composite Divisors Neeraj Anant Pande Assistant Professor Department of Mathematics & Statistics Yeshwant Mahavidyalaya (College), Nanded-431602 Maharashtra, INDIA

More information

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems

A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems A Two-Layer Hybrid Algorithm for Achieving Mutual Exclusion in Distributed Systems QUAZI EHSANUL KABIR MAMUN *, MORTUZA ALI *, SALAHUDDIN MOHAMMAD MASUM, MOHAMMAD ABDUR RAHIM MUSTAFA * Dept. of CSE, Bangladesh

More information

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem

More information

Distributed Deadlock

Distributed Deadlock Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages

More information

EECS 482 Introduction to Operating Systems

EECS 482 Introduction to Operating Systems EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci (Thanks, Harsha Madhyastha and Jason Flinn for the slides!) Distributed file systems Remote storage of data that appears local Examples:

More information

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali

Self Stabilization. CS553 Distributed Algorithms Prof. Ajay Kshemkalyani. by Islam Ismailov & Mohamed M. Ali Self Stabilization CS553 Distributed Algorithms Prof. Ajay Kshemkalyani by Islam Ismailov & Mohamed M. Ali Introduction There is a possibility for a distributed system to go into an illegitimate state,

More information

Distributed Algorithms 6.046J, Spring, Nancy Lynch

Distributed Algorithms 6.046J, Spring, Nancy Lynch Distributed Algorithms 6.046J, Spring, 205 Nancy Lynch What are Distributed Algorithms? Algorithms that run on networked processors, or on multiprocessors that share memory. They solve many kinds of problems:

More information

The PlusCal Code for Byzantizing Paxos by Refinement

The PlusCal Code for Byzantizing Paxos by Refinement The PlusCal Code for Byzantizing Paxos by Refinement Leslie Lamport 28 June 2011 Contents 1 Algorithm Consensus 1 2 Algorithm Voting 2 3 Algorithm PCon 5 4 Algorithm BPCon 9 References 13 This document

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts

Event Ordering Silberschatz, Galvin and Gagne. Operating System Concepts Event Ordering Happened-before relation (denoted by ) If A and B are events in the same process, and A was executed before B, then A B If A is the event of sending a message by one process and B is the

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

SIR C.R.REDDY COLLEGE OF ENGINEERING, ELURU DEPARTMENT OF INFORMATION TECHNOLOGY LESSON PLAN

SIR C.R.REDDY COLLEGE OF ENGINEERING, ELURU DEPARTMENT OF INFORMATION TECHNOLOGY LESSON PLAN SIR C.R.REDDY COLLEGE OF ENGINEERING, ELURU DEPARTMENT OF INFORMATION TECHNOLOGY LESSON PLAN SUBJECT: (IT 4.1.3) ADVANCED OPERATING SYSTEM CLASS: 4/4 B.Tech. I SEMESTER, A.Y.2017-18 INSTRUCTOR: CHALLA

More information

arxiv: v2 [cs.dc] 12 Sep 2017

arxiv: v2 [cs.dc] 12 Sep 2017 Efficient Synchronous Byzantine Consensus Ittai Abraham 1, Srinivas Devadas 2, Danny Dolev 3, Kartik Nayak 4, and Ling Ren 2 arxiv:1704.02397v2 [cs.dc] 12 Sep 2017 1 VMware Research iabraham@vmware.com

More information

A Diversity of Duplications

A Diversity of Duplications A Diversity of Duplications David Powell Special event «Dependability of computing systems, Memories and future» in honor of Jean-Claude Laprie LAAS-CNRS, Toulouse, 16 April 2010 Duplication error Detection

More information

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory Byzantine Fault Tolerance and Consensus Adi Seredinschi Distributed Programming Laboratory 1 (Original) Problem Correct process General goal: Run a distributed algorithm 2 (Original) Problem Correct process

More information

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009 Greg Bilodeau CS 5204 November 3, 2009 Fault Tolerance How do we prepare for rollback and recovery in a distributed system? How do we ensure the proper processing order of communications between distributed

More information