Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Similar documents
Transactions Between Distributed Ledgers

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

CONSENSUS PROTOCOLS & BLOCKCHAINS. Techruption Lecture March 16 th, 2017 Maarten Everts (TNO & University of Twente)

Recovering from a Crash. Three-Phase Commit

Practical Byzantine Fault

Today: Fault Tolerance. Fault Tolerance

Distributed Commit in Asynchronous Systems

Robust BFT Protocols

Cryptocurrency and Blockchain Research

Today: Fault Tolerance

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Security (and finale) Dan Ports, CSEP 552

Failures, Elections, and Raft

BlockFin A Fork-Tolerant, Leaderless Consensus Protocol April

Distributed Systems Consensus

Alternative Consensus

Introduction to Distributed Systems Seif Haridi

Consensus a classic problem. Consensus, impossibility results and Paxos. Distributed Consensus. Asynchronous networks.

Consensus, impossibility results and Paxos. Ken Birman

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Recall our 2PC commit problem. Recall our 2PC commit problem. Doing failover correctly isn t easy. Consensus I. FLP Impossibility, Paxos

Distributed Consensus Protocols

To do. Consensus and related problems. q Failure. q Raft

Hyperledger fabric: towards scalable blockchain for business

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Byzantine Fault Tolerant Raft

Paxos and Raft (Lecture 21, cs262a) Ion Stoica, UC Berkeley November 7, 2016

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance. Castro and Liskov SOSP 99

A definition. Byzantine Generals Problem. Synchronous, Byzantine world

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Consensus and A Simple Distributed Ledger Application Hao Xu Muyun Chen Xin Li

Consensus Problem. Pradipta De

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Distributed Consensus: Making Impossible Possible

Replication in Distributed Systems

Intuitive distributed algorithms. with F#

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

AS distributed systems develop and grow in size,

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Paxos and Replication. Dan Ports, CSEP 552

Candidates Day Modeling the Energy Consumption of. Ryan Cole Liang Cheng. CSE Department Lehigh University

Distributed Systems. Day 11: Replication [Part 3 Raft] To survive failures you need a raft

Blockchain for Enterprise: A Security & Privacy Perspective through Hyperledger/fabric

Reducing the Costs of Large-Scale BFT Replication

Practice: Large Systems Part 2, Chapter 2

Distributed Systems 11. Consensus. Paul Krzyzanowski

Consensus and related problems

SCP: A Computationally Scalable Byzantine Consensus Protocol for Blockchains

Recap. CSE 486/586 Distributed Systems Paxos. Paxos. Brief History. Brief History. Brief History C 1

Distributed Consensus: Making Impossible Possible

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

Proof of Stake Made Simple with Casper

The Ripple Protocol Consensus Algorithm

Byzantine Fault Tolerance

Synchrony Weakened by Message Adversaries vs Asynchrony Enriched with Failure Detectors. Michel Raynal, Julien Stainer

Distributed Systems COMP 212. Lecture 19 Othon Michail

or? Paxos: Fun Facts Quorum Quorum: Primary Copy vs. Majority Quorum: Primary Copy vs. Majority

Byzantine Fault Tolerance

Authenticated Agreement

CS October 2017

Hyperledger Fabric v1:

Agreement in Distributed Systems CS 188 Distributed Systems February 19, 2015

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

BYZANTINE AGREEMENT CH / $ IEEE. by H. R. Strong and D. Dolev. IBM Research Laboratory, K55/281 San Jose, CA 95193

Authenticated Byzantine Fault Tolerance Without Public-Key Cryptography

Key-value store with eventual consistency without trusting individual nodes

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

OUROBOROS PRAOS: AN ADAPTIVELY-SECURE, SEMI-SYNCHRONOUS

Exam 2 Review. October 29, Paul Krzyzanowski 1

A Lightweight Blockchain Consensus Protocol

Beyond FLP. Acknowledgement for presentation material. Chapter 8: Distributed Systems Principles and Paradigms: Tanenbaum and Van Steen

PBFT: A Byzantine Renaissance. The Setup. What could possibly go wrong? The General Idea. Practical Byzantine Fault-Tolerance (CL99, CL00)

CS 4770: Cryptography. CS 6750: Cryptography and Communication Security. Alina Oprea Associate Professor, CCIS Northeastern University

Consensus in Distributed Systems. Jeff Chase Duke University

arxiv: v2 [cs.dc] 12 Sep 2017

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

Designing for Understandability: the Raft Consensus Algorithm. Diego Ongaro John Ousterhout Stanford University

Distributed Consensus Protocols and Algorithms

EECS 498 Introduction to Distributed Systems

CSE 5306 Distributed Systems

Blockchain Basics A. Introduction B. Key Properties

CSE 5306 Distributed Systems. Fault Tolerance

Basic vs. Reliable Multicast

A Reliable Broadcast System

Consensus & Blockchain

FAULT TOLERANT LEADER ELECTION IN DISTRIBUTED SYSTEMS

Distributed Systems 8L for Part IB

PARALLEL CONSENSUS PROTOCOL

Paxos provides a highly available, redundant log of events

P2 Recitation. Raft: A Consensus Algorithm for Replicated Logs

Extend PB for high availability. PB high availability via 2PC. Recall: Primary-Backup. Putting it all together for SMR:

MENCIUS: BUILDING EFFICIENT

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Transcription:

Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1

Data Consistency Point-in-time consistency Transaction consistency Application consistency 2

Strong Consistency ACID Atomicity. All of the operations in the transaction will complete, or none will. Consistency. The database will be in a consistent state when the transaction begins and ends. Isolation. The transaction will behave as if it is the only operation being performed upon the database. Durability. Upon completion of the transaction, the operation will not be reversed. 3

CAP theorem A distributed computer system CANNOT simultaneously provide: Consistency all nodes see the same data at the same time A va i l a b i l i t y every request receives a response about whether it succeeded or failed Pa r t i t i o n t o le r a n c e the system continues to operate despite arbitrary partitioning due to network failures 2 of 3 is misleading! Partitions are rase, so detecting partition failure is more valuable The choice is mainly between C and A CAP are not binary, 0 vs. 1 4

Eventual Consistency Based CAP ACID provides Consistency for partitioned database, but give ups availability to some extent as tradeoff In ACID, product availability is lower than components availability BASE (An ACID alternative, diametrically opposed) Trading some consistency for availability can lead to dramatic improvements in scalability What makes up BASE? Basically Available Soft state Eventual consistency 5

Consensus A process of agreeing on one result among a group of participants to guarantee data consistency 6

AGREEMENT ABC XYZ ABC XYZ Consensus SRT NMO XYZ XYZ 7

VALIDITY NMO NMO XYZ NMO Consensus NMO NMO NMO NMO 8

TERMINATION NMO NMO XYZ NMO Consensus NMO NMO NMO NMO 9

Consensus Models and Challenges Synchronous and Asynchronous models Timing control Leader Challenges Crash failure Byzantine 10

FLP Impossibility FLP (Fischer, Lynch and Paterson) No completely asynchronous consensus protocol can tolerate even a single unannounced process death. It is not saying that consensus can never be reached: merely that no algorithm can always reach consensus in bounded time. Inspire discussion of: Safety: something bad will never happen Liveness: Something good will eventually happen 11

Consensus - 2PC 2PC (Two Phase Commit) Agree on a decision for transaction atomic Coordinator ensures Safety by rollbacking transactional update The failure of Coordinator blocks the consensus and thus heuristic action needs to be designed. 3N-1 message and 3 force writes to log, probably most efficient consensus protocol 12

Consensus - Paxos Paxos Paxon Parliament : Agree on a value or a decision Customary asynchronous, non-byzantine model with three roles and two phases Proposer, Acceptor and Learner Prepare, Accept Tolerate non-malicious Failure among 2F+1 acceptor Non-blocking via leader election Multi-Paxos to facilitate a seriers of decisions 13

Consensus - Raft Raft Equivalent to Paxos in terms of Performance But better Understandability and Completeness Leader election Heartbeats and random timeouts Log replication Strong leadership: receive command from Client, append log and replicates its log to followers Safety Only leader can decide to COMMIT Roles: Leader, Follower, Candidates 14

One Lie is enough to question all truths Anonymous 15

Byzantine Empire (height AD 555) 16

Consensus - PBFT Byzantine fault Failure to return a result Return of an incorrect result Return of a deliberately misleading result Return of a differing result to different parts of the system Pratical Byzantine Fault Tolerance (PBFT) A big step for Byzantine Fault Further research e.g. Q/U, HQ, Zyzzyva and ABsTRACTs to lower costs and improve performance, Aardvark RBFT to improve robustness. 17

PBFT Cont. Malicious Primary? Might lie! get(x) 10 X=-1 10 18

PBFT Cont. Malicious Primary? Solution: direct response from participants Might lie more! X=10 get(x) 10 X=-1 X=-1 X=-1 10 X=-1 19 X=10

PBFT Cont. Ok, please sign your name!!! Public-key crypto for signatures X=10, Signed S2 get(x) S1 S2 10 X=-1, signed S1 S3 Rejected due to duplication 10 X=10, signed S3 20

PBFT Cont. 3f + 1? At f + 1 we can tolerate f failures and hold on to data. At 2f + 1 we can tolerate f failures and remain available. What do we get for 3f + 1? Client get(x) S1 10 S2 10 S3 We can tolerate f malicious or arbitrarily nasty failures S4?? 21

PBFT Cont. Three Phases Commit for agreement Pre-Prepare, Prepare and Commit Pre-prepare and Prepare: quibble about order Prepare and Commit: atomicity, and thus safety Checkpoint for Garbage Collection log is a must to guarreetee Safety, and GC is a must to manage log View Changes for crashed or malicious leader Triggered by replicas timeout Provide Liverness by Re-electing a new leader (thus new view) 22

PBFT Cont. Step 1: Client sends a request to the primary. The primary can then validate the message and propose a sequence number for it. 23

PBFT Cont. Step 2: Primary sends pre-prepare message to all backups. This allows the backups to validate the message and receive the sequence number. 24

PBFT Cont. Step 3: All functioning backups send prepare message to all other backups. This allows replicas to agree on a total ordering. 25

PBFT Cont. Step 4: All replicas multicast a commit. The replicas have agreed on an ordering and have acknowledged the receipt of the request. 26

PBFT Cont. Step 5: Each functioning replica sends a reply directly to the client. This bypasses the case where the primary fails between request and reply. 27

PBFT Cont. Summary of PBFT MODEL : Safe under asynchrony but needs periods of synchrony to proceed (relies upon message timeouts). PROS : Uses a leader, and is fast while network synchronous and leader is correct. Complete system sequential writes. CONS : Leader reelected on difficulties. A Byzantine leader can fwd messages on edge timeout window and cause massive slow down. The adversary can attack the network to make correct leader be seen as faulty and cause endless leader election. 28

SEIVE PBFT is Order-then-Execute Atomic broadcast forms bottleneck Output is NOT deterministic (same operations generate same output) Sieve is Execute-then-Order with a designated leader Operation requests are sent to leader as INVOKE Leader orders the requests, and asks peers to EXECUTE operations in order (batch multiple operations like Eve) Each peer executes operation speculatively, and send output back to leader as APPROVE Commit if at least f+1 same output among at least 2f+1 APPROVE, otherwise abort, and decision is broadcasted as an ORDER start-epoch/new-sieve-config for selecting new Sieve leader 29

Consensus - RPCA Ripple Protocol Consensus Algorithm (RPCA) Design Goal: Correctness, Agreement and Utility Consensus at ledger-close Form transaction candidate set Amalgamates the set to UNL (Unique Node List) and vote on the veracity of all transactions Reject transaction with a minimum percentage by voting (aviod impact from slow nodes) Final round consensus requires 80% agreement among UTL Apply agreed transactions to ledger and the ledger is closed as new last-closed ledger (represent the state of the network) Tolerate f <= (n-1)/5 Carefully selected UNL to avoid fork and to guarantee correctness 30

Public Blockchains require Openness 31

Consensus - POW Proof Of Work in Bitcoin: Problem: Double Spend electronic coin in a p2p network Implicit Consensus for block generation Random Miner consumes huge computing resource to mine next block by seeking a nonce for a hush puzzle A block is then validated and accepted by others POW difficulty is moving to control black rate Majority decision is represented by the longest chain Essentially one-cpu-one-vote, and thus attacher needs 51% CPUs Sacrifice safety to achieve openness and liveness Eventually consistency, 10m for block generate, hours to confirm forks make thing worse Ethereum customised POW to consume storage as the work instead. 32

Steps of Bitcoin network New transactions are broadcast to all nodes Each node collects new transactions into a block Each node works on finding a difficult proof-of-work for its block When a node finds a proof-of-work, it broadcasts the block to all nodes Nodes accept the block only if all transactions in it are valid and already spent Nodes express their acceptance of the block by working on creating the next block in the chain, using the hash of the accepted block as the previous hash 33

Consensus - POS and others Thoughts about POW Do we really need to waste so much computing power? Is there better ways to make an individual miner finds the task expensive? Proof Of Stake (POS) Alternative to POW proposed by Quantum Mechanic The probability of mining a block depends on the amount of Bitcoin a miner holds Proof of Burn (POB) Alternative to POW and POS Miners should show proof that they burned some coins - that is, sent them to a verifiably unspendable address 34

A Byzantine summary Safety Liveness Openness Fault Tolerance Throughput Consumption 2PC Good Week No Participant crash Good Low Paxos Good Ok Week f/2f+1 crash Good Medium Raft Good Ok Week f/2f+1 crash Good Medium PBFT Good Ok Week f/3f+1 Byzantine ok High Bandwidth Sieve Good Ok Week f/3f+1 Byzantine Good Better than PBFT via Eve RPCA Ok Ok Week f <= (n-1)/5 Byzantine Good High Bandwidth POW Week Good Good 49% Week High CPU POS Week Good Good 49% Week Low 35

Understand the requirement, decide the tradeoff, Plug-in Consensus BC s suggestion, 2016 July 36