Enhancing Throughput of

Size: px
Start display at page:

Download "Enhancing Throughput of"

Transcription

1 Enhancing Throughput of NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano

2 Enhancing Throughput of Partially Replicated State Machines via NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano

3 Enhancing Throughput of Partially Replicated State Machines via Multi-Partition Operation Scheduling NCA 2017 Zhongmiao Li, Peter Van Roy and Paolo Romano

4 Background Online services strive to have 7*24 availability. Replication is crucial to ensure availability. State-machine replication (SMR) is a key technique to implement fault-tolerant services.

5 Background Online services strive to have 7*24 availability. Replication is crucial to ensure availability. State-machine replication (SMR) is a key technique to implement fault-tolerant services.

6 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas A, B, C

7 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas A, B, C A, B, C A, B, C

8 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas OP1 OP2 OP3 A, B, C A, B, C A, B, C

9 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas Consensus A, B, C A, B, C A, B, C

10 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas Consensus OP1 OP2 OP3 A, B, C A, B, C A, B, C

11 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas Consensus A, B, C A, B, C A, B, C

12 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas A, B, C OP2 OP1 OP3 Consensus OP2 OP1 OP3 A, B, C A, B, C OP2 OP1 OP3

13 Background State-machine replication Applications are abstracted as deterministic state machines All replicas store application state Replicas agree on operation order (e.g. using Paxos), then execute Deterministic operation => equivalent final state of replicas Consensus A, B, C A, B, C A, B, C

14 Background Partially-replicated state machines(i) The classical SMR does not scale Replicas store full state & execute all update ops => throughput limited by single replica s capacity & speed! Recent work propose to partially-replicate state machines to enhance scalability High performance state- machine replication, DSN 11 Calvin: fast distributed transactions for partitioned database systems, SIGMOD 12 Scalable state-machine replication, DSN 14

15 Background Partially-replicated state machines(ii) A BC A B C A B C

16 Background Partially-replicated state machines(ii) Replication group A BC A B C A B C

17 Background Partially-replicated state machines(ii) Replication group A A A B B B C C C

18 Background Partially-replicated state machines(ii) Replication group A A A A Replication group B B B B Replication group C C C C

19 Background Partially-replicated state machines(ii) Replication group A A A A Replication group B B B B Replication group C C C C Each replica splits their state to multiple partitions Ops involving single partition (SPOs) only executed by that partition Ops involving multiple partitions (MPOs) coordinated and then executed by involved partitions

20 Background Partially-replicated state machines(ii) Replication group A A A A Replication group B B B B Replication group C C C C Each replica splits their state to multiple partitions Ops involving single partition (SPOs) only executed by that partition Ops involving multiple partitions (MPOs) coordinated and then executed by involved partitions But.. can we scale linearly by adding more partitions?

21 Problems Coordinating MPOs (i) Partitions have to agree on the order of MPOs.

22 Problems Coordinating MPOs (i) Partitions have to agree on the order of MPOs. OP1: A=10 B=10 A B OP2: A=5 B=5

23 Problems Coordinating MPOs (i) Partitions have to agree on the order of MPOs. OP1: A=10 A B OP2: A=5 B=5 OP1: B=10

24 Problems Coordinating MPOs (i) Partitions have to agree on the order of MPOs. OP1: A=10 OP2: B=5 OP2: A=5 A B OP1: B=10

25 Problems Coordinating MPOs (i) Partitions have to agree on the order of MPOs. OP1: A=10 OP2: B=5 OP2: A=5 A B OP1: B=10 A=5 B=10

26 Problems Coordinating MPOs (i) Partitions have to agree on the order of MPOs. OP1: A=10 OP2: B=5 OP2: A=5 A B OP1: B=10 A=5 B=10 Coordinating MPOs is slow Replication + multiple inter-group communication In existing systems, the coordination of MPOs lies on the critical path of execution! Partitions sit idle while coordinating MPOs=> throughput reduced

27 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 A B C

28 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP1 A B C

29 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP1 A B C

30 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP2 A OP1 B C

31 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP2 A OP1 B C

32 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP2 A OP1 B C

33 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP2 A OP1 B C

34 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP3: C=100 OP1 OP2 A OP1 OP2 B C

35 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP1 OP2 A OP1 OP2 Ordering lies on the critical path of execution Non-scalable B OP3: C=100 C

36 Problems Coordinating MPOs (ii) Calvin requires all-to-all synchronization to order ops Progresses in round, in each round: OP1: A=10, B=10 OP2: A=5, B=5 OP1 OP2 A OP1 OP2 Ordering lies on the critical path of execution Non-scalable B OP3: C=100 C Scalable SMR leverages atomic multicast to order ops More scalable than Calvin, but ordering still lies on the critical path of execution Additional messages exchanged between partitions to ensure linearizability* *Omitted due to time constraints; refer to paper if interested

37 Solution Genepi Remove the coordination of MPOs from the critical path of operation execution by: Schedule MPOs to future round => overlap the ordering of MPOs & processing of ordered ops Genepi: Efficient execution protocol ensuring linearizability* Scraper: an ordering building block for Genepi

38 Solution Genepi Remove the coordination of MPOs from the critical path of operation execution by: Schedule MPOs to future round => overlap the ordering of MPOs & processing of ordered ops Genepi: Efficient execution protocol ensuring linearizability* Scraper: an ordering building block for Genepi *Omitted due to time constraints; refer to paper if interested

39 Solution Genepi Remove the coordination of MPOs from the critical path of operation execution by: Schedule MPOs to future round => overlap the ordering of MPOs & processing of ordered ops Genepi: Efficient execution protocol ensuring linearizability* Scraper: an ordering building block for Genepi Scalable consensus for partial replication *Omitted due to time constraints; refer to paper if interested

40 Solution Scraper abstraction Formal specifications can be found in the paper S-Propose(SPOs, Rs, MPOs, Rm) Propose accumulated ops for each round Rs current round & Rm a future round => only low bound on final round S-Decide(OPs, R) Triggered when the operations for R has been decided R can only be decided if 1, 2,, R-1 have all been decided

41 Solution Genepi Execution Scraper Partition A Partition B

42 Solution Genepi Execution Round 1 Scraper Partition A Partition B

43 Solution Genepi Execution Round 1 Scraper Propose(SPO1, 1, MPO1, 2) Propose(SPO2, 1, MPO2, 2) Partition A Partition B

44 Solution Genepi Execution Round 1 Scraper Partition A Partition B

45 Solution Genepi Execution Round 1 Decide(SPO1,1) Scraper Decide(SPO2,1) Partition A Partition B

46 Solution Genepi Execution Round 1 Scraper Partition A Partition B

47 Solution Genepi Execution Round 2 MPO1: 2 Scraper Partition A Partition B

48 Solution Genepi Execution Round 2 MPO1: 2 Scraper Propose(SPO3, 2, MPO3, 3) Propose(SPO4, 2, MPO4, 3) Partition A Partition B

49 Solution Genepi Execution Round 2 MPO1: 2 Scraper Partition A Partition B

50 Solution Genepi Execution Round 2 Decide([SPO3, MPO1],2) Scraper Decide([SPO4, MPO1],2) Partition A Partition B

51 Solution Genepi Execution Round 2 Scraper Partition A Partition B

52 Solution Genepi Execution Scraper Partition A Partition B

53 Solution Genepi Execution Round 3 MPO2: 3 MPO3: 3 MPO4: 3 Scraper Partition A Partition B

54 Solution Genepi Execution Round 3 MPO2: 3 MPO3: 3 MPO4: 3 Scraper Propose( ) Propose( ) Partition A Partition B

55 Solution Genepi Execution Round 3 MPO2: 3 MPO3: 3 MPO4: 3 Scraper Partition A Partition B

56 Solution Genepi Execution Round 3 Scraper Decide([.., MPO2, MPO3, MPO4],3) Decide([.., MPO2, MPO3, MPO4],3) Partition A Partition B

57 Solution Genepi Execution Round 3 Scraper Partition A Partition B

58 Solution Scraper design (i) Avoiding synchronizing all partitions for scalability

59 Solution Scraper design (i) Avoiding synchronizing all partitions for scalability

60 Solution Scraper design (i) Avoiding synchronizing all partitions for scalability Partitions unilaterally advance rounds

61 Solution Scraper design (i) Avoiding synchronizing all partitions for scalability Partitions unilaterally advance rounds

62 Solution Scraper design (i) Avoiding synchronizing all partitions for scalability Partitions unilaterally advance rounds How to ensure they agree on rounds of ops?

63 Solution Scraper design (i) Avoiding synchronizing all partitions for scalability Partitions unilaterally advance rounds How to ensure they agree on rounds of ops? Key idea: a two-phase-commit-like protocol for partitions to agree on the round of an operation

64 Solution Scraper design (ii) R: 10 R: 13 Partition A Partition B

65 Solution Scraper design (ii) 1. Coordinator sends request with min_round 1 OP1: round 12 R: 10 R: 13 Partition A Partition B

66 Solution Scraper design (ii) 1. Coordinator sends request with min_round 2. Partitions propose max(min_round, decided round+1) 1 OP1: round 12 2 round 14 R: 10 R: 13 Partition A R14 Partition B OP1

67 Solution Scraper design (ii) 1. Coordinator sends request with min_round 2. Partitions propose max(min_round, decided round+1) 3. Coordinator decides max(received rounds) 1 OP1: round 12 2 round 14 R: 10 R: 13 Partition A 3 round 14 R14 Partition B OP1

68 Solution Scraper design (ii) 1. Coordinator sends request with min_round 2. Partitions propose max(min_round, decided round+1) 3. Coordinator decides max(received rounds) 4. Partitions finalize proposal 1 OP1: round 12 2 round 14 R: 10 R: 13 Partition A 3 round 14 R14 Partition B

69 Solution Other aspects in the paper Replication to ensure fault-tolerance Lightweight mechanism to ensure linearizability Delay replying to clients Choosing round numbers for MPOS Big enough to allow ordering MPOs Not too large to avoid unnecessary latency overhead

70 Implementation: Evaluation Experimental setup Calvin, S-SMR and Genepi all implemented based on Calvin s codebase (in C++) Deployment: Deployed in Grid Used up to 40 nodes in the same region; RTT is around 0.4ms Replication cost emulated by injecting 3ms delay 5ms round duration for batching MPOs scheduled two rounds later (2*5ms)

71 Evaluation Micro benchmark Each op reads & updates 10 keys Increase number of nodes & percentage of MPOs

72 Evaluation Micro benchmark Each op reads & updates 10 keys Increase number of nodes & percentage of MPOs

73 Evaluation Micro benchmark Each op reads & updates 10 keys Increase number of nodes & percentage of MPOs : Genepi scales better than Calvin: 83% higher throughput with 40 nodes & 1% MPOs : Latency of MPOs is 7~14 ms higher than SPOs

74 Evaluation TPC-C About 10% distributed transactions Includes heavy-weight and/or read-only txns At 40 nodes, Genepi has 45% throughput gain

75 Evaluation TPC-C About 10% distributed transactions Includes heavy-weight and/or read-only txns At 40 nodes, Genepi has 45% throughput gain

76 Summary Genepi s idea of postponing the execution of MPOs allow remove MPO coordination from the critical path of operation execution Questions?

77 Evaluation Micro benchmark 10 nodes, Varying the % of MPOs and partitions accessed by MPOs Genepi is only worse for workloads with lots of MPOs that access lots of MPOs!

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li

Janus. Consolidating Concurrency Control and Consensus for Commits under Conflicts. Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li Janus Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, Jinyang Li New York University, University of Southern California State of the Art

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports University of Washington Distributed storage systems

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication DB Reading Group Fall 2015 slides by Dana Van Aken Building Consistent Transactions with Inconsistent Replication Irene Zhang, Naveen Kr. Sharma, Adriana Szekeres, Arvind Krishnamurthy, Dan R. K. Ports

More information

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton

TAPIR. By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton TAPIR By Irene Zhang, Naveen Sharma, Adriana Szekeres, Arvind Krishnamurthy, and Dan Ports Presented by Todd Charlton Outline Problem Space Inconsistent Replication TAPIR Evaluation Conclusion Problem

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines

SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines SDPaxos: Building Efficient Semi-Decentralized Geo-replicated State Machines Hanyu Zhao *, Quanlu Zhang, Zhi Yang *, Ming Wu, Yafei Dai * * Peking University Microsoft Research Replication for Fault Tolerance

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution

Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution Fine-grained Transaction Scheduling in Replicated Databases via Symbolic Execution Raminhas pedro.raminhas@tecnico.ulisboa.pt Stage: 2 nd Year PhD Student Research Area: Dependable and fault-tolerant systems

More information

Evaluating BFT Protocols for Spire

Evaluating BFT Protocols for Spire Evaluating BFT Protocols for Spire Henry Schuh & Sam Beckley 600.667 Advanced Distributed Systems & Networks SCADA & Spire Overview High-Performance, Scalable Spire Trusted Platform Module Known Network

More information

There Is More Consensus in Egalitarian Parliaments

There Is More Consensus in Egalitarian Parliaments There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine

More information

Designing Distributed Systems using Approximate Synchrony in Data Center Networks

Designing Distributed Systems using Approximate Synchrony in Data Center Networks Designing Distributed Systems using Approximate Synchrony in Data Center Networks Dan R. K. Ports Jialin Li Naveen Kr. Sharma Vincent Liu Arvind Krishnamurthy University of Washington CSE Today s most

More information

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete

MDCC MULTI DATA CENTER CONSISTENCY. amplab. Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete MDCC MULTI DATA CENTER CONSISTENCY Tim Kraska, Gene Pang, Michael Franklin, Samuel Madden, Alan Fekete gpang@cs.berkeley.edu amplab MOTIVATION 2 3 June 2, 200: Rackspace power outage of approximately 0

More information

All about Eve: Execute-Verify Replication for Multi-Core Servers

All about Eve: Execute-Verify Replication for Multi-Core Servers All about Eve: Execute-Verify Replication for Multi-Core Servers Manos Kapritsos, Yang Wang, Vivien Quema, Allen Clement, Lorenzo Alvisi, Mike Dahlin Dependability Multi-core Databases Key-value stores

More information

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University

Ovid A Software-Defined Distributed Systems Framework. Deniz Altinbuken, Robbert van Renesse Cornell University Ovid A Software-Defined Distributed Systems Framework Deniz Altinbuken, Robbert van Renesse Cornell University Ovid Build distributed systems that are easy to evolve easy to reason about easy to compose

More information

Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions

Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions Automated Data Partitioning for Highly Scalable and Strongly Consistent Transactions Alexandru Turcu, Roberto Palmieri, Binoy Ravindran Virginia Tech SYSTOR 2014 Desirable properties in distribute transactional

More information

MINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES

MINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES MINIMIZING TRANSACTION LATENCY IN GEO-REPLICATED DATA STORES Divy Agrawal Department of Computer Science University of California at Santa Barbara Joint work with: Amr El Abbadi, Hatem Mahmoud, Faisal

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem)

Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Practical Byzantine Fault Tolerance (The Byzantine Generals Problem) Introduction Malicious attacks and software errors that can cause arbitrary behaviors of faulty nodes are increasingly common Previous

More information

Replicated State Machine in Wide-area Networks

Replicated State Machine in Wide-area Networks Replicated State Machine in Wide-area Networks Yanhua Mao CSE223A WI09 1 Building replicated state machine with consensus General approach to replicate stateful deterministic services Provide strong consistency

More information

Low Overhead Concurrency Control for Partitioned Main Memory Databases

Low Overhead Concurrency Control for Partitioned Main Memory Databases Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan Jones, Daniel Abadi, Samuel Madden, June 2010, SIGMOD CS 848 May, 2016 Michael Abebe Background Motivations Database partitioning

More information

Low-Latency Multi-Datacenter Databases using Replicated Commit

Low-Latency Multi-Datacenter Databases using Replicated Commit Low-Latency Multi-Datacenter Databases using Replicated Commit Hatem Mahmoud, Faisal Nawab, Alexander Pucher, Divyakant Agrawal, Amr El Abbadi UCSB Presented by Ashutosh Dhekne Main Contributions Reduce

More information

Tolerating Latency in Replicated State Machines through Client Speculation

Tolerating Latency in Replicated State Machines through Client Speculation Tolerating Latency in Replicated State Machines through Client Speculation April 22, 2009 1, James Cowling 2, Edmund B. Nightingale 3, Peter M. Chen 1, Jason Flinn 1, Barbara Liskov 2 University of Michigan

More information

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin

Zyzzyva. Speculative Byzantine Fault Tolerance. Ramakrishna Kotla. L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin Zyzzyva Speculative Byzantine Fault Tolerance Ramakrishna Kotla L. Alvisi, M. Dahlin, A. Clement, E. Wong University of Texas at Austin The Goal Transform high-performance service into high-performance

More information

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov

Practical Byzantine Fault Tolerance. Miguel Castro and Barbara Liskov Practical Byzantine Fault Tolerance Miguel Castro and Barbara Liskov Outline 1. Introduction to Byzantine Fault Tolerance Problem 2. PBFT Algorithm a. Models and overview b. Three-phase protocol c. View-change

More information

Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control

Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control [Extended Version] Jialin Li Ellis Michael Dan R. K. Ports University of Washington {lijl, emichael, drkp}@cs.washington.edu

More information

HP: Hybrid Paxos for WANs

HP: Hybrid Paxos for WANs HP: Hybrid Paxos for WANs Dan Dobre, Matthias Majuntke, Marco Serafini and Neeraj Suri {dan,majuntke,marco,suri}@cs.tu-darmstadt.de TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded

More information

PARALLEL CONSENSUS PROTOCOL

PARALLEL CONSENSUS PROTOCOL CANOPUS: A SCALABLE AND MASSIVELY PARALLEL CONSENSUS PROTOCOL Bernard Wong CoNEXT 2017 Joint work with Sajjad Rizvi and Srinivasan Keshav CONSENSUS PROBLEM Agreement between a set of nodes in the presence

More information

Practical Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Robert Grimm New York University (Partially based on notes by Eric Brewer and David Mazières) The Three Questions What is the problem? What is new or different? What

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Reducing the Costs of Large-Scale BFT Replication

Reducing the Costs of Large-Scale BFT Replication Reducing the Costs of Large-Scale BFT Replication Marco Serafini & Neeraj Suri TU Darmstadt, Germany Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

More information

Paxos Made Live. An Engineering Perspective. Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented By: Dipendra Kumar Jha

Paxos Made Live. An Engineering Perspective. Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone. Presented By: Dipendra Kumar Jha Paxos Made Live An Engineering Perspective Authors: Tushar Chandra, Robert Griesemer, Joshua Redstone Presented By: Dipendra Kumar Jha Consensus Algorithms Consensus: process of agreeing on one result

More information

Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control

Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control Eris: Coordination-Free Consistent Transactions Using In-Network Concurrency Control Jialin Li University of Washington lijl@cs.washington.edu ABSTRACT Distributed storage systems aim to provide strong

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering

Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering Jialin Li, Ellis Michael, Naveen Kr. Sharma, Adriana Szekeres, Dan R. K. Ports Server failures are the common case in data centers

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS6450: Distributed Systems Lecture 10 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University.

More information

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers

HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers 1 HT-Paxos: High Throughput State-Machine Replication Protocol for Large Clustered Data Centers Vinit Kumar 1 and Ajay Agarwal 2 1 Associate Professor with the Krishna Engineering College, Ghaziabad, India.

More information

On Fault-tolerant and High Performance Replicated Transactional Systems

On Fault-tolerant and High Performance Replicated Transactional Systems On Fault-tolerant and High Performance Replicated Transactional Systems Sachin Hirve Preliminary Examination Proposal submitted to the Faculty of the Virginia Polytechnic Institute and State University

More information

SAREK: Optimistic Parallel Ordering in Byzantine Fault Tolerance

SAREK: Optimistic Parallel Ordering in Byzantine Fault Tolerance SAREK: Optimistic Parallel Ordering in Byzantine Fault Tolerance Bijun Li TU Braunschweig bli@ibr.cs.tu-bs.de Wenbo Xu TU Braunschweig wxu@ibr.cs.tu-bs.de Muhammad Zeeshan Abid KTH Stockholm mzabid@kth.se

More information

Giza: Erasure Coding Objects across Global Data Centers

Giza: Erasure Coding Objects across Global Data Centers Giza: Erasure Coding Objects across Global Data Centers Yu Lin Chen*, Shuai Mu, Jinyang Li, Cheng Huang *, Jin li *, Aaron Ogus *, and Douglas Phillips* New York University, *Microsoft Corporation USENIX

More information

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems. replication Johan Montelius ID2201. Distributed Systems ID2201 Distributed Systems ID2201 replication Johan Montelius 1 The problem The problem we have: servers might be unavailable The solution: keep duplicates at different servers 2 Building a fault-tolerant service

More information

CS6450: Distributed Systems Lecture 15. Ryan Stutsman

CS6450: Distributed Systems Lecture 15. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 15 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov

Byzantine fault tolerance. Jinyang Li With PBFT slides from Liskov Byzantine fault tolerance Jinyang Li With PBFT slides from Liskov What we ve learnt so far: tolerate fail-stop failures Traditional RSM tolerates benign failures Node crashes Network partitions A RSM w/

More information

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions Transactions Main issues: Concurrency control Recovery from failures 2 Distributed Transactions

More information

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory

Byzantine Fault Tolerance and Consensus. Adi Seredinschi Distributed Programming Laboratory Byzantine Fault Tolerance and Consensus Adi Seredinschi Distributed Programming Laboratory 1 (Original) Problem Correct process General goal: Run a distributed algorithm 2 (Original) Problem Correct process

More information

STATE Machine Replication (SMR) is a well-known approach

STATE Machine Replication (SMR) is a well-known approach TO APPEAR IN IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS (MARCH 17) 1 Elastic State Machine Replication Andre Nogueira, Antonio Casimiro, Alysson Bessani State machine replication (SMR) is a

More information

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015 Distributed Coordination with ZooKeeper - Theory and Practice Simon Tao EMC Labs of China {simon.tao@emc.com} Oct. 24th, 2015 Agenda 1. ZooKeeper Overview 2. Coordination in Spring XD 3. ZooKeeper Under

More information

PRIMARY-BACKUP REPLICATION

PRIMARY-BACKUP REPLICATION PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons

More information

Canopus: A Scalable and Massively Parallel Consensus Protocol

Canopus: A Scalable and Massively Parallel Consensus Protocol Canopus: A Scalable and Massively Parallel Consensus Protocol (Extended report) Sajjad Rizvi, Bernard Wong, Srinivasan Keshav Cheriton School of Computer Science University of Waterloo, 200 University

More information

Replication and Consistency. Fall 2010 Jussi Kangasharju

Replication and Consistency. Fall 2010 Jussi Kangasharju Replication and Consistency Fall 2010 Jussi Kangasharju Chapter Outline Replication Consistency models Distribution protocols Consistency protocols 2 Data Replication user B user C user A object object

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science 496 (2013) 170 183 Contents lists available at SciVerse ScienceDirect Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs Optimizing Paxos with batching

More information

Primary-Backup Replication

Primary-Backup Replication Primary-Backup Replication CS 240: Computing Systems and Concurrency Lecture 7 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Simplified Fault Tolerance

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: replication adds latency and throughput overheads CURP: Consistent Unordered Replication Protocol

More information

Applications of Paxos Algorithm

Applications of Paxos Algorithm Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1

More information

Scalable State-Machine Replication

Scalable State-Machine Replication Scalable State-Machine Replication Carlos Eduardo Bezerra, Fernando Pedone, Robbert van Renesse University of Lugano, Switzerland Cornell University, USA Universidade Federal do Rio Grande do Sul, Brazil

More information

6.824 Final Project. May 11, 2014

6.824 Final Project. May 11, 2014 6.824 Final Project Colleen Josephson cjoseph@mit.edu Joseph DelPreto delpreto@mit.edu Pranjal Vachaspati pranjal@mit.edu Steven Valdez dvorak42@mit.edu May 11, 2014 1 Introduction The presented project

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety Copyright 2012 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4.

More information

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18 Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick

More information

Module 7 - Replication

Module 7 - Replication Module 7 - Replication Replication Why replicate? Reliability Avoid single points of failure Performance Scalability in numbers and geographic area Why not replicate? Replication transparency Consistency

More information

Byzantine Fault Tolerance

Byzantine Fault Tolerance Byzantine Fault Tolerance CS 240: Computing Systems and Concurrency Lecture 11 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. So far: Fail-stop failures

More information

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin

ZHT: Const Eventual Consistency Support For ZHT. Group Member: Shukun Xie Ran Xin ZHT: Const Eventual Consistency Support For ZHT Group Member: Shukun Xie Ran Xin Outline Problem Description Project Overview Solution Maintains Replica List for Each Server Operation without Primary Server

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible QCon London Tuesday 29/3/2016 Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 What is Consensus? The process

More information

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better

More information

(Lightweight) Recoverable Virtual Memory. Robert Grimm New York University

(Lightweight) Recoverable Virtual Memory. Robert Grimm New York University (Lightweight) Recoverable Virtual Memory Robert Grimm New York University The Three Questions What is the problem? What is new or different? What are the contributions and limitations? The Goal Simplify

More information

HiperTM: High Performance, Fault-Tolerant Transactional Memory

HiperTM: High Performance, Fault-Tolerant Transactional Memory HiperTM: High Performance, Fault-Tolerant Transactional Memory Sachin Hirve, Roberto Palmieri 1, Binoy Ravindran Virginia Tech, Blacksburg VA 2461, USA Abstract We present HiperTM, a high performance active

More information

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database. Presented by Kewei Li The Problem db nosql complex legacy tuning expensive

More information

Introduction to Distributed Systems Seif Haridi

Introduction to Distributed Systems Seif Haridi Introduction to Distributed Systems Seif Haridi haridi@kth.se What is a distributed system? A set of nodes, connected by a network, which appear to its users as a single coherent system p1 p2. pn send

More information

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models

More information

MENCIUS: BUILDING EFFICIENT

MENCIUS: BUILDING EFFICIENT MENCIUS: BUILDING EFFICIENT STATE MACHINE FOR WANS By: Yanhua Mao Flavio P. Junqueira Keith Marzullo Fabian Fuxa, Chun-Yu Hsiung November 14, 2018 AGENDA 1. Motivation 2. Breakthrough 3. Rules of Mencius

More information

Data Storage Revolution

Data Storage Revolution Data Storage Revolution Relational Databases Object Storage (put/get) Dynamo PNUTS CouchDB MemcacheDB Cassandra Speed Scalability Availability Throughput No Complexity Eventual Consistency Write Request

More information

Distributed Consensus: Making Impossible Possible

Distributed Consensus: Making Impossible Possible Distributed Consensus: Making Impossible Possible Heidi Howard PhD Student @ University of Cambridge heidi.howard@cl.cam.ac.uk @heidiann360 hh360.user.srcf.net Sometimes inconsistency is not an option

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

Strong Consistency at Scale

Strong Consistency at Scale Strong Consistency at Scale Carlos Eduardo Bezerra University of Lugano (USI) Switzerland Le Long Hoang University of Lugano (USI) Switzerland Fernando Pedone University of Lugano (USI) Switzerland Abstract

More information

Distributed Commit in Asynchronous Systems

Distributed Commit in Asynchronous Systems Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!

More information

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden"

Low Overhead Concurrency Control for Partitioned Main Memory Databases. Evan P. C. Jones Daniel J. Abadi Samuel Madden Low Overhead Concurrency Control for Partitioned Main Memory Databases Evan P. C. Jones Daniel J. Abadi Samuel Madden" Banks" Payment Processing" Airline Reservations" E-Commerce" Web 2.0" Problem:" Millions

More information

Distributed Systems. Day 9: Replication [Part 1]

Distributed Systems. Day 9: Replication [Part 1] Distributed Systems Day 9: Replication [Part 1] Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Does your client know about all of F s servers? Security issues? Performance issues? How do clients

More information

Fast Atomic Multicast

Fast Atomic Multicast Università della Svizzera italiana USI Technical Report Series in Informatics Fast Atomic Multicast Paulo R. Coelho 1, Nicolas Schiper 2, Fernando Pedone 1 1 Faculty of Informatics, Università della Svizzera

More information

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING

SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING SCHISM: A WORKLOAD-DRIVEN APPROACH TO DATABASE REPLICATION AND PARTITIONING ZEYNEP KORKMAZ CS742 - PARALLEL AND DISTRIBUTED DATABASE SYSTEMS UNIVERSITY OF WATERLOO OUTLINE. Background 2. What is Schism?

More information

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout Exploiting Commutativity For Practical Fast Replication Seo Jin Park and John Ousterhout Overview Problem: consistent replication adds latency and throughput overheads Why? Replication happens after ordering

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

The Stream Processor as a Database. Ufuk

The Stream Processor as a Database. Ufuk The Stream Processor as a Database Ufuk Celebi @iamuce Realtime Counts and Aggregates The (Classic) Use Case 2 (Real-)Time Series Statistics Stream of Events Real-time Statistics 3 The Architecture collect

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Distributed Systems. Aleardo Manacero Jr.

Distributed Systems. Aleardo Manacero Jr. Distributed Systems Aleardo Manacero Jr. Replication - part 1 Introduction Using multiple servers to attend client requests allow for a better performance in the system Unfortunately, as shown in the study

More information

Building Consistent Transactions with Inconsistent Replication

Building Consistent Transactions with Inconsistent Replication Building Consistent Transactions with Inconsistent Replication Irene Zhang Naveen Kr. Sharma Adriana Szekeres Arvind Krishnamurthy Dan R. K. Ports University of Washington {iyzhang, naveenks, aaasz, arvind,

More information

CPS 512 midterm exam #1, 10/7/2016

CPS 512 midterm exam #1, 10/7/2016 CPS 512 midterm exam #1, 10/7/2016 Your name please: NetID: Answer all questions. Please attempt to confine your answers to the boxes provided. If you don t know the answer to a question, then just say

More information

April 21, 2017 Revision GridDB Reliability and Robustness

April 21, 2017 Revision GridDB Reliability and Robustness April 21, 2017 Revision 1.0.6 GridDB Reliability and Robustness Table of Contents Executive Summary... 2 Introduction... 2 Reliability Features... 2 Hybrid Cluster Management Architecture... 3 Partition

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

Recall use of logical clocks

Recall use of logical clocks Causal Consistency Consistency models Linearizability Causal Eventual COS 418: Distributed Systems Lecture 16 Sequential Michael Freedman 2 Recall use of logical clocks Lamport clocks: C(a) < C(z) Conclusion:

More information

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers

Recall: Primary-Backup. State machine replication. Extend PB for high availability. Consensus 2. Mechanism: Replicate and separate servers Replicated s, RAFT COS 8: Distributed Systems Lecture 8 Recall: Primary-Backup Mechanism: Replicate and separate servers Goal #: Provide a highly reliable service Goal #: Servers should behave just like

More information

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Practical Byzantine Consensus. CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Practical Byzantine Consensus CS 138 XX 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Scenario Asynchronous system Signed messages s are state machines It has to be practical CS 138

More information

Consolidating Concurrency Control and Consensus for Commits under Conflicts

Consolidating Concurrency Control and Consensus for Commits under Conflicts Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li New York University, University of Southern California Abstract Conventional

More information

CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems

CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems Sharon Choy, Bernard Wong, Xu Cui, Xiaoyi Liu Cheriton School of Computer Science, University of Waterloo s2choy,

More information

Large-scale cluster management at Google with Borg

Large-scale cluster management at Google with Borg Large-scale cluster management at Google with Borg Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune, John Wilkes Google Inc. Slides heavily derived from John Wilkes s presentation

More information

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols

Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Leader or Majority: Why have one when you can have both? Improving Read Scalability in Raft-like consensus protocols Vaibhav Arora, Tanuj Mittal, Divyakant Agrawal, Amr El Abbadi * and Xun Xue, Zhiyanan,

More information

Huge market -- essentially all high performance databases work this way

Huge market -- essentially all high performance databases work this way 11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch

More information

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Replication and consistency Fall 2013 1 Replication 2 What is replication? Introduction Make different copies of data ensuring that all copies are identical Immutable data

More information

Consolidating Concurrency Control and Consensus for Commits under Conflicts

Consolidating Concurrency Control and Consensus for Commits under Conflicts Consolidating Concurrency Control and Consensus for Commits under Conflicts Shuai Mu, Lamont Nelson, Wyatt Lloyd, and Jinyang Li New York University, University of Southern California Abstract Conventional

More information

Paxos and Replication. Dan Ports, CSEP 552

Paxos and Replication. Dan Ports, CSEP 552 Paxos and Replication Dan Ports, CSEP 552 Today: achieving consensus with Paxos and how to use this to build a replicated system Last week Scaling a web service using front-end caching but what about the

More information