SAMC: Sema+c- Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems

Size: px
Start display at page:

Download "SAMC: Sema+c- Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems"

Transcription

1 SAMC: Sema+c- Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi *, Jeffrey F. Lukman, and Haryadi S. Gunawi * 1

2 Internet Services 2

3 Internet Services Cloud Systems 3

4 Reliability 4

5 Deep Bug Example ZooKeeper (synchronization service) Issue # Nodes A, B, C start (w/ latex txid: 10) 2. B becomes leader 3. B crashes 4. C becomes leader 5. C commits new txid-value pair (11, X) 6. A crashes, before committing the new txid C loses quorum and C crashes 8. A and B are back online after C crashes 9. A becomes leader 10. A's commits new txid-value pair (11, Y) 11. C is back online after A's new tx commit 12. C announce to B (11, X) 13. B replies diff starting with tx Inconsistency: A has (11, Y), C has (11, X) PERMANENT INCONSISTENT REPLICA 5

6 Deep Bug Characteris+cs ZooKeeper (synchronization service) Issue #335. Specific Order 1. Nodes A, B, C start (w/ latex txid: 10) 2. B becomes leader 3. B crashes 4. C becomes leader 5. C commits new txid-value pair (11, X) 6. A crashes, before committing the new txid C loses quorum and C crashes 8. A and B are back online after C crashes 9. A becomes leader 10. A's commits new txid-value pair (11, Y) 11. C is back online after A's new tx commit 12. C announce to B (11, X) 13. B replies diff starting with tx Inconsistency: A has (11, Y), C has (11, X) 1. Out- of- order messages 2. Mul+ple crashes 3. Mul+ple reboots HAPPEN IN ANY ORDER 6

7 Study of Deep Bugs # crashes / # reboots # crashes / # reboots ZooKeeper Cassandra Bug # 6503 Bug # # crashes / # reboots SEVERE Hadoop IMPLICATIONS x- axis is bug number y- axis is number of crashes and reboots INCONSISTENT REPLICAS, DATA LOSS, DOWNTIMES, ETC. Bug # 7

8 How do we catch deep bugs in distributed systems? 8

9 How to Catch Deep Bugs Distributed system model checker Re- ordering all non- determinis+c events Find which specific orderings lead to bugs

10 What s Wrong with Exis+ng Model Checkers? Last 7 years MaceMC [NSDI 07], Modist [NSDI 09], dbug [SSV 10], Demeter [SOSP 13], etc. BUT Too many events Mul+ple crashes and reboots Create more messages No model checker incorporate mul+ple crashes and reboots Cannot find deep bugs! 100 events 10

11 How do we catch deep bugs REALLY FAST? 11

12 Black- Box Approach Exis+ng model checkers are so slow They treat target systems as black boxes A large number of event orderings are generated Black Box Black Box Model Checker ABCD ABDC ACBD ACDB A B C D ADBC... (24 total) 12

13 Seman+c Knowledge How can we make model checker fast? Exploit seman+c knowledge Seman+c- aware model checker (SAMC) SAMC Black Box A B C D 13

14 Black Box vs. SAMC Black Box Model Checker ABCD ABDC ACBD ACDB ADBC ADCB BACD BADC BCAD BCDA BDAC... A Black Box B C D A B C D Lead to the same state Message Processing Seman+c Unnecessary Re-orderings SAMC with message processing semanjc ABCD ABDC ACBD ACDB ADBC ADCB BACD BADC BCAD BCDA... 14

15 SAMC with Crashes Crash Recovery Seman+c A,B N2 Black Box Model checker ABCDX ABCXD SAMC with crash recovery semanjc ABCDX ABCXD N1 C,D N3 N4 ABXCD AXBCD XABCD ABDCX ABDXC ABXCD AXBCD XABCD ABDCX ABDXC Unnecessary Re-orderings

16 Generic Reduc+on Algorithms Local- Message Independence (LMI) Principle of Semantic Awareness Crash- Message Independence (CMI) Crash Recovery Symmetry (CRS) Reboot Synchroniza+on Symmetry (RSS) 16

17 SAMC Implementa+on and Integra+on SAMC implementa+on 10,000 LOC from scratch Apply SAMC to 3 cloud systems 7 protocols 10 versions Cloud systems Cassandra Hadoop 2.0 ZooKeeper Protocol Gossiper Hinted handoff Read/write Cluster management Specula+ve execu+on Atomic broadcast Leader elec+on 17

18 Result Reproduced 12 old bugs Compare to state- of- the- art techniques Dynamic Par+al Order Reduc+on (DPOR) Random- DPOR Find bugs 2x to 340x faster 49x on average Found 2 new bugs Submit them to developers 18

19 Outline IntuiJon SAMC Local- Message Independence Crash- Message Independence Crash Recovery Symmetry Reboot Synchroniza+on Symmetry Evalua+on 19

20 Dependency vs. Independency A, B S Node State = S A, B S S S B, A S A B B, A Unnecessary A, B = Dependent A, B = Independent INDEPENDENT = NO REORDERING 2X SPEED-UP 20

21 Black Box vs. SAMC Black Box Model Checker ABCD ABDC ACBD ACDB Black Box Seman+c Info SAMC ABCD ABDC BACD BADC ADBC ADCB BACD BADC BCAD A B C D A B C D All dependent Dependent Dependent BCDA... 6X SPEED-UP 4!=24 orderings 21

22 Reduc+on Speed- up S ABDC S ABDC

23 How to Declare Message Independency? Q: Which concurrent messages are independent? A: Use message processing seman+c 23

24 Message Processing Seman+c in Simplified Leader Elec+on Belief = n3 if(vote <= belief)! // do nothing! else! belief = vote;! B = 3 V = 1 B = 3 V = 2 B = 3 V = 4 B = 3 B = 3 B = 4 Vote=1 Vote=2 Vote=4 24

25 Removing Re- ordering via Message Processing Seman+c 1, 2, 3 1, 3, 2 2, 1, 3 Belief=4 B = 4 B = 4 B = 4 V = 1 V = 1 V = 2 Vote=1 Vote=2 Vote=3 B = 4 B = 4 B = 4 if (vote <= belief)! // do nothing! else! belief = vote;! V = 2 B = 4 V = 3 Unnecessary V = 3 V = 1 B = 4 B = 4 V = 2 V = 3 B = 4 B = 4 B = 4 25

26 Formalizing the Intui+on MESSAGE PROCESSING SEMANTIC if (vote <= belief)! // do nothing! else! belief = vote;!! DISCARD PATTERN if (isdiscard(msg, ls)) {! // do nothing;! }! DISCARD PREDICATE boolean discardpredicate(msg, ls) {! if (msg.vote <= ls.belief)! return true;! else! return false;! }! 26

27 Formalizing the Intui+on Belief=4 Vote=1 Vote=2 Vote=3 vote belief discardpredicate 1 4 true 2 4 true 3 4 true m x m y discard(m x ) discard(m y ) Independent 1 2 true true 1 3 true true 2 3 true true 27

28 Outline Intui+on SAMC Local- Message Independence Crash- Message Independence Crash Recovery Symmetry Reboot Synchroniza+on Symmetry Evalua+on 28

29 SAMC Architecture a, b c, d Local state: ls1 interceptor interceptor Local state: ls2 release(c) Protocol Specific Rules Generic Reduc+on Policies Basic Reduc+on Techniques Leader Elec+on LMI CMI CRS RSS Symmetry SAMC Atomic Broadcast Dynamic Par+al Order Reduc+on (DPOR) 29

30 Local- Message Independence SAMC Generic Reduc+on Policies Local- Message Independence (LMI) Crash- Message Independence (CMI) Crash Recovery Symmetry (CRS) Reboot Synchroniza+on Symmetry (RSS) 30

31 Local- Message Independence Discard parern Increment parern if (msg.type == ack) {! node.ackcount++;! }! C=0 ack C=0 ack boolean incrementpredicate(msg, ls) {! if (msg.type == ack)! return true;! else! return false;! }! Constant parern C=1 ack C=2 C=1 ack C=2 31

32 Crash- Message Independence SAMC Generic Reduc+on Policies Local- Message Independence (LMI) Crash- Message Independence (CMI) Crash Recovery Symmetry (CRS) Reboot Synchroniza+on Symmetry (RSS) 32

33 Crash- Message Independence Black Box ABCDX ABCXD ABXCD F A,B L F C,D F void handlecrash() {! if (X == follower && isquorum())! followercount--;! // No new messages! }! AXBCD XABCD ABDCX A,B L C,D boolean localimpact(x, ls) {! if (X == follower && isquorum())! return true;! else! return false;! }! F F F 33

34 Crash- Message Independence F A,B L F C,D F void handlecrash() {! if (X == leader!isquorum())! electleader()! // New messages created! }! S L S S boolean globalimpact(x, ls) {! if (X == leader!isquorum())! return true;! else! return false;! }! 34

35 Crash Recovery Symmetry Reboot Synchroniza+on Symmetry SAMC Generic Reduc+on Policies Local- Message Independence (LMI) Crash- Message Independence (CMI) Crash Recovery Symmetry (CRS) Reboot Synchroniza+on Symmetry (RSS) 35

36 Outline Intui+on SAMC Local- Message Independence Crash- Message Independence Crash Recovery Symmetry Reboot Synchroniza+on Symmetry EvaluaJon 36

37 Evalua+on SAMC implementa+on 10,000 LOC from scratch Apply SAMC to 3 cloud systems 7 protocols 10 versions Cloud systems Cassandra Hadoop 2.0 ZooKeeper Protocol Gossiper Hinted handoff Read/write Cluster management Specula+ve execu+on Atomic broadcast Leader elec+on 37

38 Protocol- Specific Rules (e.g. ZooKeeper Leader Elec+on) Guide SAMC to remove re- orderings 35 LOC on average per protocol 38

39 Catching Old Bugs A table shows number of execu+ons to reach the bugs and speedup Issue# SAMC Black- Box DPOR Random Random DPOR #exe #exe speedup #exe speedup #exe speedup ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper MapReduce MapReduce MapReduce Cassandra Cassandra

40 Catching Old Bugs A table shows number of execu+ons to reach the bugs and speedup Issue# SAMC Black- Box DPOR Random Random DPOR #exe #exe speedup #exe speedup #exe speedup ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper ZooKeeper MapReduce MapReduce MapReduce Cassandra Cassandra

41 Reduc+on Ra+o ZooKeeper leader elec+on protocol Run black- box DPOR Ater each execu+on, find that this execu+on is executed by SAMC or not Count DPOR s execu+ons that are executed by SAMC too #Crash #Reboot ReducJon RaJo ALL LMI CMI CRS RSS X 18X 5X 4X 3X 63X 35X 6X 5X 5X 103X 37X 9X 9X 14X 41

42 Conclusion Deep bugs live in the cloud Model checker needs to incorporate complex failure to reach deep bugs State space explosion Seman+c- aware model checking LMI, CMI, CRS, RSS Bring future research ques+ons What other seman+c knowledge is useful? How to extract them from the code automa+cally? 42

43 Thank You Ques+ons? hrp://ucare.cs.uchicago.edu 43

SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems

SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems Tanakorn Leesatapornwongsa and Mingzhe Hao, University of Chicago; Pallavi Joshi, NEC Labs America; Jeffrey F. Lukman,

More information

SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems

SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems Tanakorn Leesatapornwongsa, Mingzhe Hao, Pallavi Joshi, Jeffrey F. Lukman and Haryadi S. Gunawi University of Chicago

More information

Failures in Distributed Systems

Failures in Distributed Systems CPSC 426/526 Failures in Distributed Systems Ennan Zhai Computer Science Department Yale University Recall: Lec-14 In lec-14, we learned: - Difference between privacy and anonymity - Anonymous communications

More information

THE UNIVERSITY OF CHICAGO UNEARTHING CONCURRENCY AND SCALABILITY BUGS IN CLOUD-SCALE DISTRIBUTED SYSTEMS A DISSERTATION SUBMITTED TO

THE UNIVERSITY OF CHICAGO UNEARTHING CONCURRENCY AND SCALABILITY BUGS IN CLOUD-SCALE DISTRIBUTED SYSTEMS A DISSERTATION SUBMITTED TO THE UNIVERSITY OF CHICAGO UNEARTHING CONCURRENCY AND SCALABILITY BUGS IN CLOUD-SCALE DISTRIBUTED SYSTEMS A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR

More information

Definition: A complete graph is a graph with N vertices and an edge between every two vertices.

Definition: A complete graph is a graph with N vertices and an edge between every two vertices. Complete Graphs Let N be a positive integer. Definition: A complete graph is a graph with N vertices and an edge between every two vertices. There are no loops. Every two vertices share exactly one edge.

More information

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

There Is More Consensus in Egalitarian Parliaments

There Is More Consensus in Egalitarian Parliaments There Is More Consensus in Egalitarian Parliaments Iulian Moraru, David Andersen, Michael Kaminsky Carnegie Mellon University Intel Labs Fault tolerance Redundancy State Machine Replication 3 State Machine

More information

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing

Coordinating distributed systems part II. Marko Vukolić Distributed Systems and Cloud Computing Coordinating distributed systems part II Marko Vukolić Distributed Systems and Cloud Computing Last Time Coordinating distributed systems part I Zookeeper At the heart of Zookeeper is the ZAB atomic broadcast

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

CS 425 / ECE 428 Distributed Systems Fall 2017

CS 425 / ECE 428 Distributed Systems Fall 2017 CS 425 / ECE 428 Distributed Systems Fall 2017 Indranil Gupta (Indy) Nov 7, 2017 Lecture 21: Replication Control All slides IG Server-side Focus Concurrency Control = how to coordinate multiple concurrent

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication

CSE 444: Database Internals. Section 9: 2-Phase Commit and Replication CSE 444: Database Internals Section 9: 2-Phase Commit and Replication 1 Today 2-Phase Commit Replication 2 Two-Phase Commit Protocol (2PC) One coordinator and many subordinates Phase 1: Prepare Phase 2:

More information

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin

Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitionin Parallel Data Types of Parallelism Replication (Multiple copies of the same data) Better throughput for read-only computations Data safety Partitioning (Different data at different sites More space Better

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2

UPCRC. Illiac. Gigascale System Research Center. Petascale computing. Cloud Computing Testbed (CCT) 2 Illiac UPCRC Petascale computing Gigascale System Research Center Cloud Computing Testbed (CCT) 2 www.parallel.illinois.edu Mul2 Core: All Computers Are Now Parallel We con'nue to have more transistors

More information

Intuitive distributed algorithms. with F#

Intuitive distributed algorithms. with F# Intuitive distributed algorithms with F# Natallia Dzenisenka Alena Hall @nata_dzen @lenadroid A tour of a variety of intuitivedistributed algorithms used in practical distributed systems. and how to prototype

More information

ZooKeeper Atomic Broadcast (for Project 2) 10/27/2016

ZooKeeper Atomic Broadcast (for Project 2) 10/27/2016 ZooKeeper Atomic Broadcast (for Project 2) 10/27/2016 Apache Hadoop 2002: Internet Archive search director Doug CuFng and UW grad student Mike Carafella set out to build a bemer open- source search engine.

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Dynamic Reconfiguration of Primary/Backup Clusters

Dynamic Reconfiguration of Primary/Backup Clusters Dynamic Reconfiguration of Primary/Backup Clusters (Apache ZooKeeper) Alex Shraer Yahoo! Research In collaboration with: Benjamin Reed Dahlia Malkhi Flavio Junqueira Yahoo! Research Microsoft Research

More information

HARDFS: Hardening HDFS with Selective and Lightweight Versioning

HARDFS: Hardening HDFS with Selective and Lightweight Versioning HARDFS: Hardening HDFS with Selective and Lightweight Versioning Thanh Do, Tyler Harter, Yingchao Liu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau Haryadi S. Gunawi 1 Cloud Reliability q Cloud

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences

Performance and Forgiveness. June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Performance and Forgiveness June 23, 2008 Margo Seltzer Harvard University School of Engineering and Applied Sciences Margo Seltzer Architect Outline A consistency primer Techniques and costs of consistency

More information

Distributed Systems Fault Tolerance

Distributed Systems Fault Tolerance Distributed Systems Fault Tolerance [] Fault Tolerance. Basic concepts - terminology. Process resilience groups and failure masking 3. Reliable communication reliable client-server communication reliable

More information

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components Fault Tolerance To avoid disruption due to failure and to improve availability, systems are designed to be fault-tolerant Two broad categories of fault-tolerant systems are: systems that mask failure it

More information

Announcements. Persistence: Crash Consistency

Announcements. Persistence: Crash Consistency Announcements P4 graded: In Learn@UW by end of day P5: Available - File systems Can work on both parts with project partner Fill out form BEFORE tomorrow (WED) morning for match Watch videos; discussion

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

FS Consistency & Journaling

FS Consistency & Journaling FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Why Is Consistency Challenging? File system may perform several disk writes to serve a single request Caching

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Distributed System. Gang Wu. Spring,2018

Distributed System. Gang Wu. Spring,2018 Distributed System Gang Wu Spring,2018 Lecture4:Failure& Fault-tolerant Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Distributed Computation Models

Distributed Computation Models Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

More information

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY

CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY CONCURRENCY CONTROL, TRANSACTIONS, LOCKING, AND RECOVERY George Porter May 18, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative

More information

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC RPC and Clocks Tom Anderson Go Synchroniza>on RPC Lab 1 RPC Last Time 1 Topics MapReduce Fault tolerance Discussion RPC At least once At most once Exactly once Lamport Clocks Mo>va>on MapReduce Fault Tolerance

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research

Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * Technion Yahoo Research Transac'onal Libraries Alexander Spiegelman *, Guy Golan-Gueta, and Idit Keidar * * Technion Yahoo Research 1 Mul'-Threading is Everywhere 2 Agenda Mo@va@on Concurrent Data Structure Libraries (CDSLs)

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure

More information

Distributed Consensus Protocols

Distributed Consensus Protocols Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a

More information

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented

More information

ZooKeeper Dynamic Reconfiguration

ZooKeeper Dynamic Reconfiguration by Table of contents 1 Overview... 2 2 Changes to Configuration Format...2 2.1 Specifying the client port... 2 2.2 The standaloneenabled flag...3 2.3 Dynamic configuration file...3 2.4 Backward compatibility...

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 10. Consensus: Paxos. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 10. Consensus: Paxos Paul Krzyzanowski Rutgers University Fall 2017 1 Consensus Goal Allow a group of processes to agree on a result All processes must agree on the same value The value

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Copyright 2003 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4. Other Approaches

More information

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all failures or predictable: exhibit a well defined failure behavior

More information

6.830 Lecture Spark 11/15/2017

6.830 Lecture Spark 11/15/2017 6.830 Lecture 19 -- Spark 11/15/2017 Recap / finish dynamo Sloppy Quorum (healthy N) Dynamo authors don't think quorums are sufficient, for 2 reasons: - Decreased durability (want to write all data at

More information

It also performs many parallelization operations like, data loading and query processing.

It also performs many parallelization operations like, data loading and query processing. Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency

More information

Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance. Failure Masking by Redundancy Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing

More information

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein 10. Replication CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety Copyright 2012 Philip A. Bernstein 1 Outline 1. Introduction 2. Primary-Copy Replication 3. Multi-Master Replication 4.

More information

Synchroniza+on II COMS W4118

Synchroniza+on II COMS W4118 Synchroniza+on II COMS W4118 References: Opera+ng Systems Concepts (9e), Linux Kernel Development, previous W4118s Copyright no2ce: care has been taken to use only those web images deemed by the instructor

More information

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms

DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms DD2451 Parallel and Distributed Computing --- FDD3008 Distributed Algorithms Lecture 8 Leader Election Mads Dam Autumn/Winter 2011 Previously... Consensus for message passing concurrency Crash failures,

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures GFS Overview Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures Interface: non-posix New op: record appends (atomicity matters,

More information

Consistency in Distributed Systems

Consistency in Distributed Systems Consistency in Distributed Systems Recall the fundamental DS properties DS may be large in scale and widely distributed 1. concurrent execution of components 2. independent failure modes 3. transmission

More information

How Routing Algorithms Work

How Routing Algorithms Work How Routing Algorithms Work A router is used to manage network traffic and find the best route for sending packets. But have you ever thought about how routers do this? Routers need to have some information

More information

Refinement and Optimization of Streaming Architectures. Don Batory and Taylor Riché Department of Computer Science University of Texas at Austin

Refinement and Optimization of Streaming Architectures. Don Batory and Taylor Riché Department of Computer Science University of Texas at Austin Refinement and Optimization of Streaming Architectures Don Batory and Taylor Riché Department of Computer Science University of Texas at Austin 1 Introduction Model Driven Engineering (MDE) is paradigm

More information

BIG DATA AND CONSISTENCY. Amy Babay

BIG DATA AND CONSISTENCY. Amy Babay BIG DATA AND CONSISTENCY Amy Babay Outline Big Data What is it? How is it used? What problems need to be solved? Replication What are the options? Can we use this to solve Big Data s problems? Putting

More information

Distributed Transaction Management

Distributed Transaction Management Distributed Transaction Management Material from: Principles of Distributed Database Systems Özsu, M. Tamer, Valduriez, Patrick, 3rd ed. 2011 + Presented by C. Roncancio Distributed DBMS M. T. Özsu & P.

More information

CS October 2017

CS October 2017 Atomic Transactions Transaction An operation composed of a number of discrete steps. Distributed Systems 11. Distributed Commit Protocols All the steps must be completed for the transaction to be committed.

More information

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #17: Transac0ons 1: Intro. to ACID

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #17: Transac0ons 1: Intro. to ACID CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #17: Transac0ons 1: Intro. to ACID Why Transac0ons? Database systems are normally being accessed by many users or processes

More information

Towards Automatically Checking Thousands of Failures with Micro-Specifications

Towards Automatically Checking Thousands of Failures with Micro-Specifications Towards Automatically Checking Thousands of Failures with Micro-Specifications Haryadi S. Gunawi, Thanh Do, Pallavi Joshi, Joseph M. Hellerstein, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Koushik

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

Topics in Reliable Distributed Systems

Topics in Reliable Distributed Systems Topics in Reliable Distributed Systems 049017 1 T R A N S A C T I O N S Y S T E M S What is A Database? Organized collection of data typically persistent organization models: relational, object-based,

More information

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #6: Transac/ons 1: Intro. to ACID

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #6: Transac/ons 1: Intro. to ACID CS 5614: (Big) Data Management Systems B. Aditya Prakash Lecture #6: Transac/ons 1: Intro. to ACID Project dates Proposal due: Feb 23 Milestone due: Mar 28 Final report/posters etc: May 2 (last class)

More information

Linking Number of a Linear Embedding with Two Components

Linking Number of a Linear Embedding with Two Components Linking Number of a Linear Embedding with Two Components Julie Wasiuk July 22, 2011 Abstract The linking number, the crossing number, and stick number of a linear embedding with 2V-vertices is defined.

More information

Distributed Databases. CS347 Lecture 16 June 6, 2001

Distributed Databases. CS347 Lecture 16 June 6, 2001 Distributed Databases CS347 Lecture 16 June 6, 2001 1 Reliability Topics for the day Three-phase commit (3PC) Majority 3PC Network partitions Committing with partitions Concurrency control with partitions

More information

Haryadi Gunawi and Andrew Chien

Haryadi Gunawi and Andrew Chien Haryadi Gunawi and Andrew Chien in collaboration with Gokul Soundararajan and Deepak Kenchammana (NetApp) Rob Ross and Dries Kimpe (Argonne National Labs) 2 q Complete fail-stop q Fail partial Rich literature

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) ) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons) Transactions - Definition A transaction is a sequence of data operations with the following properties: * A Atomic All

More information

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems ZooKeeper & Curator CS 475, Spring 2018 Concurrent & Distributed Systems Review: Agreement In distributed systems, we have multiple nodes that need to all agree that some object has some state Examples:

More information

Verifying Concurrent Programs

Verifying Concurrent Programs Verifying Concurrent Programs Daniel Kroening 8 May 1 June 01 Outline Shared-Variable Concurrency Predicate Abstraction for Concurrent Programs Boolean Programs with Bounded Replication Boolean Programs

More information

6.824 Distributed System Engineering: Spring Quiz I Solutions

6.824 Distributed System Engineering: Spring Quiz I Solutions Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.824 Distributed System Engineering: Spring 2009 Quiz I Solutions All problems are open-ended questions.

More information

Assignment #2 Sample Solutions

Assignment #2 Sample Solutions CS 470/670 Intro to Artificial Intelligence Spring 2016 Instructor: Marc Pomplun Assignment #2 Sample Solutions Question 1: Living in Another World Imagine a world that is much more exciting than the one

More information

Large-Scale Data Stores and Probabilistic Protocols

Large-Scale Data Stores and Probabilistic Protocols Distributed Systems 600.437 Large-Scale Data Stores & Probabilistic Protocols Department of Computer Science The Johns Hopkins University 1 Large-Scale Data Stores and Probabilistic Protocols Lecture 11

More information

Basic vs. Reliable Multicast

Basic vs. Reliable Multicast Basic vs. Reliable Multicast Basic multicast does not consider process crashes. Reliable multicast does. So far, we considered the basic versions of ordered multicasts. What about the reliable versions?

More information

Building a transactional distributed

Building a transactional distributed Building a transactional distributed data store with Erlang Alexander Reinefeld, Florian Schintke, Thorsten Schütt Zuse Institute Berlin, onscale solutions GmbH Transactional data store - What for? Web

More information

Distributed Computing. CS439: Principles of Computer Systems November 19, 2018

Distributed Computing. CS439: Principles of Computer Systems November 19, 2018 Distributed Computing CS439: Principles of Computer Systems November 19, 2018 Bringing It All Together We ve been studying how an OS manages a single CPU system As part of that, it will communicate with

More information

CSE-E5430 Scalable Cloud Computing Lecture 10

CSE-E5430 Scalable Cloud Computing Lecture 10 CSE-E5430 Scalable Cloud Computing Lecture 10 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 23.11-2015 1/29 Exam Registering for the exam is obligatory,

More information

PreFail: A Programmable Failure-Injection Framework

PreFail: A Programmable Failure-Injection Framework PreFail: A Programmable Failure-Injection Framework Pallavi Joshi Haryadi S. Gunawi Koushik Sen Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2011-30

More information

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Important Lessons. Today's Lecture. Two Views of Distributed Systems Important Lessons Replication good for performance/ reliability Key challenge keeping replicas up-to-date Wide range of consistency models Will see more next lecture Range of correctness properties L-10

More information

Control. CS432: Distributed Systems Spring 2017

Control. CS432: Distributed Systems Spring 2017 Transactions and Concurrency Control Reading Chapter 16, 17 (17.2,17.4,17.5 ) [Coulouris 11] Chapter 12 [Ozsu 10] 2 Objectives Learn about the following: Transactions in distributed systems Techniques

More information

CISC327 - So*ware Quality Assurance

CISC327 - So*ware Quality Assurance CISC327 - So*ware Quality Assurance Lecture 8 Introduc

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Distributed Storage Systems: Data Replication using Quorums

Distributed Storage Systems: Data Replication using Quorums Distributed Storage Systems: Data Replication using Quorums Background Software replication focuses on dependability of computations What if we are primarily concerned with integrity and availability (and

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

Today: Fault Tolerance. Reliable One-One Communication

Today: Fault Tolerance. Reliable One-One Communication Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues

More information

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019

PERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019 PERSISTENCE: FSCK, JOURNALING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA Project 4b: Due today! Project 5: Out by tomorrow Discussion this week: Project 5 AGENDA / LEARNING OUTCOMES How does

More information

Reliable Distributed Storage A Research Perspective. Idit Keidar Technion

Reliable Distributed Storage A Research Perspective. Idit Keidar Technion Reliable Distributed Storage A Research Perspective Idit Keidar Technion UnDistributed Enterprise Storage (What I Won t Talk About Today) Expensive Needs to be replaced to scale up Direct fiber access

More information

Effective Test Case Prioritization Technique in Web Application for Regression Test Suite

Effective Test Case Prioritization Technique in Web Application for Regression Test Suite Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

Distributed Systems Consensus

Distributed Systems Consensus Distributed Systems Consensus Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) Consensus 1393/6/31 1 / 56 What is the Problem?

More information

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis

Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Performance Evaluation of a MongoDB and Hadoop Platform for Scientific Data Analysis Elif Dede, Madhusudhan Govindaraju Lavanya Ramakrishnan, Dan Gunter, Shane Canon Department of Computer Science, Binghamton

More information