Extending Blockchains in Computing - Transaction semantics for web services. Subhash Bhalla (Dept. of Comp. Sc., IIT Delhi)

Size: px
Start display at page:

Download "Extending Blockchains in Computing - Transaction semantics for web services. Subhash Bhalla (Dept. of Comp. Sc., IIT Delhi)"

Transcription

1 Extending Blockchains in Computing - Transaction semantics for web services Subhash Bhalla (Dept. of Comp. Sc., IIT Delhi)

2 Slashdot Items Tagged "blockchain" Thursday September 06, Blockchains Are Not Safe For Voting, Concludes NAP Report Friday August 24, China Shuts Down Blockchain News Accounts on WeChat App, Bans Hotels in Beijing From Hosting Cryptocurrency Events Friday August 10, The World Bank is Preparing For the World's First Blockchain Bond Thursday August 09, Colorado Candidate For Governor Wants To Put His State On the Blockchain Thursday August 09, Blockchain Hype May Have Peaked, But IBM is Still a Believer 2

3 Blockchain Philosophy Books, HBR, Sloan Management Review law, trust Technology Agriculture,. Land records, Legal systems, Healthcare(Standardized Electronic Health Records) Computing 3

4 Double Entry Book Keeping Ledger ( Append only log ) 4

5 No Cutting Compensation OK Horizontal Total Hash Number 1 (Control) Vertical Total Hash Number 2 (control) Linked List (serial no of Transations) Control Hash Numbers change with each New Transaction 5

6 Linked List Model Temper-proof Compensation is OK Transactions Over one Book ( Ledger ) 6

7 7

8

9 Multi-organization Trust: legal Doc- Registry Control Hash fixed size for all cases Legal Document / Documents Tempering Hash Change X N sites 9

10 Computing Systems : Web Developments SkyPe, MedlinePlus, MedlinePlus Encyclopedia, Google Docs, Google Maps API, gmail, google search, LMMP (NASA), PTF Caltech, Facebook, Twitter, LinkedIn, Air India, Ashoka University, AMAZON, Any e-auction web site, Postgres web site, Wikipedia, e-bay, yahoo auction,. Classify the above: sites, applications, Cloud-based 10

11 Palomar Transient Factory Time Domain Astronomy Since 2009 Sectors in Norther Sky Watched all night Real-time processing Machine Learning, data mining Archive (growing in time) 11

12 Categories of Web Applications

13 Computing as enterprise Change Computing W3C Application New Specifications

14 Development of Web Applications Specifications (25 Years): Front-end Form, client-server, XML, CSS, JavaScript, Web Services, HTML 5 (transmit GIS coordinates of clients), tracking tools/systems Back-end Map-reduce, Data centers, Hadoop,

15 Computing as enterprise (before years) Change Computing ISO, ANSI Application New Specifications Prior to 1993: Database System Distributed Systems on ETHERNET (web?,internet?) Banking, Stock exchange, Airlines, Railways,

16 Interactions thru the Web A number of processes (N) on ETHERNET Communicate through messages to Cooperate Interact with outside world (web) 16

17 ETHERNET Hand-shake, acknowledge, time-limit on response, network-status, fast vs- 17

18 In Transit Messages A message that has been sent but not yet received is called an in-transit message. Do rollback recovery protocols have to guarantee the delivery of in-transit messages? Depends on whether reliable communication is assumed! 1

19 LAN vs. Web/Internet Wired, replication (loaded standby) Synchronous Failure is detected by timeout 19

20 Distributed Systems: Computing as an Enterprise Network Eccentric and Mobile Applications 1. Middleware 2. Networks Mobile ad hoc networks ( MANETs ) 20

21 Network Eccentric and Mobile Aplications ( Middleware ): Mobile ad hoc networks (MANETs ) Energy in sensor networks, programming wireless sensor networks, Ad hoc routing, 5 G Software defined networks, Communication Models Population protocols, routing in opportunistic Networks, wireless Mesh networks Gossip-Based dissemination, Application layer Multicast, Distributed event routing in Publish/subscribe systems, Tuple Space Middleware for wireless networks, Security Middleware, Dynamic Adaptation.. Blockchain 21

22 Blockchain Networks Toyota to Bring Blockchain Networks to Smart Cars IEEE Spectrum, May 2017, By Philip E. Ross 1. It could make car-to-cloud communication easier and more secure, if your car wants to talk to another car, a service provider network 2. Blockchain Consensus, Tyler Crain, Vincent Gramoli, Michel Raynal, Mikel Larrea. Proceedings of AlgoTel

23 Blockchain in use 1. Bay area in California Rent a Toyota car 1. Down load an APP Smartphone Location nearest car on Map Walk to the car Select pay Car door unlocks 2. Driverless car / Remote Guidance System for Spacecraft 23

24 Blockchain 24

25 Blockchain / Distributed Ledger Technology Distributed Systems Applications 1. Extremely Large Amount of Data 2. Extremely Critical Data 3. Real-time Data Streaming 4. Consensus Among Nodes in an Asynchronous System Blockchain is an enabling technique Immutability Asynchrony Consensus 25

26 Distributed Systems: Immutability Log-structured Files / Append-only Logs Example: Database Nodes Asynchrony Achieve a common history at nodes thru a stamp server Consensus In the absence of globally synchronous clock, there is a need for global consensus 26

27 Database Change of State Backup 27

28 Database Systems + Dist. Syst. On Ethernet Banking 100s of ATMs, Audit system, Accounting report 1. Backup at time (To) 2. Database 3. Append only log (journal) of activity since last backup (To) Recovery after failure: Combine 1. and 2 3.

29 Computing Systems- Protocols Fail-stop Model - Communication Channel uses parity Byzantine fault tolerance Atomic commit Brooks Iyengar algorithm List of mathematical concepts named after places List of terms relating to algorithms and data structures Byzantine Paxos Quantum Byzantine agreement Two Armies Problem Impossible to win on web? Blockchain 29

30 Fail-Safe Model Communication Channel : Parity bit changes Fail-prone Fail-stop 0 3 bits are 1s Parity odd as 1 30

31 Byzantine General s Problem

32 Workflow processing 1. e-bay : Cart HP Notebook + EPSON Color printer + SONY Camera + Customer bank + e-bay bank 5 Processes in a TWO-phase Commit, with resources blocked ( Atomic ) (Consistent)(Isolation)(Durability) On top of web service connections (no Ethernet) 32

33 Transaction Atomicity 33

34 Atomicity 34

35 2 Phase Commit 35

36 Participants make note in Log 36

37 3 Phase Commit - 3PC is non-blocking (in cases C or P failure) 37

38 3

39 39

40 40

41 Application Systems on Web Services 41

42 Complexity in Distributed Systems Multiple Nodes Messages 42

43 Problems: Long Running Process Blue Gene (1999) parallel computer, for the study of bio-molecular phenomena such as protein folding P1 Process Failure Checkpoint 1 Checkpoint ( 1,., n ) : STABLE STORE Data, threads, register values Run-time overhead; Failure most recent checkpoint 64 x 64 grid of parallel computers middleware for checkpoints 43

44 Cooperating Processes Distributed System P 0 P 1 m 0 P 2 P 3 m 1 C 1,0 C 2,0 m 3 m 2 m 5 m 4 C 3,0 C 0,0 m 6 C 2,1 m 7 C 3,1 m C 1,1 C 2,2 Crashed C 0,1 44

45 Middleware Distributed System Distributed system a collection of processes that communicate through messages in a network Fault tolerance periodically using stable storage to save the processes states during the failure-free execution. After a failure a failed process restarts from one of its saved states, reducing the amount of lost computation. Each of the saved states is called a checkpoint 45

46 Checkpont Cascading Rollback Problem Last checkpoint: C 1,1 by P1, before P1 crashed Cannot use C 0,1 at P0 because it is inconsistent with C 1,1 => P0 rollbacks to C 0,0 Cannot use C 2,1 at P2 because it fails to reflect the sending of m6 => P2 rollbacks to C 2,0 P 0 P 1 m 0 C 0,0 C 0,1 m 5 m C 1,0 C 2,0 C 1,1 m 4 m 6 Crashed m 2 P 2 P 3 m1 Cannot use C 3,1 and C 3,0 as a result => P3 rollbacks 46 to initial state C 2,1 C 2,2 m 3 m 7 C 3,0 C 3,1

47 Uncoordinated Checkpointing Uncoordinated checkpoints: full autonomy, and simple. Problems Most Checkpoints are not be useful Cascading rollback to the initial state (domino effect) To select a set of consistent checkpoints during a recovery, the dependency of checkpoints has to be determined and recorded together with each checkpoint Extra overhead and complexity => not simple after all 47

48 Disadvantages of Uncoordinated Checkpointing Susceptible to the domino effect Checkpoints that will never be part of a global consistent state are recorded Stable Storage overhead do not advance the recovery line A process needs to maintain multiple checkpoints and to use garbage collector to reclaim checkpoints Not suitable for output commit, because output commit requires global coordination to compute the recovery line 4

49 Coordinated Blocking (LAN based solution) Processes are coordinated to form a consistent global state, and initiator Ready! Go! * okay, channels flushed p1 * p2 * * p3 Next: Coordinated Blocking Chkpnt (cont ) 49

50 Coordinated Blocking (cont ) Advantage Always consistent No Domino Effect Less storage overhead Disadvantage Large latency to chkpnt! Next: Coordinated Non-blocking Chkpnt 50

51 Individual Log Based Protocols Work might be lost upon recovery using checkpointbased protocols By logging messages, we may be able to recover the system to where it was prior to the failure System mode: the execution of a process is modeled as a set of consecutive state intervals Each interval is initiated by a nondeterministic state or initial state We assume the only type of nondeterministic event is receiving of a message 1st State Interval 2nd State Interval 3rd State Interval P i m 0 m1 m 2 m 3 m 4 m 5 51

52 Log Based Protocols In practice, logging is always used together with checkpointing Limits the recovery time: start with the latest checkpoint instead of from the initial state Limits the size of the log: after taking a checkpoint, previously logged events can be purged Logging protocol types: Pessimistic logging: msgs are logged prior to execution Optimistic logging: msgs are logged asynchronously Causal logging: nondeterministic events that not yet logged (to stable storage) are piggybacked with each msg sent For optimistic and causal logging, dependency of processes has to be tracked => more complexity, longer recovery time 52

53 Pessimistic Logging Synchronously log every incoming message to stable storage prior to execution Each process periodically checkpoints its state: no need for coordination Recovery: a process restores its state using the last checkpoint and replay all logged incoming msgss 53

54 Lamport s logical clock Happened before relation a -> b : Event a occurred before event b. Events in the same process p1. b -> c : If b is the event of sending a message m1 in a process p1 and c is the event of receipt of the same message m1 by another process p2. a -> b, b -> c, then a -> c; -> is transitive. 54

55 Lamport s logical clock Causally Ordered Events a -> b : Event a causally affects event b Concurrent Events a e: if a!-> e and e!-> a 55

56 Lamport s logical clock Algorithm Sending end Receiving end time = time+1; time_stamp = time; send(message, time_stamp); (message, time_stamp) = receive(); time = max(time_stamp, time)+1; 56

57 a -> b Lamport s logical clock C(a) < C(b) b -> c C (b) and C(c) must be assigned in such a way that C(b) < C(c) and the clock time, C, must always go forward (increasing), never backward (decreasing). Corrections to time can be made by adding a positive value, never by subtracting one. 57

58 Lamport s logical clock An illustration: Three processes, each with its own clock. The clocks run at different rates and Lamport's algorithm corrects the clocks. 5

59 Lamport s logical clock Limitations m1 >m3 C(m1)<C(m3) m2 >m3 C(m2)<C(m3) m1 or m2 caused m3 to be sent? 59

60 Lamport s logical clock Lamport s logical clocks all events in a distributed system are totally ordered. That is, if a -> b, then we can say C(a)<C(b). Lamport s clocks nothing can be said about the actual time of a and b. logical clock says a -> b, that does not mean in terms of real time. Lamport clocks do not capture causality. If a -> c and b -> c we do not kno which action initiated c. Problems : when trying to replay events in a distributed system (such as when trying to recover after a crash). The theory goes that if one node goes down, if we know the causal relationships between messages, then we can replay those messages and respect the causal relationship to get that node back up to the state it needs to be in. Piece-wise Deterministiic (PWD)? 60

61 Vector clocks Vector clocks allow causality to be captured Rules of Vector Clocks Properties of a process Implementation 61

62 Vector clocks Rules and properties A vector clock VC(i) is assigned to an event i. If VC(i)<VC(j) for events i and j, then event i is known to causally precede j. Each process i maintains a vector V such that Vi [i] : number of events that have occurred at i Vi [j] : number of events I knows have occurred at process j 62

63 Vector clocks Implementation Before executing an event (i.e., sending a message over the network, delivering a message to an application, or some other internal event), 1. Pi executes VCj [i] ~ VCj [i] When process Pi sends a message m to Pj, it sets m's (vector) timestamp ts (m) equal to VCj after having executed the previous step. 3. Upon the receipt of a message m, process lj adjusts its own vector by setting VCj [k] ~ max{vcj [k], ts (m )[k]} for each k, after which it executes the first 63 step and delivers the message to the application.

64 Vector clocks 64

65 Sum Up: Checkpoints and Recovery Prevent Orphan process Lamport s timestamps Integer clocks assigned to events Obeys causality Cannot distinguish concurrent events Vector timestamps Obeys causality By using more space, can also identify concurrent events 65

66 Message Dependencies 66

67 Sender and Receiver - Dependencies Sender Dependency In Figure 1.(a), the process state P1 depends on P3 (state change by message m3). (after failure of process p3, if p3 restarts from state 0, p1 becomes an orphan process. Similarly, P1 transitively depends on P2 ( transitive sender dependency.). Receiver Dependency In Figure 1.(b), the process state P1 depends on P2 (message m2) After failure of process p2, m2 becomes a lost message. Process p1 should roll back and send the message m2 again. Similarly, P1 transitively depends on P3 (transitive receiver dependency). 67

68 Interacting Processes 6

69 Total Dependency Graph 69

70 Minimum Reachability Graph 70

71 Interacting Processes Total Dependency Vector clock 71

72 TDG - Cumulative State Dependencies Vector clock 72

73 Independent Dependency Tracking using TDT Vector Clock Extending Blockchains Reliable Communication Network Vs Dependency Tracking LOST Messages Tracking + Orphan Message Tracking Instantantaneous Minimum Reachability Graph Reduced time Check-poining and Rollback 73 Recovery

74 Blockchain Distributed Ledger Philosophy same as double entry book-keeping Example: Bank Passbook Credits Debits Balance Description ( transactions (cr + db = bal) No change is allowed; compensation is allowed) [Controls on Cr, Db, Bal Check SUMs] 74

75 Distributed Ledger (Blockchain) Replicated Database 75

76 Application Systems on Web Services 76

77 Checkpont Cascading Rollback Problem Last checkpoint: C 1,1 by P1, before P1 crashed Cannot use C 0,1 at P0 because it is inconsistent with C 1,1 => P0 rollbacks to C 0,0 Cannot use C 2,1 at P2 because it fails to reflect the sending of m6 => P2 rollbacks to C 2,0 P 0 P 1 m 0 C 0,0 C 0,1 m 5 m C 1,0 C 2,0 C 1,1 m 4 m 6 Crashed m 2 P 2 P 3 m1 Cannot use C 3,1 and C 3,0 as a result => P3 rollbacks 77 to initial state C 2,1 C 2,2 m 3 m 7 C 3,0 C 3,1

78 BLOCKCHAIN 7

79 79

80 Distributed Ledger Common / one Log 0

81 1

82 2

83 3

84 4

85 5

86 Different Rollback Recovery Schemes Rollback Recovery Schemes Checkpoint based Log based Uncoordinated check pointing Blockchain Pessimistic Logging Coordinated check pointing Optimistic Logging Comm. induced check pointing Casual Logging 6

87 Computing Systems- Protocols Component Level, Sub-systems, Blockchain: - End-to-end, - at Application layer System : Sum of its parts Application recovers from underlying component failure 7

88 Computing Systems- Protocols Individual System Replicated DBMS, Internal Architecture for a Distributed Application Supports reliable computations Fail-Stop Model, Byzantine Generals protocols, RPC level End-to-end Application delivery systems: Communicate thru Web Services at Application Layer Blockchain

89 Distributed Systems: new paradigms Crash / fault tolerant consensus algorithms run by one organization BLOCKCHAINS May run with multiplicity of Organizations [ Malicious Nodes ] : No trust between each other Byzantine General s Problems; Network not reliable, Internet delays, Network Partitions, Message loss and / reordering 9

90 Blockchains ( Distributed Ledger Technology ) Blockchains Clever way to detect message loss / reordering messages Log of bloc Log of blocks log of block Copies Every block in the log has a pointer back to previous block ( Linked Lists ) Broadcasts reach many nodes ( detect missing or reorder is no problem ) Distributed Database across (no trust entities ) 90

91 Time 1 : Distributed Information Systems Distributed Oracle (Database System) : 1 Organization ( Banking SBI ) : LAN based; Synchronous Communication Time 2 : Distributed Systems : Inter-bank reconciliations (group of Org. ) : LAN + Internet ; Grid Computing Time 3: Cloud-based Mushups and Computing :Aggregate Applications (Multiple organizations); Web Services 91

92 Distributed Systems in perspective 70-0s 90s 2000s LAN Synchronous Comm. LAN + Internet; Grid Computing Asynchronous Comm. Delay/Disconnectio n Web Services; Cloud Computing Asynchronous Comm. Delay/Disconnectio n Distributed Oracle +.. J2EE / Jini Web Services One Organization Banking - SBI Super-computers Group of Organizations (Banks) Networked Workstations Multiple Organizations (may be unknown)? CLOUD Multiplicity 92 of Channels

93 What are Chellenges- BANK ATMs work on dedicated lines (similar to a FAX machine, Synchronous network) AT&T goal in circa 2000 Aimed to change to telephony using Internet in -10yrs US Govt. Air-traffic Control automation in 90s (IBM) Driverless cars Real-time Control Problems ; High-speed streams of Data Stock Exchange (in TOKYO, NY, Germany) 1 hour Internet trade handling > 1 year Budget of Japan Govt. LAN+Internet 10MB (Megabits) Giga bits Networks Multiplicity of channels- 93 Blockchains

94 How to meet the Challenges- One item Big Internet ( one Bullock ) One item Big clouds ( one Bullock ) MULTIPLICITY Individual site logs [one organization] No Global Clock Time-order (LAMPORT) Log Structured Files (append only logs) [one group] Vector Clocks Distributed logs Blockchain Technolgy / Distributed Ledger Technoloogy [Not one globe?] Globally ordered transaction logs 94

95 Problems: Long Running Process Blue Gene (1999) parallel computer, for the study of bio-molecular phenomena such as protein folding P1 Process Failure Checkpoint 1 Checkpoint ( 1,., n ) : STABLE STORE Data, threads, register values Run-time overhead; Failure most recent checkpoint 64 x 64 grid of parallel computers middleware for checkpoints 95

96 Cooperating Processes Distributed System P 0 P 1 m 0 P 2 P 3 m 1 C 1,0 C 2,0 m 3 m 2 m 5 m 4 C 3,0 C 0,0 m 6 C 2,1 m 7 C 3,1 m C 1,1 C 2,2 Crashed C 0,1 96

97 Cooperating Logs / Blockchain Distributed Ledger (external for Web services, managed by a cloud data center) P 0 P 1 m 0 P 2 P 3 m 1 C 1,0 C 2,0 m 3 m 2 m 5 m 4 C 3,0 C 0,0 m 6 C 2,1 m 7 C 3,1 m C 1,1 C 2,2 C 0,1 Crashed 97

98 Problems: Long Running Blockchain External Blockchain supports the Web Service transaction can tolerate a few failure P1 Process Failure Checkpoint 1 Log is Checkpoints ( 1,., n ) : on STABLE STORE? It is all for Web Services Data, threads, register values Run-time overhead; No Failure No most recent checkpoint Dist. / parallel computers middleware for checkpoints 9

99 References Advanced Concepts in Operating Systems by Singhal and Shivaratri on pages Distributed Systems: Principles and Paradigms, Andrew S. Tanenbaum and Maarten Van Steen, (Second Edition) on pages Time, clocks, and the ordering of events in a distributed system by Lamport (197) Youtube videos

100 References C. Lee, B. Nick, U. Brandes, and P. Cunningham, Link prediction with social vector clocks, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD 13, Apr. 2013, pp M. Harrigan, Using vector clocks to visualize communication flow. in ASONAM, N. Memon and R. Alhajj, Eds. IEEE Computer Society, 2010, pp C. E. Hrischuk and C. M. Woodside, Logical clock requirements for reverse engineering scenarios from a distributed system, IEEE Trans. Software Eng., 2(4), Apr. 2002, M. Raynal and M. Singhal, Logical Time: Capturing Causality in Distributed Systems, IEEE Computer Magazine, vol. 29, no. 2, pp , Feb

101 Reference BLOCKCHAINS : 101

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi

Three Models. 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1. DEPT. OF Comp Sc. and Engg., IIT Delhi DEPT. OF Comp Sc. and Engg., IIT Delhi Three Models 1. CSV888 - Distributed Systems 1. Time Order 2. Distributed Algorithms 3. Nature of Distributed Systems1 Index - Models to study [2] 1. LAN based systems

More information

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems

Failure Models. Fault Tolerance. Failure Masking by Redundancy. Agreement in Faulty Systems Fault Tolerance Fault cause of an error that might lead to failure; could be transient, intermittent, or permanent Fault tolerance a system can provide its services even in the presence of faults Requirements

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance Fault Tolerance Basic Concepts Being fault tolerant is strongly related to what

More information

Fault Tolerance. Distributed Systems IT332

Fault Tolerance. Distributed Systems IT332 Fault Tolerance Distributed Systems IT332 2 Outline Introduction to fault tolerance Reliable Client Server Communication Distributed commit Failure recovery 3 Failures, Due to What? A system is said to

More information

Chapter 8 Fault Tolerance

Chapter 8 Fault Tolerance DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 8 Fault Tolerance 1 Fault Tolerance Basic Concepts Being fault tolerant is strongly related to

More information

Today: Fault Tolerance. Fault Tolerance

Today: Fault Tolerance. Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Fault Tolerance Dr. Yong Guan Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University Outline for Today s Talk Basic Concepts Process Resilience Reliable

More information

CSE 5306 Distributed Systems. Fault Tolerance

CSE 5306 Distributed Systems. Fault Tolerance CSE 5306 Distributed Systems Fault Tolerance 1 Failure in Distributed Systems Partial failure happens when one component of a distributed system fails often leaves other components unaffected A failure

More information

CSE 5306 Distributed Systems

CSE 5306 Distributed Systems CSE 5306 Distributed Systems Fault Tolerance Jia Rao http://ranger.uta.edu/~jrao/ 1 Failure in Distributed Systems Partial failure Happens when one component of a distributed system fails Often leaves

More information

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part II CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Reliable Group Communication Reliable multicasting: A message that is sent to a process group should be delivered

More information

Fault-Tolerant Computer Systems ECE 60872/CS Recovery

Fault-Tolerant Computer Systems ECE 60872/CS Recovery Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.

More information

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, steen@cs.vu.nl Chapter 08: Fault Tolerance Version: December 2, 2010 2 / 65 Contents Chapter

More information

Today: Fault Tolerance. Failure Masking by Redundancy

Today: Fault Tolerance. Failure Masking by Redundancy Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing

More information

A Survey of Rollback-Recovery Protocols in Message-Passing Systems

A Survey of Rollback-Recovery Protocols in Message-Passing Systems A Survey of Rollback-Recovery Protocols in Message-Passing Systems Mootaz Elnozahy * Lorenzo Alvisi Yi-Min Wang David B. Johnson June 1999 CMU-CS-99-148 (A revision of CMU-CS-96-181) School of Computer

More information

Today: Fault Tolerance. Reliable One-One Communication

Today: Fault Tolerance. Reliable One-One Communication Today: Fault Tolerance Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery Checkpointing Message logging Lecture 17, page 1 Reliable One-One Communication Issues

More information

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra

Today CSCI Recovery techniques. Recovery. Recovery CAP Theorem. Instructor: Abhishek Chandra Today CSCI 5105 Recovery CAP Theorem Instructor: Abhishek Chandra 2 Recovery Operations to be performed to move from an erroneous state to an error-free state Backward recovery: Go back to a previous correct

More information

Rollback-Recovery p Σ Σ

Rollback-Recovery p Σ Σ Uncoordinated Checkpointing Rollback-Recovery p Σ Σ Easy to understand No synchronization overhead Flexible can choose when to checkpoint To recover from a crash: go back to last checkpoint restart m 8

More information

Failure Tolerance. Distributed Systems Santa Clara University

Failure Tolerance. Distributed Systems Santa Clara University Failure Tolerance Distributed Systems Santa Clara University Distributed Checkpointing Distributed Checkpointing Capture the global state of a distributed system Chandy and Lamport: Distributed snapshot

More information

Recovering from a Crash. Three-Phase Commit

Recovering from a Crash. Three-Phase Commit Recovering from a Crash If INIT : abort locally and inform coordinator If Ready, contact another process Q and examine Q s state Lecture 18, page 23 Three-Phase Commit Two phase commit: problem if coordinator

More information

Today: Fault Tolerance

Today: Fault Tolerance Today: Fault Tolerance Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Paxos Failure recovery Checkpointing

More information

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology

Clock and Time. THOAI NAM Faculty of Information Technology HCMC University of Technology Clock and Time THOAI NAM Faculty of Information Technology HCMC University of Technology Using some slides of Prashant Shenoy, UMass Computer Science Chapter 3: Clock and Time Time ordering and clock synchronization

More information

COV885 Distributed Systems

COV885 Distributed Systems COV885 Distributed Systems Web-based Systems: Web Developments Web Services S. Bhalla, ( 2017) Special Module on Database Systems Client Server Systems Web-based Client Server Systems Web Engineering:

More information

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf Distributed systems Lecture 6: distributed transactions, elections, consensus and replication Malte Schwarzkopf Last time Saw how we can build ordered multicast Messages between processes in a group Need

More information

Today: Fault Tolerance. Replica Management

Today: Fault Tolerance. Replica Management Today: Fault Tolerance Failure models Agreement in presence of faults Two army problem Byzantine generals problem Reliable communication Distributed commit Two phase commit Three phase commit Failure recovery

More information

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems?

Distributed Systems Question Bank UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems? UNIT 1 Chapter 1 1. Define distributed systems. What are the significant issues of the distributed systems? 2. What are different application domains of distributed systems? Explain. 3. Discuss the different

More information

Distributed KIDS Labs 1

Distributed KIDS Labs 1 Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database

More information

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Systems COMP 212. Lecture 19 Othon Michail Distributed Systems COMP 212 Lecture 19 Othon Michail Fault Tolerance 2/31 What is a Distributed System? 3/31 Distributed vs Single-machine Systems A key difference: partial failures One component fails

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Distributed Systems

Distributed Systems 15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard

More information

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware

Hypervisor-based Fault-tolerance. Where should RC be implemented? The Hypervisor as a State Machine. The Architecture. In hardware Where should RC be implemented? In hardware sensitive to architecture changes At the OS level state transitions hard to track and coordinate At the application level requires sophisticated application

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today)

Last Class: Naming. Today: Classical Problems in Distributed Systems. Naming. Time ordering and clock synchronization (today) Last Class: Naming Naming Distributed naming DNS LDAP Lecture 12, page 1 Today: Classical Problems in Distributed Systems Time ordering and clock synchronization (today) Next few classes: Leader election

More information

Fault Tolerance. Basic Concepts

Fault Tolerance. Basic Concepts COP 6611 Advanced Operating System Fault Tolerance Chi Zhang czhang@cs.fiu.edu Dependability Includes Availability Run time / total time Basic Concepts Reliability The length of uninterrupted run time

More information

Fault Tolerance. Distributed Systems. September 2002

Fault Tolerance. Distributed Systems. September 2002 Fault Tolerance Distributed Systems September 2002 Basics A component provides services to clients. To provide services, the component may require the services from other components a component may depend

More information

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski

Distributed Systems. 09. State Machine Replication & Virtual Synchrony. Paul Krzyzanowski. Rutgers University. Fall Paul Krzyzanowski Distributed Systems 09. State Machine Replication & Virtual Synchrony Paul Krzyzanowski Rutgers University Fall 2016 1 State machine replication 2 State machine replication We want high scalability and

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 07 (version 16th May 2006) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:

More information

2017 Paul Krzyzanowski 1

2017 Paul Krzyzanowski 1 Question 1 What problem can arise with a system that exhibits fail-restart behavior? Distributed Systems 06. Exam 1 Review Stale state: the system has an outdated view of the world when it starts up. Not:

More information

Reliable Distributed System Approaches

Reliable Distributed System Approaches Reliable Distributed System Approaches Manuel Graber Seminar of Distributed Computing WS 03/04 The Papers The Process Group Approach to Reliable Distributed Computing K. Birman; Communications of the ACM,

More information

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN

SYNCHRONIZATION. DISTRIBUTED SYSTEMS Principles and Paradigms. Second Edition. Chapter 6 ANDREW S. TANENBAUM MAARTEN VAN STEEN DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN واحد نجف آباد Chapter 6 SYNCHRONIZATION Dr. Rastegari - Email: rastegari@iaun.ac.ir - Tel: +98331-2291111-2488

More information

MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS

MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS MESSAGE INDUCED SOFT CHEKPOINTING FOR RECOVERY IN MOBILE ENVIRONMENTS Ruchi Tuli 1 & Parveen Kumar 2 1 Research Scholar, Singhania University, Pacheri Bari (Rajasthan) India 2 Professor, Meerut Institute

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Dep. Systems Requirements

Dep. Systems Requirements Dependable Systems Dep. Systems Requirements Availability the system is ready to be used immediately. A(t) = probability system is available for use at time t MTTF/(MTTF+MTTR) If MTTR can be kept small

More information

Module 8 Fault Tolerance CS655! 8-1!

Module 8 Fault Tolerance CS655! 8-1! Module 8 Fault Tolerance CS655! 8-1! Module 8 - Fault Tolerance CS655! 8-2! Dependability Reliability! A measure of success with which a system conforms to some authoritative specification of its behavior.!

More information

Synchronization. Chapter 5

Synchronization. Chapter 5 Synchronization Chapter 5 Clock Synchronization In a centralized system time is unambiguous. (each computer has its own clock) In a distributed system achieving agreement on time is not trivial. (it is

More information

Consensus and related problems

Consensus and related problems Consensus and related problems Today l Consensus l Google s Chubby l Paxos for Chubby Consensus and failures How to make process agree on a value after one or more have proposed what the value should be?

More information

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit.

Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication. Distributed commit. Basic concepts in fault tolerance Masking failure by redundancy Process resilience Reliable communication One-one communication One-many communication Distributed commit Two phase commit Failure recovery

More information

Fault Tolerance 1/64

Fault Tolerance 1/64 Fault Tolerance 1/64 Fault Tolerance Fault tolerance is the ability of a distributed system to provide its services even in the presence of faults. A distributed system should be able to recover automatically

More information

MYE017 Distributed Systems. Kostas Magoutis

MYE017 Distributed Systems. Kostas Magoutis MYE017 Distributed Systems Kostas Magoutis magoutis@cse.uoi.gr http://www.cse.uoi.gr/~magoutis Message reception vs. delivery The logical organization of a distributed system to distinguish between message

More information

Introduction to Databases

Introduction to Databases Introduction to Databases Matthew J. Graham CACR Methods of Computational Science Caltech, 2009 January 27 - Acknowledgements to Julian Bunn and Ed Upchurch what is a database? A structured collection

More information

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following:

CS 470 Spring Fault Tolerance. Mike Lam, Professor. Content taken from the following: CS 47 Spring 27 Mike Lam, Professor Fault Tolerance Content taken from the following: "Distributed Systems: Principles and Paradigms" by Andrew S. Tanenbaum and Maarten Van Steen (Chapter 8) Various online

More information

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON.

DISTRIBUTED SYSTEMS. Second Edition. Andrew S. Tanenbaum Maarten Van Steen. Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON. DISTRIBUTED SYSTEMS 121r itac itple TAYAdiets Second Edition Andrew S. Tanenbaum Maarten Van Steen Vrije Universiteit Amsterdam, 7'he Netherlands PEARSON Prentice Hall Upper Saddle River, NJ 07458 CONTENTS

More information

Checkpointing HPC Applications

Checkpointing HPC Applications Checkpointing HC Applications Thomas Ropars thomas.ropars@imag.fr Université Grenoble Alpes 2016 1 Failures in supercomputers Fault tolerance is a serious problem Systems with millions of components Failures

More information

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson Distributed systems Lecture 6: Elections, distributed transactions, and replication DrRobert N. M. Watson 1 Last time Saw how we can build ordered multicast Messages between processes in a group Need to

More information

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University

Fault Tolerance Part I. CS403/534 Distributed Systems Erkay Savas Sabanci University Fault Tolerance Part I CS403/534 Distributed Systems Erkay Savas Sabanci University 1 Overview Basic concepts Process resilience Reliable client-server communication Reliable group communication Distributed

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Modified by: Dr. Ramzi Saifan Definition of a Distributed System (1) A distributed

More information

Failures, Elections, and Raft

Failures, Elections, and Raft Failures, Elections, and Raft CS 8 XI Copyright 06 Thomas W. Doeppner, Rodrigo Fonseca. All rights reserved. Distributed Banking SFO add interest based on current balance PVD deposit $000 CS 8 XI Copyright

More information

To do. Consensus and related problems. q Failure. q Raft

To do. Consensus and related problems. q Failure. q Raft Consensus and related problems To do q Failure q Consensus and related problems q Raft Consensus We have seen protocols tailored for individual types of consensus/agreements Which process can enter the

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 08 (version October 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 08 (version October 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20. Tel:

More information

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit Fault Tolerance o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication o Distributed Commit -1 Distributed Commit o A more general problem of atomic

More information

T ransaction Management 4/23/2018 1

T ransaction Management 4/23/2018 1 T ransaction Management 4/23/2018 1 Air-line Reservation 10 available seats vs 15 travel agents. How do you design a robust and fair reservation system? Do not enough resources Fair policy to every body

More information

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit)

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 21: Network Protocols (and 2 Phase Commit) CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring 2003 Lecture 21: Network Protocols (and 2 Phase Commit) 21.0 Main Point Protocol: agreement between two parties as to

More information

Causal Consistency and Two-Phase Commit

Causal Consistency and Two-Phase Commit Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

CSE 5306 Distributed Systems. Course Introduction

CSE 5306 Distributed Systems. Course Introduction CSE 5306 Distributed Systems Course Introduction 1 Instructor and TA Dr. Donggang Liu @ CSE Web: http://ranger.uta.edu/~dliu Email: dliu@uta.edu Phone: 817-2720741 Office: ERB 555 Office hours: Tus/Ths

More information

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju Chapter 5: Distributed Systems: Fault Tolerance Fall 2013 Jussi Kangasharju Chapter Outline n Fault tolerance n Process resilience n Reliable group communication n Distributed commit n Recovery 2 Basic

More information

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases

Distributed Database Management System UNIT-2. Concurrency Control. Transaction ACID rules. MCA 325, Distributed DBMS And Object Oriented Databases Distributed Database Management System UNIT-2 Bharati Vidyapeeth s Institute of Computer Applications and Management, New Delhi-63,By Shivendra Goel. U2.1 Concurrency Control Concurrency control is a method

More information

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases

ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases ATOMIC COMMITMENT Or: How to Implement Distributed Transactions in Sharded Databases We talked about transactions and how to implement them in a single-node database. We ll now start looking into how to

More information

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ)

Data Consistency and Blockchain. Bei Chun Zhou (BlockChainZ) Data Consistency and Blockchain Bei Chun Zhou (BlockChainZ) beichunz@cn.ibm.com 1 Data Consistency Point-in-time consistency Transaction consistency Application consistency 2 Strong Consistency ACID Atomicity.

More information

Practical Byzantine Fault

Practical Byzantine Fault Practical Byzantine Fault Tolerance Practical Byzantine Fault Tolerance Castro and Liskov, OSDI 1999 Nathan Baker, presenting on 23 September 2005 What is a Byzantine fault? Rationale for Byzantine Fault

More information

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau

The CAP theorem. The bad, the good and the ugly. Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau The CAP theorem The bad, the good and the ugly Michael Pfeiffer Advanced Networking Technologies FG Telematik/Rechnernetze TU Ilmenau 2017-05-15 1 / 19 1 The bad: The CAP theorem s proof 2 The good: A

More information

Module 8 - Fault Tolerance

Module 8 - Fault Tolerance Module 8 - Fault Tolerance Dependability Reliability A measure of success with which a system conforms to some authoritative specification of its behavior. Probability that the system has not experienced

More information

Transaction Management. Pearson Education Limited 1995, 2005

Transaction Management. Pearson Education Limited 1995, 2005 Chapter 20 Transaction Management 1 Chapter 20 - Objectives Function and importance of transactions. Properties of transactions. Concurrency Control Deadlock and how it can be resolved. Granularity of

More information

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions

Distributed Systems. Day 13: Distributed Transaction. To Be or Not to Be Distributed.. Transactions Distributed Systems Day 13: Distributed Transaction To Be or Not to Be Distributed.. Transactions Summary Background on Transactions ACID Semantics Distribute Transactions Terminology: Transaction manager,,

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

CA464 Distributed Programming

CA464 Distributed Programming 1 / 25 CA464 Distributed Programming Lecturer: Martin Crane Office: L2.51 Phone: 8974 Email: martin.crane@computing.dcu.ie WWW: http://www.computing.dcu.ie/ mcrane Course Page: "/CA464NewUpdate Textbook

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 16 - Checkpointing I Chapter 6 - Checkpointing Part.16.1 Failure During Program Execution Computers today are much faster,

More information

PRIMARY-BACKUP REPLICATION

PRIMARY-BACKUP REPLICATION PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Distributed Commit in Asynchronous Systems

Distributed Commit in Asynchronous Systems Distributed Commit in Asynchronous Systems Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Commit Problem - Either everybody commits a transaction, or nobody - This means consensus!

More information

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad

INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad Course Name Course Code Class Branch INSTITUTE OF AERONAUTICAL ENGINEERING (Autonomous) Dundigal, Hyderabad -500 043 COMPUTER SCIENCE AND ENGINEERING TUTORIAL QUESTION BANK 2015-2016 : DISTRIBUTED SYSTEMS

More information

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009

Event Ordering. Greg Bilodeau CS 5204 November 3, 2009 Greg Bilodeau CS 5204 November 3, 2009 Fault Tolerance How do we prepare for rollback and recovery in a distributed system? How do we ensure the proper processing order of communications between distributed

More information

Distributed Consensus Protocols

Distributed Consensus Protocols Distributed Consensus Protocols ABSTRACT In this paper, I compare Paxos, the most popular and influential of distributed consensus protocols, and Raft, a fairly new protocol that is considered to be a

More information

Distributed Databases

Distributed Databases Distributed Databases These slides are a modified version of the slides of the book Database System Concepts (Chapter 20 and 22), 5th Ed., McGraw-Hill, by Silberschatz, Korth and Sudarshan. Original slides

More information

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems

A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems A Review of Checkpointing Fault Tolerance Techniques in Distributed Mobile Systems Rachit Garg 1, Praveen Kumar 2 1 Singhania University, Department of Computer Science & Engineering, Pacheri Bari (Rajasthan),

More information

Distributed Deadlock

Distributed Deadlock Distributed Deadlock 9.55 DS Deadlock Topics Prevention Too expensive in time and network traffic in a distributed system Avoidance Determining safe and unsafe states would require a huge number of messages

More information

Distributed Systems Architectures. Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 12 Slide 1

Distributed Systems Architectures. Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 12 Slide 1 Objectives To explain the advantages and disadvantages of different distributed systems architectures

More information

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018 TWO-PHASE COMMIT George Porter May 9 and 11, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides

More information

A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm for Mobile Distributed System

A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm for Mobile Distributed System A Low-Overhead Minimum Process Coordinated Checkpointing Algorithm for Mobile Distributed System Parveen Kumar 1, Poonam Gahlan 2 1 Department of Computer Science & Engineering Meerut Institute of Engineering

More information

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES Introduction Storage Failure Recovery Logging Undo Logging Redo Logging ARIES Volatile storage Main memory Cache memory Nonvolatile storage Stable storage Online (e.g. hard disk, solid state disk) Transaction

More information

Synchronization. Clock Synchronization

Synchronization. Clock Synchronization Synchronization Clock Synchronization Logical clocks Global state Election algorithms Mutual exclusion Distributed transactions 1 Clock Synchronization Time is counted based on tick Time judged by query

More information

Datacenter replication solution with quasardb

Datacenter replication solution with quasardb Datacenter replication solution with quasardb Technical positioning paper April 2017 Release v1.3 www.quasardb.net Contact: sales@quasardb.net Quasardb A datacenter survival guide quasardb INTRODUCTION

More information

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems

OUTLINE. Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Chapter 5 Synchronization OUTLINE Introduction Clock synchronization Logical clocks Global state Mutual exclusion Election algorithms Deadlocks in distributed systems Concurrent Processes Cooperating processes

More information

Distributed Systems Principles and Paradigms

Distributed Systems Principles and Paradigms Distributed Systems Principles and Paradigms Chapter 01 (version September 5, 2007) Maarten van Steen Vrije Universiteit Amsterdam, Faculty of Science Dept. Mathematics and Computer Science Room R4.20.

More information

Introduction. Distributed Systems IT332

Introduction. Distributed Systems IT332 Introduction Distributed Systems IT332 2 Outline Definition of A Distributed System Goals of Distributed Systems Types of Distributed Systems 3 Definition of A Distributed System A distributed systems

More information

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus AGREEMENT PROTOCOLS Paxos -a family of protocols for solving consensus OUTLINE History of the Paxos algorithm Paxos Algorithm Family Implementation in existing systems References HISTORY OF THE PAXOS ALGORITHM

More information