Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Size: px

Start display at page:

Download "Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol"

Godwin Clement Lane
6 years ago
Views:

1 Global atomicity In distributed systems a set of processes may be taking part in executing a task Their actions may have to be atomic with respect to processes outside of the set example: in a distributed system a transaction must be processed at every site (or at none of the sites) Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

2 Two-phase commit protocol Assumptions: one of the processes acts as coordinator, other are referred to as cohorts (from the latin word for one of the 10 divisions of an ancient Roman legion ) stable storage and write-ahead log protocol active at each site

3 Two-phase commit protocol: definition Phase I At the coordinator - The coordinator sends a COMMIT-REQUEST message to every cohort requesting them to commit - The coordinator waits for replies from all cohorts At the cohorts Upon receiving the COMMIT-REQUEST message the cohort takes the following action: - if the transaction is successful it writes UNDO and REDO log on stable storage, then sends an AGREED message - otherwise it sends an ABORT message to the coordinator Phase II At the coordinator - If all cohorts reply AGREED, then the coordinator writes a COMMIT record in the log, and sends a COMMIT message to all the cohorts. Otherwise, it sends an ABORT msg to all the cohorts - The coordinator then waits for acknowledgements from all cohorts - If an ack is not received from a cohort after a timeout period, the coordinator resends the commit/abort message to that cohort - If all acks are received, the coordinator writes a COMPLETE record At the cohorts Upon receiving a COMMIT message a cohort release all the resources held for executing the transaction and sends an ack Upon receiving an ABORT message a cohort undoes the transaction using the UNDO log record, releases all the resources held and sends an ack

4 Two-phase commit protocol: message exchanges Coordinator Cohorts Transaction successful write UNDO, REDO on log Send ABORT Write COMMIT in log Release resources and locks Undo transaction using UNDO rec Write COMPLETE in log

5 Two-phase commit protocol: correctness Things that can go wrong: coordinator crashes before having written COMMIT record on recovery the coordinator broadcasts an ABORT msg coordinator crashes between writing the COMMIT and COMPLETE records on recovery the coordinator broadcasts a COMMIT message, waits for acks coordinator crashes after writing the COMPLETE record on recovery there is nothing to do a cohort crashes in Phase I coordinator can abort the transaction because it won t receive a reply a cohort crashes in Phase II (i.e. after writing UNDO and REDO recs) on recovery cohort will check with the coordinator whether to abort or commit; commit may require a redo operation if failure happened before updating database

6 Byzantine Generals Problems The problem is an abstract model of a computer system that may have faulty components that may send conflicting information to different parts of the system The dining philospher equivalent: Several divisions of the Byzantine army surround an enemy city. Each division is commanded by a general. The generals can communicate with each other only through messenger. They need to arrive at a common plan after observing the enemy.

7 Complications Some of the generals may be traitors, who may try to prevent the loyal generals from reaching agreement by sending false messages. Required: An algorithm to guarantee that A. All loyal generals decide upon the same plan of action, irrespective of what the traitors do. B. A small number of traitors cannot cause the loyal generals to adopt a bad plan.

8 Impossibility results commander attack attack lieutenant 1 lieutenant 2 he said retreat TRAITOR attack commander TRAITOR retreat lieutenant 1 lieutenant 2 he said retreat

9 General impossibility result Using this we can show that no solution with fewer than 3m + 1 generals can cope with m traitors.

10 Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes.

11 Consistency models The scenario we will be studying: Some sort of shared data that we will call data store (examples: shared memory, shared file system) multiple processes perform read/write operations on data store Each process has a local (nearby copy of the entire store A consistency model is a contract between processes and the data store

12 The issue of Data Consistency Assume writes to an object become visible to all in the same order But when does a write become visible, exactly? How to establish orders between a write and a read by different procs? Think of event synchronization by using more than one data object P 1 P 2 /*Assume initial value of A and flag is 0*/ A = 1; while (flag == 0); /*spin idly*/ flag = 1; print A; Programmers expect data store to respect order between accesses to different data objects issued by a given process to preserve orders among accesses to same object by different processes

13 Definition: Strict Consistency Any read on a data item x returns a value corresponding to the result of the most recent write on x Example below with two processes, operating on the same data item. (a) A strictly consistent store. (b) A store that is not strictly consistent. Very intuitive but impossible to implement because assumes existence of global physical clock

14 Sequential Consistency Processors issuing memory references as per program or der P 1 P 2 P n The switch is randomly set after each memory reference Memory A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program. [Lamport, 1979] Total order achieved by interleaving accesses from different processes as if there were no replicas, and a single copy of data Important points: Maintains program order operations appear to execute in the same order to all processes Programmer s intuition is maintained

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components

Fault Tolerance. it continues to perform its function in the event of a failure example: a system with redundant components Fault Tolerance To avoid disruption due to failure and to improve availability, systems are designed to be fault-tolerant Two broad categories of fault-tolerant systems are: systems that mask failure it