DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

Similar documents
DISTRIBUTED SYSTEMS [COMP9243] Lecture 5: Synchronisation and Coordination (Part 2) TRANSACTION EXAMPLES TRANSACTIONS.

Synchronisation and Coordination (Part 2)

Synchronization. Chapter 5

Synchronization Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Synchronization Part 2. REK s adaptation of Claypool s adaptation oftanenbaum s Distributed Systems Chapter 5 and Silberschatz Chapter 17

Synchronization. Clock Synchronization

(Pessimistic) Timestamp Ordering

(Pessimistic) Timestamp Ordering. Rules for read and write Operations. Read Operations and Timestamps. Write Operations and Timestamps

Distributed Systems Principles and Paradigms

Distributed Transaction Management

Synchronization (contd.)

Transactions. A Banking Example

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Transactions. Transactions. Distributed Software Systems. A client s banking transaction. Bank Operations. Operations in Coordinator interface

Distributed Systems (ICE 601) Transactions & Concurrency Control - Part1

Distributed Systems COMP 212. Revision 2 Othon Michail

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Consistency in Distributed Systems

Part III Transactions

Control. CS432: Distributed Systems Spring 2017

CS5412: TRANSACTIONS (I)

Problem: if one process cannot perform its operation, it cannot notify the. Thus in practise better schemes are needed.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Exam 2 Review. Fall 2011

Coordination and Agreement

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

Basic vs. Reliable Multicast

Distributed Systems Fault Tolerance

Chapter 22. Transaction Management

Coordination and Agreement

CS October 2017

CS 425 / ECE 428 Distributed Systems Fall 2017

Topics in Reliable Distributed Systems

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

Distributed Systems Synchronization

Distributed Systems. coordination Johan Montelius ID2201. Distributed Systems ID2201

Distributed Systems COMP 212. Lecture 19 Othon Michail

Distributed Transactions Brian Nielsen

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

Some Examples of Conflicts. Transactional Concurrency Control. Serializable Schedules. Transactions: ACID Properties. Isolation and Serializability

Fault Tolerance. Distributed Systems. September 2002

Failure Tolerance. Distributed Systems Santa Clara University

Distributed Systems 8L for Part IB

CprE Fault Tolerance. Dr. Yong Guan. Department of Electrical and Computer Engineering & Information Assurance Center Iowa State University

Distributed Databases Systems

CSE 5306 Distributed Systems. Fault Tolerance

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Chapter 9: Concurrency Control

CSE 5306 Distributed Systems

Distributed Transactions

Distributed Synchronization. EECS 591 Farnam Jahanian University of Michigan

The UNIVERSITY of EDINBURGH. SCHOOL of INFORMATICS. CS4/MSc. Distributed Systems. Björn Franke. Room 2414

Fault Tolerance. Fall 2008 Jussi Kangasharju

Distributed Transaction Management 2003

PRIMARY-BACKUP REPLICATION

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Distributed Systems Principles and Paradigms. Chapter 08: Fault Tolerance

Distributed Systems

Exam 2 Review. October 29, Paul Krzyzanowski 1

transaction - (another def) - the execution of a program that accesses or changes the contents of the database

CS 347 Parallel and Distributed Data Processing

Transaction Management. Pearson Education Limited 1995, 2005

Chapter 5: Distributed Systems: Fault Tolerance. Fall 2013 Jussi Kangasharju

permanent. Otherwise (only one process refuses or crashs), the state before beginning the operations is reload.

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Distributed Systems. 12. Concurrency Control. Paul Krzyzanowski. Rutgers University. Fall 2017

TWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018

Page 1. Goals of Today s Lecture" Two Key Questions" Goals of Transaction Scheduling"

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Causal Consistency and Two-Phase Commit

Last Class: Clock Synchronization. Today: More Canonical Problems

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

T ransaction Management 4/23/2018 1

Page 1. Goals of Today s Lecture. The ACID properties of Transactions. Transactions

Reminder from last time

Transaction Processing: Concurrency Control. Announcements (April 26) Transactions. CPS 216 Advanced Database Systems

Database Replication: A Tutorial

Fault Tolerance Part II. CS403/534 Distributed Systems Erkay Savas Sabanci University

Transactions and Concurrency Control

Distributed System. Gang Wu. Spring,2018

Chapter 6 Synchronization (2)

Multicast Communication. Brian Nielsen

Today: Fault Tolerance. Fault Tolerance

Intro to Transactions

Figure 13.1 Transactions T and U with exclusive locks. Transaction T: Bank$Withdraw(A, 4) Bank$Deposit(B, 4)

Concurrency Control. Transaction Management. Lost Update Problem. Need for Concurrency Control. Concurrency control

殷亚凤. Synchronization. Distributed Systems [6]

Today: Fault Tolerance. Reliable One-One Communication

CS 347 Parallel and Distributed Data Processing

Multi-User-Synchronization

Chapter 15 : Concurrency Control

Chapter 13 : Concurrency Control

CS5412: TRANSACTIONS (I)

Concurrency control CS 417. Distributed Systems CS 417

PROCESS SYNCHRONIZATION

Transaction Processing: Basics - Transactions

Transcription:

TRANSACTIONS Transaction: DISTRIBUTED SYSTEMS [COMP94] Comes from database world Defines a sequence of operations Atomic in presence of multiple clients and failures Slide Lecture 5: Synchronisation and Coordination (Part ) Slide Mutual Exclusion ++: Protect shared data against simultaneous access Allow multiple data items to be modified in single atomic action ➀ Transactions ➁ Multicast ➂ Elections Transaction Model: Operations: BeginTransaction EndTransaction Read Write End of Transaction: Commit Abort TRANSACTION EXAMPLES Inventory: Previous inventory New inventory Input tapes Computer Output tape Slide TRANSACTIONS Slide 4 Today's updates Banking: BeginTransaction b = A.Balance(); A.Withdraw(b); B.Deposit(b); EndTransaction TRANSACTIONS ACID PROPERTIES

ACID PROPERTIES NESTED TRANSACTION Slide 5 atomic: all-or-nothing. once committed the full transaction is performed, if aborted, there is no trace left; consistent: the transaction does not violate system invariants (i.e. it does not produce inconsistent results) isolated: transactions do not interfere with each other i.e. no intermediate state of a transaction is visible outside (also called serialisable); durable: after a commit, results are permanent (even if server or hardware fails) Slide 7 Example: Booking a flight Sydney Manila Manila Amsterdam Amsterdam Toronto What to do? Abort whole transaction Commit nonaborted parts of transaction only Partially commit transaction and try alternative for aborted part CLASSIFICATION OF TRANSACTIONS Flat: sequence of operations that satisfies ACID T commit T abort Nested: hierarchy of transactions T commit Slide 6 Distributed: (flat) transaction that is executed on distributed data Flat Transactions: Simple Failure all changes undone BeginTransaction accounta -= ; accountb += 5; accountc += 5; accountd += 5; EndTransaction Slide 8 T T commit abort T T Subtransactions and parent transactions abort abort Parent transaction may commit even if some subtransactions aborted Parent transaction aborts all subtransactions abort NESTED TRANSACTION NESTED TRANSACTION 4

Writeahead Log: Subtransactions: Subtransaction can abort any time Subtransaction cannot commit until parent ready to commit In-place update with writeahead logging Roll back on Abort Slide 9 Subtransaction either aborts or commits provisionally Provisionally committed subtransaction reports provisional commit list, containing all its provisionally committed subtransactions, to parent On commit, all subtransaction in that list are committed On abort, all subtransactions in that list are aborted. Slide TRANSACTION ATOMICITY IMPLEMENTATION CONCURRENCY CONTROL (ISOLATION) Private Workspace: Perform all tentative operations on a shadow copy Atomically swap with main copy on Commit Discard shadow on Abort. Simultaneous Transactions: Clients accessing bank accounts Travel agents booking flights Inventory system updated by cash registers Slide Index Private workspace Original index Slide Problems: Simultaneous transactions may interfere Lost update Inconsistent retrieval Consistency and Isolation require that there is no interference Why? Free blocks (a) (b) (c) Concurrency Control Algorithms: Guarantee that multiple transactions can be executed simultaneously while still being isolated. As though transactions executed one after another TRANSACTION ATOMICITY IMPLEMENTATION 5 CONFLICTS AND SERIALISABILITY 6

CONFLICTS AND SERIALISABILITY SERIALISABLE EXECUTION Read/Write Conflicts Revisited: Serial Equivalence: Slide conflict: operations (from the same, or different transactions) that operate on same data read-write conflict: one of the operations is a write Slide 5 conflicting operations performed in same order on all data items operation in T before T, or operation in T before T write-write conflict: more than one operation is a write Are the following serially equivalent? Schedule: Total ordering (interleaving) of operations Legal schedules provide results as though transactions serialised (serial equivalence) R (x)w (x)r (y)w (y)r (x)w (y) R (x)r (y)w (y)r (x)w (x)w (y) R (x)r (x)w (x)w (y)r (y)w (y) R (x)w (x)r (x)w (y)r (y)w (y) Example Schedules: MANAGING CONCURRENCY Slide 4 Slide 6 Dealing with Concurrency: Locking Timestamp Ordering Optimistic Control SERIALISABLE EXECUTION 7 MANAGING CONCURRENCY 8

Transaction Managers: TWO PHASE LOCKING (PL) Lock point Transactions Number of locks Growing phase Shrinking phase Slide 7 READ/WRITE Transaction manager BEGIN_TRANSACTION END_TRANSACTION Slide 9 Scheduler Data manager LOCK/RELEASE or Timestamp operations Execute read/write Time ➀ Lock granted if no conflicting locks on that data item. Otherwise operation delayed until lock released. ➁ Lock is not released until operation executed by data manager ➂ No more locks granted after a release has taken place All schedules formed using PL are serialisable. Why? PROBLEMS WITH LOCKING Deadlock: LOCKING Detect and break deadlocks (in scheduler) Timeout on locks Slide 8 Pessimistic approach: prevent illegal schedules Lock must be obtained from scheduler before a read or write. Scheduler grants and releases locks Ensures that only valid schedules result Slide Cascaded Aborts: Release(T i,x) Lock(T j,x) Abort(T i) T j will have to be aborted too : Problem: dirty read: seen value from non-committed transaction solution: Strict Two-Phase Locking: Release all locks at Commit/Abort TWO PHASE LOCKING (PL) 9 TIMESTAMP ORDERING

TIMESTAMP ORDERING Each transaction has unique timestamp (ts(t i)) Each operation (TS(W), TS(R)) receives its transaction s timestamp Each data item has two timestamps: OPTIMISTIC CONTROL Slide read timestamp: ts RD(x) - transaction that most recently read x write timestamp: ts WR(x) - committed transaction that most recently wrote x Also tentative write timestamps (noncommitted writes) ts wr(x) Timestamp ordering rule: Slide Assume that no conflicts will occur. Detect conflicts at commit time Three phases: Working (using shadow copies) Validation Update write request only valid if TS(W) > ts WR and TS(W) ts RD read request only valid if TS(R) > ts WR Conflict resolution: Operation with lower timestamp executed first Write Read ts RD (x) ts WR (x) ts(t ) ts RD (x) ts WR (x) ts(t ) (T ) (T ) (T ) (T ) (T ) (T ) Validation: Keep track of read set and write set during working phase Slide ts WR (x) ts wr (x) ts(t ) (T ) (T ) (T ) ts WR (x) ts wr (x) (T ) (T 4 ) (T ) ts(t ) using state from T Slide 4 During validation make sure conflicting operations with overlapping transactions are serialisable Make sure T v doesn t read items written by other T is Why? ts WR (x) ts wr (x) ts(t ) (T ) (T 4 ) (T ) ts WR (x) ts wr (x) (T ) (T ) ts(t ) wait until T commits Make sure T v doesn t write items read by other T is Why? Make sure T v doesn t write items written by other T is Why? Prevent overlapping of validation phases (mutual exclusion) ts WR (x) ts(t ) ts WR (x) ts(t ) (T 4 ) (T ) (T 4 ) (T ) OPTIMISTIC CONTROL OPTIMISTIC CONTROL

Backward validation: Check committed overlapping transactions Only have to check if T v read something another T i has written Abort T v if conflict Have to keep old write sets Distributed Flat Transaction: Client T Slide 5 Forward validation: Slide 7 Check not yet committed overlapping transactions Only have to check if T v wrote something another T i has read Options on conflict: abort T v, abort T i, wait Read sets of not yet committed transactions may change during validation! Server W Server X Server Y Server Z Distributed Nested Transaction: DISTRIBUTED TRANSACTIONS Client In distributed system, a single transaction will, in general, involve several servers: T transaction may require several services, Slide 6 transaction involves files stored on different servers All servers must agree to Commit or Abort, and do this atomically. Slide 8 Server X T Server Y T Transaction Management: Centralised Distributed Server M T Server N T Server O T Server P T DISTRIBUTED TRANSACTIONS DISTRIBUTED CONCURRENCY CONTROL 4

DISTRIBUTED CONCURRENCY CONTROL Distributed Timestamps: Assigning unique timestamps: Transaction manager Timestamp assigned by first scheduler accessed Clocks have to be roughly synchronized Slide 9 Scheduler Scheduler Scheduler Data manager Data manager Data manager Slide Distributed Optimistic Control: Validation operations distributed over servers Commitment deadlock (because of mutual exclusion of validation) Parallel validation protocol Make sure that transaction serialised correctly Machine A Machine B Machine C DISTRIBUTED LOCKING Slide Centralised PL: Single server handles all locks Scheduler only grants locks, transaction manager contacts data manager for operation. Primary PL: Each data item is assigned a primary copy Scheduler on that server responsible for locks Distributed PL: Data can be replicated Scheduler on each machine responsible for locking own data Read lock: contact any replica Write lock: contact all replicas Slide ATOMICITY AND DISTRIBUTED TRANSACTIONS Distributed Transaction Organisation: Each distributed transaction has a coordinator, the server handling the initial BeginTransaction call Coordinator maintains a list of workers, i.e. other servers involved in the transaction Each worker needs to know coordinator Coordinator is responsible for ensuring that whole transaction is atomically committed or aborted Require a distributed commit protocol. DISTRIBUTED LOCKING 5 DISTRIBUTED ATOMIC COMMIT 6

Two-phase commit: Worker: NewServer CanCommit DoCommit DISTRIBUTED ATOMIC COMMIT running yes uncertain committed Slide Transaction may only be able to commit when all workers are ready to commit (e.g. validation in optimistic concurrency) Hence distributed commit requires at least two phases:. Voting phase: all workers vote on commit, coordinator then decides whether to commit or abort.. Completion phase: all workers commit or abort according to decision. Basic protocol is called two-phase commit (PC) Slide 5 DoAbort abort CanCommit aborted abort. receives CanCommit, sends yes, abort;. receives DoCommit, DoAbort What are the assumptions? Two-phase commit: Coordinator: Slide 4 CommitReq CanCommit{ n abort() DoAbort{ n yes() yes() yes(n )... n yes(n) DoCommit{ n abort()... DoAbort{ n abort(n) DoAbort{ n Slide 6 Limitations: Once node voted yes, cannot change its mind, even if crashes. Atomic state update to ensure yes vote is stable. If coordinator crashes, all workers may be blocked. aborted. sends CanCommit, receives yes, abort; committed Can use different protocols (e.g. three-phase commit), in some circumstances workers can obtain result from other workers.. sends DoCommit, DoAbort DISTRIBUTED ATOMIC COMMIT 7 DISTRIBUTED ATOMIC COMMIT 8

Two-phase commit of nested transactions: Two-phase commit is required, as a worker might crash after provisional commit machine B Slide 7 On CanCommit request, worker: votes no : if it has no recollection of subtransactions of committing transaction (i.e. must have crashed recently), otherwise Slide 9 machine A machine C machine D aborts subtransactions of aborted transactions, saves provisionally committed transactions in stable store, votes yes. machine E Two Approaches: Hierarchic PC Flat PC Sender performs a single send() Group of receivers Membership of group is transparent EXAMPLES Fault Tolerance: Replicated (redundant) servers Strong consistency: multicast operations Service Discovery: Multicast request for service Slide 8 MULTICAST Slide 4 Reply from service provider Performance: Replicated servers or data Weaker consistency: multicast operations or data Event or Notification propagation: Group members are those interested in particular events Example: sensor data, stock updates, network status MULTICAST 9 PROPERTIES

PROPERTIES Group membership: OTHER ISSUES Static: membership does not change Dynamic: membership changes Performance: Bandwidth Open vs Closed group: Delay Slide 4 Closed group: only members can send Open group: anyone can send Reliability: Communication failure vs process failure Guarantee of delivery: all members (or none) Atomic all non-failed members Slide 4 Efficiency: Avoid sending a message over a link multiple times (stress) Distribution tree Hardware support (e.g., Ethernet broadcast) Network-level vs Application-level: Network routers understand multicast Ordering: Guarantee of ordered delivery FIFO, Causal, Total Order Applications (or middleware) send unicasts to group members Overlay distribution tree EXAMPLES REVISITED NETWORK-LEVEL MULTICAST Fault Tolerance: Reliability: Atomic Ordering: Total Membership: Static Group: Closed "You put packets in at one end, and the network conspires to deliver them to anyone who asks." Dave Clark Slide 4 Service Discovery: Reliability: No guarantee Ordering: None Membership: Static Group: Open Slide 44 Ethernet Broadcast: all hosts on local network MAC address: FF:FF:FF:FF:FF:FF Performance: IP Multicast: Reliability: Non-failed Ordering: FIFO, Causal Membership: Dynamic Group: Closed multicast group: class D Internet address: first 4 bits: (4... to 9.55.55.55) Event or Notification propagation: Reliability: Non-failed Ordering: Causal Membership: Dynamic Group: Open permanent groups: 4... - 4...55 multicast routers join group: Internet Group Management Protocol (IGMP) set distribution trees: Protocol Independent Multicast (PIM) OTHER ISSUES APPLICATION-LEVEL MULTICAST SYSTEM MODEL

APPLICATION-LEVEL MULTICAST SYSTEM MODEL Slide 45 msend(g,m) <...> S send(m) m = receive(g) mdeliver(m) <...> S deliver(m) Application Multicast Middleware OS Network Slide 47 B-send(g,m) { foreach p in g { send(p, m); Assumptions: reliable one-to-one channels no failures single closed group deliver(m) { B-deliver(m); BASIC MULTICAST FIFO MULTICAST A B A B Slide 46 Slide 48 B A A B no reliability guarantees no ordering guarantees A B A B order maintained per sender BASIC MULTICAST FIFO MULTICAST 4

CAUSAL MULTICAST FO-init() { S = ; // local sequence # for (i = to N) V[i] = ; // vector of last seen seq #s A B Slide 49 FO-send(g, m) { S++; B-send(g, <m,s>); // multicast to everyone Slide 5 A B A B order maintained between causally related sends and A, and B are concurrent happens before B Slide 5 B-deliver(<m,S>) { if (S == V[sender(m)] + ) { // expecting this msg, so deliver FO-deliver(m); V[sender(m)] = S; else if (S > V[sender(m)] + ) { // not expecting this msg, so put in queue for later enqueue(<m,s>); // check if msgs in queue have become deliverable foreach <m,s> in queue { if (S == V[sender(m)] + ) { FO-deliver(m); dequeue(<m,s>); V[sender(m)] = S; Slide 5 CO-init() { // vector of what we ve delivered already for (i = to N) V[i] = ; CO-send(g, m) { V[i]++; B-send(g, <m,v>); B-deliver(<m,Vj>) { // j = sender(m) enqueue(<m,vj>); // make sure we ve delivered everything the message // could depend on wait until Vj[j] == V[j] + and Vj[k] <= V[k] (k!= j) CO-deliver(m); dequeue(<m,vj>); V[j]++; CAUSAL MULTICAST 5 TOTALLY ORDERED MULTICAST 6

TOTALLY ORDERED MULTICAST Agreement-based: Slide 5 A??? B Slide 55 P P P P4 A B A B message proposed sequence agreed sequence Other possibilities: Slide 54 Sequencer Based: P Sequencer P Slide 56 Moving sequencer Logical clock based each receiver determines order independently delivery based on sender timestamp ordering how do you know you have most recent timestamp? Token based Physical clock ordering Hybrid Ordering: P message sequence number FIFO + Total Causal + Total Dealing with Failure: Communication Process TOTALLY ORDERED MULTICAST 7 ELECTIONS 8

Determining a coordinator: Assume all nodes have unique id possible assumption: processes know all other process s ids but don t know if they are up or down Slide 57 ELECTIONS Slide 59 Election: agree on which non-crashed process has largest id number Requirements: ➀ Safety: A process either doesn t know the coordinator or it knows the id of the process with largest id number ➁ Liveness: Eventually, a process crashes or knows the coordinator BULLY ALGORITHM Three types of messages: Election: announce election Coordinator: Some algorithms rely on a distinguished coordinator process Coordinator needs to be determined Answer: response to election Coordinator: announce elected coordinator A process begins an election when it notices through a timeout that the coordinator has failed or receives an Election message Slide 58 May also need to change coordinator at runtime Election: Goal: when algorithm finished all processes agree who new coordinator is. Slide 6 When starting an election, send Election to all higher-numbered processes If no Answer is received, the election starting process is the coordinator and sends a Coordinator message to all other processes If an Answer arrives, it waits a predetermined period of time for a Coordinator message If a process knows it is the highest numbered one, it can immediately answer with Coordinator ELECTIONS 9 BULLY ALGORITHM

Slide 6 4 Election Election Election 5 6 4 OK 5 6 4 7 7 7 Previous coordinator has crashed (a) (b) (c) 5 5 OK Election Election 5 Election 6 Slide 6 Previous coordinator has crashed 7 [5,6,] [5,6] Election message [] [,] 4 7 OK 6 Coordinator 4 7 6 No response 6 [5] 5 4 (d) (e) What are the assumptions? What are the assumptions? RING ALGORITHM Two types of messages: Slide 6 Election: forward election data Coordinator: announce elected coordinator Processes ordered in ring A process begins an election when it notices through a timeout that the coordinator has failed. Sends message to first neighbour that is up Every node adds own id to Election message and forwards along the ring Election finished when originator receives Election message again Forwards message on as Coordinator message Slide 64 HOMEWORK We only discussed distributed transactions, but not replicated transactions. What changes if we introduce replication? Do the techniques we ve discussed still work? How well does PC deal with failure? Can you improve it to deal with more types of failure? Hacker s edition: Do the Multicast (Erlang) exercise RING ALGORITHM READING LIST

READING LIST Optional Slide 65 Total Order Broadcast and Multicast Algorithms: Taxonomy and Survey everything you always wanted to know... Elections in a distributed computing system Bully algortihm READING LIST