6.033 Spring Lecture #19. Distributed transactions Availability Replicated State Machines spring 2018 Katrina LaCurts

Size: px

Start display at page:

Download "6.033 Spring Lecture #19. Distributed transactions Availability Replicated State Machines spring 2018 Katrina LaCurts"

Norma Logan
5 years ago
Views:

1 6.033 Spring 2018 Lecture #19 Distributed transactions Availability Replicated State Machines spring 2018 Katrina Laurts

2 goal: build reliable systems from unreliable components the abstraction that makes that easier is transactions, which provide atomicity and isolation, while not hindering performance atomicity isolation shadow copies (simple, poor performance) or logs (better performance, a bit more complex) two-phase locking we also want transaction-based systems to be distributed to run across multiple machines and to remain available even through failures spring 2018 Katrina Laurts

3 1 write1 (X) 2 write2 (X) (replica of ) problem: replica servers can become inconsistent spring 2018 Katrina Laurts

4 if primary fails, switches to backup ( knows how to contact backup servers) primary chooses order of operations, decides all non-deterministic values primary AKs coordinator only after it s sure that backup has all updates (backup) attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018 Katrina Laurts

5 if primary fails, switches to backup ( knows how to contact backup servers) (dead) (backup) attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018 Katrina Laurts

6 if primary fails, switches to backup ( knows how to contact backup servers) (dead) attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018 Katrina Laurts

7 multiple coordinators + the network = problems 1 2 (backup) attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018 Katrina Laurts

8 multiple coordinators + the network = problems 1 network partition 2 (backup) attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018

9 multiple coordinators + the network = problems 1 network partition 2 (backup, but primary for 2) attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018

10 multiple coordinators + the network = problems 1 network partition 2 (backup, but primary for 2) 1 and 2 are using different primaries; and are no longer consistent attempt: coordinators communicate with primary servers, who communicate with backup servers spring 2018

11 use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

12 1:, view server keeps a table that maintains a sequence of views use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

13 primary 1:, backup view server alerts primary/backups about their roles (backup) use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

14 primary sends updates to, gets AKs from backup (as before) primary 1:, backup (backup) use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

15 primary? coordinators make requests to view server to find out who is primary 1:, (backup) use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

16 coordinators contact primary (as before) primary? 1:, (backup) use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

17 primary? 1:, primary/backup(s) ping view server so that it can discover failures (backup) use a view server, which determines which replica is the primary spring 2018 Katrina Laurts

18 handling primary failure (dead) lack of pings indicates to that is down 1:, " (backup) spring 2018 Katrina Laurts

19 handling primary failure (dead) 1:, 2:, -- primary " spring 2018 Katrina Laurts

20 handling primary failure (dead) primary? 1:, 2:, -- " spring 2018 Katrina Laurts

21 handling primary failure (dead) 1:, 2:, -- " spring 2018 Katrina Laurts

22 handling primary failure due to partition network partition 1:, (backup) pose a partition keeps from communicating with the view ser spring 2018

23 handling primary failure due to partition (presumed dead) network partition 1:, lack of pings indicates to that is down (backup) spring 2018

24 handling primary failure due to partition (presumed dead) network p 1:, 2:, -- artition primary makes primary spring 2018

25 primary handling primary failure due to partition (presumed dead) network partition 1:, 2:, -- question: what happens before knows it s the primary? spring 2018

26 handling primary failure due to partition (presumed dead) network partition 1:, 2:, -- rejected by primary will act as backup (accept updates from, reject coordinator requests) spring 2018

27 handling primary failure due to partition (presumed dead) network partition 1:, 2:, -- question: what happens after knows it s the primary, but also thinks it is? spring 2018

28 handling primary failure due to partition rejected by (can t get AK from ) 1:, 2:, -- network partition (presumed dead) rejected by won t be able to act as primary (can t accept client requests because it won t get AKs from ) spring 2018

29 " 1:, problem: what if view server fails? spring 2018 Katrina Laurts

30 " 1:, problem: what if view server fails? go to recitation tomorrow and find out spring 2018 Katrina Laurts

31 Replicated state machines (RSMs) provide single-copy consistency: operations complete as if there is a single copy of the data, though internally there are replicas. RSMs use a primary-backup mechanism for replication. The view server ensures that only one replica acts as the primary. It can also recruit new backups after servers fail. To extend this model to handle view-server failures, we need a mechanism to provide distributed consensus; see tomorrow s recitation (on Raft) spring 2018 Katrina Laurts

32 MIT OpenourseWare ompute stem i ee i Spring 201 For information about citing these materials or our Terms of Use, visit: 32

6.033 Spring Lecture #18. Distributed transactions Multi-site atomicity Two-phase commit spring 2018 Katrina LaCurts

6.033 Spring Lecture #18. Distributed transactions Multi-site atomicity Two-phase commit spring 2018 Katrina LaCurts 6.033 Spring 2018 Lecture #18 Distributed transactions Multi-site atomicity Two-phase 1 goal: build reliable systems from unreliable components the abstraction that makes that easier is transactions, which