Distributed Systems. Day 9: Replication [Part 1]

Size: px

Start display at page:

Download "Distributed Systems. Day 9: Replication [Part 1]"

Ariel Bates
5 years ago
Views:

1 Distributed Systems Day 9: Replication [Part 1]

2 Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Does your client know about all of F s servers? Security issues? Performance issues? How do clients send all requests to replicas? Multicast? Special protocol? Clients send requests To all replicas Servers use vector Clocks Control ordering Node Node Node C Shard 1 Shard 1 Shard 1 Clocks allow ``same ordering across servers Partition data into shards, maps shards to server with consistent hashing Node D Node E Node F Shard 2 Shard 2 Shard 2 Maintain multiple copies For fault tolerance and to reduce latency

3 Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Replica Transparency: Clients interact with a Front End server () controls replica to the servers hides replica complexity is owned by F à knows all F servers is stateless à crash does not lead to data loss Clients send requests To all replicas Servers use vector Clocks Control ordering Node Node Node C Shard 1 Shard 1 Shard 1 Clocks allow ``same ordering across servers Partition data into shards, maps shards to server with consistent hashing Node D Node E Node F Shard 2 Shard 2 Shard 2 Maintain multiple copies For fault tolerance and to reduce latency

4 Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data bstract Representation Clients send requests To all replicas Servers use vector Clocks Control ordering Node Node Node C Shard 1 Shard 1 Shard 1 Clocks allow ``same ordering across servers Partition data into shards, maps shards to server with consistent hashing Node D Node E Node F Shard 2 Shard 2 Shard 2 Maintain multiple copies For fault tolerance and to reduce latency Server C

5 Replication (Server Replication) Maintain several copies of data because Recovery from failure Lower latency: put data closer to client Issues with replication Consistency: should all data be identical? vailability: client have access to at least one data? Performance: quick reads/writes to data? Server C

6 Do We lways Need Total Ordering? For some applications: yes!!! My back account For others: ``causal ordering Dependency between events are preserved à : on all servers is processed before. New Post: 1 Comment on Post Like post New Post: 2 Comment on Post Like post Make file1 Write to file1 Make file2 Delete file2 dd $100 dd $100 Withdraw $100 How do I capture and enforce causal ordering??? New Post: 3 Comment on Post Like post

7 Data Consistency Challenges With Replication Ideal consistency: ll replicas behave as one server Linearizabilityà total order according to real time Recall, Linearizability is almost impossible Total order is costly to achieve Practical consistency: all replicas have identical ordering on subset of events Casual ordering? Fifo ordering? Server C

8 Hash table k 0 v 0 k 1 v 1 k 2 v 2 k 3 v 3 ll Facebook Data Does your client know about all of F s servers? Security issues? Performance issues? How do clients send all requests to replicas? Multicast? Special protocol? Clients send requests To all replicas Servers use vector Clocks Control ordering Node Node Node C Shard 1 Shard 1 Shard 1 Clocks allow ``same ordering across servers Partition data into shards, maps shards to server with consistent hashing Node D Node E Node F Shard 2 Shard 2 Shard 2 Maintain multiple copies For fault tolerance and to reduce latency

9 pproaches to Replication ctive Replication Passive Replication Lazy Replication (follower) (leader) Server C ctive Replication FIFO ordering Tolerates byzantine failures Server C Passive Replication Total ordering Server C (Follower) Protocols: Zookeeper, Paxos, Chubby Lazy replication Causal ordering Protocols: Gossip, DynamoD, CassandraD, VoldemortD, MongoD

10 System ssumptions

11 ssumptions! Each program is a state machine Deterministic Given initial state + sequence of events Terminates at same state Replicated State Machine (RSM) Implications of RSM Each server can independently process events ND reach same conclusion ONLY if events are total ordered

12 Total Ordering Time line If totally ordered what is the final state? If FIFO ordered what are potential final states?

13 FIFO Ordering 1 Time line If totally ordered what is the final state? If FIFO ordered what are potential final states?

14 FIFO Ordering 2 Time line CRSH!!! If totally ordered what is the final state? If FIFO ordered what are potential final states?

15 ssumptions! Each program is a state machine Deterministic Given initial state + sequence of events Terminates at same state Replicated State Machine (RSM) Implications of RSM Each server can independently process events ND reach same conclusion ONLY if events are total ordered Given identical initial states, applying updates in same sequence results in same final state

16 Server Recovery with RSM On node failure Start new replica Feed it sequence of events Recovery Interesting questions: How to detect failures? How to determine sequence of events? New replica server Sequence of events

17 Server Recovery with RSM To recover from N failures, You need N+1 replicas. There is always at least one live node On node failure Start new replica Feed it sequence of events Manager Interesting questions: How to detect failures? How to determine sequence of events? Server C manager process uses heartbeats to track node liveness. The manager can get log of events from a working node to bootstrap a new replica

18 ctive Replication

19 Ordered Reliable Multicast Local Sequencer Requires N*N messages Step 1: gets ID from local sequencer Step 2: Sends (event, ID) to all servers Step 3: On receipt: server sends to all server + Sends the same msg (event, ID) Then processes it with RSM Step 4: waits for response receives response from all servers knows msg was delivered Server C

20 Implementing FIFO Ordering Local sequencer: Process locally maintain ID ID is monotonic Client Pseudocode Increment ID ssign ID to message Send message to all servers ID_++; sendmsg (m, ID_) Server 1 Server Pseudocode Maintain a queue for each client Order message by ID -- process in order This is FIFO Order ID_++; clients clients sendmsg (m, ID_) Client maintain different IDs: local and monotonic Server 2

21 ctive Replication ctive Replication Steps: Request Client à to Replicas Ordering: ordered reliable multicast Local Sequencer Execute à each server executes Happens have total ordered multicast is completed No Need for greement!!! Response Replicas à -> Client (after replicas respond) Server C

22 Passive Replication

23 Terminology and ssumptions Servers are divided into two groups Leader: only one at a time Followers: all non-leader servers only communicates with Leader (leader) (follower) Leader provides role of global sequencer Leader orders events Leader is singe point of failure Will discuss recovery in Raft!!! Server C (Follower)

24 Passive Replication Steps: Request Client à to Leader Ordering: Leader provides total ordering Only processes messages one at a time Execute à only leader executes greementà leader informs follower of response ll replicas agree on value Response Leader à à Client (leader) (follower) Server C (Follower)

25 ctive Versus Passive Replication

26 Replication Reliable Multicast: Lots of Messages Single point of failure ctive Replication Passive Replication Local Sequencer (follower) (leader) Server C Only leader executes: Saves CPU Server C (Follower)

27 Replication Single point of failure Must wait for ll followers Passive Replication Steps: Request Client à to Leader Ordering: Leader provides total ordering Only processes messages one at a time Execute à only leader executes greementà leader informs follower of response ll replicas agree on value Response Leader response FTER greement Leader à à Client ctive Replication Steps: Request Client à to Replicas Ordering: ordered reliable multicast Execute à each server executes Happens have total ordered multicast is completed No Need for greement!!! Response Replicas à -> Client (after replicas respond) Requires N*N messages Must wait for ll Servers Requires N* CPU processing

Replication in Distributed Systems

Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over