Causal Consistency Algorithms for Partially Replicated and Fully Replicated Systems

Size: px

Start display at page:

Download "Causal Consistency Algorithms for Partially Replicated and Fully Replicated Systems"

Wilfred Stevens
5 years ago
Views:

1 Causal Consistency Algorithms for Partially Replicated and Fully Replicated Systems TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at Chicago TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 1 / 72Sys

2 Outline 1 Introduction 2 System Model 3 Algorithms Full-Track Algorithm: partially replicated memory Opt-Track Algorithm: partially replicated memory Opt-Track-CRP: fully replicated memory 4 Complexity Measures and Performance 5 Approximate Causal Consistency Approx-Opt-Track Performance 6 Conclusions and Future Work TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 2 / 72Sys

3 Introduction Motivation for Replication Replication makes copies of data/services on multiple sites. Replication improves... Reliability (by redundancy) If primary File Server crashes, standby File Server still works Performance Reduce communication delays Scalability Prevent overloading a single server (size scalability) Avoid communication latencies (geographic scale) However, concurrent updates become complex Consistency models define which interleavings of operations are valid (admissible) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 3 / 72Sys

4 Introduction Consistency Models Spectrum of consistency models trade-off: cost vs. convenient semantics Linearizability (Herlihy and Wing 1990) non-overlapping ops seen in order of occurrence + overlapping ops seen in common order Sequential consistency (Lamport 1979) all processes see the same interleaving of executions Causal consistency (Ahamad et al. 1991) all processes see the same order of causally related writes PRAM consistency (Lipton and Sandberg 1988) pipeline per pair of processes, for updates Slow memory (Hutto et al. 1990) pipeline per variable per pair of processes, for updates Eventual consistency (Johnson et al. 1975) updates are guaranteed to eventually propagate to all replicas TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 4 / 72Sys

5 Introduction The Motivation of Causal Consistency Figure: Causal consistency is good for users in social networks. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 5 / 72Sys

6 Introduction Related Works Causal consistency in distributed shared memory systems (Ahamad et al. [1]) Causal consistency has been studied (by Baldoni et al., Mahajan et al., Belaramani et al., Petersen et al.). Since 2011, ChainReaction (S. Almeida et al.) Bolt-on causal consistency (P. Bailis et al.) Orbe and GentleRain (J. Du et al.) Wide-Area Replicated Storage (K. Lady et al.) COPS, Eiger (W. Lloyd et al.) The above works assume full replication. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 6 / 72Sys

7 Introduction Partial Replication Figure: Case for Partial Replication. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 7 / 72Sys

8 Introduction Case for Partial Replication Partial replication is more natural for some applications. See previous fig. With p replicas at some p of the total of n DCs, each write operation that would have triggered an update broadcast now becomes a multicast to just p of the n DCs. Savings in storage and infrastructure costs For write-intensive workloads, partial replication gives a direct savings in the number of messages. Allowing flexibility in the number of DCs required in causally consistent replication remains an interesting aspect of future work. The supposedly higher cost of tracking dependency metadata is relatively small for applications such as Instagram. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 8 / 72Sys

9 Introduction Causal Consistency Causal consistency: writes that are potentially causally related must be seen by all processors in that same order. Concurrent writes may be seen in a different order on different machines. causally related writes: the write comes after a read that returned the value of the other write Examples Figure: Should enforce W(x)a < W(x)b ordering? TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 9 / 72Sys

10 System Model System Model n application processes - ap 1, ap 2,...,ap n - interacting through a shared memory Q composed of q variables x 1,x 2,...,x q Each ap i can perform either a read or a write on any of the q variables. r i (x j )v: a read operation by ap i on variable x j returns value v w i (x j )v: a write operation by ap i on variable x j writes value v local history h i : a series of read and write operations generated by process ap i global history H : the set of local histories h i from all n processes TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 10 / 72Sys

11 System Model Causality Order Program Order: a local operation o 1 precedes another local operation o 2, denoted as o 1 po o 2 Read-from Order: read operation o 2 = r (x )v retrieves the value v written by the write operation o 1 = w (x )v from a distinct process, denoted as o 1 ro o 2 Causality Order: o 1 co o 2 iff one of the following conditions holds: 9ap i s.t. o 1 po o 2 (program order) 9ap i, ap j s.t. o 1 ro o 2 (read-from order) 9o 3 2 O H s.t. o 1 co o 3 and o 3 co o 2 (transitive closure) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 11 / 72Sys

12 System Model Underlying Distributed Communication System The shared memory abstraction and its causal consistency model is implemented on top of the distributed message passing system. With n sites (connected by FIFO channels), each site s i hosts an application process ap i and holds only a subset of variables x h 2 Q. When ap i performs a write operation w (x 1 )v, it invokes the Multicast(m) to deliver the message m containing w (x 1 )v to all sites replicating x 1. send event, receive event, apply event in msg-passing layer When ap i performs a read operation r (x 2 )v, it invokes the RemoteFetch(m) to deliver the message m containing r (x 2 )v to a pre-designated site replicating x 2 to fetch its value. fetch event, remote_return event, return event in msg-passing layer TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 12 / 72Sys

13 System Model Activation Predicate Baldoni et al. [2] defined relation! co on send events. send i (m w(x )a )! co send j (m w(y)b ) iff one of the following conditions holds: 1 i = j and send i (m w(x )a ) locally precedes send j (m w(y)b ) 2 i 6= j and return j (x ; a) locally precedes send j (m w(y)b ) 3 9send k (m w(z )c ), s.t. send i (m w(x )a )! co send k (m w(z )c )! co send j (m w(y)b )! co! (Lamport s happened before relation) A write can be applied when its activation predicate becomes true there is no earlier message (under! co) which has not been locally applied TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 13 / 72Sys

14 System Model Implementing the Activation Predicate Figure: Can the write for M 3 can be applied at s 1? Only after the earlier write for M is applied. The dependency "s 1 is a destination of M " is needed as meta-data on M 3. Meta-data grows as computation progresses Objective: to reduce the size of meta-data TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 14 / 72Sys

15 Algorithms Overview of Algorithms Two algorithms [3, 4] implement causal consistency in a partially replicated distributed shared memory system. Full-Track Opt-Track (a message and space optimal algorithm) Implement the! co relation; adopt the activation predicate A special case of Opt-Track for full replication. Opt-Track-CRP (optimal) : a lower message size, time, space complexity than the Baldoni et al. algorithm [2] TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 15 / 72Sys

16 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Algorithm 1 is for a non-fully replicated system. Each application process performing write operation will only write to a subset of all the sites. Each site s i needs to track the number of write operations performed by every ap j to every site s k, denoted as Write i [j ][k ]. the Write clock piggybacked with messages generated by the Multicast(m) should not be merged with the local Write clock at the message reception, but only at a later read operation reading the value that comes with the message. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 16 / 72Sys

17 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Data structures 1 Write i - the Write clock Write i [j ; k ] : the number of updates sent by ap j to site s k that causally happened before under the! co relation. 2 Apply i - an array of integers Apply i [j ] : the number of updates written by ap j that have been applied at site s i. 3 LastWriteOn ihvariable id, Writei - a hash map of Write clocks LastWriteOn ihhi : the Write clock value associated with the last write operation on variable x h locally replicated at site s i. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 17 / 72Sys

18 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 18 / 72Sys

19 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track The activation predicate is implemented. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 19 / 72Sys

20 Algorithms Opt-Track Algorithm: partially replicated memory Algorithm 2: Opt-Track Each message corresponding to a write operation piggybacks an O(n 2 ) matrix in Algorithm 1. Algorithm 2 uses propagation constraints to further reduce the message size and storage cost. Exploits the transitive dependency of causal deliveries of messages (ideas from the KS algorithm [5]). Each site keeps a record of the most recently received message from each other site (along with the list of its destinations). optimal in terms of log space and message space overhead achieve another optimality that no redundant destination information is recorded. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 20 / 72Sys

21 Algorithms Opt-Track Algorithm: partially replicated memory Propagation Constraints: Two Situations for Destination Information to be Redundant Figure: Meta-data information = "s 2 is a destination of m". The causal future of the relevant apply and return events are shown in dotted lines. must not exist in the causal future of apply(w): (Propagation Constraint 1) apply(w ): (Propagation Constraint 2) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 21 / 72Sys

22 Algorithms Opt-Track Algorithm: partially replicated memory Algorithm 2: Opt-Track Data Structures 1 clock i local counter at site s i for write operations performed by ap i. 2 Apply i - an array of integers Apply i [j ] : the number of updates written by ap j that have been applied at site s i. 3 LOG i = fhj ; clock j ; Destsig - the local log Each entry indicates a write operation in the causal past. 4 LastWriteOn ihvariable id, LOGi - a hash map of LOGs LastWriteOn ihhi : the piggybacked LOG from the most recent update applied at site s i for locally replicated variable x h. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 22 / 72Sys

23 Algorithms Opt-Track Algorithm: partially replicated memory Figure: Write process at site s i TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 23 / 72Sys Algorithm 2: Opt-Track

24 Algorithms Opt-Track Algorithm: partially replicated memory Figure: Read, receiving processes at site s i TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 24 / 72Sys Algorithm 2: Opt-Track

25 Algorithms Opt-Track Algorithm: partially replicated memory Procedures used in Opt-Track Figure: PURGE and MERGE functions at site s i TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 25 / 72Sys

26 Algorithms Opt-Track Algorithm: partially replicated memory The Propagation Constraints: An example P 1 P 2 P 3 P 4 M 5,1 P 5 1 P 6 M 4,2 1 2 M 5,1 1 1 M 2,2 M M 4,2 4,3 3 M 6, M 4,3 M 5, causal past contains event (6,1) information about P 6 as a destination of multicast at event (5,1) propagates as piggybacked information and in Logs (a) M 2,3 3 4 M 3,3 M 3,3 Message to dest. M to P, 5,1 4 P 6 { P,P 4 6 } M to P,P 4,2 3 2 { P } 6 M to P 2,2 1 M to P 6,2 1 M 4,3 to P 6 M 4,3 to P 3 M to P 5,2 6 M to P 2,3 1 M 3,3 to P 2,P 6 (b) piggybacked M 5,1.Dests { P } 6 { P } 4 { P } 6 { } { P, 4 P } 6 { P } 6 { } Figure: Meta-data information = "P 6 is a destination of M 5;1". The propagation of explicit information and the inference of implicit information. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 26 / 72Sys

27 Algorithms Opt-Track Algorithm: partially replicated memory If the Destination List Becomes ;, then... Figure: Illustration of why it is important to keep a record even if its destination list becomes empty. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 27 / 72Sys

28 Algorithms Opt-Track-CRP: fully replicated memory Algorithm 3: Opt-Track-CRP Special case of Algorithm 2 for full replication; same optimizations. Every write operation will be sent to exactly the same set of sites; there is no need to keep a list of the destination information with each write. Represent each write operation as hi ; clock i i at site s i. the cost of a write operation down from O(n) to O(1). d entries in local log, where d = no. of write operations in local log Local log always gets reset after each write Each read will add at most one new entry into the local log TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 28 / 72Sys

29 Algorithms Opt-Track-CRP: fully replicated memory Further Improved Scalability In Algorithm 2, keeping entries with empty destination list is important. In the fully replicated case, we can also decrease this cost. s 1 send1(m(w)) LOG1 = {w} m(w(x 1 )v) receive1(m(w 0 )) LastW riteon1h2i = {w 0 } s 2 m(w 0 (x 2 )u) s 3 receive3(m(w)) LastW riteon3h1i = {w} return3(x1,v) LOG3 = {w} send3(m(w 0 )) LOG3 = {w 0 } Figure: In fully replicated systems, the local log will be reset after each write. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 29 / 72Sys

30 Algorithms Opt-Track-CRP: fully replicated memory Algorithm 3: Opt-Track-CRP Figure: There is no need to maintain the destination list for each write operation in the local log. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 30 / 72Sys

31 Complexity Measures and Performance Parameters n: number of sites in the system q: number of variables in the system p: replication factor, i.e., the number of sites where each variable is replicated w: number of write operations performed in the system r: number of read operations performed in the system d: number of write operations stored in local log (used only in Opt-Track-CRP algorithm) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 31 / 72Sys

32 Complexity Measures and Performance Complexity Table: Complexity measures of causal memory algorithms. Metric Full-Track Opt-Track Opt-Track-CRP optp [2] Message Count ((p 1) + n p (n )w + p) 2r ((p 1) + n p (n )w + p) 2r (n 1)w (n 1)w n n n n Message O(n 2 pw + nr(n p)) O(n 2 pw + nr(n p)) O(nwd) O(n 2 w) Size amortized O(npw + r(n p)) Time write O(n 2 ) write O(n 2 p) write O(d) write O(n) Complexity read O(n 2 ) read O(n 2 ) read O(d) read O(n) Space O(max(n 2 ; npq)) O(max(n 2 ; npq)) O(max(d ; q)) O(nq) Complexity amortized O(max(n ; pq)) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 32 / 72Sys

33 Complexity Measures and Performance Message Count as a Function of w rate Define write rate as w rate = w w+r Partial replication gives a lower message count than full replication if (n p) pw + 2r < nw ) w > 2 r n n ) w rate > n (1) (2) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 33 / 72Sys

34 Complexity Measures and Performance Message count: Partial Replication vs. Full Replication Message Count p = 1 p = 3 p = 5 p = 7 p = 10 Partial Replication versus Full Replication (n=10) Write Rate (w/(w+r)) Figure: The graph illustrates message count for partial replication vs. full replication, by plotting message count as a function of w rate. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 34 / 72Sys

35 Complexity Measures and Performance Message meta-data structures Figure: Partial Replication Figure: Full Replication (only SM) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 35 / 72Sys

36 Complexity Measures and Performance Simulation Methodology Time interval T e between two consecutive events from 5ms 2000ms. Propagation time T t from 100ms 3000ms. Number of processes n is varied from 5 up to 40. w rate is set to be 0.2, 0.5, and 0.8, respectively. Replica factor rate p n for partial replication is defined as 0.3. Message meta-data size (m s ): The total size of all the meta-data transmitted over all the processes. Each simulation execution runs 600n operation events totally. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 36 / 72Sys

37 Complexity Measures and Performance Meta-Data Space Overhead in Partial Replication 16 Average Message Size (KB) SM(Full-Track) FM(Full-Track) RM(Full-Track) SM(Opt-Track) FM(Opt-Track) RM(Opt-Track) Processes Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 37 / 72Sys

38 Complexity Measures and Performance Meta-Data Space Overhead in Partial Replication 16 Average Message Size (KB) SM(Full-Track) FM(Full-Track) RM(Full-Track) SM(Opt-Track) FM(Opt-Track) RM(Opt-Track) Processes Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 38 / 72Sys

39 Complexity Measures and Performance Meta-Data Space Overhead in Partial Replication 16 Average Message Size (KB) SM(Full-Track) FM(Full-Track) RM(Full-Track) SM(Opt-Track) FM(Opt-Track) RM(Opt-Track) Processes Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 39 / 72Sys

40 Complexity Measures and Performance Ratio of Message Overhead of Opt-Track to Full-Track Ratio of Message Overhead of Opt-Track to Full-Track write rate:.2 write rate:.5 write rate: Processes Figure: Total message meta-data space overhead as a function of n and w rate in partial replication protocols. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 40 / 72Sys

41 Complexity Measures and Performance Meta-Data Size in Partial Replication With increasing n, the ratio rapidly decreases. For 40 processes, the Opt-Track overheads are only around % of Full-Track overheads on different write rates. In Full-Track protocol, the average message space overheads of SM and RM quadratically increases with n. However, the increasingly lower overhead of SM and RM in Opt-Track are linear in n. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 41 / 72Sys

42 Complexity Measures and Performance Meta-Data Space Overhead in Full Replication 700 Average Message Size (Byte) SM(optP) SM(Opt-Track-CRP) Processes Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 42 / 72Sys

43 Complexity Measures and Performance Meta-Data Space Overhead in Full Replication 700 Average Message Size (Byte) SM(optP) SM(Opt-Track-CRP) Processes Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 43 / 72Sys

44 Complexity Measures and Performance Meta-Data Space Overhead in full Replication 700 Average Message Size (Byte) SM(optP) SM(Opt-Track-CRP) Processes Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 44 / 72Sys

45 Complexity Measures and Performance Ratio of Message Overhead of Opt-Track-CRP to optp Ratio of Message Overhead of Opt-Track-CRP to optp write rate:.2 write rate:.5 write rate: Processes Figure: Total message meta-data space overhead as a function of n and w rate in full replication protocols. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 45 / 72Sys

46 Complexity Measures and Performance Message Count: Partial Replication vs. Full Replication Total message size overhead = Total message count (meta-data size + replicated data size) X meta-data size replicated data size Figure: Total message count for Full Replication (Opt-Track-CRP) VS. Partial Replication (Opt-Track). TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 46 / 72Sys

47 Complexity Measures and Performance Message Size: Partial Replication vs. Full Replication (a) when n = 40, p = 12, w rate = 0.5, in the worst case. (b) when n = 40, p = 12, w rate = 0.5, in the real case. Figure: Total message size for Full Replication (Opt-Track-CRP) and Partial Replication (Opt-Track). TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 47 / 72Sys

48 Approximate Causal Consistency Approximate Causal Consistency For some applications where the data size is small (e.g, wall posts in Facebook), the size of the meta-data can be a problem. Can further reduce meta-data overheads at the risk of some (rare) violations of causal consistency. As dependencies age, w.h.p. the messages they represent get delivered and the dependencies need not be carried around and stored. Amount of violations can be made arbitrarily small by controlling a parameter called credits. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 48 / 72Sys

49 Approximate Causal Consistency Approx-Opt-Track Notion of Credits: Case 1 Figure: Reduce the meta-data at the cost of some possible violations of causal consistency. The amount of violations can be made arbitrarily small by controlling a tunable parameter (credits). TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 49 / 72Sys

50 Approximate Causal Consistency Approx-Opt-Track Notion of Credits: Case 2 Figure: Illustration of meta-data reduction when credits are exhausted. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 50 / 72Sys

51 Approximate Causal Consistency Approx-Opt-Track Approx-Opt-Track Integrate the notion of credits into the Opt-Track algorithm, to give an algorithm Approx-Opt-Track [6] that can fine-tune the amount of causal consistency by trading off the size of meta-data overhead. Give three instantiations of credits (hop count, time-to-live, and metric distance) Violation Error Rate: R e = ne m c n e: number of messages applied by the remote replicated sites with violation of causal consistency m c: total number of transmitted messages m Meta-Data Saving Rate: R s = 1- s (cr6=1) m s (Opt Track) m s: message meta-data size, the total size of all the meta-data transmitted over all the processes TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 51 / 72Sys

52 Approximate Causal Consistency Performance Simulation Methodology in Approx-Opt-Track Time interval T e between two consecutive events from 5ms 2000ms. Propagation time T t 100ms 3000ms. Number of processes n is varied from 5 up to 40. w rate is set to 0.2, 0.5, and 0.8, respectively. Replica factor rate p n for partial replication defined as 0.3. Number of variables q used is 100. (?) Hop Count Credit (cr): denotes the hop count available before the entry meta-data ages out and is removed. (?) Initial credit cr is specified from one to a critical value cr 0, with which there is no message transmission violating causal consistency. Each simulation execution runs 600n operation events totally. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 52 / 72Sys

53 Approximate Causal Consistency Performance Violation Error Rates Peers 10 Peers 20 Peers 30 Peers 40 Peers Violation Error Rate Initial Credits Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 53 / 72Sys

54 Approximate Causal Consistency Performance Violation Error Rates Peers 10 Peers 20 Peers 30 Peers 40 Peers Violation Error Rate Initial Credits Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 54 / 72Sys

55 Approximate Causal Consistency Performance Violation Error Rates Violation Error Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 55 / 72Sys

56 Approximate Causal Consistency Performance Critical Initial Credits Figure: Summary of the critical values of cr 0 and cr0:5%. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 56 / 72Sys

57 Approximate Causal Consistency Performance Impact of initial cr on R e cr 0 : major critical initial credit (R e = 0) cr0:5% : minor critical initial credit (R e 0.5%). cr0:5% seems not to highly increase as n. By setting the initial cr to a small finite value but enough, most of the dependencies will become aged and can be removed without violating causal consistency after the associated meta-data is transmitted across a few hops (even for a large number of processes.) The correlation coefficients of cr 0 and n are around The major critical credit values increase as n to avoid causal violations. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 57 / 72Sys

58 Approximate Causal Consistency Performance Average Meta-Data Size (KB) 2.2 Average Meta-Data Size (KB) Peers 10Peers 20Peers 30Peers 40Peers Initial Credits Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 58 / 72Sys

59 Approximate Causal Consistency Performance Average Meta-Data Size (KB) Average Meta-Data Size (KB) Peers 10Peers 20Peers 30Peers 40Peers Initial Credits Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 59 / 72Sys

60 Approximate Causal Consistency Performance Average Meta-Data Size (KB) Average Meta-Data Size (KB) Peers 10Peers 20Peers 30Peers 40Peers Initial Credits Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 60 / 72Sys

Approximate Causal Consistency Performance Critical Average Message Meta-Data Size m ave (KB) Figure: Summary of the critical average message meta-data sizes.

61 Approximate Causal Consistency Performance Critical Average Message Meta-Data Size m ave (KB) Figure: Summary of the critical average message meta-data sizes. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 61 / 72Sys

62 Approximate Causal Consistency Performance Critical Average Message Meta-Data Size With increasing number of processes, m ave linearly increases. m ave becomes smaller with a higher w rate for more processes. A read operation will invoke a MERGE function to merge the piggybacked log list. So, a higher read rate may increase the likelihood to make the log size enlarged. Furthermore, although a write operation results in the increase of explicit information, it comes with a PURGE function to delete the redundant information. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 62 / 72Sys

63 Approximate Causal Consistency Performance Message Meta-Data Saving Rate Meta-Data Size Saving Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 63 / 72Sys

64 Approximate Causal Consistency Performance Message Meta-Data Saving Rate Meta-Data Size Saving Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 64 / 72Sys

65 Approximate Causal Consistency Performance Message Meta-Data Saving Rate Meta-Data Size Saving Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 65 / 72Sys

Approximate Causal Consistency Performance Critical Message Meta-Data Size Saving Rates R s Figure: Summary of the critical message meta-data size saving rates.

66 Approximate Causal Consistency Performance Critical Message Meta-Data Size Saving Rates R s Figure: Summary of the critical message meta-data size saving rates. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 66 / 72Sys

67 Approximate Causal Consistency Performance Critical Message Meta-Data Size Saving Rate Focus on the case of 40 processes: R s is around 40% 60% at a very slight cost of violating causal consistency. R s reaches around 5% 20% without violating causality order in different write rates. This evidence proves that if the initial credit allocation is just a small digit, when the corresponding meta-data is removed, the associated message would already (very likely) have reached its destination. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 67 / 72Sys

68 Conclusions and Future Work Conclusions Opt-Track has a better network capacity utilization and better scalability than Full-Track for causal consistency in partial replication. The meta-data overhead of Opt-Track is linear in n. These improvements increase in higher write-intensive workloads. For the case of 40 processes, the overheads of Opt-Track are only around % of those of Full-Track for different write rates. Opt-Track-CRP can perform better than optp (Baldoni et. al 2006) in full replication. For 40 processes, the overheads of Opt-Track-CRP are only around % of those of optp for different write rates. Showed the advantages of partial replication and provided the conditions under which partial replication can provide less overhead than full replication. Modification of Opt-Track, called Approx-Opt-Track, to provide approximate causal consistency by reducing the size of the meta-data. By controlling a parameter cr, we can trade-off the level of potential inaccuracy by the size of meta-data. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 68 / 72Sys

69 Conclusions and Future Work Future Work Reduce the size of the meta-data for maintaining causal consistency in partially replicated systems. Dynamic and Data-driven Replication Mechanism (Optimize the replication mechanism). Which file? How many replicas? Where? When? (i.e., the replica factor rate p n is a variable.) Hierarchical Causal Consistency Protocol in Partial Replication. A client-cluster model (two-level architecture) for causal consistency under partial replication. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 69 / 72Sys

70 Conclusions and Future Work References I M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto. Causal memory: Definitions, implementation and programming. Distributed Computing, 9(1):37 49, R. Baldoni, A. Milani, and S. Tucci-piergiovanni. Optimal propagation based protocols implementing causal memories. Distributed Computing, 18(6): , Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu. OPCAM: optimal algorithms implementing causal memories in shared memory systems. In Proceedings of the 2015 International Conference on Distributed Computing and Networking, ICDCN 2015, Goa, India, January 4-7, 2015, pages 16:1 16:4. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 70 / 72Sys

71 Conclusions and Future Work References II Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu. Causal consistency for geo-replicated cloud storage under partial replication. In IPDPS Workshops, pages IEEE, A. Kshemkalyani and M. Singhal. Necessary and sufficient conditions on information for causal message ordering and their optimal implementation. Distributed Computing, 11(2):91 111, April Ta Yuan Hsu and Ajay D. Kshemkalyani. Performance of approximate causal consistency for partially replicated systems. In Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC), pages ACM, TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 71 / 72Sys

72 Conclusions and Future Work Questions Thank You! TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 72 / 72Sys

An Implementation of Causal Memories using the Writing Semantic

An Implementation of Causal Memories using the Writing Semantic R. Baldoni, C. Spaziani and S. Tucci-Piergiovanni D. Tulone Dipartimento di Informatica e Sistemistica Bell-Laboratories Universita di Roma