Causal Consistency Algorithms for Partially Replicated and Fully Replicated Systems

Size: px
Start display at page:

Download "Causal Consistency Algorithms for Partially Replicated and Fully Replicated Systems"

Transcription

1 Causal Consistency Algorithms for Partially Replicated and Fully Replicated Systems TaYuan Hsu, Ajay D. Kshemkalyani, Min Shen University of Illinois at Chicago TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 1 / 72Sys

2 Outline 1 Introduction 2 System Model 3 Algorithms Full-Track Algorithm: partially replicated memory Opt-Track Algorithm: partially replicated memory Opt-Track-CRP: fully replicated memory 4 Complexity Measures and Performance 5 Approximate Causal Consistency Approx-Opt-Track Performance 6 Conclusions and Future Work TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 2 / 72Sys

3 Introduction Motivation for Replication Replication makes copies of data/services on multiple sites. Replication improves... Reliability (by redundancy) If primary File Server crashes, standby File Server still works Performance Reduce communication delays Scalability Prevent overloading a single server (size scalability) Avoid communication latencies (geographic scale) However, concurrent updates become complex Consistency models define which interleavings of operations are valid (admissible) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 3 / 72Sys

4 Introduction Consistency Models Spectrum of consistency models trade-off: cost vs. convenient semantics Linearizability (Herlihy and Wing 1990) non-overlapping ops seen in order of occurrence + overlapping ops seen in common order Sequential consistency (Lamport 1979) all processes see the same interleaving of executions Causal consistency (Ahamad et al. 1991) all processes see the same order of causally related writes PRAM consistency (Lipton and Sandberg 1988) pipeline per pair of processes, for updates Slow memory (Hutto et al. 1990) pipeline per variable per pair of processes, for updates Eventual consistency (Johnson et al. 1975) updates are guaranteed to eventually propagate to all replicas TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 4 / 72Sys

5 Introduction The Motivation of Causal Consistency Figure: Causal consistency is good for users in social networks. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 5 / 72Sys

6 Introduction Related Works Causal consistency in distributed shared memory systems (Ahamad et al. [1]) Causal consistency has been studied (by Baldoni et al., Mahajan et al., Belaramani et al., Petersen et al.). Since 2011, ChainReaction (S. Almeida et al.) Bolt-on causal consistency (P. Bailis et al.) Orbe and GentleRain (J. Du et al.) Wide-Area Replicated Storage (K. Lady et al.) COPS, Eiger (W. Lloyd et al.) The above works assume full replication. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 6 / 72Sys

7 Introduction Partial Replication Figure: Case for Partial Replication. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 7 / 72Sys

8 Introduction Case for Partial Replication Partial replication is more natural for some applications. See previous fig. With p replicas at some p of the total of n DCs, each write operation that would have triggered an update broadcast now becomes a multicast to just p of the n DCs. Savings in storage and infrastructure costs For write-intensive workloads, partial replication gives a direct savings in the number of messages. Allowing flexibility in the number of DCs required in causally consistent replication remains an interesting aspect of future work. The supposedly higher cost of tracking dependency metadata is relatively small for applications such as Instagram. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 8 / 72Sys

9 Introduction Causal Consistency Causal consistency: writes that are potentially causally related must be seen by all processors in that same order. Concurrent writes may be seen in a different order on different machines. causally related writes: the write comes after a read that returned the value of the other write Examples Figure: Should enforce W(x)a < W(x)b ordering? TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 9 / 72Sys

10 System Model System Model n application processes - ap 1, ap 2,...,ap n - interacting through a shared memory Q composed of q variables x 1,x 2,...,x q Each ap i can perform either a read or a write on any of the q variables. r i (x j )v: a read operation by ap i on variable x j returns value v w i (x j )v: a write operation by ap i on variable x j writes value v local history h i : a series of read and write operations generated by process ap i global history H : the set of local histories h i from all n processes TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 10 / 72Sys

11 System Model Causality Order Program Order: a local operation o 1 precedes another local operation o 2, denoted as o 1 po o 2 Read-from Order: read operation o 2 = r (x )v retrieves the value v written by the write operation o 1 = w (x )v from a distinct process, denoted as o 1 ro o 2 Causality Order: o 1 co o 2 iff one of the following conditions holds: 9ap i s.t. o 1 po o 2 (program order) 9ap i, ap j s.t. o 1 ro o 2 (read-from order) 9o 3 2 O H s.t. o 1 co o 3 and o 3 co o 2 (transitive closure) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 11 / 72Sys

12 System Model Underlying Distributed Communication System The shared memory abstraction and its causal consistency model is implemented on top of the distributed message passing system. With n sites (connected by FIFO channels), each site s i hosts an application process ap i and holds only a subset of variables x h 2 Q. When ap i performs a write operation w (x 1 )v, it invokes the Multicast(m) to deliver the message m containing w (x 1 )v to all sites replicating x 1. send event, receive event, apply event in msg-passing layer When ap i performs a read operation r (x 2 )v, it invokes the RemoteFetch(m) to deliver the message m containing r (x 2 )v to a pre-designated site replicating x 2 to fetch its value. fetch event, remote_return event, return event in msg-passing layer TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 12 / 72Sys

13 System Model Activation Predicate Baldoni et al. [2] defined relation! co on send events. send i (m w(x )a )! co send j (m w(y)b ) iff one of the following conditions holds: 1 i = j and send i (m w(x )a ) locally precedes send j (m w(y)b ) 2 i 6= j and return j (x ; a) locally precedes send j (m w(y)b ) 3 9send k (m w(z )c ), s.t. send i (m w(x )a )! co send k (m w(z )c )! co send j (m w(y)b )! co! (Lamport s happened before relation) A write can be applied when its activation predicate becomes true there is no earlier message (under! co) which has not been locally applied TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 13 / 72Sys

14 System Model Implementing the Activation Predicate Figure: Can the write for M 3 can be applied at s 1? Only after the earlier write for M is applied. The dependency "s 1 is a destination of M " is needed as meta-data on M 3. Meta-data grows as computation progresses Objective: to reduce the size of meta-data TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 14 / 72Sys

15 Algorithms Overview of Algorithms Two algorithms [3, 4] implement causal consistency in a partially replicated distributed shared memory system. Full-Track Opt-Track (a message and space optimal algorithm) Implement the! co relation; adopt the activation predicate A special case of Opt-Track for full replication. Opt-Track-CRP (optimal) : a lower message size, time, space complexity than the Baldoni et al. algorithm [2] TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 15 / 72Sys

16 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Algorithm 1 is for a non-fully replicated system. Each application process performing write operation will only write to a subset of all the sites. Each site s i needs to track the number of write operations performed by every ap j to every site s k, denoted as Write i [j ][k ]. the Write clock piggybacked with messages generated by the Multicast(m) should not be merged with the local Write clock at the message reception, but only at a later read operation reading the value that comes with the message. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 16 / 72Sys

17 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track Data structures 1 Write i - the Write clock Write i [j ; k ] : the number of updates sent by ap j to site s k that causally happened before under the! co relation. 2 Apply i - an array of integers Apply i [j ] : the number of updates written by ap j that have been applied at site s i. 3 LastWriteOn ihvariable id, Writei - a hash map of Write clocks LastWriteOn ihhi : the Write clock value associated with the last write operation on variable x h locally replicated at site s i. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 17 / 72Sys

18 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 18 / 72Sys

19 Algorithms Full-Track Algorithm: partially replicated memory Algorithm 1: Full-Track The activation predicate is implemented. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 19 / 72Sys

20 Algorithms Opt-Track Algorithm: partially replicated memory Algorithm 2: Opt-Track Each message corresponding to a write operation piggybacks an O(n 2 ) matrix in Algorithm 1. Algorithm 2 uses propagation constraints to further reduce the message size and storage cost. Exploits the transitive dependency of causal deliveries of messages (ideas from the KS algorithm [5]). Each site keeps a record of the most recently received message from each other site (along with the list of its destinations). optimal in terms of log space and message space overhead achieve another optimality that no redundant destination information is recorded. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 20 / 72Sys

21 Algorithms Opt-Track Algorithm: partially replicated memory Propagation Constraints: Two Situations for Destination Information to be Redundant Figure: Meta-data information = "s 2 is a destination of m". The causal future of the relevant apply and return events are shown in dotted lines. must not exist in the causal future of apply(w): (Propagation Constraint 1) apply(w ): (Propagation Constraint 2) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 21 / 72Sys

22 Algorithms Opt-Track Algorithm: partially replicated memory Algorithm 2: Opt-Track Data Structures 1 clock i local counter at site s i for write operations performed by ap i. 2 Apply i - an array of integers Apply i [j ] : the number of updates written by ap j that have been applied at site s i. 3 LOG i = fhj ; clock j ; Destsig - the local log Each entry indicates a write operation in the causal past. 4 LastWriteOn ihvariable id, LOGi - a hash map of LOGs LastWriteOn ihhi : the piggybacked LOG from the most recent update applied at site s i for locally replicated variable x h. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 22 / 72Sys

23 Algorithms Opt-Track Algorithm: partially replicated memory Figure: Write process at site s i TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 23 / 72Sys Algorithm 2: Opt-Track

24 Algorithms Opt-Track Algorithm: partially replicated memory Figure: Read, receiving processes at site s i TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 24 / 72Sys Algorithm 2: Opt-Track

25 Algorithms Opt-Track Algorithm: partially replicated memory Procedures used in Opt-Track Figure: PURGE and MERGE functions at site s i TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 25 / 72Sys

26 Algorithms Opt-Track Algorithm: partially replicated memory The Propagation Constraints: An example P 1 P 2 P 3 P 4 M 5,1 P 5 1 P 6 M 4,2 1 2 M 5,1 1 1 M 2,2 M M 4,2 4,3 3 M 6, M 4,3 M 5, causal past contains event (6,1) information about P 6 as a destination of multicast at event (5,1) propagates as piggybacked information and in Logs (a) M 2,3 3 4 M 3,3 M 3,3 Message to dest. M to P, 5,1 4 P 6 { P,P 4 6 } M to P,P 4,2 3 2 { P } 6 M to P 2,2 1 M to P 6,2 1 M 4,3 to P 6 M 4,3 to P 3 M to P 5,2 6 M to P 2,3 1 M 3,3 to P 2,P 6 (b) piggybacked M 5,1.Dests { P } 6 { P } 4 { P } 6 { } { P, 4 P } 6 { P } 6 { } Figure: Meta-data information = "P 6 is a destination of M 5;1". The propagation of explicit information and the inference of implicit information. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 26 / 72Sys

27 Algorithms Opt-Track Algorithm: partially replicated memory If the Destination List Becomes ;, then... Figure: Illustration of why it is important to keep a record even if its destination list becomes empty. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 27 / 72Sys

28 Algorithms Opt-Track-CRP: fully replicated memory Algorithm 3: Opt-Track-CRP Special case of Algorithm 2 for full replication; same optimizations. Every write operation will be sent to exactly the same set of sites; there is no need to keep a list of the destination information with each write. Represent each write operation as hi ; clock i i at site s i. the cost of a write operation down from O(n) to O(1). d entries in local log, where d = no. of write operations in local log Local log always gets reset after each write Each read will add at most one new entry into the local log TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 28 / 72Sys

29 Algorithms Opt-Track-CRP: fully replicated memory Further Improved Scalability In Algorithm 2, keeping entries with empty destination list is important. In the fully replicated case, we can also decrease this cost. s 1 send1(m(w)) LOG1 = {w} m(w(x 1 )v) receive1(m(w 0 )) LastW riteon1h2i = {w 0 } s 2 m(w 0 (x 2 )u) s 3 receive3(m(w)) LastW riteon3h1i = {w} return3(x1,v) LOG3 = {w} send3(m(w 0 )) LOG3 = {w 0 } Figure: In fully replicated systems, the local log will be reset after each write. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 29 / 72Sys

30 Algorithms Opt-Track-CRP: fully replicated memory Algorithm 3: Opt-Track-CRP Figure: There is no need to maintain the destination list for each write operation in the local log. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 30 / 72Sys

31 Complexity Measures and Performance Parameters n: number of sites in the system q: number of variables in the system p: replication factor, i.e., the number of sites where each variable is replicated w: number of write operations performed in the system r: number of read operations performed in the system d: number of write operations stored in local log (used only in Opt-Track-CRP algorithm) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 31 / 72Sys

32 Complexity Measures and Performance Complexity Table: Complexity measures of causal memory algorithms. Metric Full-Track Opt-Track Opt-Track-CRP optp [2] Message Count ((p 1) + n p (n )w + p) 2r ((p 1) + n p (n )w + p) 2r (n 1)w (n 1)w n n n n Message O(n 2 pw + nr(n p)) O(n 2 pw + nr(n p)) O(nwd) O(n 2 w) Size amortized O(npw + r(n p)) Time write O(n 2 ) write O(n 2 p) write O(d) write O(n) Complexity read O(n 2 ) read O(n 2 ) read O(d) read O(n) Space O(max(n 2 ; npq)) O(max(n 2 ; npq)) O(max(d ; q)) O(nq) Complexity amortized O(max(n ; pq)) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 32 / 72Sys

33 Complexity Measures and Performance Message Count as a Function of w rate Define write rate as w rate = w w+r Partial replication gives a lower message count than full replication if (n p) pw + 2r < nw ) w > 2 r n n ) w rate > n (1) (2) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 33 / 72Sys

34 Complexity Measures and Performance Message count: Partial Replication vs. Full Replication Message Count p = 1 p = 3 p = 5 p = 7 p = 10 Partial Replication versus Full Replication (n=10) Write Rate (w/(w+r)) Figure: The graph illustrates message count for partial replication vs. full replication, by plotting message count as a function of w rate. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 34 / 72Sys

35 Complexity Measures and Performance Message meta-data structures Figure: Partial Replication Figure: Full Replication (only SM) TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 35 / 72Sys

36 Complexity Measures and Performance Simulation Methodology Time interval T e between two consecutive events from 5ms 2000ms. Propagation time T t from 100ms 3000ms. Number of processes n is varied from 5 up to 40. w rate is set to be 0.2, 0.5, and 0.8, respectively. Replica factor rate p n for partial replication is defined as 0.3. Message meta-data size (m s ): The total size of all the meta-data transmitted over all the processes. Each simulation execution runs 600n operation events totally. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 36 / 72Sys

37 Complexity Measures and Performance Meta-Data Space Overhead in Partial Replication 16 Average Message Size (KB) SM(Full-Track) FM(Full-Track) RM(Full-Track) SM(Opt-Track) FM(Opt-Track) RM(Opt-Track) Processes Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 37 / 72Sys

38 Complexity Measures and Performance Meta-Data Space Overhead in Partial Replication 16 Average Message Size (KB) SM(Full-Track) FM(Full-Track) RM(Full-Track) SM(Opt-Track) FM(Opt-Track) RM(Opt-Track) Processes Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 38 / 72Sys

39 Complexity Measures and Performance Meta-Data Space Overhead in Partial Replication 16 Average Message Size (KB) SM(Full-Track) FM(Full-Track) RM(Full-Track) SM(Opt-Track) FM(Opt-Track) RM(Opt-Track) Processes Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 39 / 72Sys

40 Complexity Measures and Performance Ratio of Message Overhead of Opt-Track to Full-Track Ratio of Message Overhead of Opt-Track to Full-Track write rate:.2 write rate:.5 write rate: Processes Figure: Total message meta-data space overhead as a function of n and w rate in partial replication protocols. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 40 / 72Sys

41 Complexity Measures and Performance Meta-Data Size in Partial Replication With increasing n, the ratio rapidly decreases. For 40 processes, the Opt-Track overheads are only around % of Full-Track overheads on different write rates. In Full-Track protocol, the average message space overheads of SM and RM quadratically increases with n. However, the increasingly lower overhead of SM and RM in Opt-Track are linear in n. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 41 / 72Sys

42 Complexity Measures and Performance Meta-Data Space Overhead in Full Replication 700 Average Message Size (Byte) SM(optP) SM(Opt-Track-CRP) Processes Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 42 / 72Sys

43 Complexity Measures and Performance Meta-Data Space Overhead in Full Replication 700 Average Message Size (Byte) SM(optP) SM(Opt-Track-CRP) Processes Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 43 / 72Sys

44 Complexity Measures and Performance Meta-Data Space Overhead in full Replication 700 Average Message Size (Byte) SM(optP) SM(Opt-Track-CRP) Processes Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 44 / 72Sys

45 Complexity Measures and Performance Ratio of Message Overhead of Opt-Track-CRP to optp Ratio of Message Overhead of Opt-Track-CRP to optp write rate:.2 write rate:.5 write rate: Processes Figure: Total message meta-data space overhead as a function of n and w rate in full replication protocols. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 45 / 72Sys

46 Complexity Measures and Performance Message Count: Partial Replication vs. Full Replication Total message size overhead = Total message count (meta-data size + replicated data size) X meta-data size replicated data size Figure: Total message count for Full Replication (Opt-Track-CRP) VS. Partial Replication (Opt-Track). TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 46 / 72Sys

47 Complexity Measures and Performance Message Size: Partial Replication vs. Full Replication (a) when n = 40, p = 12, w rate = 0.5, in the worst case. (b) when n = 40, p = 12, w rate = 0.5, in the real case. Figure: Total message size for Full Replication (Opt-Track-CRP) and Partial Replication (Opt-Track). TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 47 / 72Sys

48 Approximate Causal Consistency Approximate Causal Consistency For some applications where the data size is small (e.g, wall posts in Facebook), the size of the meta-data can be a problem. Can further reduce meta-data overheads at the risk of some (rare) violations of causal consistency. As dependencies age, w.h.p. the messages they represent get delivered and the dependencies need not be carried around and stored. Amount of violations can be made arbitrarily small by controlling a parameter called credits. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 48 / 72Sys

49 Approximate Causal Consistency Approx-Opt-Track Notion of Credits: Case 1 Figure: Reduce the meta-data at the cost of some possible violations of causal consistency. The amount of violations can be made arbitrarily small by controlling a tunable parameter (credits). TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 49 / 72Sys

50 Approximate Causal Consistency Approx-Opt-Track Notion of Credits: Case 2 Figure: Illustration of meta-data reduction when credits are exhausted. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 50 / 72Sys

51 Approximate Causal Consistency Approx-Opt-Track Approx-Opt-Track Integrate the notion of credits into the Opt-Track algorithm, to give an algorithm Approx-Opt-Track [6] that can fine-tune the amount of causal consistency by trading off the size of meta-data overhead. Give three instantiations of credits (hop count, time-to-live, and metric distance) Violation Error Rate: R e = ne m c n e: number of messages applied by the remote replicated sites with violation of causal consistency m c: total number of transmitted messages m Meta-Data Saving Rate: R s = 1- s (cr6=1) m s (Opt Track) m s: message meta-data size, the total size of all the meta-data transmitted over all the processes TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 51 / 72Sys

52 Approximate Causal Consistency Performance Simulation Methodology in Approx-Opt-Track Time interval T e between two consecutive events from 5ms 2000ms. Propagation time T t 100ms 3000ms. Number of processes n is varied from 5 up to 40. w rate is set to 0.2, 0.5, and 0.8, respectively. Replica factor rate p n for partial replication defined as 0.3. Number of variables q used is 100. (?) Hop Count Credit (cr): denotes the hop count available before the entry meta-data ages out and is removed. (?) Initial credit cr is specified from one to a critical value cr 0, with which there is no message transmission violating causal consistency. Each simulation execution runs 600n operation events totally. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 52 / 72Sys

53 Approximate Causal Consistency Performance Violation Error Rates Peers 10 Peers 20 Peers 30 Peers 40 Peers Violation Error Rate Initial Credits Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 53 / 72Sys

54 Approximate Causal Consistency Performance Violation Error Rates Peers 10 Peers 20 Peers 30 Peers 40 Peers Violation Error Rate Initial Credits Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 54 / 72Sys

55 Approximate Causal Consistency Performance Violation Error Rates Violation Error Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 55 / 72Sys

56 Approximate Causal Consistency Performance Critical Initial Credits Figure: Summary of the critical values of cr 0 and cr0:5%. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 56 / 72Sys

57 Approximate Causal Consistency Performance Impact of initial cr on R e cr 0 : major critical initial credit (R e = 0) cr0:5% : minor critical initial credit (R e 0.5%). cr0:5% seems not to highly increase as n. By setting the initial cr to a small finite value but enough, most of the dependencies will become aged and can be removed without violating causal consistency after the associated meta-data is transmitted across a few hops (even for a large number of processes.) The correlation coefficients of cr 0 and n are around The major critical credit values increase as n to avoid causal violations. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 57 / 72Sys

58 Approximate Causal Consistency Performance Average Meta-Data Size (KB) 2.2 Average Meta-Data Size (KB) Peers 10Peers 20Peers 30Peers 40Peers Initial Credits Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 58 / 72Sys

59 Approximate Causal Consistency Performance Average Meta-Data Size (KB) Average Meta-Data Size (KB) Peers 10Peers 20Peers 30Peers 40Peers Initial Credits Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 59 / 72Sys

60 Approximate Causal Consistency Performance Average Meta-Data Size (KB) Average Meta-Data Size (KB) Peers 10Peers 20Peers 30Peers 40Peers Initial Credits Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 60 / 72Sys

61 Approximate Causal Consistency Performance Critical Average Message Meta-Data Size m ave (KB) Figure: Summary of the critical average message meta-data sizes. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 61 / 72Sys

62 Approximate Causal Consistency Performance Critical Average Message Meta-Data Size With increasing number of processes, m ave linearly increases. m ave becomes smaller with a higher w rate for more processes. A read operation will invoke a MERGE function to merge the piggybacked log list. So, a higher read rate may increase the likelihood to make the log size enlarged. Furthermore, although a write operation results in the increase of explicit information, it comes with a PURGE function to delete the redundant information. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 62 / 72Sys

63 Approximate Causal Consistency Performance Message Meta-Data Saving Rate Meta-Data Size Saving Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.2. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 63 / 72Sys

64 Approximate Causal Consistency Performance Message Meta-Data Saving Rate Meta-Data Size Saving Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.5. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 64 / 72Sys

65 Approximate Causal Consistency Performance Message Meta-Data Saving Rate Meta-Data Size Saving Rate Peers 10 Peers 20 Peers 30 Peers 40 Peers Initial Credits Figure: w rate = 0.8. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 65 / 72Sys

66 Approximate Causal Consistency Performance Critical Message Meta-Data Size Saving Rates R s Figure: Summary of the critical message meta-data size saving rates. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 66 / 72Sys

67 Approximate Causal Consistency Performance Critical Message Meta-Data Size Saving Rate Focus on the case of 40 processes: R s is around 40% 60% at a very slight cost of violating causal consistency. R s reaches around 5% 20% without violating causality order in different write rates. This evidence proves that if the initial credit allocation is just a small digit, when the corresponding meta-data is removed, the associated message would already (very likely) have reached its destination. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 67 / 72Sys

68 Conclusions and Future Work Conclusions Opt-Track has a better network capacity utilization and better scalability than Full-Track for causal consistency in partial replication. The meta-data overhead of Opt-Track is linear in n. These improvements increase in higher write-intensive workloads. For the case of 40 processes, the overheads of Opt-Track are only around % of those of Full-Track for different write rates. Opt-Track-CRP can perform better than optp (Baldoni et. al 2006) in full replication. For 40 processes, the overheads of Opt-Track-CRP are only around % of those of optp for different write rates. Showed the advantages of partial replication and provided the conditions under which partial replication can provide less overhead than full replication. Modification of Opt-Track, called Approx-Opt-Track, to provide approximate causal consistency by reducing the size of the meta-data. By controlling a parameter cr, we can trade-off the level of potential inaccuracy by the size of meta-data. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 68 / 72Sys

69 Conclusions and Future Work Future Work Reduce the size of the meta-data for maintaining causal consistency in partially replicated systems. Dynamic and Data-driven Replication Mechanism (Optimize the replication mechanism). Which file? How many replicas? Where? When? (i.e., the replica factor rate p n is a variable.) Hierarchical Causal Consistency Protocol in Partial Replication. A client-cluster model (two-level architecture) for causal consistency under partial replication. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 69 / 72Sys

70 Conclusions and Future Work References I M. Ahamad, G. Neiger, J. Burns, P. Kohli, and P. Hutto. Causal memory: Definitions, implementation and programming. Distributed Computing, 9(1):37 49, R. Baldoni, A. Milani, and S. Tucci-piergiovanni. Optimal propagation based protocols implementing causal memories. Distributed Computing, 18(6): , Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu. OPCAM: optimal algorithms implementing causal memories in shared memory systems. In Proceedings of the 2015 International Conference on Distributed Computing and Networking, ICDCN 2015, Goa, India, January 4-7, 2015, pages 16:1 16:4. TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 70 / 72Sys

71 Conclusions and Future Work References II Min Shen, Ajay D. Kshemkalyani, and Ta Yuan Hsu. Causal consistency for geo-replicated cloud storage under partial replication. In IPDPS Workshops, pages IEEE, A. Kshemkalyani and M. Singhal. Necessary and sufficient conditions on information for causal message ordering and their optimal implementation. Distributed Computing, 11(2):91 111, April Ta Yuan Hsu and Ajay D. Kshemkalyani. Performance of approximate causal consistency for partially replicated systems. In Workshop on Adaptive Resource Management and Scheduling for Cloud Computing (ARMS-CC), pages ACM, TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 71 / 72Sys

72 Conclusions and Future Work Questions Thank You! TaYuan Hsu, Ajay D. Kshemkalyani, MinCausal Shen University ConsistencyofAlgorithms Illinois at Chicago for Partially Replicated and Fully Replicated 72 / 72Sys

An Implementation of Causal Memories using the Writing Semantic

An Implementation of Causal Memories using the Writing Semantic An Implementation of Causal Memories using the Writing Semantic R. Baldoni, C. Spaziani and S. Tucci-Piergiovanni D. Tulone Dipartimento di Informatica e Sistemistica Bell-Laboratories Universita di Roma

More information

arxiv: v2 [cs.dc] 26 Dec 2016

arxiv: v2 [cs.dc] 26 Dec 2016 Timestamps for Partial Replication Zhuolun Xiang 1 and Nitin H. Vaidya 2 1 Computer Science Department 2 Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign arxiv:1611.04022v2

More information

Memory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab.

Memory Consistency. Minsoo Ryu. Department of Computer Science and Engineering. Hanyang University. Real-Time Computing and Communications Lab. Memory Consistency Minsoo Ryu Department of Computer Science and Engineering 2 Distributed Shared Memory Two types of memory organization in parallel and distributed systems Shared memory (shared address

More information

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini

Causal Consistency. CS 240: Computing Systems and Concurrency Lecture 16. Marco Canini Causal Consistency CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models Linearizability

More information

Recall use of logical clocks

Recall use of logical clocks Causal Consistency Consistency models Linearizability Causal Eventual COS 418: Distributed Systems Lecture 16 Sequential Michael Freedman 2 Recall use of logical clocks Lamport clocks: C(a) < C(z) Conclusion:

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley - Nomchin Banga Outline Eventual Consistency: History and Concepts How eventual is eventual consistency?

More information

About the efficiency of partial replication to implement Distributed Shared Memory

About the efficiency of partial replication to implement Distributed Shared Memory About the efficiency of partial replication to implement Distributed Shared Memory Jean-Michel Hélary IRISA Campus de Beaulieu, 35042 Rennes-cedex, France helary@irisa.fr Alessia Milani DIS, Universitá

More information

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris Consistency examples COS 418: Distributed Systems Precept 5 Themis Melissaris Plan Midterm poll Consistency examples 2 Fill out this poll: http://tinyurl.com/zdeq4lr 3 Linearizability 4 Once read returns

More information

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication Important Lessons Lamport & vector clocks both give a logical timestamps Total ordering vs. causal ordering Other issues in coordinating node activities Exclusive access to resources/data Choosing a single

More information

殷亚凤. Consistency and Replication. Distributed Systems [7]

殷亚凤. Consistency and Replication. Distributed Systems [7] Consistency and Replication Distributed Systems [7] 殷亚凤 Email: yafeng@nju.edu.cn Homepage: http://cs.nju.edu.cn/yafeng/ Room 301, Building of Computer Science and Technology Review Clock synchronization

More information

Consistency & Replication

Consistency & Replication Objectives Consistency & Replication Instructor: Dr. Tongping Liu To understand replication and related issues in distributed systems" To learn about how to keep multiple replicas consistent with each

More information

Replication and Consistency

Replication and Consistency Replication and Consistency Today l Replication l Consistency models l Consistency protocols The value of replication For reliability and availability Avoid problems with disconnection, data corruption,

More information

A Suite of Formal Denitions for Consistency Criteria. in Distributed Shared Memories Rennes Cedex (France) 1015 Lausanne (Switzerland)

A Suite of Formal Denitions for Consistency Criteria. in Distributed Shared Memories Rennes Cedex (France) 1015 Lausanne (Switzerland) A Suite of Formal Denitions for Consistency Criteria in Distributed Shared Memories Michel Raynal Andre Schiper IRISA, Campus de Beaulieu EPFL, Dept d'informatique 35042 Rennes Cedex (France) 1015 Lausanne

More information

Process groups and message ordering

Process groups and message ordering Process groups and message ordering If processes belong to groups, certain algorithms can be used that depend on group properties membership create ( name ), kill ( name ) join ( name, process ), leave

More information

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju

Chapter 4: Distributed Systems: Replication and Consistency. Fall 2013 Jussi Kangasharju Chapter 4: Distributed Systems: Replication and Consistency Fall 2013 Jussi Kangasharju Chapter Outline n Replication n Consistency models n Distribution protocols n Consistency protocols 2 Data Replication

More information

Causal Consistency for Data Stores and Applications as They are

Causal Consistency for Data Stores and Applications as They are [DOI: 10.2197/ipsjjip.25.775] Regular Paper Causal Consistency for Data Stores and Applications as They are Kazuyuki Shudo 1,a) Takashi Yaguchi 1, 1 Received: November 3, 2016, Accepted: May 16, 2017 Abstract:

More information

Applying Sequential Consistency to Web Caching

Applying Sequential Consistency to Web Caching Applying Sequential Consistency to Web Caching Francisco J. Torres-Rojas and Esteban Meneses Abstract Web caches have several advantages for reducing the server load, minimizing the network traffic and

More information

Time and distributed systems. Just use time stamps? Correct consistency model? Replication and Consistency

Time and distributed systems. Just use time stamps? Correct consistency model? Replication and Consistency Correct consistency model? Replication and Consistency B COS 58: dvanced Computer Systems Lecture 3 Let s say and B send an op. ll readers see B? ll readers see B? Michael Freedman Some see B and others

More information

Distributed Systems (5DV147)

Distributed Systems (5DV147) Distributed Systems (5DV147) Replication and consistency Fall 2013 1 Replication 2 What is replication? Introduction Make different copies of data ensuring that all copies are identical Immutable data

More information

Multimedia Streaming. Mike Zink

Multimedia Streaming. Mike Zink Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=

More information

Replication of Data. Data-Centric Consistency Models. Reliability vs. Availability

Replication of Data. Data-Centric Consistency Models. Reliability vs. Availability CIS 505: Software Systems Lecture Note on Consistency and Replication Instructor: Insup Lee Department of Computer and Information Science University of Pennsylvania CIS 505, Spring 2007 Replication of

More information

DISTRIBUTED COMPUTER SYSTEMS

DISTRIBUTED COMPUTER SYSTEMS DISTRIBUTED COMPUTER SYSTEMS CONSISTENCY AND REPLICATION CONSISTENCY MODELS Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Consistency Models Background Replication Motivation

More information

Consistency and Replication

Consistency and Replication Consistency and Replication 1 D R. Y I N G W U Z H U Reasons for Replication Data are replicated to increase the reliability of a system. Replication for performance Scaling in numbers Scaling in geographical

More information

Today CSCI Data Replication. Example: Distributed Shared Memory. Data Replication. Data Consistency. Instructor: Abhishek Chandra

Today CSCI Data Replication. Example: Distributed Shared Memory. Data Replication. Data Consistency. Instructor: Abhishek Chandra Today CSCI 5105 Data Replication Examples and Issues Data Consistency Consistency Models Instructor: Abhishek Chandra 2 Data Replication Using multiple copies of same data Why do we need data replication?

More information

Replication in Distributed Systems

Replication in Distributed Systems Replication in Distributed Systems Replication Basics Multiple copies of data kept in different nodes A set of replicas holding copies of a data Nodes can be physically very close or distributed all over

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes. Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated across multiple processes. Consistency models The scenario we will be studying: Some

More information

Causal Consistency and Two-Phase Commit

Causal Consistency and Two-Phase Commit Causal Consistency and Two-Phase Commit CS 240: Computing Systems and Concurrency Lecture 16 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency

More information

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes

Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes Databricks Delta: Bringing Unprecedented Reliability and Performance to Cloud Data Lakes AN UNDER THE HOOD LOOK Databricks Delta, a component of the Databricks Unified Analytics Platform*, is a unified

More information

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to

More information

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast

Coordination 2. Today. How can processes agree on an action or a value? l Group communication l Basic, reliable and l ordered multicast Coordination 2 Today l Group communication l Basic, reliable and l ordered multicast How can processes agree on an action or a value? Modes of communication Unicast 1ç è 1 Point to point Anycast 1è

More information

Distributed Systems Multicast & Group Communication Services

Distributed Systems Multicast & Group Communication Services Distributed Systems 600.437 Multicast & Group Communication Services Department of Computer Science The Johns Hopkins University 1 Multicast & Group Communication Services Lecture 3 Guide to Reliable Distributed

More information

Germán Llort

Germán Llort Germán Llort gllort@bsc.es >10k processes + long runs = large traces Blind tracing is not an option Profilers also start presenting issues Can you even store the data? How patient are you? IPDPS - Atlanta,

More information

Automatic clustering of similar VM to improve the scalability of monitoring and management in IaaS cloud infrastructures

Automatic clustering of similar VM to improve the scalability of monitoring and management in IaaS cloud infrastructures Automatic clustering of similar to improve the scalability of monitoring and management in IaaS cloud infrastructures C. Canali R. Lancellotti University of Modena and Reggio Emilia Department of Engineering

More information

A Parametrized Algorithm that Implements Sequential, Causal, and Cache Memory Consistencies

A Parametrized Algorithm that Implements Sequential, Causal, and Cache Memory Consistencies A Parametrized Algorithm that Implements Sequential, Causal, and Cache Memory Consistencies Ernesto Jiménez b, Antonio Fernández a, Vicent Cholvi c, a Universidad Rey Juan Carlos, 28933 Móstoles, Spain

More information

Correlation based File Prefetching Approach for Hadoop

Correlation based File Prefetching Approach for Hadoop IEEE 2nd International Conference on Cloud Computing Technology and Science Correlation based File Prefetching Approach for Hadoop Bo Dong 1, Xiao Zhong 2, Qinghua Zheng 1, Lirong Jian 2, Jian Liu 1, Jie

More information

Replication. Consistency models. Replica placement Distribution protocols

Replication. Consistency models. Replica placement Distribution protocols Replication Motivation Consistency models Data/Client-centric consistency models Replica placement Distribution protocols Invalidate versus updates Push versus Pull Cooperation between replicas Client-centric

More information

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat

L3.4. Data Management Techniques. Frederic Desprez Benjamin Isnard Johan Montagnat Grid Workflow Efficient Enactment for Data Intensive Applications L3.4 Data Management Techniques Authors : Eddy Caron Frederic Desprez Benjamin Isnard Johan Montagnat Summary : This document presents

More information

A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks

A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks Yannis Matalas 1, Nikolaos D. Dragios 2, and George T. Karetsos 2 1 Digital Media & Internet Technologies Department,

More information

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38

Atomicity. Bailu Ding. Oct 18, Bailu Ding Atomicity Oct 18, / 38 Atomicity Bailu Ding Oct 18, 2012 Bailu Ding Atomicity Oct 18, 2012 1 / 38 Outline 1 Introduction 2 State Machine 3 Sinfonia 4 Dangers of Replication Bailu Ding Atomicity Oct 18, 2012 2 / 38 Introduction

More information

Making Consistency More Consistent: A Unified Model for Coherence, Consistency and Isolation

Making Consistency More Consistent: A Unified Model for Coherence, Consistency and Isolation Making Consistency More Consistent: A Unified Model for Coherence, Consistency and Isolation ABSTRACT Adriana Szekeres University of Washington aaasz@cs.washington.edu Ordering guarantees are often defined

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Replication and Consistency. Fall 2010 Jussi Kangasharju

Replication and Consistency. Fall 2010 Jussi Kangasharju Replication and Consistency Fall 2010 Jussi Kangasharju Chapter Outline Replication Consistency models Distribution protocols Consistency protocols 2 Data Replication user B user C user A object object

More information

Replications and Consensus

Replications and Consensus CPSC 426/526 Replications and Consensus Ennan Zhai Computer Science Department Yale University Recall: Lec-8 and 9 In the lec-8 and 9, we learned: - Cloud storage and data processing - File system: Google

More information

Consistency and Replication

Consistency and Replication Topics to be covered Introduction Consistency and Replication Consistency Models Distribution Protocols Consistency Protocols 1 2 + Performance + Reliability Introduction Introduction Availability: proportion

More information

Summary Cache based Co-operative Proxies

Summary Cache based Co-operative Proxies Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13 Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 Data Stream Processing Topics Model Issues System Issues Distributed Processing Web-Scale Streaming 3 System Issues Architecture

More information

Indirect Communication

Indirect Communication Indirect Communication To do q Today q q Space and time (un)coupling Common techniques q Next time: Overlay networks xkdc Direct coupling communication With R-R, RPC, RMI Space coupled Sender knows the

More information

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary

Consistency and Replication. Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary Consistency and Replication Some slides are from Prof. Jalal Y. Kawash at Univ. of Calgary Reasons for Replication Reliability/Availability : Mask failures Mask corrupted data Performance: Scalability

More information

3. Quality of Service

3. Quality of Service 3. Quality of Service Usage Applications Learning & Teaching Design User Interfaces Services Content Process ing Security... Documents Synchronization Group Communi cations Systems Databases Programming

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

Distributed Systems: Consistency and Replication

Distributed Systems: Consistency and Replication Distributed Systems: Consistency and Replication Alessandro Sivieri Dipartimento di Elettronica, Informazione e Bioingegneria Politecnico, Italy alessandro.sivieri@polimi.it http://corsi.dei.polimi.it/distsys

More information

Important Lessons. Today's Lecture. Two Views of Distributed Systems

Important Lessons. Today's Lecture. Two Views of Distributed Systems Important Lessons Replication good for performance/ reliability Key challenge keeping replicas up-to-date Wide range of consistency models Will see more next lecture Range of correctness properties L-10

More information

Module 7 - Replication

Module 7 - Replication Module 7 - Replication Replication Why replicate? Reliability Avoid single points of failure Performance Scalability in numbers and geographic area Why not replicate? Replication transparency Consistency

More information

Strong Consistency & CAP Theorem

Strong Consistency & CAP Theorem Strong Consistency & CAP Theorem CS 240: Computing Systems and Concurrency Lecture 15 Marco Canini Credits: Michael Freedman and Kyle Jamieson developed much of the original material. Consistency models

More information

Distributed Systems (ICE 601) Fault Tolerance

Distributed Systems (ICE 601) Fault Tolerance Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Introduction Failure Model Fault Tolerance Models state machine primary-backup Class Overview Introduction Dependability availability reliability

More information

Low Cost Coherence Protocol for DSM Systems with Processor Consistency 1

Low Cost Coherence Protocol for DSM Systems with Processor Consistency 1 Low Cost Coherence Protocol for DSM Systems with Processor Consistency Jerzy Brzeziński Michał Szychowiak Institute of Computing Science Poznań University of Technology Piotrowo 3a, 6-965 Poznań, POLAND

More information

June Gerd Liefländer System Architecture Group Universität Karlsruhe, System Architecture Group

June Gerd Liefländer System Architecture Group Universität Karlsruhe, System Architecture Group Distributed Systems 14 Consistency June-29-2009 Gerd Liefländer System Architecture Group 2009 Universität Karlsruhe, System Architecture Group 1 Overview Outline Motivation & Introduction Consistency

More information

Consistency and Replication (part b)

Consistency and Replication (part b) Consistency and Replication (part b) EECS 591 Farnam Jahanian University of Michigan Tanenbaum Chapter 6.1-6.5 Eventual Consistency A very weak consistency model characterized by the lack of simultaneous

More information

Applications of Paxos Algorithm

Applications of Paxos Algorithm Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Non-blocking Array-based Algorithms for Stacks and Queues!

Non-blocking Array-based Algorithms for Stacks and Queues! Non-blocking Array-based Algorithms for Stacks and Queues! Niloufar Shafiei! Department of Computer Science and Engineering York University ICDCN 09 Outline! Introduction! Stack algorithm! Queue algorithm!

More information

Consistency and Replication. Why replicate?

Consistency and Replication. Why replicate? Consistency and Replication Today: Consistency models Data-centric consistency models Client-centric consistency models Lecture 15, page 1 Why replicate? Data replication versus compute replication Data

More information

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer? Parallel and Distributed Systems Instructor: Sandhya Dwarkadas Department of Computer Science University of Rochester What is a parallel computer? A collection of processing elements that communicate and

More information

Trade- Offs in Cloud Storage Architecture. Stefan Tai

Trade- Offs in Cloud Storage Architecture. Stefan Tai Trade- Offs in Cloud Storage Architecture Stefan Tai Cloud computing is about providing and consuming resources as services There are five essential characteristics of cloud services [NIST] [NIST]: http://csrc.nist.gov/groups/sns/cloud-

More information

Unit 2 Packet Switching Networks - II

Unit 2 Packet Switching Networks - II Unit 2 Packet Switching Networks - II Dijkstra Algorithm: Finding shortest path Algorithm for finding shortest paths N: set of nodes for which shortest path already found Initialization: (Start with source

More information

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016 Consistency in Distributed Storage Systems Mihir Nanavati March 4 th, 2016 Today Overview of distributed storage systems CAP Theorem About Me Virtualization/Containers, CPU microarchitectures/caches, Network

More information

DISTRIBUTED SHARED MEMORY

DISTRIBUTED SHARED MEMORY DISTRIBUTED SHARED MEMORY COMP 512 Spring 2018 Slide material adapted from Distributed Systems (Couloris, et. al), and Distr Op Systems and Algs (Chow and Johnson) 1 Outline What is DSM DSM Design and

More information

Lecture 6 Consistency and Replication

Lecture 6 Consistency and Replication Lecture 6 Consistency and Replication Prof. Wilson Rivera University of Puerto Rico at Mayaguez Electrical and Computer Engineering Department Outline Data-centric consistency Client-centric consistency

More information

Consistency and Replication. Why replicate?

Consistency and Replication. Why replicate? Consistency and Replication Today: Introduction Consistency models Data-centric consistency models Client-centric consistency models Thoughts for the mid-term Lecture 14, page 1 Why replicate? Data replication:

More information

Self-healing Data Step by Step

Self-healing Data Step by Step Self-healing Data Step by Step Uwe Friedrichsen (codecentric AG) NoSQL matters Cologne, 29. April 2014 @ufried Uwe Friedrichsen uwe.friedrichsen@codecentric.de http://slideshare.net/ufried http://ufried.tumblr.com

More information

A Verification Approach for GALS Integration of Synchronous Components

A Verification Approach for GALS Integration of Synchronous Components GALS 2005 Preliminary Version A Verification Approach for GALS Integration of Synchronous Components F. Doucet, M. Menarini, I. H. Krüger and R. Gupta 1 Computer Science and Engineering University of California,

More information

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei

Non-blocking Array-based Algorithms for Stacks and Queues. Niloufar Shafiei Non-blocking Array-based Algorithms for Stacks and Queues Niloufar Shafiei Outline Introduction Concurrent stacks and queues Contributions New algorithms New algorithms using bounded counter values Correctness

More information

Chapter 7 CONCLUSION

Chapter 7 CONCLUSION 97 Chapter 7 CONCLUSION 7.1. Introduction A Mobile Ad-hoc Network (MANET) could be considered as network of mobile nodes which communicate with each other without any fixed infrastructure. The nodes in

More information

CS6450: Distributed Systems Lecture 11. Ryan Stutsman

CS6450: Distributed Systems Lecture 11. Ryan Stutsman Strong Consistency CS6450: Distributed Systems Lecture 11 Ryan Stutsman Material taken/derived from Princeton COS-418 materials created by Michael Freedman and Kyle Jamieson at Princeton University. Licensed

More information

Draft Standard for Verilog. Randomization and Constraints. Extensions

Draft Standard for Verilog. Randomization and Constraints. Extensions Draft Standard for Verilog Randomization and Constraints Extensions Copyright 2003 by Cadence Design Systems, Inc. This document is an unapproved draft of a proposed IEEE Standard. As such, this document

More information

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi 1 Lecture Notes 1 Basic Concepts Anand Tripathi CSci 8980 Operating Systems Anand Tripathi CSci 8980 1 Distributed Systems A set of computers (hosts or nodes) connected through a communication network.

More information

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs 1 Anand Tripathi CSci 8980 Operating Systems Lecture Notes 1 Basic Concepts Distributed Systems A set of computers (hosts or nodes) connected through a communication network. Nodes may have different speeds

More information

This talk is about. Data consistency in 3D. Geo-replicated database q: Queue c: Counter { q c } Shared database q: Queue c: Counter { q c }

This talk is about. Data consistency in 3D. Geo-replicated database q: Queue c: Counter { q c } Shared database q: Queue c: Counter { q c } Data consistency in 3D (It s the invariants, stupid) Marc Shapiro Masoud Saieda Ardekani Gustavo Petri This talk is about Understanding consistency Primitive consistency mechanisms How primitives compose

More information

Fault-Tolerant Computer Systems ECE 60872/CS Recovery

Fault-Tolerant Computer Systems ECE 60872/CS Recovery Fault-Tolerant Computer Systems ECE 60872/CS 59000 Recovery Saurabh Bagchi School of Electrical & Computer Engineering Purdue University Slides based on ECE442 at the University of Illinois taught by Profs.

More information

Fork Sequential Consistency is Blocking

Fork Sequential Consistency is Blocking Fork Sequential Consistency is Blocking Christian Cachin Idit Keidar Alexander Shraer May 14, 2008 Abstract We consider an untrusted server storing shared data on behalf of clients. We show that no storage

More information

Eventual Consistency Today: Limitations, Extensions and Beyond

Eventual Consistency Today: Limitations, Extensions and Beyond Eventual Consistency Today: Limitations, Extensions and Beyond Peter Bailis and Ali Ghodsi, UC Berkeley Presenter: Yifei Teng Part of slides are cited from Nomchin Banga Road Map Eventual Consistency:

More information

Application of SDN: Load Balancing & Traffic Engineering

Application of SDN: Load Balancing & Traffic Engineering Application of SDN: Load Balancing & Traffic Engineering Outline 1 OpenFlow-Based Server Load Balancing Gone Wild Introduction OpenFlow Solution Partitioning the Client Traffic Transitioning With Connection

More information

CSE 5306 Distributed Systems. Consistency and Replication

CSE 5306 Distributed Systems. Consistency and Replication CSE 5306 Distributed Systems Consistency and Replication 1 Reasons for Replication Data are replicated for the reliability of the system Servers are replicated for performance Scaling in numbers Scaling

More information

Distributed Synchronization: outline

Distributed Synchronization: outline Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots This presentation is based on the book: Distributed operating-systems

More information

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA

Priya Narasimhan. Assistant Professor of ECE and CS Carnegie Mellon University Pittsburgh, PA OMG Real-Time and Distributed Object Computing Workshop, July 2002, Arlington, VA Providing Real-Time and Fault Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS Carnegie

More information

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma

Overlay and P2P Networks. Unstructured networks. Prof. Sasu Tarkoma Overlay and P2P Networks Unstructured networks Prof. Sasu Tarkoma 20.1.2014 Contents P2P index revisited Unstructured networks Gnutella Bloom filters BitTorrent Freenet Summary of unstructured networks

More information

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems

2. Time and Global States Page 1. University of Freiburg, Germany Department of Computer Science. Distributed Systems 2. Time and Global States Page 1 University of Freiburg, Germany Department of Computer Science Distributed Systems Chapter 3 Time and Global States Christian Schindelhauer 12. May 2014 2. Time and Global

More information

INF-5360 Presentation

INF-5360 Presentation INF-5360 Presentation Optimistic Replication Ali Ahmad April 29, 2013 Structure of presentation Pessimistic and optimistic replication Elements of Optimistic replication Eventual consistency Scheduling

More information

Byzantine Consensus in Directed Graphs

Byzantine Consensus in Directed Graphs Byzantine Consensus in Directed Graphs Lewis Tseng 1,3, and Nitin Vaidya 2,3 1 Department of Computer Science, 2 Department of Electrical and Computer Engineering, and 3 Coordinated Science Laboratory

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen

Overlay and P2P Networks. Unstructured networks. PhD. Samu Varjonen Overlay and P2P Networks Unstructured networks PhD. Samu Varjonen 25.1.2016 Contents Unstructured networks Last week Napster Skype This week: Gnutella BitTorrent P2P Index It is crucial to be able to find

More information

Job Re-Packing for Enhancing the Performance of Gang Scheduling

Job Re-Packing for Enhancing the Performance of Gang Scheduling Job Re-Packing for Enhancing the Performance of Gang Scheduling B. B. Zhou 1, R. P. Brent 2, C. W. Johnson 3, and D. Walsh 3 1 Computer Sciences Laboratory, Australian National University, Canberra, ACT

More information

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers

Clock Synchronization. Synchronization. Clock Synchronization Algorithms. Physical Clock Synchronization. Tanenbaum Chapter 6 plus additional papers Clock Synchronization Synchronization Tanenbaum Chapter 6 plus additional papers Fig 6-1. In a distributed system, each machine has its own clock. When this is the case, an event that occurred after another

More information

What Came First? The Ordering of Events in

What Came First? The Ordering of Events in What Came First? The Ordering of Events in Systems @kavya719 kavya the design of concurrent systems Slack architecture on AWS systems with multiple independent actors. threads in a multithreaded program.

More information

Device-to-Device Networking Meets Cellular via Network Coding

Device-to-Device Networking Meets Cellular via Network Coding Device-to-Device Networking Meets Cellular via Network Coding Yasaman Keshtkarjahromi, Student Member, IEEE, Hulya Seferoglu, Member, IEEE, Rashid Ansari, Fellow, IEEE, and Ashfaq Khokhar, Fellow, IEEE

More information

Assignment 5. Georgia Koloniari

Assignment 5. Georgia Koloniari Assignment 5 Georgia Koloniari 2. "Peer-to-Peer Computing" 1. What is the definition of a p2p system given by the authors in sec 1? Compare it with at least one of the definitions surveyed in the last

More information

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting

Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Architecture-Dependent Tuning of the Parameterized Communication Model for Optimal Multicasting Natawut Nupairoj and Lionel M. Ni Department of Computer Science Michigan State University East Lansing,

More information

Providing Real-Time and Fault Tolerance for CORBA Applications

Providing Real-Time and Fault Tolerance for CORBA Applications Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing

More information