! Replication comes with consistency cost: ! Reasons for replication: Better performance and availability. ! Replication transforms client-server

Size: px

Start display at page:

Download "! Replication comes with consistency cost: ! Reasons for replication: Better performance and availability. ! Replication transforms client-server"

Eugenia Foster
5 years ago
Views:

1 in Replication Continuous and Haifeng Yu CPS 212 Fall 2002! Replication comes with consistency cost:! Reasons for replication: Better performance and availability! Replication transforms -server communication to server-server communication: Decrease performance server Decrease availability server 2 Strong and Optimistic! Traditionally, two choices for consistency level: Strong consistency: Strictly in sync Optimistic consistency: No guarantee at all Associated tradeoffs with each model / Performance / Scalability Optimistic Strong Problems with Binary Choice! Strong consistency incurs prohibitive overheads for many WAN apps Replication may even decrease performance, availability and scalability relative to a single server!! Optimistic consistency provides no consistency guarantee at all Resulting in upset users: Unbounded reservation conflicts Potentially render the app unusable: If traffic data is more than 1 hour stale, probably of little use! Applications cannot tune consistency level based on its environment Need to adapt to, service and network characteristics 3 4 Continuous! is continuous rather than binary for many WAN apps These apps can benefit from exploiting the consistency spectrum between strong and optimistic consistency.! Many ways: Quantifying Staleness (TTL in web caching): Invalidate Limit number of locally buffered writes / Performance / Scalability Optimistic / Performance / Scalability buffered updates To Other Replicas Strong Continuous 5 6 1

2 ! Applications: Web caching Airline reservation Distributed games Shared editor Applications?! Non-Applications: Some scientific computing problems Banking system Any application that has binary output! Application s nature determines whether continuous consistency is applicable Trading for Performance! Airline reservation: running at Berkeley, Utah, Duke Throughput (updates/sec) Strong % 50% 100% Optimistic [Yu 02, TOCS] 7 8 The Cost of Increased Performance! Increased performance comes with a cost Adaptively trade consistency for performance based on, network, and service conditions Resv. Conflict Rate 25% 20% 15% 10% 5% 0% 0% 20% 40% 60% 80% 100% Model vs. Protocol! Continuous consistency model is a spec.! Protocol is anything that can enforce the spec. Corollary: Strong consistency protocol is a protocol for any model! Many protocols for a specific model, some are good, others are not 9 10 Designing a Continuous Model! Model is a spec, thus quantifying consistency (in a bad way) is trivial! Only applications know its definition of consistency Airline reservation vs. distributed games! What is a good continuous consistency model? Can be used by diverse apps Practical Distributed Consensus and Leader Election! What does continuous consistency mean? Allow at most k decision values Allow at most k leaders! Helps overcome some impossibilities Unique decision value requires ½ majority K decision values allow any partition with 1/(k + 1) nodes to decide

3 Group Membership Service! Def: Keep track of which nodes belong to which group! Traditionally, group membership only maintain a single group Primary-partition membership services Corresponds to strong consistency! Recently, partitionable membership services Still active area of research Corresponds to optimistic consistency! Continuous consistency: Allow at most k groups Again, helps overcome the ½ majority limitation Continuous Summary! WAN replication needs dynamically tunable consistency! Tradeoff between consistency and performance! How to design a continuous consistency model! Continuous consistency in other context! Next: What is?! No well-accepted availability metric for Internet services! Uptime metric can be misleading for Internet services Server may be inaccessible because of network partition! Available: present or ready for immediate use From Webster s Collegiate Dictionary What does immediate mean? Time-out! User satisfaction is not binary Perform-ability What if a partial result is returned before time-out? What if the result is sent back after an hour, or a day? is related to performance! Performability = reward function (quality and timeliness of result)! Determining reward function is hard!! = (accepted accesses) / (submitted accesses) Implicit time-out in the definition of an Internet Service! We use user-observed availability in our study: = (accepted accesses) / (submitted accesses) 2% [Chandra et.al., USITS 01] Server reject due to server failure 0.1% [MS press release,jan 01] Effects of Replication < 2% Replica reject communication to maintain consistency failed reject > 0.1%! may force a replica to reject an otherwise acceptable request Network Failure Rate Replica Rejection Rate Replica

4 Limitations of Strong : Replicas : Clients Option 1: accept reads accept reads Option 2: accept reads reject reads allow replica to buffer 5 writes Effects of Continuous Option 1: accept reads accept reads New Option 1: accept reads accept reads accept first 10 writes accept first 5 writes allow replica to buffer 5 writes Effects of Continuous Option 2: accept reads reject reads New Option 2: accept reads accept first few reads accept first 5 writes Impact is Inherent! Hard bound always exist! We always know the to end points, but may not know the exact shape of the curve Hard Bound 100% 0% 100% Effects of Protocol! Achieved availability also depends on protocol Design better protocols Job of system designers Upper Bound Protocol A Optimizations! Technique should not be tied to model! Focus on two techniques: Retiring replicas Aggressive write propagation Protocol B

5 Limitations of Strong : Replicas : Clients Retiring Replicas! Obviously, such decision may not be optimal unless we have future knowledge Importance of prediction! Even with future knowledge, it is hard Option 1: accept reads accept reads Option 2: accept reads reject reads! In option 2, all replicas much reach an agreement Leader election We are experiencing partitions One option: Voting What if we don t have majority? Aggressive Write Propagation! Applicable to continuous consistency! Continuous consistency gives us buffers that can be utilized in case of network partition! Keep the buffer empty: Cannot predict the occurrence of network partitions Propagate writes more aggressively Cut down the amount of inconsistency accumulated in times of good connectivity Effects of Aggressive Propagation! Baseline: Propagate writes only when necessary (lazily)! Aggressive: When necessary and every 3 seconds replicas with measured faultload Avail Upper Bound Aggressive Baseline From [Yu 01, SOSP] More Aggressive Propagation! Aggressive write propagation does not work in all cases! optimizations can incur more communication Best availability achieved when we use a strong consistency protocol! Speaks of availability / performance tradeoffs of Other Systems! Consensus and leader election Blocks without majority! Group membership Blocks without majority! Relaxing consistency enables them to make progress Open Question: But will these systems still be useful?

6 ! definition Summary! Inherent impact of consistency on availability! also depends on consistency protocols! optimizations: Replica retirement Aggressive write propagation Why can we easily approach the upper bound?! Simple protocols in our study can approach the upper bound closely Remember reaching the upper bound in general needs future knowledge! Related to the characteristics of the faultloads we measured and simulated Most partitions are singleton partitions Most transitions are: fully-connected singleton partition fully-connected! These characteristics are consistent with Internet hierarchical architecture Dual Effects of Replication Scale on! may force a replica to reject a request! Adding more replicas: Network Failure Rate Replica Rejection Rate! = (1 - Network Failure Rate) * ( 1 - Rejection Rate) Too large or too small replication scale can hurt availability Optimal Replication Scale! Optimal replication scale: Adding more replicas can hurt! Upper Bound Increase in replica rejection rate outweighs decrease in network failure rate! Optimal replication scale depends on level Network failure rate among replicas Number of Replicas Failure Rate = 1%, Numerical Error = 250 Failure Rate = 1%, Numerical Error = 0 Failure Rate = 5%, Numerical Error = 250 Failure Rate = 5%, Numerical Error =

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality

Dynamically Provisioning Distributed Systems to Meet Target Levels of Performance, Availability, and Data Quality Amin Vahdat Department of Computer Science Duke University 1 Introduction Increasingly,