Reliable Distribution of Data Using Replicated Web Servers

Size: px

Start display at page:

Download "Reliable Distribution of Data Using Replicated Web Servers"

Arlene Kelly
5 years ago
Views:

1 Reliable Distribution of Data Using Replicated Web Servers Moreno Marzolla Dipartimento di Informatica Università Ca' Foscari di Venezia via Torino 155, Mestre (ITALY)

2 Talk Outline Introduction Fault-tolerant Data Retrieval Reliability Evaluation Conclusions and Future Works Moreno Marzolla HADIS'05, Copenhagen, aug 22,

3 Introduction Accessing large documents over a network is a challenging problem for several issues Performance Security Reliability We consider here the reliability issue How to efficiently fetch large data files over unreliable media for read-only datasets Moreno Marzolla HADIS'05, Copenhagen, aug 22,

4 Network failure What do we mean by unreliable media? Server failure Moreno Marzolla HADIS'05, Copenhagen, aug 22,

5 Reliability Model Links (or servers) may fail at any moment Failed components simply do not deliver any data (i.e., no byzantine failures) They deliver correct data until they crash Failures may be transient or permanent Moreno Marzolla HADIS'05, Copenhagen, aug 22,

6 Usage scenario We consider the problem of downloading large documents from WEB servers Documents are fully replicated among different, geographically distributed WEB servers Data is accessed using standard HTTP/1.1 protocol Moreno Marzolla HADIS'05, Copenhagen, aug 22,

7 Possible solution Data Redundancy Add redundancy (e.g. parity information) to data delivered Client computes missing data from redundant information Example: RAID-like solution Dataset 1 Dataset 2 Dataset 3 Parity Moreno Marzolla HADIS'05, Copenhagen, aug 22,

8 Problems Traditional RAID (RAID-5, parity-based) only tolerates a single dataset failure Can be improved if different RAID layouts are hierarchically combined Can be improved if sophisticated Error Correcting Codes are employed In case of failures client needs to compute missing informations Can be CPU-intensive; not applicable if client has limited computing power (eg, mobile device) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

9 Proposed Approach We consider a document W of size W which is replicated among N WEB servers S 0, S 1,... S N-1 User selects a parameter K, 1 K N We prepare requests R 0, R 1,...R N-1 to be sent to S 0, S 1,...S N-1 respectively, such that: Any K replies are sufficient to reconstruct W Moreno Marzolla HADIS'05, Copenhagen, aug 22,

10 Example N=5, K=5 N=5, K=4 N=5, K=3 N=5, K=2 Moreno Marzolla HADIS'05, Copenhagen, aug 22,

11 Some properties The size (number of bytes) of each request is The total size of all requests is then (Almost) computation-free from client side Trivial deployment of replica It's just the same file Feedback-free Moreno Marzolla HADIS'05, Copenhagen, aug 22,

12 Analysis We consider a very simple model From the user's perspective, from a given connection either data is coming or not Each connection is modeles as a two-state, continuous-time Markov Chain 0 Idle (no data coming) 1 Active (data coming at rate Bw) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

13 System model N-1 N-1 1 Bw 0 Bw 1 Bw 2 Bw N-1 Moreno Marzolla HADIS'05, Copenhagen, aug 22,

14 Analysis / 1 Let T N,K (W) be the time needed to download W from N WEB servers with parameter K We want to compute: (probability of downloading W from at least K out of N servers in time at most t) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

15 Analysis / 2 We write: Moreno Marzolla HADIS'05, Copenhagen, aug 22,

16 Analysis / 3 Let D j denote the minimum time needed to download request R j from server S j Let O j (t) denote the cumulative time spent in state 1 by server S j during the time interval [0,t) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

17 Analysis / 4 Then, we have: I P is the indicator function for predicate P: I P =1 iff P is true Moreno Marzolla HADIS'05, Copenhagen, aug 22,

18 Analysis / 5 The distribution of O j (t) is the Operational Time Distribution of the associated Markov Chain Pr( O j (t) < D j ) can be evaluated numerically using algorithms developed by Rubino and Sericola [IEEE Trans. Comp., 1993] Moreno Marzolla HADIS'05, Copenhagen, aug 22,

19 Settings Moreno Marzolla HADIS'05, Copenhagen, aug 22,

20 Parameters Moreno Marzolla HADIS'05, Copenhagen, aug 22,

21 Results / 1 (5 fast & good) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

22 Results / 2 (4 fast & good, 1 slow & poor) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

23 Results / 3 (2 fast & good, 3 slow & poor) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

24 Results / 4 (2 fast&poor, 2 slow&good, 1 slow&very poor) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

25 Conclusions We proposed a simple solution to provide a high degree of fault-tolerance to data retrieval using the standard WEB infrastructure Feedback-free Automatically selects the K fastest servers without the need for complex protocols Almost no computations required on client or server side (suitable for thin clients) Moreno Marzolla HADIS'05, Copenhagen, aug 22,

26 What's next? How do we select the value for K? From past measurements... You know no less than K servers can be reached......need something better... Need a compromise between reliability and redundancy K=1 maximum reliability, but wastes bandwidth K=N minimum reliability, maximum net efficiency Moreno Marzolla HADIS'05, Copenhagen, aug 22,

27 Moreno Marzolla HADIS'05, Copenhagen, aug 22,

28 Applications Data delivery over wireless networks At least K out of N servers can be reached Moreno Marzolla HADIS'05, Copenhagen, aug 22,

29 Definition of requests Algorithm 1 Computation of R 0,R 1,...R N-1 Require: K, 1 K N Ensure: R i is the request for server S i fragsize := W /N t := 0 R 0 := R 1 :=... := R N-1 := {}; for i = 0 to N-1 do W i := W[i fragsize, (i+1) fragsize-1] for j = 1 to N-K+1 do R t := R t + W i t := (t + 1) mod N end for end for Moreno Marzolla HADIS'05, Copenhagen, aug 22,

Valutazione delle prestazioni di Architetture Software con specifica UML tramite modelli di simulazione Moreno Marzolla

Valutazione delle prestazioni di Architetture Software con specifica UML tramite modelli di simulazione Moreno Marzolla Dipartimento di Informatica Università Ca' Foscari di Venezia marzolla@dsi.unive.it