Analysis of Web Caching Architectures: Hierarchical and Distributed Caching

Size: px

Start display at page:

Download "Analysis of Web Caching Architectures: Hierarchical and Distributed Caching"

Theodora Andrews
6 years ago
Views:

1 Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez Christian Spanner Ernst W. Biersack Abstract In this paper we compare the performance of different caching architectures. We derive analytical models to evaluate and caching in terms of client s perceived latency, bandwidth usage, load on the caches, and disk space. Hierarchical caching has short connection and low bandwidth usage, however, it requires high performance caches at the top-levels of the hierarchy. On the other hand, caching has short transmission times and does not require intermediate cache levels. With caching the bandwidth usage in the network is higher, however, the traffic is better, with more bandwidth in the less-congested lower network levels. Additionally, we study a hybrid scheme where a certain number of caches cooperate at every level of the caching hierarchy using caching. We find that a well configured hybrid scheme can combine the advantages of both and caching, reducing the retrieval latency, the bandwidth usage, and the load in the caches. Depending on the hybrid caching architecture, the current load in the caches/network, and the document size, there is an optimal number of caches that should cooperate at each network level to minimize the overall retrieval latency. Based on analytical and experimental results, we propose small modifications of the existing cache-sharing schemes to better determine the degree of cooperation at every caching level. I. INTRODUCTION The World Wide Web provides simple access to a wide range of information and services. As a result, the Web has become the most successful application in the Internet. However, the exponential growth in demand experienced during the last years has not been followed by the necessary upgrade in the network/servers, therefore, clients experience frustrating delays when accessing Web pages. In order to keep the Web attractive, the experienced latencies must be maintained under a tolerable limit. One way to reduce the experienced latencies, the server s load, and the Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack are with Institut EURECOM, Sophia Antipolis, France. (frodrigue, spanner, erbi@eurecom.frg). Pablo Rodriguez is supported by the European Commission in form of a TMR (Training and Mobility for Researchers) fellowship. EURECOM s research is partially supported by its industrial partners: Ascom, Cegetel, France Telecom, Hitachi, IBM France, Motorola, Swisscom, Texas Instruments, and Thomson CSF. Preliminary results of this work were presented at the 4th International Caching Workshop, San Diego. March, 999 Submitted to Transactions on Networking, May 99 network congestion is to store multiple copies of the same Web documents in geographically dispersed Web caches. As most of the Web content is rather static, the idea of Web caching at the application layer is not new. The first WWW browsers, e.g. Mosaic [7], were able to cache Web objects for later reference, thus, reducing the bandwidth used for Web traffic and the latency to the users. Web caching rapidly extended from a local cache used by a single browser to a shared cache serving all the clients from a certain institution. Unfortunately, since the number of clients connected to a single cache can be rather small and the amount of information available on the Web is rapidly increasing, the cache s performance can be quite modest. The hit rate, i.e. the percentage of requests that can be served from previously cached document copies, at an institutional cache only ranges between 3% and 5% [8]. The hit rate of a Web cache can be increased significantly by sharing the interests of a large community [6]; the more people accessing the cache, the higher the probability that a given document has already been requested, and is present in the cache. The most common approach to group a large client community is setting up a caching hierarchy with institutional, regional, and national caches [6]. Recently, several researchers have proposed to let institutional caches cooperate directly in a caching scheme with no intermediate cache levels [5][23]. With caching caches at placed at different network levels. At the bottom level of the hierarchy there are the client caches. When a request is not satisfied by the client cache, the request is redirected to the institutional cache. If the document is not found at the institutional level, the request is then forwarded to the regional cache which in turn forwards unsatisfied requests to the national cache. If the document is not found at any cache level, the national cache contacts directly the origin server. When the document is found, either at a cache or at the origin server, it travels down the hierarchy, leaving a copy at each of the intermediate caches. Further requests for the same document travel up the caching hierarchy until the document is hit at any cache level. Hierarchical caching is already a fact of life in much of the Internet [2]. However, there are several problems associated with a caching hierarchy: i) every hierarchy level may introduce additional

2 delays [23] [6], ii) higher level caches may become bottlenecks and have long queuing delays, and iii) multiple document copies are stored at different cache levels. With caching no intermediate caches are set up, and there are only institutional caches that serve each others misses. In order to decide from which institutional cache to retrieve a miss document, institutional caches keep metadata information about the content of every other cooperating institutional cache. To make the distribution of the metadata information more efficient and scalable, a distribution can be used [5] [23]. However, the hierarchy is only used to distribute information about the location of the documents and not to store document copies. In this paper we develop analytical models to study the performance of a pure scheme and compare it with the performance of a pure scheme. Then, we consider a hybrid scheme where caches cooperate at every level of a caching hierarchy using caching. We contrast,, and hybrid caching according to the latency incurred to retrieve a document, the bandwidth usage inside every network provider, the load in the caches, and the disk space requirements. We find that caching has lower connection times than caching and, thus, placing redundant copies at intermediate cache levels reduces the connection time. We also find that caching has lower transmission times than caching since most of the traffic flows through the less congested low network levels. In addition to the latency analysis, we study the bandwidth used by and caching. We find that caching has higher bandwidth usage than caching, however, the network traffic is better, using more bandwidth in the low network levels. We derive models to calculate the needed storage capacity to store all documents requested at a certain cache level (i.e., infinite cache capacity) depending on the document request distribution and the number of incoming requests. We obtain that the capacity needed to store all accessed documents at the top-level cache of a caching hierarchy is about several tens of GBytes, while the capacity needed in an institutional cache is about several GBytes (Section IV- F). With caching there are only institutional caches and no additional disk space is required at intermediate network levels. However, a large scale deployment of caching encounters several problems (i.e., high connection times, higher bandwidth usage, administrative issues, etc.). In a smaller scale, where close institutional caches are interconnected through fast links and bandwidth is abundant, caching performs very well with no additional intermediate cache levels. Our analysis of the hybrid scheme shows that the latency experienced by the clients varies depending on the number of caches that cooperate at every network level. Based on analytical results we determine the optimal number of caches that should cooperate at every network level to minimize the latency experienced by the clients. We find that a hybrid scheme with the optimal number of cooperating caches at every level, can combine the advantages of and caching, reducing the latency, the bandwidth usage, and the load in the caches. Then, we perform experiments to better understand the parameters that should determine the degree of cooperation between caches at every level. We study different schemes to select among all the caches that contain a document copy. We find that a round-trip-time (RTT) selection often provides an incorrect estimate of the fastest cache. Web caching happens at the application-level, and thus, a network-level measurement, i.e. RTT, is not enough to estimate the actual retrieval latency from a cache. Distant caches with high RTT but which are lightly loaded may offer lower retrieval latencies than other closer caches with small RTT which are highly congested. Thus, the degree of cooperation between caches at every level should be dynamically determined depending on the hybrid caching architecture, the document s size, and the current cache/network load. Based on experimental results we find that a cache-sharing scheme that selects the fastest cache based on the application-level rate of the caches, can have latency savings up to 8% compared to a selection based on the RTT. The rest of the paper is organized as follows. In Section I-A we discuss some previous work and different approaches to and caching. In Section II we describe our specific model for analyzing and caching. In Section III we provide latency analysis for and caching. In Section IV we present a numerical comparison of both caching architectures and also consider bandwidth, disk space, and cache s load. In Section V we analyze the hybrid scheme. In Section VI we propose small variations of the current cache-sharing schemes to dynamically determine the number of cooperating caches. In Section VII we summarize our findings and conclude the paper. A. Related Work Hierarchical Web caching was first proposed in the Harvest project [6] to share the interests of a large community of clients. In the context of caching, the Harvest group also designed the Internet Cache Protocol (ICP) [26], which supports discovery and retrieval of documents from neighboring caches as well as par-

3 ent caches. Other approach to caching is the Cache Array Routing Protocol (CARP) [24], which divides the URL-space among an array of loosely coupled caches and lets each cache store only the documents whose URL are hashed to it. In [2], Karger et al. propose a similar approach to CARP which uses DNS severs to resolve the URL-space and allows to replicate popular content in several caches. Povey and Harrison also proposed a Internet cache [5]. In their scheme upper level caches are replaced by directory servers which contain location hints about the documents kept at every cache. A metadata-hierarchy is used to make the distribution of these location hints more efficient and scalable. Tewari et al. proposed a similar approach to implement a fully Internet cache where location hints are replicated locally at the institutional caches [23]. In the central directory approach (CRISP) [], a central mapping service ties together a certain number of caches. In Summary Cache [9], Cache Digest [9], and the Relais project [4] caches inter-exchange messages indicating their content, and keep local directories to facilitate finding documents in other caches. Concerning a hybrid scheme, ICP allows for cache cooperation at every level of a caching hierarchy. The document is fetched from the parent/neighbor cache with a document copy that has the lowest RTT [26]. Rabinovich et al. [6] proposed to limit the cooperation between neighbor caches to avoid obtaining documents from distant or slower caches, which could have been retrieved directly from the origin server at a lower cost. A. Network Model II. THE MODEL As shown in Figure, the Internet connecting the server and the receivers can be modeled as a hierarchy of ISPs, each ISP with its own autonomous administration. Regional Network National Network International Path Regional Network Institutional Network Institutional Network Institutional Network Institutional Network Clients Fig.. Network topology National Network Origin Servers We shall make the reasonable assumption that the Internet hierarchy consists of three tiers of ISPs: institutional networks, regional networks, and national backbones. All of the clients are connected to the institutional networks; the institutional networks are connected to the regional networks; the regional networks are connected to the national networks. The national networks are also connected, sometimes by transoceanic links. We shall focus on a model with two national networks, with one of the national networks containing all of the clients and the other national network containing the origin servers. H H National cache C N access link Regional caches C N C R Router O Institutional caches z International Path C R C I C Clients... Fig. 2. The tree model. Caches placement. C I Origin Server We model the underlying network topology as a full O-ary tree, as shown in Figure 2. Let O be the nodal outdegree of the tree. Let H be the number of network links between the root node of a national network and the root node of a regional network. H is also the number of links between the root node of a regional network and the root node of an institutional network. Let z be the number of links between a origin server and root node (i.e., the international path). Let l be the level of the tree. l 2H +z, where l = is the institutional caches and l = 2H + z is the origin server. We assume that bandwidth is homogeneous within each ISP, i.e. each link within an ISP has the same transmission rate. Let C I, C R, and C N be the transmission rate of the links at the institutional, regional, and national networks. Let C be the bottleneck rate on the international path. B. Document Model Denote N the total number of documents in the WWW. Denote S the size of a certain document. We assume that documents change periodically every update period, thus, documents are removed from the caches every seconds. Requests for document i, i N in an institutional cache are Poisson with average request rate I;i. We consider that each document is requested independently from other documents, so we are neglect- l=h l= l=

4 ing any source of correlation between requests of different documents. Let I be the request rate P from an institutional N cache for all N documents, I = i= I;i. I is Zipf [4] [27], that is, if we rank all N documents in order of their popularity, the i? th most popular document has a request rate I;i given by I;i = I i ; where is a constant that determines how skewed the Zipf distribution is, and is given by = ( NX i= i )? : At the regional and national caches, requests for very popular documents are filtered by intermediate-level caches. To compare and caching we consider homogeneous client communities. Thus, requests for document i are uniformly between all O 2H institutional caches, resulting in tot;i = I;i O 2H total request rate for document i, which is also Poisson. If we would consider heterogeneous client communities the hit rates for both and caching would be reduced, since the geographical locality of reference is smaller (see Section IV-B). For some important parameters for which we want to get absolute numbers, i.e. hit rate, disk space, we also consider heterogeneous client communities (Sections IV-B, IV-F). C. Hierarchical Caching Caches are usually placed at the access points between two different networks to reduce the cost of traveling through a new network. As shown in Figure 2, we make this assumption for all of the network levels. In one country there is one national network with one national cache. There are O H regional networks and every one has one regional cache. There are O 2H local networks and every one has one institutional cache. Caches are placed on height of the tree (level in the cache hierarchy), height H of the tree (level 2 in the cache hierarchy), and height 2H of the tree (level 3 in the hierarchy). Caches are connected to their ISPs via access links. We assume that the capacity of the access link at every level is equal to the network link capacity at that level, i.e., C I, C R, C N and C for the respective levels. The hit rate for documents at the institutional, regional, and national caches is given by hit I, hit R, hit N. D. Distributed Caching In the caching scheme, caches are only placed at the institutional level of Figure 2 and no intermediate copies are stored in the network. Institutional caches keep a copy of every requested document. To share document copies among institutional caches, intermediate network caches are replaced with a metadata hierarchy which contains location hints about about the content of the institutional caches [5]. To avoid lookup latencies at the metadata hierarchy, location hints are replicated locally at every institutional cache [23]. We assume that location hints are instantaneously updated at every institutional cache. III. LATENCY ANALYSIS In this section we model the expected latency to obtain a document in a caching hierarchy and in a caching scheme. We use a similar analysis to the one presented in [7]. The total latency T to fetch a document can be divided into two parts, the connection time T c and the transmission time T t. The connection time T c is the time since the document is requested by the client and the first data byte is received. The transmission time T t is the time to transmit the document. Thus, the average total latency is given by A. Connection Time E[T ] = E[T c ] + E[T t ]: The connection time depends on the number of network links from the client to the cache containing the desired document copy. Let L be the number of links that a request travels to hit a document in the caching hierarchy. We assume that the operating system in the cache gives priority at establishing TCP connections. Let d denote the per-hop propagation delay. The connection time in a caching hierarchy Tc h is given by E[T h c ] = 4d X l2f;h;2h;2h+zg P (L = l)(l + ) where the 4d term is due to the three-way handshake of a TCP connection that increases the number of links traversed before any data packet is sent. To account for the distance between the client and the institutional cache, we consider one more link in the connection time. Now let L be the network level such that the tree rooted at level L is the smallest tree containing a document copy. The connection time in caching Tc d is given by E[T d c ] = 4d P 2H l= P (L = l) (2l + ) + 4d P (L = 2H + z) (2H + z + ): In caching a request first travels up to network level l and then it travels down to the institutional cache with a document copy, thus, accounting for 2l links.

5 We now proceed to calculate the distribution of L which is the same for and caching. To obtain P (L = l) we use P (L = l) = P (L l)? P (L l + ). Note that P (L l) is the probability that the number of links traversed to meet the document is equal to l or higher. To calculate P (L l) let denote the time into the interval [; ] at which a request occurs. The random variable is uniformly over the interval, thus we have Z P (L l) = P (L l j ) d () where P (L l j ) is the probability that there is no request for document i during the interval [; ] P (L l j ) = e?ol I;i : (2) Combining equation and equation 2 we get P (L l) = O l I;i (? I;i e?ol ): (3) B. Transmission Time In this section we calculate the transmission time to send a document in a caching hierarchy and in caching. The transmission time of a document depends on the network level L up to which a request travels. Requests that travel through low network levels will experience low transmission times. Requests that travel up to high network levels will experience large transmission times. We make the realistic assumption that the caches operate in a cut-through mode rather than a store-and-forward mode, i.e., when a cache begins to receive a document it immediately transmits the document to the subsequent cache (or client) while the document is being received. We expect capacity misses to be a secondary issue for large-scale cache architectures because it is becoming very popular to have caches with huge effective storage capacities. We therefore assume that each cache has infinite storage capacity. We now proceed to calculate the transmission time for caching E[Tt h ], and the transmission time for caching E[Tt d ]. E[T t h ] and E[T t d ] are given by: X E[Tt h ] = E[Tt h jl = l] P (L = l) l2f;h;2h;2h+zg X 2H+z E[Tt d ] = E[Tt d jl = l] P (L = l) l= where E[Tt h jl = l] and E[T t d jl = l] are the expected transmission times at a certain network level for and caching. To calculate E[Tt h jl = l] and E[Tt d jl = l] we first determine the aggregate request arrival rate at every network level l for caching l h, and caching d l. For caching, the aggregate request arrival rate at every network level l h is filtered by the hit rates at the lower caches. Thus, the aggregate request arrival rate generated by caching at a link between the levels l and l + of the network tree is given by h l = 8 >< >: O l I (? hit I ) O l I (? hit R ) O 2H I (? hit N ) I l = < l < H H l < 2H 2H l < 2H + z The hit rates at every network level can be calculated using the popularity distribution of the different documents, (i.e., Zipf) and the distribution of L. hit l = NX i= I;i I P (L l) For caching, the aggregate request arrival rate at a link between levels l and l + is filtered by the documents already hit in any institutional cache belonging to the subtree rooted at level l, hit l. Besides, in caching, the traffic between levels l and l + needs to be increased by the percentage of requests that were not satisfied in all the other neighbor caches and that are hit in the subtree rooted at level l, hit N? hit l. Thus, every subtree rooted at level l receives an additional traffic equal to O l I (hit N? hit l ). Therefore, the request rate between levels l and l + in caching is given by d l = O l I ((? hit l ) + (hit N? hit l )) for l < 2H and by O 2H I (? hit N ) for 2H l < 2H + z. To calculate the transmission time we model the routers and the caches on the network path from the sending to the receiving host as M/D/ queues. The arrival rate at a given network level for and caching is given by l h and l d. The service rate is given by the link s capacity at every network level (i.e., C I, C R, C N, and C). We assume that the most congested network link from level l to the clients, is the link at level l. Delays on network levels lower than l are neglected. Let S ~ be the average document size of all N documents. The M/D/ queuing theory [3] gives E[Tt h S ~ jl] = C l? l h ~ S (? h l ~ S 2C l )

6 for the delay at every network level in a caching hierarchy, and E[T d t jl] = S ~ C l? l d S ~ (? d l S ~ ) 2C l for the delay at every network level in a caching scheme. IV. HIERARCHICAL VS DISTRIBUTED CACHING: NUMERICAL COMPARISON In this section we pick some typical values for the different parameters in the model to obtain some quantitative results. The following parameters will be fixed for the remainder of the paper, except when stated differently. The network tree is modeled with an outdegree O = 4 and a distance between caching levels H = 3, yielding O H = 64 regional and O 2H = 496 institutional caches. The distance from the top node of the national network to the origin server is set to z =. Setting a high value for z we model the situation where the cost to access the origin servers is very high. We consider N = 25 million Web documents [3], which are following a Zipf distribution with between :64 and :8 [4]. We consider the total network traffic generated from the institutional caches O 2H I, to be equal to document requests per second [5]. We fix the average document size of a Web document as ~ S = KB [5]. We vary the update time between several days and one month. We consider % of non-cacheable documents [23]. The hit rates can be calculated with the formulas derived in Section III-B. Table I summarizes the hit rates at the different caching levels for homogeneous client communities and different update intervals. We see that the 7 days days 5 days hit I 4% 42% 44% hit R 59% 6% 63% hit N 78% 79% 8% TABLE I PERCENTAGE OF DOCUMENTS HIT AT DIFFERENT CACHING LEVELS FOR SEVERAL UPDATE INTERVALS. = :8 hit rate achieved at an institutional cache is about 4%, at a regional cache about 6%, and at the national cache about 8%. The values for the institutional and regional caches are quite similar to those reported in many real caches [23] [8]. However, the hit rate at the national cache is higher than in a real caching architecture. If we consider heterogeneous client communities at the national cache, i.e. = :64, we obtain hit rates at the national cache about 7%, which are closer to the real ones. A. Connection Time The connection time is the first part of the perceived latency when retrieving a document and depends on the network distance to the document. In Figure 3 we show the connection time for and caching for a generic document with a document s popularity tot. We only consider an update period = 24 hours; if we would consider larger update periods, documents would receive more requests in a period and the curves for and caching would both be displaced towards the left of the x-axis. However, the results of the comparison would be still the same. E[T c ] [ms] Fig. 3. Expected connection time E[T c ], for and caching depending on the document s popularity tot. = 24 hours, d = 5 msec. First, we observe that for very unpopular document (small tot ) both and caching experience high connection times because the request needs to travel to the origin server. As the number of requests for a document increases, the average connection time decreases since there is a higher probability to hit a document at closer caches than the origin server. For all the documents which tot ranges between one request per day and one request per minute, a caching scheme gives shorter connection times than a caching scheme. Document copies placed at the regional and the national caches in a caching hierarchy reduce the expected network distance to hit a document. To make the abstraction of considering a generic document, we eliminate the sub-index i and use tot instead of tot;i

7 When the document is very popular a caching scheme can benefit from copies stored at close neighbor caches, reducing the connection time. In a caching scheme, however, the document still has to be fetched from the regional cache in case of a miss at the institutional cache. Thus, for very popular documents, the caching scheme has shorter connection times than the scheme. However, since the probability that a very popular document is not hit at the local institutional cache is very small, the benefit of caching is almost not appreciable. B. Transmission Time The second part of the overall latency is the time it takes to transmit the Web document. To calculate the transmission time, we first show the distribution of the traffic generated by caching l d, and caching at every network level (Figure 4). We observe that dis- h l the regional and national network, C N = C R. We don t fix the network links capacities at the regional or national networks to certain values, but consider only the degree of congestion under which these links operate (i.e., we vary the utilization of the top national network links in the caching scheme = h 2H S ~ C N ). The international path is always very congested and has a utilization of IO 2H (?hit N ) S ~ C = :95. E[T t ] [ms] Mbps Mbps 2 Mbps (a) Not-congested national network. = :3. kbps kbps H 2H tree level Fig. 4. Network traffic generated by and caching at every tree level. = 24 hours. tributed caching practically doubles the used bandwidth on the lower levels of the network tree, and uses more bandwidth in most part of national network. However, the traffic on the most congested links, around the root node of the national network is reduced to half. Distributed caching uses all possible network shortcuts between institutional caches, generating more traffic in the less congested low network levels. Once we have presented the traffic generated at every network level we calculate the transmission time for two different scenarios, i) the national network is not congested, and ii) the national network is highly congested. We set the institutional network capacity to C I = Mbps. We consider the same network link capacities at E[T t ] [ms] (b) Congested national network. = :8. Fig. 5. Expected transmission time E[T t ], for and caching. = 24 hours. In Figure 5(a) we show the transmission time for the case where the national network is not congested ( = :3). The only bottleneck on the path from the client to the

8 origin server is the international path. We observe that the performance of both and caching is very similar because there are no highly congested links. In Figure 5(b) we contrast the performance of and caching in a more realistic scenario where the national network is congested, = :8. We observe that both, and caching have higher transmission times than in the case when the national network is not congested (Figure 5(a)). However, the increase in the transmission time is much higher for caching than for caching. Distributed caching gives shorter transmission times than caching because many requests travel through lower network levels. Similar results are also obtained in the case that the access link of the national cache is very congested. In this situation the transmission time in caching remains unchanged, while the transmission time in caching increases considerably [2]. C. Total Latency The total latency is the sum of the connection time and the transmission time. For large document sizes, the transmission time is more relevant than the connection time. For small document sizes, the transmission time is very small and the connection time has a higher relevance. Next, we present the total latency for a single document with different document sizes S in both and caching (recall that the average document size is S ~ = KB). We will study the total latency in the scenario where the top nodes of the national network are highly congested. In Figure 6 we observe that E[T] [ms] S [kb] Fig. 6. Average total latency depending on the document size S. National network is congested, = :8 caching gives lower latencies for documents smaller than 2 KBytes because caching has lower connection times than caching. However, caching gives lower latencies for higher documents because caching has lower transmission times than caching. The size-threshold depends on the degree of congestion in the national network. The higher the congestion, the lower is the size-threshold from which caching has lower latencies than caching. D. Bandwidth Usage Next, we consider the bandwidth usage in the regional and national network for both caching architectures. Note that the bandwidth used on the international path is the same for both architectures, since the number of documents hit in the global caching architecture is the same for both schemes. We calculate the bandwidth usage as the expected number of links traversed to distribute one packet to the clients. Figure 7(a) shows the bandwidth usage of and caching in the regional network. We see that caching uses more bandwidth that caching in the regional network since many documents are retrieved from distant institutional caches, traversing many links. However, recall that the network traffic is better, with most of the bandwidth being used on lower, less-congested levels of the network tree. Hierarchical caching uses fewer links in the regional network since documents are hit in the regional cache, which is at a closer network distance than many institutional caches. A similar behavior occurs in the national network (Figure 7(b)); caching uses more bandwidth than caching. However, the bandwidth usage of caching is lower in the national network than in the regional network since many institutional caches can be reached by traversing only few national network links. E. Cache Load In order to estimate the load on the caches, we calculate the filtered request rate at every cache level for and caching. For a caching hierarchy, the filtered request rates are given by h I;i = h R;i = I;i (? P (L )) OH I;i h N;i = (? P (L H)) O2H I;i : In caching, the request rate at the institutional cache d I;i is given by the local request rate I;i from the institutional cache, plus the external request rate from

9 Links In figure 8, we calculate the load in the caches for both caching architectures as h l I and d l I respectively. We see that the request rate at the regional and national cache of a caching hierarchy are very high. For popular documents, the request rate at the national and regional caches is reduced since these documents are already hit at lower cache levels. With caching, the request rate at the institutional cache is increased due to the external request rate, however, this increase is not very significant since the external request rate is shared among all cooperating caches. (a) Regional network 496 institutional regional national 6 5 Load Links 3 2 (b) National network Fig. 8. Document request rate at the different caches with respect to the institutional request rate I. = 24 hours. Considering all N documents with different popularities, we can estimate the total incoming request rates at every cache level. In Table II we show the load at every cache level of a caching hierarchy for different network topologies. Again, we see that upper level caches need to Fig. 7. Bandwidth usage on the regional and national network, depending on the document request rate. = 24 hours. other cooperating institutional caches. The external request rate from other cooperating institutional caches can be calculated using the percentage of requests that are not hit locally in an institutional cache and that are hit in any other institutional cache (P (L 2H)? P (L )). Recall that the percentage of requests that are satisfied among the O 2H institutional caches is shared between all the institutional caches, thus, reducing the external request rate that a single institutional cache needs to handle. The load at an institutional cache in caching is given by: d I;i = I;i + (P (L 2H)? P (L )) I;i institutional regional national O=4, H=3.2 req/s 6.4 req/s 22 req/s O=3, H=3.2 req/s 2.7 req/s 25 req/s TABLE II TOTAL REQUEST RATE AT EVERY HIERARCHY LEVELS FOR DIFFERENT NETWORK TOPOLOGIES, = 5 DAYS. support much higher request rates. For the situation where every cache has a smaller number of children caches, i.e. 27 children caches (O = 3,H = 3), the processing power required at high level caches is reduced, event hough it is still quite high.

10 F. Disk Space In this section we will analyze the disk requirements of and caching. In a caching hierarchy, a copy of every requested document is stored at each level of the hierarchy. In caching, only local caches keep copies of the requested documents. In order to compare the disk space required in both caching schemes, we calculate the expected disk space D h needed to store one document in the caching hierarchy with a given request rate. We calculate D h as the average Web document size S, times the average number of copies present in the caching infrastructure. The average number of copies in the caching infrastructure can be calculated using the probability that a new document copy is created at every cache level. Thus, the average disk space D h needed to store one document in caching is given by E[D h ] = ~ S O 2H (? e?h I;i ) + ~ S O H (? e?h R;i ) + ~ S (? e?h N;i ) and in caching is given by E[D d ] = ~ S O 2H (? e?d I;i ): hierarchy instead of being only at the institutional caches. However, for popular documents the amount of disk space used in and caching is very similar. Given the distribution of document requests at level l for caching, h l;i, and the average Web document size S, ~ we can derive the infinite cache size Dl; h at every hierarchy level, i.e. the disk space needed to keep a copy of every requested document while the document is up-todate. For caching, the infinite disk space is the same as the infinite disk space needed at the institutional caches of a caching hierarchy DI; h. Since documents expire after time units, to have and infinite disk space, a cache needs to keep all requested documents during the period where they are up-to-date. Squid cache recommends an average period for a document to be in the cache of 3 days [25]. The infinite cache size for a cache on level l is given by D h l; = ~ S NX i= (? e?h l;i ): To model different client communities with different Zipf parameters, and a generic request rate l, we calculate the request rate h l;i for document i at level l as h l;i = h l i. Table III presents the infinite cache space for different request rates l h, different Zipf parameters, and different update periods. h l, 5 days days 5 days req/s = GB 7 GB GB 2 req/s =.64 GB 58 GB 369 GB req/s = GB 558 GB 759 GB E[D h ] / E[D d ] 2 TABLE III INFINITE CACHE SIZE AT DIFFERENT LEVELS OF A CACHING HIERARCHY. ~ S = KB..5 Fig. 9. Hierarchical to disk space ratio for one document, depending on the document request rate Figure 9 shows the ratio of the disk space used by to caching for different document popularities. We see that caching uses three times as much disk space for unpopular documents, since document copies are present at the three levels of the caching We have chosen values of and h l that can easily correspond to those of an institutional, a regional, and national cache of a caching hierarchy [4] [8] []. As the period increases, the disk requirements also increase since the probability that a document is requested in a period increases. At the limit, for an infinite update period and a large enough client population, the cache would need a disk space equal to N S KBytes to store all requested N documents. From Table III we see that a cache with a request rate equal to req/sec needs about GBytes of disk space to keep all requested documents during 5 days. As the request rate increases, there are more documents that are requested during a period and the needed

11 disk space increases as well. For a cache with a request rate equal to req/sec the disk space required to store all documents is about several hundreds of Gbytes. These analytical values are very close to those ones reported in different trace-driven simulations with similar parameters l h, and [4] [] [8]. As we have seen in this section, the performance requirements of high level caches concerning disk space and cache load are very high, which makes very difficult to deploy a centralized top-level cache. As a result, many top level caches use several machines to distribute their load and disk space. These, machines cooperate in a fashion [], using some of the existing cache sharing schemes [24] [9] [9]. Distributed caching can decrease the retrieval latency of large documents and reduce the bandwidth usage at high network levels. However, full deployment of caching encounters several problems. When a document is fetched from a neighbor institutional cache, the experienced latency depends not only on the bandwidth of the requesting institutional cache, but also on the bandwidth of the neighbor cache that is contacted. Thus, in a caching scheme investing into a higher capacity links will not result in any benefit when documents are hit in neighbor caches with smaller connection capacities. In order to increase the local hit rates, local ISPs could increase the disk space of their cache and thus store more documents. However, in caching the more documents a given institutional cache stores, the higher it is the number of external requests it will receive from other neighbor institutional caches. Thus, investing in larger disks to save bandwidth and reduce latency can eventually result in more incoming traffic and longer queuing delays at the local cache. Additionally, caching has longer connection times and higher bandwidth usage in the low network levels. Nevertheless, caching can be used in a smaller scale where caches are interconnected with short distances and plentiful bandwidth, i.e. among the caches on a campus or in a metropolitan area. V. A HYBRID CACHING SCHEME In this section we consider a hybrid caching scheme where a certain number of caches k cooperate at every network level of a caching hierarchy. With hybrid caching when a document can not be found in a cache, the cache checks if the document can be found in any of the cooperating caches. If multiple caches happen to have a document copy, the neighbor/parent cache that provides the lowest latency is selected. If the document is not found among the caches at a given level, the request is then forwarded to the immediate parent cache or to the origin server. Next, we will study the optimal number of caches that should cooperate at every network level to minimize the connection time, the transmission time, and the total retrieval latency. The analysis of the hybrid scheme is very similar to the one already presented for and caching. Due to space limitations we omit the detailed analysis for hybrid caching which can be found in [2]. For hybrid caching the probability that a document is found among k cooperating institutional caches can be calculated using Formula 3 and replacing O l institutional caches by k cooperating caches. At the regional level, to calculate the probability that a document is found among k cooperating regional caches, O l should be replaced by O H k. A. Connection Time Next, we present the connection time in a hybrid scheme depending on the number of cooperating caches k at every level. The number of cooperating caches at every cache level can range from (no cooperation) to O H = 64 (all neighbor caches in the same cache level cooperate). In Figure, we show the average connection time for all N Web documents, depending on the number k of cooperating caches. We observe that when the number of cooperat- E[T c ] [ms] k Fig.. Connection time depending on the number of cooperating caches at every cache level in a hybrid scheme. ing caches is very small, the connection time is high. The probability that a document is found in few close neighbor caches is very small, thus, most of the requests are satisfied by the parent cache at a distance of H hops. When the number of cooperating caches increases, the connection time decreases up to a minimum. This is due to the fact that the probability to hit a document at neighbor caches,

12 which are closer than the parent cache increases. However, when the number of cooperating caches increases over a certain threshold k c = 4, the connection time increases very fast because documents are requested from cooperating caches at distant network levels. There is, therefore, an optimum number of caches that should cooperate at every cache level to minimize the connection time. The optimum number of cooperating caches k c that minimize the connection time is given number of caches that are at closer network distances than the parent cache, k c = O bh=2c. In Figure, we present the connection time for a hybrid scheme with the optimum number of cooperating caches k c, caching, and caching. We observe that a hybrid scheme with k c cooperating caches has lower connection times than caching and even lower connection times than caching for a large range of documents popularities. E[T c ] [ms] hybrid, k=4 Fig.. Connection time in a hybrid caching scheme with the optimal number of cooperating caches k c. B. Transmission Time Next, we analyze the transmission time in a hybrid scheme and calculate the optimum number k t of cooperating caches at every network level that minimize the transmission time. In Figure 2 we plot the transmission time in a hybrid scheme for all N Web documents depending on the number of cooperating caches at every cache level. We consider the case where the top links of the national network are not congested ( = :3), that is, the only bottleneck in the path from the origin server to the client is the international path, and the case where the top links of the national network are congested ( = :8). Similar results are also obtained for the case that the access link of the national cache is congested [2]. E[T t ] [ms] ρ=.3 ρ= k Fig. 2. Average transmission time depending on the number of cooperating caches in a hybrid scheme. National network is not congested, = :3. National network is congested, = :8. We observe that for the case where the national network is not congested varying the number of cooperating caches k at every cache level hardly influences the transmission time. However, when the national network is congested, the transmission time strongly depends on the number of cooperating caches at every cache level. If the number of cooperating caches is very small, there is a low probability that the document can be retrieved from close neighbor caches. Since the parent cache contains all documents requested by its children caches, there is a higher probability that the document is retrieved from the parent cache through the highly congested top-level links. As the number of cooperating caches increases, the probability to hit the document at close neighbor caches connected by fast links increases, and thus, the transmission times are lower. If the number of cooperating caches increases over a certain threshold k t = 6, the transmission time increases again because documents are hit in distant neighbor caches which are connected through highly congested top-level links. The optimum number of cooperating caches k t that minimizes the experienced transmission time, depends on the number of cooperating caches that can be reachable avoiding the congested links. In the case where the toplevel links of the national network are congested, the optimum number of cooperating caches at every cache level is k t = 6. This value corresponds to the number of regional caches that can cooperate without traversing the national top-level links, k t = O H?. In Figure 3, we present the transmission time E[T t ] for a hybrid scheme with the optimum number of coop-

13 erating caches k t, caching, and caching. We observe that a hybrid scheme with k t cooperating caches has lower transmission times than caching. We also observe that a hybrid scheme has even lower transmission times than caching since it reduces more the traffic around the high network levels [2]. k t = 6. In Figure 4 we observe that k opt ranges between k c = 4 and k t = 6. For documents smaller than several KBytes, only k c = 4 caches should cooperate at every cache level. For documents larger than several cents of KBytes, k t = 6 caches should cooperate at every cache level. E[T t ] [ms] hybrid, k=6 Fig. 3. Transmission time for a hybrid caching scheme with the optimal number of cooperating caches k t. National network is congested, = :8. Thus, by dynamically choosing the number of cooperating caches, a hybrid scheme can have as low connection times as caching, and as low transmission times as caching. C. Total Latency Depending on the document size there is an optimum number of caches that should cooperate at every cache level to minimize the total latency. For small documents the optimum number of cooperating caches is close to k c, since choosing k c cooperating caches minimizes the connection time. For large documents the optimum number of cooperating caches is close to k t, since choosing k t cooperating caches minimizes the transmission time. For any document size, the optimum number of cooperating caches k opt that minimizes the total retrieval latency is such that k c k opt k t. In Figure 4 we plot the optimum number of caches k opt that should cooperate at every network level to minimize the total retrieval latency depending on the document size. We choose the case where the top-level links of the national network are highly congested, thus the optimum number of caches that minimizes the transmission time is k opt S [kb] Fig. 4. Optimum number of cooperating caches k opt, depending on the document size S. National network is congested, = :8. In Figure 5 we show the total retrieval latency for a large document (S = 2 KB) and the optimal number of cooperating caches k opt = k t = 6. We see that a hybrid E[T] [ms] hybrid, k=6 Fig. 5. Total latency to retrieve a large document in a hybrid caching scheme with the optimum number of cooperating caches k opt = k t = 6. National network is congested, = :8. S = 2 KB. scheme with the optimal number of cooperating caches at every cache level has lower overall retrieval latencies than and caching for a large range of

14 document s popularities. D. Bandwidth Having caches cooperate at every cache level not only influences the latency experienced by the clients, but also modifies the bandwidth usage inside the network. In figures 6(a) and 6(b), we compare the bandwidth usage of a hybrid scheme for different degrees of cooperation with the bandwidth usage of and caching. hybrid architecture uses less bandwidth than a scheme. For the case where k = k c = 4 caches cooperate at every level, the bandwidth usage of a hybrid architecture is even lower than the bandwidth usage of a caching hierarchy, since k c = 4 cooperating caches minimize the connection time in terms of network distance. When the top-level links are congested, k = k t = 6 caches should cooperate to minimize the transmission time. However, in this situation the bandwidth usage increases slightly compared to a pure configuration since distant caches may be contacted to reduce the transmission time. Links hybrid, k=4 hybrid, k=6 (a) National network E. Cache Load A hybrid architecture redistributes the load at every cache level. Institutional and regional caches receive more requests from neighboring cooperating caches and the national cache receives fewer requests since more requests are satisfied among cooperating caches. The average request rates at the different hierarchy levels can be calculated as the aggregate request rate not hit in the children caches plus the external requests coming from the cooperating caches. 496 institutional regional national hybrid, k=4 Links hybrid, k=4 hybrid, k=6 Load 64 2 (b) Regional network Fig. 6. Bandwidth usage on the national and regional network for the, and hybrid caching schemes. We see that for the national, and the regional network, a Fig. 7. Document request rate at the different caches with respect to the institutional request rate I for caching and for hybrid caching with k = k c = 4 cooperating caches (crossed lines). = 24 hours. Figure 7 shows the load at every cache level for, and hybrid caching with k = 4. We see that a hybrid scheme clearly reduces the request rate for popular documents at the national and regional caches compared to caching. The load at the regional and institutional caches increases slightly, since every cooperating cache only serves a part of all the requests hit among the k cooperating caches. For higher number of cooperating

Evaluation of Web Caching Strategies

INSTITUT FÜR INFORMATIK DER TECHNISCHEN UNIVERSITÄT MÜNCHEN Diplomarbeit Evaluation of Web Caching Strategies Bearbeiter: Aufgabensteller: Betreuer: Christian Spanner Prof. Dr. Johann Schlichter, TU München