Web Proxy Cache Replacement: Do's, Don'ts, and Expectations

Size: px

Start display at page:

Download "Web Proxy Cache Replacement: Do's, Don'ts, and Expectations"

Virginia Campbell
5 years ago
Views:

1 Web Proxy Cache Replacement: Do's, Don'ts, and Expectations Peter Triantafillou and Ioannis Aekaterinidis Department of Computer Engineering and Informatics, University of Patras, Greece Abstract Numerous research efforts have produced a large number of algorithms and mechanisms for web proxy caches. In order to build powerful web proxies and understand their performance, one must be able to appreciate the impact and significance of earlier contributions and how they can be integrated. To do this we employ a cache replacement algorithm, 'CSP', which integrates key knowledge from previous work. CSP utilizes the communication Cost to fetch web objects, the objects' Sizes, their Popularities, an auxiliary cache and a cache admission control algorithm. We study the impact of these components with respect to hit ratio, latency, and bandwidth requirements. Our results show that there are clear performance gains when utilizing the communication cost, the popularity of objects, and the auxiliary cache. In contrast, the size of objects and the admission controller have a negligible performance impact. Our major conclusions going against those in related work are that (i) LRU is preferable to CSP for important parameter values, (ii) accounting for the objects' sizes does not improve latency and/or bandwidth requirements, and (iii) the collaboration of nearby proxies is not very beneficial. Based on these results, we chart the problem solution space, identifying which algorithm is preferable and under which conditions. Finally, we develop a dynamic replacement algorithm that continuously utilizes the best algorithm as the problem-parameter values (e.g., the access distributions) change with time. 1. Introduction Currently there are billions of data objects in the web. Typically, a fraction of these objects will be requested more frequently. A web proxy server is a computer system that stands between web origin servers and (the browsers of) a community of users/clients. It provides a cache, that is shared among its client community, storing web objects, so that later requests from these clients for the cached objects can be served from the cache rather than from the remote origin server. Since some objects are requested more frequently, the web traffic is reduced considerably, the origin server is substantially off-loaded, and the user-observed latencies may be reduced [11, 24]. Proxies are categorized into forward or reverse proxies (the latter are also called surrogate servers). The former can be found in ISP network backbone nodes, intercepting all web traffic. The latter are found in CDN servers being contacted only for content requests belonging to origin servers, which have specific business agreements with the CDN. Cached content in reverse proxies is under the explicit control of the content provider. Despite these differences, both proxy types perform fundamentally the same caching tasks. Thus, in both types of environments, proxies can prove very beneficial and thus constitute a key part of the fundamental infrastructure of web-based information systems. Motivation and Goals of this Research During the past few years a great wealth of knowledge on efficient web proxy cache replacement has been accumulated. This knowledge is in terms of components, which replacement functions must include and on auxiliary algorithms and resources. Briefly, the accumulated knowledge suggests employing a cache replacement function that takes into account the objects popularities, sizes, and access costs, favoring small, remote, and popular objects. In addition, auxiliary caches and cache admission control can help further by eliminating unfortunate, eager replacements and maintaining key statistics. Finally, collaborating nearby proxies can further improve performance. With this work we do not intend to contribute yet another algorithm for web proxy cache replacement. Instead, we wish to (i) study and evaluate the impact when integrating the various policies that have prevailed over the last few years; (ii) chart the problem solution space, identifying which algorithms and mechanisms are preferable under which conditions; and (iii) given the above, develop a dynamic algorithm that monitors the current state and employs the algorithm that has been found to offer the best performance for this state. The rest of the paper is organized as follows. In section 2 we present a brief overview of related work in proxy caching. Section 3 describes the trace-driven, simulation-based performance study. In section 4 we present and analyze the results from our experiments. In section 5 we present a dynamic replacement algorithm. In section 6 we present the concluding remarks of the paper.

2 2. Related Work Several architectures have been proposed that are based on web proxy cache collaboration. Hierarchicallyorganized proxy caches (e.g. Harvest, Squid [6,7]) were proposed to improve overall performance. The key idea lies in the cooperation of proxies, as cache misses are served by higher-level caches. [10] Proposes a scalable data-location service, called the hint hierarchy, which allows each cache to locate the source (proxy cache or remote server) of each object, that requires the minimum number of hops in order to be accessed. In [20] an approach is proposed, based on the Summary Cache, which allows data sharing among a large number of caches. CRISP [8,9] is a collaborative cache that is scalable. A design similar to CRISP, for similar environments, but for video data proxy caching is adopted in the MiddleMan system [16]. Cache replacement schemes for Web proxies play a key role in the proxy s performance. They can be categorized as follows: i) Traditional replacement policies and their extensions. The bulk of these are based on LRU. Some replacement algorithms that belong to this category are: Size-adjusted LRU, Least Frequently Used (LFU), SIZE [3], LRUMIN [3], LRU-THOLD [22], and LRU-K [5]. ii) Key-based policies. The idea is to sort objects based upon a primary key, break ties based on a secondary key and so on. Such policies are LOG2-SIZE [3] and HYPER-G [3]. iii) Function-based replacement policies. The idea is to employ a general function of several factors. The algorithm makes decisions of which object to evict from the cache, based on the specific function value associated with each cached object. Such algorithms are: the SLRU [1], the PSS [1], the LNC-R-W3-U [2], the LRV [4], the Greedy Dual-Size (GDS) [19], and a generalization of Greedy Dual-Size algorithm called Greedy Dual* [21]. 3. Performance Study 3.1. Study Setup We have conducted a performance study with the goal of evaluating the impact of several key components and mechanisms of a cache replacement scheme. We will concentrate on the performance of the proxy cache. For this, we will not take into account delays experienced during the interaction between the clients browsers and the proxy. Modeling Communication Cost In general, the access cost (measured in time units) of accessing an object from a web server includes several components ([23]). The total costs include DNS resolution times, overheads due to the TCP protocol (e.g., connection establishment times), (proxy) server RAM access and I/O times. In addition, the access costs depend on link bandwidths, and router (processing and buffering) capacities. Since the Internet itself is a collection of different servers, routers, and links with very different performance characteristics, and given the trend of continuous infrastructure improvements (which can also perhaps explain why the literature contains conflicting data as to the contribution of some of the above components to the overall performance [15]) the task of modeling web object access costs is a formidable one and is outside the scope of this paper. A recent study [15] identified the bottleneck within the Internet itself. So we focus on the communication cost component of the total access cost. We will assume the existence of well configured and efficient DNS, proxies, and web servers and routers, as well as efficient transport protocols (with TCP-state caching). The majority of researchers take advantage of the latency information, which is included in web traces collected from various sites. After detailed examination of web traces [17] it has been found that the communication latency to retrieve the same object varies significantly even after the passing of a short time (a fact also acknowledged by [19]). This of course leads to erroneous decisions when we are studying and/or comparing the impact of different cache replacement algorithms on the mean latency. For our purposes we wish to employ a simple communication cost model that satisfies four requirements: i) reflects the cost differences when fetching web objects of different sizes, ii) is parametric and sensitive to the load in the Internet, iii) reflects the fact that different proxy cache replacement algorithms have different impacts on the Internet load, due to the different hit ratios they achieve, and iv) reflects the different link characteristics of the links traversed in typical scenarios. We believe that a model that satisfies these requirements can be simple enough, on the one hand, and also powerful enough to allow the proper evaluation of different replacement algorithms with respect to their latency performance. For the above reasons we have used an M/M/1-based analytical model for estimating latency. Its detailed description is available online [27]. However, we stress that with respect to our charting of the problem solution space, the results which refer to latency can be replaced with any other results from other researchers, as better answers to the latency modeling problem emerge, or downgraded (especially given the fact that most agree that the latency benefits are inferior to those one would hope for [25,26]). Creating Synthetic Workloads We have used real web traces to drive our proxy servers. However in this paper we report the

3 performance results using the SURGE tool [13] for generating workloads, since it allow us the flexibility to test the sensitivity of our results with respect to different values of system parameters (such as the skewness of access distribution) which are found at different traces. At any rate our results with the SURGE tool are very similar to those we obtained from the real traces. In particular, SURGE s models can be varied to explore expected future demands or other alternative conditions. Each generated trace contains 140,000 requests and corresponds to one-day-long workloads observed in real traces. There are approximately 35,000 unique objects in each trace. We used the following distributions for file size, popularity and temporal locality. i) Object Sizes. These size distributions can be heavytailed, meaning that a proxy server must deal with highly variable object sizes. The lognormal distribution was used for the body and a Pareto for the tail. ii) Popularity. This property of the workload reflects the probability of referencing particular objects. Popularity distribution for Web objects has been shown to follow Zipf s Law [14] with parameter θ varying from 0 (uniform distribution) to 1 (more skewed). iii) Temporal Locality. Based on studies of real web traces we have chosen to model the distribution of stack distances using the lognormal distribution. In order to study the impact of varying the parameter θ of the Zipf distribution in proxy cache performance, we generated three sets of traces corresponding to three different values of θ. These values are θ=1.0, 0.8 and 0.6. Studies show that expected θ values range from 0.6 to 0.8 [14]. However, related work also considers θ values up to one [12,13]. Modeling Proxy Collaboration The overall performance can be improved by introducing collaboration among proxy caches. More precisely, in the case of a local cache miss, if a cooperating nearby cache has the object, then it will be retrieved from that cache rather than the object s distant home site. Cooperating caches share their contents by propagating their directory to all participating caches every T seconds. In order to study the benefits attainable from collaborating proxies we employed the following model, based on two parameters. First, a percentage of the objects in the traces driving each proxy was common to all proxies. More precisely, this percentage was set, unless stated otherwise, at 50%. Second, the total probability mass of the common objects was varied; in the traces used for this paper we used 60% and 80% of all proxy accesses to refer to the common objects Cache Replacement Mechanisms In our study we tried to measure and compare the performance of several replacement policies under a cooperative environment. These policies are based on the well-known LRU, the Cost Size Popularity (CSP) algorithm, the CP-SQRT(S), the CP-LOG(S), the CP algorithm and the CS algorithms. A detailed description of how these algorithms work follows after an explanation of the basic mechanisms used. Caching Gain For every object in the cache we compute its caching gain, which is a mathematical expression involving (for the case of the CSP replacement algorithm) the size, the popularity, and the communication cost to retrieve the object. Objects with smaller caching gain are more likely candidates for eviction. Popularity Modeling/Estimation We can approximate the object s popularity, since it is not known in advance, by computing the object s request rate λ i, which indicates how popular an object is. Since λ i =1/MTTR i where MTTR i is the Mean Time To Reaccess the object, we can compute MTTR using a well-known technique found in the literature (such as the one in [18]). More precisely, MTTR is computed as a weighted sum of the interarrival times between the consecutive accesses. Admission Control and the Use of an Auxiliary Cache The main task of an admission control policy is to decide which objects should be cached and which should not. Studies have shown that the admission control policy works well, especially if we combine it with a small auxiliary cache that acts as an additional filter. The auxiliary cache contains metadata information about each object and is also needed in order to compute the MTTR of each object. Putting Everything Together For objects fetched from the web for the 1 st time, the proxy simply enters a record in the auxiliary cache with the object id and its reference timestamp. When referenced for the 2 nd time, the MTTR value for the object is computed and the admission controller is called to determine whether the object should be cached. This decision is based on the caching gain function associated with the replacement algorithm. The replacement algorithm determines all these objects that would be evicted to make room for the new object. This action is taken if the admission controller determines that the new object has a greater caching gain than the sum of the caching gains of the objects candidates for eviction. Note that when using an auxiliary cache, we may experience one lost hit (since on the first reference only metadata is cached). The LRU Algorithm LRU deletes as many of the least recently used objects, as it is necessary to have sufficient space for the newly accessed object. LRU does not employ an auxiliary cache and admission control.

4 The CSP (Cost Size Popularity) Algorithm This algorithm takes into account the size, the communication access cost, and the popularity of an object. For every object, i, its caching gain is: CG i Cost i = Size MTTR If the admission controller permits it, the replacement algorithm evicts from the cache the object(s) with the lowest values of caching gain. We also examined some extensions of CSP by changing the caching gain function and in order to observe how i) the size term and ii) the popularity term influence the replacement algorithm s performance. Detailed results are available online at [28]. Performance Metrics The key performance metrics are: i) the hit ratio, ii) the client response time (latency) measured in milliseconds, and iii) the network bandwidth requirements (web traffic). The network bandwidth requirements, is defined to be the amount of data retrieved (in Kbytes) over the web during the duration of playing the trace file. Experiments and Goals The following are the key questions: i) What is the relative performance of the CSP algorithm (as it embodies the key techniques which have been found/proposed by related work) against that of the simple and basic LRU algorithm? In particular, how and what do the popularity term, the size term, the auxiliary cache, and the admission control mechanisms contribute to the performance? ii) How do the performance results as measured by the metrics of the hit ratio, the user latency, and the network bandwidth requirements compare against each other? iii) How do variations in the application environment and system configurations (i.e., cache sizes, number of collaborating proxies, skewness of access distributions) impact the answers to the above questions? 4. Performance Results Unless explicitly stated otherwise, when we refer to CSP we imply the use of CSP with auxiliary cache and admission control Results on Hit Ratio Performance First, we present the performance of each policy based on the hit ratio metric. It is obvious that as the cache size grows, all algorithms perform better. Since more space is available, more objects can be stored, which means that more hits are generated. Figure 1 shows the performance of LRU and CSP for a 2-proxy configuration and for variable cache size. The θ parameter is 0.8. i i We also examined the performance as a function of θ. As you can see in Figure 2 when θ=0.6, there is a very small difference between the two policies considering a cache size of 1%. In more uniform workloads (e.g. θ=0.6) the CSP policy seems to have poor performance because it cannot exploit the popularity of each object. LRU on the other hand performs much better than CSP, especially with large cache sizes. As Figure 1 shows, even for θ=0.8 when the cache size becomes adequately large, LRU outperforms CSP. Hit Ratio (%) Hit Ratio vs Cache Size(θ=0.8) Cache Size (%) LRU 2 Proxies CSP 2 Proxies Figure 1. Hit Ratio with varying cache size.θ=0.8 Hit Ratio (%) Hit Ratio vs θ for small (1%) and large (30%) caches 0,6 0,8 1 θ LRU 2 proxies 1% CSP 2 proxies 1% LRU 2 proxies 30% CSP 2 proxies 30% Figure 2. Hit Ratio with varying θ. Cache size is 1% and 30% of the maximum required space. As you can see in Figure 2 for large values of θ (θ=1.0) and small cache sizes LRU performs negligibly

5 better compared to θ=0.6; the hit ratio stays under 30%. On the other hand CSP increases its hit ratio by 76.9% compared to θ=0.6. When considering very large cache sizes (30%), where there is plenty of available space, the difference in performance observed for each algorithm by varying θ is not significant as you can see in Figure 2. The main reason for this is that there is enough space to accommodate many objects and thus many hits are generated which causes the hit ratio to approach its maximum value. By comparing the two algorithms we can definitely say that LRU outperforms CSP for large cache sizes, and for more uniform workloads. In more skewed workloads (θ 0.8) CSP is better, except for very large caches (Figure 1). More precisely, CSP performs better by 75.3% for small cache sizes where the space is at a premium. The main reason is that the popularity factor in CSP pays off, because the access distribution is more skewed. For very large cache sizes though, where the available space is enough to accommodate many objects, even LRU can keep enough hot objects and derive higher hit ratios. In this case we can see in Figure 1 that LRU performs better by 7.3%. The worse performance of CSP in large cache sizes can be explained by considering the cost of the auxiliary cache, which is necessary in order to reliably compute MTTR for each object. Its drawback is that we lose one hit (the first one) compared to LRU Results on Latency Performance CSP, as explained above. A key observation when examining our latency results is that the increase in the hit ratio of CSP versus that of LRU does not yield an analogous decrease in latency, especially for small caches. We saw for example for a 2-proxy environment, cache size 1%, and θ=0.8 (Figure 1) an improvement of 75.3% in the hit ratio while the corresponding decrease in latency (Figure 3) is only 18.6%. Latency (msec) Latency vs Cache Size (θ=0.8) Cache Size (%) LRU 2 Proxies 1% CSP 2 Proxies 1% Figure 3. Latency with varying cache size. θ=0.8 As expected, the mean latency reduces as the cache size increases as you can see in Figure 3. For smaller cache sizes and θ=0.8, CSP performs better than LRU by 18.6% (Figure 3) while for larger cache sizes LRU outperforms CSP. Generally, we noticed that the mean latency depends on the replacement policy. For small caches and more skewed distributions, CSP performs better because it tends to keep in the cache objects with large retrieval cost and popularity, and small size. In large caches where we have the ability to accommodate many objects, LRU performs better than CSP, as one would expect from the above results on the hit ratio performance. As θ grows, for small caches we can see in Figure 4 that CSP performs better than LRU except in the case where θ=0.6 and cache size is 1%. This observation agrees with the results on the hit ratio metric. For larger cache sizes, the LRU algorithm performs better than the Latency (msec) Latency vs θ for small (1%) and large (30%) caches 0,6 0,8 1 LRU 2 Proxies 1% CSP 2 Proxies 1% LRU 2 Proxies 30% CSP 2 Proxies 30% θ 1 Suppose that we have a request for object i. It enters the auxiliary cache. On the second request for the same object, it tries to enter the main cache. The result depends on the admission control policy. On the third request for i we may have a hit. If we use the LRU policy, we will have a hit on the second request because neither the auxiliary cache nor the admission control policy exists. So, in the CSP policy we lose at least one hit for every object that enters the cache Figure 4. Latency with varying θ. Cache size is 1% and 30% of the maximum required space. We also saw that a very small difference in the hit ratio of CSP and LRU translates sometimes in a relatively big difference in latency. This is attributable to

6 the dual role the size term is playing. It helps improve hit ratios, which can improve mean latencies. But it also results in CSP fetching larger objects from the web, a fact that hurts its latency (and network bandwidth requirements) performance Results on Network BW Requirements We also tested how the cache replacement policies performed when network bandwidth is considered. All algorithms perform better as the cache size grows. The reason for this is that in large caches there is plenty of available space to accommodate many objects. Thus, fewer objects are retrieved from the web. The same observation holds for the CSP policy when increasing the value of the parameter θ from 0.6 to 0.8. As we noticed above, the performance of CSP is improved as θ grows when considering the hit ratio. Higher values of hit ratio imply more hits and thus fewer retrievals from the web. So, when examining the CSP policy, the network bandwidth requirements decrease as the workload becomes more skewed. For small cache sizes we observed a decrease of 5.5% in the number of total Kbytes retrieved from the Web while the decrease in the large cache sizes was 1.8%. The situation is reversed when examining the LRU policy. We observed a slight increase in network usage when θ goes from 0.6 to 0.8. This increase was 4.67% for small cache sizes and 2.89% for larges. This increase in network bandwidth requirements for LRU when θ increases from 0.6 to 0.8 can be intuitively attributed to the fact that LRU fails to capture the more skewed popularities. Recall from Figure 2 that we had also observed a lower hit ratio, for θ=0.8 compared to θ=0.6. If we compare the two algorithms we can see in Figure 5 that for large cache sizes LRU performs better for every value of θ. This can be explained as the CSP policy is found to be consistently worse for this configuration even for the metrics of latency and hit ratio. A key conclusion is that the dramatic improvements enjoyed by CSP observed when examining the hit ratio metric do not exist for the network bandwidth metric. For example, for θ=0.8, cache size = 1% and 2 proxies, CSP enjoys a hit ratio that is 75% higher than that of LRU. However this translates to only about 4% improvement in terms of network bandwidth requirements. The explanation for this is similar to that given for the low latency improvements above Comparing Against the GDS Algorithm In another thread, we have also measured the performance of another well-known algorithm called GDS [19], against that of CSP and LRU. We have found that in general GDS outperforms CSP and LRU in terms of hit ratio in all cases, except in very skewed workloads (θ=1.0) and small cache size (1%), where CSP is marginally better than GDS. However, our results also show that GDS performs poorly in terms of the latency and network bandwidth metrics. With respect to latency, GDS for all cache sizes performs worse as θ grows. It performs better than LRU and CSP for more uniform workloads, while for θ=1.0 the situation is reversed. When considering network bandwidth requirements, GDS has poor performance even though it enjoys higher hit ratios, because it prefers to store small objects and fetch large ones from the web. 5. Dynamic Replacement Algorithm Figure 5. Network bandwidth usage for large and small cache sizes and varying θ. Having obtained the performance results for each studied replacement algorithm as a function of accessskew distribution, cache size, and of the performance metric of interest, we can go one step further and develop a replacement algorithm, which dynamically adjusts, in order to continuously provide the best performance. The main idea behind the dynamic replacement algorithm is the selection of the best algorithm based on the conditions under which the proxy is operating. The dynamic algorithm makes the right decision by looking up a performance table which summarizes the performance of the algorithms studied indicating the preferred algorithm as a function of the value of the θ parameter, the available cache size and performance metric of interest. The dynamic replacement algorithm monitors the request stream that arrives at a proxy, tries to estimate its properties, and by consulting the performance table, it

7 chooses the appropriate algorithm. Main Subsystems Description The primary subsystems of a proxy server, employing the dynamic replacement algorithm, are the following: the popularity monitoring subsystem monitors the request stream that arrives at the proxy server. Meta-data information is stored for every requested object, in order to have the ability to closely approximate the value of the parameter θ. The disk size monitoring subsystem monitors the request stream in order to compute the percentage of the maximum required space, which corresponds to the available disk size of the proxy server. In other words, if after a short time the aggregated size of all requested objects is for example 200Gbytes and the proxy server employs a cache with an 100Gbytes hard disk, then the disk size is the 50% of the maximum required space, needed to store all requested objects. The final input is given by the proxy administrator which selects the appropriate performance metric, selecting one of maximizing hit ratio, minimizing latency, or minimizing network bandwidth requirements. Having set the performance metric and estimated the value of the parameter θ and the required cache size, the dynamic algorithm looks up the performance table at regular intervals and if its worth doing so, changes the replacement policy choosing the best one as determined from the results of section 4. For our experiments on the performance of the dynamic algorithm we used the same study setup that described in section 3.1. We generated synthetic workloads with varying values for the θ parameter of the zipf distribution (θ=0.6, θ=0.8, θ=0.9). The workload contained 450,000 requests and 35,000 unique objects. We tested the dynamic algorithm under 7 different cache sizes starting from 1% to 30% of the maximum required cache size needed to store all requested objects. 6. Contribution and Concluding Remarks Web proxy cache replacement algorithms have received a great deal of attention from academia. The key knowledge accumulated by related research can be very briefly summarized as follows: Web proxy caches, in order to yield high performance in terms of one or more of the metrics of hit ratio, communication latency, and network bandwidth requirements must: i) employ a cache replacement policy that takes into account the communication latency to fetch objects from the Web, the size of the objects, and the (estimated) popularity of the objects; ii) employ an auxiliary cache which holds meta-data information for the objects and acts as a filter and an admission control policy which further reduces the probability of unfortunate cache replacement decisions; iii) exploit nearby caches, building collaborative web proxy caches across organizational wide networks which can offer additional performance improvement. Despite the fact that the above techniques (which have been embodied in our CSP algorithm) have been proposed by researchers long ago, most web proxies continue to employ the good old LRU policy. Our results show LRU to be better than the sophisticated CSP algorithm in environments where the access distribution of objects is less skewed and/or in configurations where proxies enjoy large caches. Given that most web proxies use caches on magnetic disks and that there are several web traces showing moderately skewed distributions (with θ values between 0.6 and 0.7) [14], the choice of LRU may seem justified, depending on proxy configuration and application characteristics. Our results have also shown that the importance of measuring hit ratios of complex replacement policies (involving size, popularities, and communication cost terms) is of less importance. In fact, the hit ratio results (and thus any results on simplistic metrics of latency, which rely heavily on hit/miss ratios and average communication delays) can be misleading. This happens since different terms in the multi-term replacement criteria may be working in conflicting ways. Namely, including the size term in the replacement function improves the hit ratio, since smaller objects are favored and, thus, more objects can be cached. In turn, higher hit ratios do improve latencies, in general. However, including the size term also has the effect of fetching larger objects from the web, which hurts latency (and network bandwidth requirements). In this sense, the size term conflicts with the communication cost term, which tries to fetch objects with small latencies. Similarly, we have seen small hit ratio differences to translate into large latency and network bandwidth requirements differences, a phenomenon occurring for large caches and attributable to the fact that the CSP replacement causes very large objects to be evicted and later fetched from the web. Another major conclusion is that the collaboration benefits made possible in the system configurations we examined are rather small, both in terms of latency improvement and in terms of network bandwidth requirements improvements (detailed results omitted for space reasons, see [28]). This holds despite the fact that we, as other studies have found, also noticed significant improvements in the total hit ratio, up to 30%, due to the collaborative hits. With respect to the key components of the efficient replacement policy we have found that the auxiliary cache, the popularity component, and the communication cost component play an important role in the performance of CSP. However, the size and the admission control seem to affect mostly the hit ratio performance and only in a minor way the latency and the network bandwidth requirements performance (detailed

8 results omitted for space reasons, see [28]). Finally, we took advantage of the previous results, charting the problem solution space with respect to the algorithm with the best performance as a function of popularity distribution, required cache size, and desired performance metrics. Having charted the problem space, we developed a dynamic replacement algorithm, which monitors the environment and chooses the right algorithm to apply, based on the characteristics of the proxy environment. In the future we plan to extend the above results and utilize them in order to determine optimal proxy cahe placement algorithms and study web proxy designs for continuous media and mixed-media applications. References [1] Charu Aggarwal, Joel L. Wolf and Philip S. Yu, Caching on the World Wide Web, In IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 1, January/February [2] Junho Shim, Peter Scheuermann and Radek Vingralek, Proxy Cache Algorithms: Design, Implementation and Performance, In IEEE Transactions on Knowledge and Data Engineering, vol. 11, no. 4, July/August [3] Stephen Williams, Marc Abrams, Charles R. Standridge, Ghaleb Abdulla and Edward A. Fox, Removal Policies in Network Caches for World-Wide Web Documents, In Proceedings of ACM SIGCOMM, pp , [4] Luigi Rizzo and Lorenzo Vicisano, Replacement Policies for a Proxy Cache, Research Note RN/98/13, Department of Computer Science, University College London, [5] Elizabeth J. O Neil, Patrick E. O Neil, and G. Weikum, An Optimal Proof of the LRU-K Page Replacement Algorithm, In Journal of the ACM, Vol. 46 No. 1, [6] D. Neal, The Harvest Object Cache in New Zealand, In Proceedings of the 5th Int. WWW Conference, May [7] Anawat Chankhunthod, Peter Danzig and Chuck Neerdaels, A Hierarchical Internet Object Cache, In Proceedings of the USENIX Technical Conference, San Diego, CA, January [8] S.Gadde, J. Chase and M. Rabinovich, A Taste of Crispy Squid, In Workshop on Internet Server Performance (WISP'98), Madison, WI, June [9] S. Gadde, J. Chase and M. Rabinovich, Reduce, Reuse, Recycle: An Approach to Building Large Internet Caches, Sixth Workshop on Hot Topics in Operating Systems (HotOS- VI), pages 93-98, May [10] Renu Tewari, Michael Dahlin, Harrick M. Vin and Jonathan S. Kay, Design Considerations for Distributed Caching on the Internet, Tech. Report TR98-04, Dept of Computer Science, University of Texas at Austin, [11] Mohammad S. Raunak, Prashant Shenoy, Pawan Goyal and Krithi Ramamritham, Implications of Proxy Caching for Provisioning Networks and Servers, In Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems SIGMETRICS 2000, pages 66-77, Santa Clara, CA, June 2000 [12] Martin Arlitt and Carey Williamson, Web Server Workload Characterization: The Search for Invariants, In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, May [13] Paul Barford and Mark Grovella, Generating Representative Web Workloads for Network and Server Evaluation, In proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, pages , [14] Lee Breslau, Pei Cao, Li Fan, Graham Phillips and Scott Shenker, Web Caching and Zipf-like Distributions: Evidence and Implications, In Proceedings of the IEEE INFOCOM, [15] Md Ahsan Habib and Marc Abrams, Analysis of Sources of Latency in Downloading Web Pages, in Proceedings of WebNet 2000, San Antonio USA, November, [16] Soam Acharya and Brian Smith, MiddleMan: A Video Caching Proxy Server, In Proceedings of the ACM NOSSDAV Conference, [17] Weekly Access Logs at NLANR's Proxy Caches, available from ftp://ircache.nlanr.net/traces/ [18] Renu Tewari, Harrick M. Vin, Asit Dan and Dinkar Sitaram, Resource-based Caching for Web Servers, In Proceedings of the SPIE/ACM Conference on Multimedia Computing and Networking, January [19] Pei Cao and Sandi Irani, Cost-Aware WWW Proxy Caching Algorithms, In Proceedings of USITS, [20] Li Fan, Pei Cao, Jussara Almeida and Andrei Broder, Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol, In Proceedings of the ACM SIGCOMM'98 Conference, Feb [21] Shudong Jin and Azer Bestavros, GreedyDual* Web Caching Algorithm: Exploiting the Two Sources of Temporal Locality in Web Request Streams, In Proceedings of the 5th International Web Caching and Content Delivery Workshop, May [22] M. Abrams, C. Standridge, G. Abdulla, S. Williams, and E. Fox, "Caching Proxies: Limitations and Potentials", In Proceedings of the 1995 World Wide Web Conference, December [23] Balachander Krishnamurthy and Craig E. Wills, Analyzing factors that influence end-to-end Web performance, In Proceedings of 2000 World Wide Web Conference / Computer Networks, May [24] M. Rabinovich and O. Spatscheck, Web Caching and Replication, Addison-Wesley, ISBN: , [25] A. Feldmann, R. Caceres, F. Douglis, G. Glass, M. Rabinovich, Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments, in Proceedings of the IEEE INFOCOM Conference, pp , [26] T. M. Kroeger, D. D. E. Long, and J. C. Mogul, Exploring the Bounds of Web Latency Reduction from Caching and Prefetching, In Proceedings of the USENIX Symposium on Internet Technologies and Systems, pp [27] Appendix A, Modeling Communication Cost, available at [28] Appendix B, Detailed Performance Results of Cache Replacement Algorithms, available at

An Efficient Web Cache Replacement Policy

In the Proc. of the 9th Intl. Symp. on High Performance Computing (HiPC-3), Hyderabad, India, Dec. 23. An Efficient Web Cache Replacement Policy A. Radhika Sarma and R. Govindarajan Supercomputer Education