EE6762. BROADBAND NETWORKS Project Report: Caching

Size: px

Start display at page:

Download "EE6762. BROADBAND NETWORKS Project Report: Caching"

Justina Andra Daniel
5 years ago
Views:

1 EE6762 BROADBAND NETWORKS Project Report: Caching Andreas Constantinides Due date: 05/ 11/ 2002

2 1 Introduction According to Knuth [1], the basic idea of caching is to maintain high-speed access to h items from a larger collection of d items that cannot all be accessed so quickly. The caching concept has been widely applied, to both computer systems and to the World Wide Web as caches are among the simplest and most effective ways to improve performance. This report aims to capture a snapshot of the two different flavors of caching, and how caching is used to improve the performance of the two systems (computer, WWW) under study. 2 Computer Caching 2.1 Memory Hierarchy In computer systems, cache memories are small, high-speed buffer memories used to hold temporarily those portions of the contents of main memory which are (believed to be) currently in use. Figure 1 shows a generic block diagram [2] for a memory hierarchy in a computer system. CPU CACHE MAIN MEMORY HARD DISK Figure 1: Memory Hierarchy As we can see from the above figure, the lowest level of the hierarchy is a small, fast memory called a cache. For the hierarchy to function well, a very large proportion of the CPU instruction and operand fetches are expected to be from the cache. At the next level of the hierarchy is the main memory. The main memory serves directly most of the CPU instruction and operand fetches 1

3 not satisfied by the cache. In addition, the cache fetches all of its data, some portion of which it is passed on to the CPU, from the main memory. At the top level of the hierarchy, is the hard disk, which is accessed only in the infrequent cases in which a CPU instruction or operand fetch is not found in main memory. With this memory hierarchy, since the CPU fetches most of the instructions from the cache, it sees a fast memory, most of the time. From this point on I will refer to both instructions and operands as pages. When the CPU is to fetch a page in main memory, the page may come from the cache or the main memory. First, the system checks the cache to see if it has the page. If the cache contains the page to be fetched, then we have a cache hit, and if the cache does not contain the page to be fetched, then we have a cache miss. 2.2 Performance Measures for Caching The two most important performance measures used in caching are miss ratio, m, and average memory access time, T avg, and are defined as follows: Miss Ratio, m # of cache miss # of memory requests T avg ~ t cache + m.t main-mem where t cache is the time the CPU takes to fetch a page from the cache, and t main-mem is the time it takes to fetch a page from the main memory to the cache memory. At this point a simple numerical example can show the effectiveness of caching. Assume that t main-mem =100ns and t cache =10ns. The average memory access time is as follows: Without caching: T avg ~ 100ns With caching: T avg ~ 10ns+0.1(100ns) = 20ns As we can see, caching reduces average memory access time by a factor of five. 2.3 Performance Evaluation Methods Agarwal [3] says that the primary tools for the study of caches are: (1) Hardware measurement, (2) analytical models and (3) trace-driven simulation (TDS). Hardware measurement, an expensive technique, involves instrumenting an existing system and observing the performance of the cache. This scheme is very inflexible because the cache parameters cannot be easily varied 2

4 and as such generates only a posteriori information on a design. Analytical models of caches estimate cache performance quickly at the cost of accuracy, but mathematical models can be used to suggest useful ways of improving cache performance by changing the cache organization, or the program structure after studying program-cache interactions. Out of the three performance evaluation methods mentioned above, trace-driven simulation is perhaps the most popular. TDS evaluates a model of a proposed system using previously recorded address traces as the external stimuli. Address traces are streams of addresses generated during the execution of computer programs. TDS involves studying the effects of varying the input trace and model parameters on the behavior of the model outputs. Its advantages include flexibility, accuracy and ease of use. 2.4 Locality of Reference Przybylski [4] observes that the success of caches in reducing the average memory access time relies on a high probability that a requested page is contained in the cache that is, that it was used or generated recently. He also observes that program traces exhibit spatial and temporal locality which are defined as follows: Spatial locality is defined as the likelihood that two items adjacent in main memory will be needed within a short span of time of each other and temporal locality is defined as the expectation that instructions and data that are currently in use will be referenced again soon. In fact caches are successful, because programs generally exhibit good spatial and temporal locality. 2.5 Overview of computer caching analytical models As the focus of my project is web caching I will only briefly talk about the analytical computer caching models. According to Przybylski [4], the models found in the literature vary immensely in their complexity and applicability. The range is from the straightforward, probabilistic model of just a few terms and parameters that Smith uses [5] to more sophisticated models involving measured and derived input parameters (Agarwal et al. [6]). In general, the greater the need for accuracy, reliability and a large range of applicability, the more complex the equations and reasoning behind them. A definite elegant exception to the rule is the power law model developed by Singh et al [7]. The 3

5 authors of this paper begin by analyzing the number of unique pages referenced as a function of time. This rate is independent of the cache size. The miss ratio is just the time derivative of this function evaluated at the appropriate point. This yields a surprisingly accurate model for medium to large fully-associative (an arrangement into which computer caches are organized) caches. The model only contains four parameters, which are measures of the working set size (defined by Coffman and Denning [8] as smallest subset of its pages that must be in main memory at any given time in order to guarantee the task a specified level of processing efficiency), the temporal and spatial localities of reference and the interaction between the two localities of reference. Most significantly, it shows a power relationship between the cache size and the miss ratio. 3 Web Caching 3.1 Generic WWW caching system Wang in [9] gives the following figure as a generic WWW caching system: Web server proxy cooperation Clients Clients Figure 2: A generic WWW caching system 4

6 Wang identifies that the World Wide Web (WWW) is one of the most popular applications of the internet and is of an exponential growth in size, which results in network congestion and server overloading. In fact, web caching has been recognized one of the most effective schemes to reduce the network traffic and hence minimize the user access latency. Going back to the above generic caching system for the WWW, documents can be cached at the clients, the proxies and the servers. If a client does not have a valid copy of the requested page in its own browser s cache, it requests the page from its local proxy. When the proxy receives a page request from one of its clients, it first checks to see if it has the requested page and if it has it, it returns the page to the client. Now, if it does not have the page in its cache, the proxy sends a request to its cooperative proxy caches(assuming of course that neighboring proxies cooperate). Upon receiving a request from another proxy, a proxy checks if it has the requested page. If it does, it returns the page to the requesting proxy. If not, the proxy may further forward the request to other proxies. Finally, if none of the cooperative proxies has such a page, the requested page is fetched from the web server. 3.2 Advantages/Disadvantages of Web caching The advantages of using Web caching include: 1. Web caching reduces bandwidth consumption, and hence decreases network traffic and lessens network congestion. 2. Web caching reduces access latency because: (a) frequently accessed documents are fetched from nearby proxies instead of remote web servers and (b) because of the reduction in network traffic, even the documents not cached can also be retrieved relatively faster than without caching due to less congestion along the network path and less workload at the server. 3. Web caching reduces the workload of the remote Web server by disseminating data among the proxy caches over the wide area network. 4. If the remote server is not available for any reason, the client can still obtain a cached copy at the proxy. Its disadvantages include: 1. The main disadvantage is that a client might be looking at stale data. 2. The access latency may increase in the case of a cache miss due to extra processing. 5

7 3. A single proxy is always a bottleneck. 4. A single proxy is a single point of failure. 3.3 Web caching design considerations. In order to build a caching system that would work, the following design considerations should be kept in mind: 1. How are the cache proxies organized, hierarchically, distributed or hybrid? (caching system architecture). 2. Where should we place a proxy cache in order to achieve optimal performance? (proxy placement). 3. What can be cached in the cache? Data, connection, or computation? (caching contents). 4. How does a proxy manage which page to be stored in its cache and which page to be removed from its cache? (cache placement and replacement) 5. How do proxies cooperate with each other? (proxy cooperation) 6. What kind of data can be shared among cooperative proxies? (data sharing) 7. How does a proxy maintain data consistency? 8. How should a proxy deal with data that is not cacheable? (dynamic data caching) In the next subsections, I will try to address some of these design considerations. 3.4 Page Replacement Algorithms Definitions of page replacement algorithms Coffman [8] gives the following definitions for some pager replacement algorithms: Least Recently Used(LRU) replace the page whose reference is furthest in the past. Belady s Optimal Algorithm (Belady [10]) replace the page whose reference is furthest in the future(this algorithm assumes knowledge of the future) Least Frequently Used(LFU) page replaced is the one that has received the least use(least number of references) First In First Out (FIFO) page replaced is the one having been in the memory for the longest time. Analysis of the above algorithms is usually done by assuming that the Independent 6

8 Reference Model holds for page requests Independent Reference Model IRM (Coffman [8]) Coffman says that according to the independence-reference assumption, the reference string r 1 r 2... r t... is a sequence of independent random variables with the common stationary distribution {b 1,...b m } such that: Pr [r t =i] =b i for all t=1, 1=i=m where m=total number of pages available. As Coffman comments, this model is not a realistic representation of program behavior because it cannot capture locality of reference. However, according to Rao [11], the IRM is widely used because: It is analytically tractable. It gives a good indication of relative performances. It predicts page fault rates reasonably well. Now, I will proceed by describing some results for the efficiency of page replacement policies that used probabilistic analysis. (Remember that another method, that we mentioned in class, for analyzing the page replacement algorithms is the competitive analysis first used by Sleator and Tarjan) King s result under LRU replacement In 1971, at the IFIP Congress, W. F. King III [12] proposed formulas for the probability D of page faults (the probability of a cache miss) of various policies under the Independent Reference Model. His formula for LRU replacement was: where k=cache size, and m=main memory size. As it can be seen, the above equation has exponential computational complexity and according to Flajolet [13] due to this complexity [Markov chain of limited to m=9, and k=7. m k states], King s numerical data are 7

9 3.4.4 Jelenkovic s Asymptotic Results [14] The main objective of this paper was to obtain an analytic asymptotic characterization of the MTF(Move To Front) search cost distribution function or, equivalently, the LRU(Least Recently Used) fault probabilities. A finite list of items n=1,..., N is considered that are requested according to the IRM. Each time we request an item it is moved to the front of the list (MTF). Asymptotically the probability distribution of a reference R and of the search cost C for MTF satisfy the following elegant relations: When the request has a heavy tail P[R=n] ~ c/ n a then the limiting search cost distribution satisfies the following relation: lim n p C P R n n 1 1 a 1 1 a a as a Now when the request distribution has a light tail: P R n c n lim n p Cf P R n n independently of c,?, ß, where Cf is a fluid approximation of C Knuth s result on Optimal Caching [1] In this paper of 1985, Knuth considered what happens when a cache is maintained clairvoyantly, that is to say with perfect knowledge of the future. As you may recall this is Belady s algorithm that we defined before as the algorithm that replaces the page whose reference is furthest in the future. For uniform reference distribution (IRM), Knuth showed that a cache of size k, applied optimally on an alphabet of size m, is able to avoid faults with probability of order (k/m)^0.5. A possible open problem, related to the one treated by Knuth, would be Belady s algorithm in the case of non-uniform reference probabilities A note on the Independent Reference Model As we have seen so far, most (if not all) analytical caching models rely on the Independent Reference Model. The IRM model is not designed to capture any localities of references and as a result it provides rather pessimistic miss ratios. However, according to Flajolet et al [13], Baskett and Rafii [15], back in 1976, showed that by introducing virtual probabilities that are computed in an appropriate way, one can obtain excellent agreement between observed and predicted 8

10 performance. Thus, this means that an actual caching system can be modeled accurately by the independent reference with modified access probabilities. 3.5 More Analytical results on web caching Breslau et al in [16] state that the following have been observed in previous studies: Under an infinite cache size, the hit ratio for a Web-proxy is proportional to the log of the client population of the proxy and the log of the number of requests seen by the proxy. The hit ratio of a Web cache is proportional to the log of the cache size. The probability that a document will be referenced k requests after it was last referenced is proportional to 1/k. In this paper they show that if one assumes that the references in the Web access stream are independent and the reference probability of the documents follow Zipf s law, then the above observed properties follow from Zipf s law. Their model is defined as follows: There is some cache that receives a stream of requests for Web pages. N is the total number of Web pages in the universe. P N (i), defined for i=1,, N, has a cut-off Zipf distribution given by: P N (i)= (O/i) where N i 1 1 i 1 Each page request is drawn independently from the Zipf distribution, so there are no correlations in the request stream. The other assumption is that no pages are invalidated by the cache. A simple probability analysis gives the following results: For infinite cache(ie. All requests remain in the cache) and finite request stream: In this case, they considered a finite request of R requests, and wished to determine the probability that the next request, the (R+1)th request, is a request for a page that already resides in the cache. The hit ratio H(R) can be calculated as follows. If the (R+1)th request is for page i then the probability that this page is in the cache is given by: (1-(1- P N (i)) R ) The asymptotic behavior of the hit-ratio is H(R)~Oln(OR). However, it was found that this approximation underestimates H(R). 9

11 For a finite cache with a capacity of C web pages subject to an infinitely long request stream, assuming that the cache holds the C most popular pages. In this case the asymptotic hit ratio H(C)~OlnC. This result is consistent with previously observed behavior that the hit-ratio increases logarithmically as a function of cache size. And finally asymptotically the page request interarrival times d(k) ~ 1/(kln(N)) as expected This model though has some limitations as it does not consider the cache s replacement policy, which plays a critical part in a cache s performance. Another simplification is the use of the IRM that affects the miss ratios as we have seen before. 3.6 Cooperative proxy caching (Wolman et al [17]) Another variant of caching that researchers have been looking at is cooperative proxy caching. In this paper, the authors used both trace-based analysis and analytic modeling to show the potential advantages and disadvantages of cooperative proxy caching. With their traces, they evaluated quantitatively the performance improvement potential of cooperation between 200 smallorganization proxies within a university environment, and between two large-organization proxies handling 23,000 and 60,000 clients respectively. Then with their model, they extended beyond these populations to project cooperative caching behavior in regions with millions of clients. Their main results were as follows: There is no reason to design highly scalable cooperative caches because the scale at which cooperative caching makes sense is sufficiently small (upto medium-sized city) that reasonable schemes will achieve most of the benefit. The largest benefit for cooperative caching is achieved for relatively small populations. Performance at the population level at which cooperative caching works effectively is basically limited by document cacheability. Cluster-based analysis of client access patterns indicates that cooperative caching organizations based on mutual interest offer no obvious advantages over randomly assigned or organization-based groupings. 10

12 4 Future Directions for Caching Research 4.1 Wireless (Kobayashi and Yu [18]): The authors of this paper first present a brief survey of the statistical properties of web requests that have been reported in the literature(such as document popularity, concentration of references, locality, etc. ). Then, they they construct a new analytical/numerical model that characterizes mobile use behavior in a general state-space using a semi-markov process representation. Based on the mobility model and the resultant request model, they analyze the content access patterns and then obtain estimates for: Total average latency Hit Ratio Cache capacity and bandwidth resources required for the wired and wireless network. Finally, they obtained expressions for the dynamic behavior of the aggregate request rate and the aggregate traffic rate. 4.2 Peer-to-peer caching to address Flash crowds(stading et al. [19]) In this paper, that was presented early in 2002, Stading et al. observe that flash crowds can cripple a website s performance. The definition of flash crowds is the unanticipated, massive, rapid increase in the populariy of a resource, such as a web page, that lasts for a short amount of time. This paper introduces Backslash, a web content distribution system based on peer-to-peer caching. This form of caching is based on the concept of cooperative caching that we have looked at before. The objective of the Backslash system is to offer fair load distribution in the face of flash crowds, with the primary interest being to limit the load on any participating node so as not to overwhelm it, by distributing requests among as many participants as possible. So, when a resource experiences an uncharacteristically high request load, the Backslash system redirects requests for that resource uniformly to the created caches by using distributed hash tables. In this way, Backslash helps alleviate the effects of flash crowds. This system solution is addressed to websites which do not generally expect flash crowds, and cannot afford the cost of high-profile content distribution solutions. 11

13 5 Conclusion This report serves only as a very short introduction to the fascinating field of caching. I have only looked at some aspects of caching (both computer and web) and this enabled me to observe that caching is a hard problem to analyze due to the many different parameters involved. An important thing to note her is that most of the performance evaluation papers I have looked at, deal with miss ratios as analytical approach to calculating access latency is much more complicated. And another thing we should not forget is the increasing complexity of the web caching problem as compared to the computer caching problem. In web caching, pages are of different sizes, they have expiration times, and this further complicates the already difficult caching problem. However, all the factors that I have just mentioned make the caching problem more challenging and interesting and definitely open up a lot of research opportunities! 12

14 References [1] Knuth, D. E. An analysis of Optimum Caching [2] Mano, M., Kime, C. Logic and Computer Design Fundamentals [3] Agarwal, A. Analysis of Cache Performance for Operating Systems and Multiprogramming [4] Przybylski, S. Cache and Memory Hierarchy Design A Performance-directed approach [5] Smith, A. J. A Comparative Study of Set Associative Memory Mapping Algorithms and Their Use for Cache and Main Memory [6] Agarwal, A., Horowitz, M., Hennessy, J. An Analytical Cache Model [7] Singh, J., Stone, H., Thiebaut, D. A model of workloads and its use in miss-rate prediction for fully-associative caches [8] Coffman, E. G. Jr, Denning. P, Operating Systems Theory [9] Wang, J. A survey of web caching schemes for the internet [10] Belady, L. A. A study of replacement algorithms for virtual storage computers [11] Rao, G. S. Performance Analysis of Cache Memories [12] King, W. F. III Analysis of demand Paging algorithms [13] Flajolet, Gardy, Thimonier Birthday Paradox, Coupon collectors, caching algorithms, and self-organizing search [14] Jelenkovic, P. Asymptotic Approximation of the Move-To-Front Search Cost Distribution and Least Recently-Used Caching Fault Probabilities [15] Baskett, F., Rafii, A. The A0 inversion model of program paging behavior [16] Breslau, Cao, Fan, Philips, Shenker On the implications of Zipf s law for Web caching [17] Wolman, Voelker, Sharma, Cardwell, Karlin, Levy On the scale and performance of cooperative web proxy caching [18] Kobayashi, Y., Yu, S. Performance models of web caching and prefetching for wireless internet access [19] Stading, Maniatis, Baker Peer-to-peer caching schemes to address flash crowds 13

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks