Web Caching Evaluation from Wikipedia Request Statistics

Size: px

Start display at page:

Download "Web Caching Evaluation from Wikipedia Request Statistics"

Amice Willa Douglas
6 years ago
Views:

1 Web Caching Evaluation from Wikipedia Request Statistics Gerhard Hasslinger Mahmoud Kunbaz Frank Hasslinger Thomas Bauschert Deutsche Telekom Chemnitz Univ. of Tech., Germany Darmstadt Univ. of Tech. Chemnitz Univ. of Tech., Germany Darmstadt, Germany Chair of Communication Networks Darmstadt, Germany Chair of Communication Networks telekom.de s2014.tu-chemnitz.de stud.tu-darmstadt.de etit.tu-chemnitz.de Abstract Wikipedia is one of the most popular information platforms on the Internet. The user access pattern to Wikipedia pages depends on their relevance in the current worldwide social discourse. We use publically available statistics about the top most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia caches is to reduce access times for page views and edits. We study the impact of most popular pages on the achievable cache hit rate in comparison to Zipf request distributions and we include daily dynamics in popularity. Keywords Wikipedia daily top-1000 statistics, Zipf distributed requests, web caching strategies, hit rate, simulation I. INTRODUCTION A. Content delivery and caching systems on the web The steadily ongoing growth of traffic and improvements in the performance for web services rely on an infrastructure of distributed server and caching systems, which are integrated into content delivery networks (CDNs) and cloud architectures [4][10][17][19][20][22]. Caching enables shorter transport paths of the data to the users, which also s shorter delays and higher availability when a source can be chosen within distributed and replicated content delivery systems. Large caching architectures for support of a global user population often include several hierarchical levels of caches combining core data center locations with smaller edge caching sites. B. Statistics about the Wikipedia web site and user requests Wikipedia is a well known web sites which offers a large collection of articles as a steadily updated online dictionary. Wikipedia includes over 40 million pages, making 7473 volumes of 700 pages in an online ebook version or 23 TByte of storage according to the Size_of_Wikipedia page [25] already in The total number of Wikipedia requests per day is in the range of million with peak rates going beyond request per second. In this work, we analyze statistical data with regard to the caching efficiency. Wikipedia is supported by the MediaWiki distributed cache server architecture from at least four main locations [25]. While the data and traffic volume is smaller than for global video and IPTV services [4][24], low delay and fast page loading times are a main concern for Wikipedia because the quality of experience is an important asset to make a web site attractive for the users. In 2015, a speedup of the median Wikipedia page load time from s down to s is reported on Wikimedia page Global_traffic_routing [25]. Therefore preloading of first paint overview information was introduced before the complete page content is transmitted. Moreover, in peak periods of high user request rates, cache servers help to avoid congestion. Otherwise, users are redirected to an error page 404.php, which is among the top-10 most requested pages in December C. Efficient cache support for content delivery on the web Our focus is not especially on the Wikimedia cache servers, but more general on conclusions that we can draw from the Wikipedia request statistics on the cache hit rate and on caching polices for optimizing the hit rate. Whether reduced delay, traffic load or other objectives are in the focus, the hit rate is a common basic measure of the cache performance. Hit rates depend on the user access pattern, where a concentration of requests on a small set of popular pages supports caching performance. The size of a cache and the policy for inserting and replacing objects in the cache have main impact on the hit rate [3] [18]. Based on top-1000 statistics, we can model caching behavior only with limited size. However, small size caches are relevant and allow conclusions also for similar performance characteristics of larger caches. There is a general tradeoff between the size and the forwarding speed of caching systems. Large TByte caches on cheap SSD storage for the mass of data are usually augmented by several cache levels with much faster and smaller DRAM and SRAM memory, which can process the data at high line rates. Thus caches of different type and size are relevant in high performance CDN/cloud servers [21]. Largely different sizes of web objects were also relevant in the past, but seem less important today, since large web objects are subdivided into data chunks and cache storage is steadily growing. Thus we assume a unique size of cached objects, corresponding to data chunks. We use simulation for evaluating caching efficiency, because analytical results on cache hit rates are available only in a few special cases [5][8][12]. We start the main part in Section II by adapting Zipf distributions to the top-1000 Wikipedia requests. In Section III, we study the dynamics in page popularity visible through daily changes. Section IV briefly summarizes web caching strategies, whose performance is evaluated in Section V with regard to Wikipedia request pattern, followed by the conclusions. II. DAILY WIKIPEDIA TOP-1000 STATISTICS & ZIPF REQUESTS A. Evaluations of Wikipedia Top-K Request Statistics For studying the performance of caching systems under realistic access pattern of a popular web platform, we refer to the / IFIP

2 daily top-1000 page request statistics made available by Wikimedia since August 2015 [25]. All our evaluations are for English Wikipedia pages on <en.wikipedia.org> [14]. Most relevant characteristics for the evaluation of caching from such data are the distribution of the frequencies of requests of the top-k most popular pages, and the rate of change in the popularity of objects from a day to the next one. Figure 1 shows the number of requests to the top-k most popular English Wikipedia pages for K = 1, 10, 100, A considerable variability is visible, which is higher for smaller sets of top-k objects. Requests to the top page vary by a factor of almost 40 between million per day, whereas requests to the top-1000 are in the range million. Note, that we excluded about a dozen often requested meta pages from all evaluations including the Wikipedia main page, several search pages and the error page 404.php. Those pages attract partly extreme request peaks, which do not fit into Zipf request pattern. On the other hand, including those meta pages would lead to higher but also more variable hit rates. Number of Request per Day 5,0E+07 4,0E+07 3,0E+07 2,0E+07 1,0E+07 0,0E+00 Requests to Top-1000 Pages Requests to Top-100 Pages Requests to Top-10 Pages Requests to Top-1 Page Aug Dec. / Jan Dec. Top-1/-10/-100/-1000 Requests per Day on <en.wikipedia.org> Figure 1: Daily Wikipedia Request Statistics B. Adaptation of Zipf Distributions to the Request Statistics In a first step, we check whether the daily top-1000 request distributions are in the shape of a Zipf distribution. Many studies have confirmed Zipf s law as an appropriate model for access pattern to content on the Internet. They include video platforms like YouTube [2], IP-TV channel selection [4][19], P2P file sharing systems [23], web shops etc. As a consequence, a small set of popular web objects attracts most user requests and thus makes small caches efficient. We consider Zipf distributions of finite support for a content catalogue of N objects with request probabilities z(r) being determined for the objects popularity ranks r {1, 2,, N}: N r 1 z( r) r with 0; z(1) 1/ r 0; (1) where is an adaptive shape parameter and a normalization constant. Access probabilities are becoming more unbalanced for 1. Many case studies based on different sets of web request measurement traces [2][8][16]report a good fit of Zipf adaptations within the paramter range 0.5 > > 1. Figure 2 compares Zipf distributions (dotted line curves) to a set of usual daily Wikipedia page request distributions. The figure gives an impression of typical deviations over the ranks of the top-1000 requests and illustrates how the shape of Zipf distributions is influenced by the parameter. Let W CDF (d,k) = R d (k)/r d (1000) denote the cumulative request distribution function (CDF), where R d (k) is the sum of requests to the top-k pages (k = 1,..., 1000) at a day d = 1,..., 519 in the period from Aug Dec We determine the deviation WZ (d,k) of W CDF (d,k) and a corresponding Zipf adaptation Z CDF (d,k) = r k z d (r) for a day d by the difference in the ranks, i.e., the distance of a pair of curves in the direction of the rank axis in Figure 2: WZ (d,k) = j k k where Z CDF (d, j k ) W CDF (k) < Z CDF (d, j k + 1). CDF of Top-1000 Requests with Zipf Approx. 1,0 0,8 0,6 0,4 0,2 0,0 Top-1000 Req. 2016/04/27 Zipf Approx.: ß = Top-1000 Req. 2016/08/08 Zipf Approx.: ß = Top-1000 Req. 2016/04/23 Top-1000 Req. 2016/02/02 Top-1000 Req. 2016/09/24 Zipf Approx.: ß = Top-1000 Req. 2016/09/26 Zipf Approx.: ß = Ranks Comparison of the CDF of Top-100 Requests and Zipf Distribution Figure 2: Zipf adaptations to daily Wikipedia requests We also obtain the and imum absolute rank deviation (, ) d k k WZ W Z ( ; W Z ( W Z ( d, k). (2) 1000 k1,..., 1000 The parameter of the Zipf adaptation is determined per dayin order to minimize W Z (, where we could find a unique, minimizing for all days without having a proof of this property. Figure 3 shows the and imum deviations of eq. (2) and the parameter in adaptations for each of the 519 days in the considered period. We found Zipf adaptations in the ranges > > , 0.19 < W Z( < 10.7 and 1 W Z( 31 The rank deviation over all ranks and days is 3.2. Figure 3 also shows how the precision of Zipf adaptations varies in terms of the rank deviation over the 519 days, and indicates the daily values of. On most of the days, the imum absolute rank deviations are below 10, where 5 out of 519 days show a perfect match with imum deviations W Z( of ±1. On the other hand, Figure 4 illustrates limitations of Zipf adaptations for the worst case example with W Z( In this case, a single parameter adaptation is insufficient to cover the whole range. Instead, three adaptations are shown for

3 fitting one of the three ranges 1-10, and While the adaptations make a good fit in one of the sub-ranges, they largely deviate in the other sub-ranges. ß Mean & Max. Deviation in Rank The 2nd Content Caching and Delivery in Wireless Networks Workshop (CCDWN) 10 x Max. Deviation from Zipf Approx. Mean Deviation from Zipf Approx. Zipf Approx. Parameter: 10 x ß Aug Dec. / Jan Dec. Adaptation of Daily Top-1000 Requests by a Zipf Distribution z(r)=r Figure 3: Zipf adaptations to daily Wikipedia page requests shows the imum, minimum and the number of new pages per day that appear on the 2 nd, 3 rd,, 519 th day in the Wikipedia statistics. E.g., in case K = 100, 42.4% of the top- 100 pages are replaced on the next day with 36.7% of requests addressing the new pages per day. Both fractions are decreasing for K 1000 towards 27.5% new top-1000 pages attracting 23.7% of requests per day. The imum and minimum values indicate high variability in the daily dynamics, where extremely high dynamics is visible on only a few days which hold the imum above over the range K = 1,, % 80% 60% Max. Fraction of Daily New Top-K Pages Mean Fraction of Daily New Top-K Pages Mean Fraction of Requests to Daily New Top-K Pages Min. Fraction of Daily New Top- K Pages 1,0 0,8 0,6 0,4 0,2 0,0 Top-1000 Request Distribution on 2016/04/21 Zipf Approximation: Tail Part ß = ß = Zipf Approximation: Middle Part ß = -1 Zipf Approximation: Start Part ß = % ß = -1.0 Rank Comparison of the CDF of Daily Top-1000 Requests with Zipf Distributions ß = -1.6 Figure 4: Worst case Zipf adaptation for Wikipedia requests When we consider the number of requests to top-k Wikipedia pages over longer time periods of a week, a month etc. then the variability in the request distributions is smoothing. Corresponding Zipf approximations show small or moderate deviations. The values of requests to the top-1000 pages over all days from Aug Dec can be approximated by a Zipf distribution with = , with and imum absolute rank deviations of 3.06 and 7, respectively. III. DYNAMICS IN DAILY TOP-K WIKIPEDIA PAGE REQUESTS The concentration of requests on a small set of top-k pages as expressed in Zipf-like distributions is one main criterion for caching efficiency. Moreover, the dynamics in the popularity of the pages is also relevant, where high dynamics and a frequent occurrence of new or previously unpopular pages in the top-1000 statistics requires more frequent reloading and makes caches less efficient [3][9][17][20][24]. As an indicator of popularity dynamics, we determine how many pages are newly entering the top-k ranks each day, which were not among the top-k on the previous day. Figure 5 Statistics of new top-k pages per day over the period from Aug Dec Figure 5: Dynamics in Wikipedia pages and their requests Dynamics in request pattern can be studied on longer time scales (weeks, months ), but the daily changes are most relevant for caching efficiency. Considering single pages, we can distinguish the impact of one-timers, i.e. pages that appear once in the top-1000 of a day from constantly present pages. Almost half of the pages that appear in the top-1000 in the 17 months period are seen only once, but they get only 4.2% of all requests in the statistics. The 70 pages that are always present attract 8.7% of the requests. More details are shown in Table 1. Consequently, it seems worthwhile to store some of the long term present pages even in small caches, but the evaluation in the next section shows that the least frequently used (LFU) strategy of caching the most often requested pages is far below optimum, at least for the complete long term statistics. Table 1: Statistics for pages that appear x-times in the daily top-1000 from Aug Dec Number of Days in the Top-1000 Number of Pages % of Pages % of Requests to those Pages 519 (on each day) % % % % % % % %

4 100% 80% 60% 0% Top Page P-1 Top Page P-2 Top Page P-3 Top Page P-4 Top Page P-5 Mean Rate for Top Pages Top Page P-6 Top Page P-7 Top Page P-8 Top Page P-9 Top Page P Peak Rate of Requests to Top-10 Pages on ±10 Days around their Peak Request Day Figure 6: Requests to top pages on days around peak popularity Finally, an evaluation of Wikipedia top-1000 pages illustrates the change of popularity in the top-10 pages, i.e. the 10 pages that attract most requests on a day. We assign the day of their peak popularity in the middle of Figure 6 and consider the ratio of requests relative to the peak on ±10 days around the peak as a percentage of the peak request rate. It becomes visible that peak request rates of single Wikipedia pages seldom hold on over several days, but are mostly spontaneous. We have sorted all 1830 top-10 pages appearing on the 519 days due to the sum of percentages on ±10 days around the peak and picked 10 examples P-1,, P-10 equally spaced over the sorted range. The percentages per day for all top-10 pages are included as dotted line curve. There are only a few pages with long term high request rate like P-1. Most top pages show a request level of 10% - 80% of the peak rate on a few days around the peak. However, 819 out of the 1830 top-10 pages are not among the top-1000 on the day before their peak. Therefore predictions for preloading of pages with rising popularity into the cache are subject to high uncertainty at least on a daily basis, while some yearly correlation due to anniversaries can only partly help. The request fluctuations of Wikipedia pages are completely different from behavior of top objects on video and P2P platforms, which are observed to show a steep rise to peak popularity followed by a slow decrease over time [6][7][23]. IV. PERFORMANCE OF WEB CACHING EFFICIENCY BASED ON WIKIPEDIA STATISTICS We extend previous evaluation methods [11][12] of the hit rate for different caching strategies, which assume independent (IRM) Zipf distributed requests, to include measurement based request pattern derived from the daily Wikipedia statistics. A. Web Caching Strategies: LRU, SG-LRU, LFU The least recently used (LRU) principle provides a basic and usual caching strategy with a simple cache update scheme. LRU puts the currently requested object on top of a cache implemented as a stack and replaces the bottom object with the longest time span since the last request, if the cache is full. However, LRU doesn t provide flexibility to manage the cache content based on object specific properties, which may include transport costs, delays, availability or other preferences from the content providers, ISPs or users perspective [1][10][22]. Therefore alternative caching strategies have been proposed in literature [13][15][16][18], which improve LRU cache hit rates, but are more complex and not widely adopted. In recent work we contributed in this field by proposing Score Gated LRU (SG-LRU) caching schemes, which combine the simple LRU implementation with a score rating per each object, giving full flexibility to adapt the caching strategy to arbitrary cost and benefit criteria by setting corresponding object scores [11][12]. We include scores as an additional criterion to admit objects to an LRU cache. LRU always puts the currently requested object on top of the cache, whereas SG- LRU admits a new object to the cache only if it s score is higher than the score of the cache bottom object. Otherwise, the bottom object is put on top, simply by shifting the cache top pointer to this object in a double linked list implementation. We experienced that the SG-LRU variant is sufficient to collect objects with highest scores in the cache, even if it takes longer than maintaining a sorted list of objects according to scores, which requires high update effort [12]. Our evaluations of caching strategies also include the least frequently used (LFU) principle as a reference, which counts previous requests per object and prefers most often requested objects in the cache. If a request counter per object is used as score function then SG-LRU behaves similar to LFU. In the sequel, we study SG-LRU with a request count over a sliding window of the last W requests, where the window size W can be adapted as a memory back log. SG-LRU with sliding window scores resembles LRU in the special case of window size W = 1 and behaves similar to LFU for sufficiently large window size. In this way, SG-LRU covers the LRFU spectrum [15] of cache strategies with limited memory between LRU and LFU, but we stay with the simple LRU update scheme instead of more expensive sorting methods, e.g. heap sort [15]. V. EVALUATION OF CACHE EFFICIENCY A. Cache simulation under daily changing request pattern We extend a cache simulator developed in [11][12] for daily changing requests due to measurements of top-1000 Wikipedia requests per day in the time from Aug Dec On each day, we assume constant request probabilities R d (k) /R d to the pages, where R d (k) is the number of requests to the page on rank k in the top-1000 statistics on day d and R d denotes the total number of top-1000 requests on that day. We simulate the cache load on each day by performing R d independent requests to the top-1000 pages of the day with request probabilities R d (k) /R d, i.e. we assume daily changing independent request models IRM 1,, IRM 519. On a new day, the cache starts with the content of the end of the previous day. It is left to the (SG-)LRU caching strategy to reoptimize the cache content for the request distribution on the next day. In order to control the precision of simulated cache hit rates, we use the second order statistics, i.e. the variance of the computed hit rates over several times scales, i.e. for request se-

5 quences of different length [11]. The estimated standard deviation of the hit rate is below for the simulation of a daily workload with at least 16 million top-1000 requests. It is even smaller for the value over the complete period, confirming sufficient significance for hit rate results on 3 digits. Figure 7 shows hit rates for all days in the 17 months lasting period for SG-LRU caches with a size of M = 100 pages and for different window sizes with regard to past requests. The lowest curve is for the pure LRU cache hit rate or SG-LRU with W = 1. In this case, our simulations on each day fit to the Che approximation [5][8] with relative deviation of less than 1. Although this result again confirms the Che approximation to be remarkably accurate [8], and despite of the mathematical explanation and asymptotic properties derived in [8], no bounds are known on its accuracy. Therefore simulations seem indispensible to obtain LRU hit rates including a concrete estimate of their accuracy within confidence intervals. Figure 7 includes curves on SG-LRU strategies with a count of past requests over windows of size W = 400, W = 4000 and W = , respectively. The SG-LRU hit rates over all 519 days are shown in Table 2. The case W = almost coincides with to the upper bound given by 591 M ( k) 591 R / d 1 k1 d R d1 d h (3) where k=1 M R d (k) / R d is the upper bound of the cache hit rate for IRM d requests on a single day, i.e. by putting the top-m most requested Wikipedia pages into the cache for that day, and h computes a value over all requests in the considered time period. 60% 50% 30% 10% Optimum IRM DSG-LRU for W = SG-LRU Simulation for W = 4000 SG-LRU Simulation for W = 400 LRU Simulation Che Approximation Hit Rate for a Cache of Size M = 100 on each Day D for Wikipedia Top-1000 Requests from Aug Dec Figure 7: Cache hit rate based on Wikipedia request statistics Table 2: Mean cache hit rates over 519 days SG-LRU: Window Size W = Upper LRU bound (3) Hit Rate 24.4% 28.5% 36.6% 38.2% 38.4% We can only take daily changes into account, without information on dynamics and correlation of requests during a day. Correlated requests on shorter time frames can lead to higher caching efficiency, but the dynamics on time frames of days is experienced to be more important in caching studies of measurement traces [24]. Implementations by cache providers usually combine continuous updates per request with a complete daily update of the whole cache during low traffic hours at night, including prefeteching of new popular objects based on prediction and/or available information about request pattern changes for the next day [17][20]. Finally it is remarkable that Figure 7 shows peaks on several days with cache hit rates about 1.5-fold above the in all curves. Those peaks in the hit rate coincide with peaks in the total number of top-1000 as well as top-10 requests in Figure 1 on the same days. This indicates a favourable property of caches to increase their efficiency with spontaneous peaks in the request load. B. Impact of Cache Size on Cache Hit Rates The cache size is a key factor for the achievable hit rate. In Table 3, the LFU hit rate stands for keeping the M most requested pages constantly in the cache over the almost 1.5 year period. LRU is better, but also stays essentially below the optimum hit rate according to eq. (3), which is almost fully exploited by SG-LRU with an appropriate window size W. Table 3: Cache hit rate depending on the cache size Cache Size M LFU Hit Rate 4.1% 6.1% 10.7% 16.2% 24.3% LRU Hit Rate 4.4% 7.6% 15.1% 24.4% 38.2% SG-LRU Hit Rate 14.0% 19.3% 28.5% 38.2% 50.3% C. Evaluation for regional caches with lower request load The latter results for daily changing request distributions in Table 3 and Figure 7 are still close to the bound of eq. (3). Actually, the number R d of requests per day goes far beyond the phase for adapting the cache content to new request pattern of a day. We extend the simulation results by distributing the request load equally over C caches, each of which serves R d /C independent requests per day. Thus, we run the same simulation as for the results in Figure 7, but only for R d /C instead of R d requests per day. With decreasing number of requests per day per cache, the adaptation phase to new popular pages becomes more relevant. In case C = we have R d /C 1100 daily requests per cache, while 27.5% of top-1000 web pages are changing in the same time span. Higher dynamics in the popularity of pages makes the request pattern less predictable and, as a consequence, reduces the cache hit rate. The results for SG-LRU in Figure 8 start at the daily LFU hit rate level of eq. (3) for small C. This level is hold until C 200, i.e. as long as there are more than requests per day, before hit rates essentially decline for larger C. The curves are obtained for an optimized window size W as the only parameter of the SG-LRU strategy. A request count statistics for

6 50% 30% 10% SG-LRU for Cache Size M = 200 SG-LRU for Cache Size M = 100 SG-LRU for Cache Size M = 50 Hit Rates for C Caches serving R D /C /C Requests per Day with Daily Changing Wikipedia Top-1000 Request Pattern Figure 8: Cache hit rate for daily changing request pattern one day corresponds to W = R d /C, which indicates a reasonable first estimate for W. We run simulations with several values of W approaching the optimum, which is in the order of one million for C = 10, going down a few hundred for C = With decreasing W, SG-LRU is developing towards LRU. We also have to run simulations at least C/100-fold in order to get results within appropriate confidence intervals [11]. CONCLUSIONS AND OUTLOOK Our main evaluation results of Wikipedia s daily top-1000 request statistics show that the Zipf law of eq. (1) provides a fairly good match of most daily top-1000 request distributions, that largest deviations from Zipf laws coincide with days showing high peak request rates, which also lead to higher cache hit rates than for the basically favourable Zipf case, that small caches of 1% of the content catalogue already show considerable cache performance, see Table 3, that neither LFU nor LRU can exploit the cache efficiently, whereas SG-LRU variants including request counts exploit the upper bound of eq. (3) for daily changing IRM requests, that Wikipedia request pattern is dynamic with about 27.5% of new pages appearing in each day s top-1000, but such dynamics is negligible for the cache hit rate, if a load of or more requests is served per day. Naturally, this work based on daily top-1000 statistics ignores the larger portion of requests to pages beyond the top- 1000, thus restricting conclusions to small caches for the most popular pages. We aim at studying cache performance with complete request measurement traces on other web platforms to gain broader experience on how web cache performance depends on usual request pattern for web content. ACKNOWLEDGEMENTS We would like to thank the operators of the Wikimedia platform for their effort of providing daily updated statistical data about requests to Wikipedia. Our work has received funding from the European Union s Horizon 2020 innovation programme in the SSICLOPS project under grant agreement No < This work reflects only the authors views and the European Commission is not responsible for any use that may be made of the information it contains. REFERENCES [1] A. Araldo, D. Rossi, F. Martignon, Cost-aware caching: Caching more (costly items) for less (ISPs operational expenditures), IEEE Transactions on Parallel and Distributed Systems (TPDS), 27 (2016) [2] L. Breslau et al., Web caching and Zipf-like distributions: Evidence and implications, Proc. IEEE Infocom (1999) [3] M. Busari and C. Williamson, On the sensitivity of Web proxy cache performance to workload characteristics, IEEE Infocom (2001) [4] M. Cha et al., Watching TV over an IP network, Proc. 8 th SIGCOMM Conf. on Internet Measurement (IMC), Athens, Greece (2008) [5] H. Che, Y. Tung, and Z. Wang, Hierarchic web caching systems: modeling, design and experimental results, IEEE JSAC 20(7) (2002) [6] B. Cohen, Incentives build robustness in BitTorrent (2003) 1-5 [7] F. Figueiredo et al., TrendLearner: Early prediction of popularity trends of user generated content (2014) < [8] C. Fricker, P. Robert and J. Roberts, A versatile and accurate approximation for LRU cache performance, IEEE Proc. 24 th International Teletraffic Congress, Kraków, Poland (2012) [9] M. Garetto, E. Leonardi and S. Traverso, Efficient analysis of caching strategies under dynamic content popularity, Proc. IEEE Infocom (2012) [10] G. Hasslinger and F. Hartleb, Content delivery and caching from a network provider s perspective, Special Issue on Internet based Content Delivery, Computer Networks 55 (Dec. 2011) [11] G. Hasslinger, K. Ntougias and F. Hasslinger, Performance and Precision of Web Caching Simulations Including a Random Generator for Zipf Request Pattern, Proc. 18 th MMB Conf., Münster, Germany, Springer LNCS 9629 (2016) [12] G. Hasslinger, K. Ntougias, F. Hasslinger and O. Hohlfeld: Performance Evaluation for New Web Caching Strategies Combining LRU with Score Based Object Selection, Proc. International Teletraffic Congress ITC-28, Würzburg, Germany (2016) [13] M. Hefeeda, O. Saleh, Traffic modeling and proportional partial caching for P2P systems, IEEE/ACM Trans. on Networking 16/6 (2008) [14] M. Kunbaz, Dynamics in the Popularity of User Request on IP Platforms and their Impact on Caching Efficiency, Research Project Report, TU Chemnitz (2016) 1-54 [15] D. Lee et al., LRFU: A spectrum of policies that subsumes the least recently used and least frequently used policies, IEEE Transactions on Computers 50/12 (2001) [16] N. Megiddo and S. Modha, Outperforming LRU with an adaptive replacement cache algorithm, IEEE Computer (Apr. 2004) 4-11 [17] PeerApp Inc. < [18] S. Podlipnik and L. Böszörmenyi, A survey of web cache replacement strategies, ACM Computer Surveys (2003) [19] T. Qiu et al., Modeling channel popularity dynamics in a large IPTV system, Proc. 11 th ACM SIGMETRICS, Seattle, WA, USA (2009) [20] Qwilt Inc. < [21] G. Rossini, D. Rossi, Coupling caching & forwarding: Benefits, analysis and implementation, Proc. ACM ICN, New York, USA (2014) [22] R.K. Sitaraman et al., Overlay networks: An Akamai perspective, Chapter 16 in Advanced content delivery, streaming and cloud services, Eds. M. Pathan, et al., Wiley (2014) [23] D. Stutzbach, S. Zhao, R. Rejaie, Characterizing files in modern Gnutella networks: A measurement study, Multimedia Systems 13/1 (2007) [24] S. Traverso et al., Unraveling the impact of temp. and geo. locality in content caching systems, IEEE Trans. on Multimedia 17 (2015) [25] Wikipedia statistics and information available on

Performance Evaluation for New Web Caching Strategies Combining LRU with Score Based Object Selection

Performance Evaluation for New Web Caching Strategies Combining LRU with Score Based Object Selection Gerhard Hasslinger a, Konstantinos Ntougias b, Frank Hasslinger c and Oliver Hohlfeld d a Deutsche