arxiv: v3 [cs.ni] 3 May PDF Free Download

Modeling Request Patterns in VoD Services with Recommendation Systems Samarth Gupta and Sharayu Moharir arxiv:1609.02391v3 [cs.ni] 3 May 2017 Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, India, 400076 sharayum@ee.iitb.ac.in Abstract. Video on Demand (VoD) services like Netflix and YouTube account for ever increasing fractions of Internet traffic. It is estimated that this fraction will cross 80% in the next three years. Most popular VoD services have recommendation engines which recommend videos to users based on their viewing history, thus introducing time-correlation in user requests. Understanding and modeling this time-correlation in user requests is critical for network traffic engineering. The primary goal of this work is to use empirically observed properties of user requests to model the effect of recommendation engines on request patterns in VoD services. We propose a Markovian request model to capture the timecorrelation in user requests and show that our model is consistent with the observations of existing empirical studies. Most large-scale VoD services deliver content to users via a distributed network of servers as serving users requests via geographically co-located servers reduces latency and network bandwidth consumption. The content replication policy, i.e., determining which contents to cache on the servers is a key resource allocation problem for VoD services. Recent studies show that low start-up delay is a key Quality of Service (QoS) requirement of users of VoD services. This motivates the need to prefetch (fetch before contents are requested) and cache content likely to be requested in the near future. Since pre-fetching leads to an increase in the network bandwidth usage, we use our Markovian model to explore the trade-offs and feasibility of implementing recommendation based prefetching. 1 Introduction Internet usage patterns are shifting towards content distribution and sharing with Video-on-demand (VoD) services like Netflix [21] and YouTube [27] accounting for over 50% of all Internet traffic. This fraction is expected to cross 80% by 2019 [7]. Most popular VoD services provide recommendations to users which heavily influence their viewing patterns. More specifically, recommendations lead to correlation in the videos requested by a user across time. The primary goal of this work is to model the viewing patterns of users of VoD services with recommendation engines. An accurate model of usage patterns is a

2 Samarth Gupta and Sharayu Moharir crucial ingredient in the design of resource allocation algorithms which effectively manage Internet traffic and ensure high Quality of Service (QoS) to the users. Meeting the QoS demands of users is critical for a VoD service to retain and expand its customer base. A recent study by Akamai [14] found that users start leaving if a video takes more than two seconds to start streaming. Moreover, for each additional second of start-up delay, the rate of abandonments increases by approximately 5.8%. The probability of a user returning to the VoD service within one day after watching a failed video is 8% versus 11% after watching a normal one. Evidently, frequent start-up delays can lead to a loss of customers, thus reducing the revenue of the VoD service. Most large-scale VoD services serve their users via Content Delivery Networks (CDNs) which have multiple servers/caches with storage and service capabilities spread across the world. Efficient use of the available storage resources, e.g., serving user requests via geographically co-located servers can enhance the QoS for the user. More specifically, a frequent cause of start-up delay is that videos requested by users are not available on geographically co-located servers, and have to be fetched from other servers after they are requested. The delay in start-up is caused by the large geographical/network distance between the users and servers which cache the requested content. The goal of reducing start-up delay motivates caching policies that are aggressive in adapting the content stored on the local servers in order to minimize the probability of delayed start-up. One possible solution is to pre-fetch (fetch before videos are requested) and cache videos that are likely to be requested in the near future [9, 13, 15, 17, 22]. Since pre-fetching leads to an increase in the bandwidth consumption of the CDN, there is a trade-off between bandwidth usage of the network and the quality of service provided to the users. We explore this trade-off in this work. 1.1 Contributions The contributions of this work can be summarized as follows. Modeling the request process: In a preliminary version of this work [11], we propose a Markovian model which captures the time-correlation in user requests in VoD services due to the presence of recommendation systems. We show that our model is consistent with empirically observed properties of request patterns in such VoD services [5,6,16,29]. A limitation of this model is that it imposes the constraint that the recommendations are symmetric (i.e., Video A recommends Video B implies Video B recommends Video A). In this work, we generalize our model to allow for non-symmetric recommendation relationships. Performance evaluation of caching policies: We study a caching policy which pre-fetches videos likely to be requested in the future in order to minimize the chance of delayed start-up. More specifically, while a user is watching a video, our policy pre-fetches a pre-determined number of the corresponding recommended videos to the local cache, thus reducing the probability that the next request from this user experiences any start-up delay.

Request Patterns and Caching 3 As discussed above, pre-fetching content reduces start-up delay, but, leads to increased bandwidth consumption. Via simulations, we explore this trade-off as a function of the relative costs of bandwidth consumption and delayed start-up. Our results characterize when pre-fetching content can lead to a reduction in the overall cost of service, even with the increased bandwidth usage. 1.2 Organization The rest of the paper is organized as follows. In Section 2, we discuss existing literature on empirical studies of viewing patterns in VoDs with recommendation systems. In Section 3, we define our Markovian request model and discuss its properties. We describe our CDN setting in Section 4 and discuss the proposed caching scheme in Section 5. In Section 6, we evaluate the performance of the proposed policy via simulations. We present our conclusions in Section 8. 2 Literature Review 2.1 Request Patterns in VoDs with Recommendations We first summarize the observations of empirical studies which study the effect of recommendation systems on the users viewing patterns [5, 6, 16, 29]. These studies have been conducted either by crawling the YouTube webpage [29], or via the Youtube API [29], or by collecting browsing data from university networks [16, 29]. The studies represent the relationship between videos using a directed graph, where nodes represent videos and each node has a directed edge to all the corresponding recommended videos. They focus on the properties of the graph [5], the effect of the placement/rank of a video in the recommendation list of another video [16, 29], and the effect of recommendations on the overall video popularity profile [6]. Small-World Recommendation Graph The key insight obtained in [5] is that the graph representing the YouTube recommendation network is smallworld. We use the following definitions to formally define small-world networks. (i) Characteristic Path Length: The characteristic path length of a network is defined as the mean distance between two nodes, averaged over all pairs of nodes. (ii) Clustering Coefficient: The clustering coefficient of a network is defined as the average fraction of pairs of neighbors of a node that are also neighbors of each other. Small-world networks are a class of networks that are highly clustered (high clustering coefficient), like regular lattices, yet have small characteristic path lengths, like random graphs [26]. Compared to random graphs with the same average degree, small-world networks are characterized by high clustering coefficients and similar path lengths. In [5], the authors use these two characteristics to conclude that the YouTube recommendation graph is small-world.

4 Samarth Gupta and Sharayu Moharir Content Popularity Profiles It has been observed that content popularity for VoD services without recommendation systems is heavy-tailed and can often be well-fitted with the Zipf distribution defined as follows: the popularity of the i th most popular video is proportional to i β, where β is a positive constant called the Zipf s parameter. Typical values of β for VoD services lie between 0.6 and 2 [1]. Empirical studies have concluded that content popularity for VoD services with recommendation systems, e.g., YouTube, can be well-fitted with the Zipf distribution for the popular videos and popularity for the less popular videos decreases faster than the rate predicted by the Zipf distribution [6]. Click Through Rate The Click Through Rate (CTR) for position r in the recommendation list of Video i is defined as the fraction of times a user requests the video in position r in the recommendation list of Video i right after watching Video i. In [29], the authors found that the mean of the CTR follows the Zipf distribution as a function of r. In addition, Figure 3 in [16] shows that the CDF of the CTR is concave. Chain Count Chain count is defined as the average number of consecutive videos a user requests by clicking on videos in the recommendation list before requesting a video which is not the list of recommended videos for the video currently being watched. For YouTube, the chain count is estimated to be between 1.3 and 2.4 in [16]. Degree Distribution The degree distribution of the recommendation graph has been found to follow the power law. More specifically, the number of nodes with degree k is approximately proportional to k 3 [23]. 2.2 Pre-fetching based caching schemes Caching schemes which use pre-fetching have been shown to be beneficial for TV-on-demand and VoD services [9,13,15,17,22]. To the best of our knowledge, none of the existing works have attempted to model the request arrival process for VoD services with recommendation systems, and instead, use trace data to evaluate the performance of the proposed policies. In addition, another key difference between the existing literature and this work is that we study the trade-off between bandwidth usage and quality of service, while most of the existing works (except [9]) focus only on the improvement in quality of service (cache hit-ratio) by pre-fetching content. In [13], the authors use trace data from a campus network gateway to analyze the performance of pre-fetching content to serve YouTube requests. A key observation in [13] that it is not necessary to pre-fetch complete videos to avoid start-up delays. Fetching a fraction of the video is often sufficient as the rest of the video can be fetched while the users watch the initial part of the video.

Request Patterns and Caching 5 In [15], the authors compare the performance of pre-fetching+caching and the Least Recent Used (LRU) caching scheme which does not pre-fetch content, for Hulu (a VoD service) on a university network. In [9], trace data from a Swedish TV service provider is used to evaluate the benefits of pre-fetching episodes of shows that a specific user is watching in order to reduce latency. In [22], the authors study the setting where the requests arrive according to a known Markov process. They propose an MDP based pre-fetching scheme and prove its optimality. Although our work also assumes that the underlying request process is Markovian, unlike [22], our caching policy works without the knowledge of the transition probabilities. This is an important distinction, since for VoD services like YouTube with massive content catalogs, content popularity is often timevarying and unknown [20]. In [17], the authors study a pre-fetching and caching scheme for HTTP-based adaptive video streaming. They propose a pre-fetching and caching scheme to maximize the cache hit-ratio assuming the bandwidth between the local cache and the central server is limited. 3 Our Request Model In this section, we discuss our model for the request process for VoD services with recommendation systems. 3.1 Model Definition We construct a directed graph G(V, E), where the set V consists of all the videos offered by the VoD service and an edge e = {i, j} E implies that Video j is one of the recommended videos for Video i. We then assign weights to edges. Each user s request process is a random walk on this weighted graph and therefore, the request arrival process is Markovian and can be completely described by a transition probability matrix. We use a subset of the properties discussed in Section 2.1 to construct this matrix and verify that the remaining properties discussed in Section 2.1 are satisfied by our Markovian model. Motivated by the fact the empirical studies like [5] have found that this graph is small-world, and the degree distribution follows the power law [23] we use the Barabasi-Albert model [2] to generate a random small-world graph. Refer to Figure 1 for a formal definition of the Barabasi-Albert model. Since the Barabasi-Albert model generates a undirected graph, we replace each edge by two directed edges to obtain a directed graph on the set of videos. This means that if v i recommends v j, our model assumes that v j also recommends v i. This is motivated by the fact that YouTube uses the relatedness score [8] for each pair of videos to determine homepage recommendations. The relatedness score of two videos is proportional to the number of times two videos are co-watched in a session. Therefore, by definition, if v i is closely related to v j, v j is closely related to v i. Users can request videos via multiple sources. We divide them into two categories:

6 Samarth Gupta and Sharayu Moharir 1: Initialize: Generate a connected graph of m nodes (v 1, v 2,..., v m). Let v = m + 1. 2: Introduce a new node n v which connects to m existing nodes. These m edges from n v are added in an sequential manner as follows. The probability that each of the m edges from the new node go to an existing node n i is given by p i such that p i = Ki j Kj, where K i is the current degree of node n i. 3: v = v + 1. If v < n, goto Step 2. Fig. 1. Barabasi-Albert Model An algorithm to generate a random small-world graph with a degree distribution following the power law. The first set of requests come via the recommendations made by VoD service when the user is watching a video. We introduce a quantity P cont, defined as the probability that a user requests a recommended video after he/she finishes watching the current video. Formally, after watching a video, each user requests one of the recommended videos with probability P cont independent of all previous requests. By definition, the expected chain count (defined in Section 2.1) is given by 1/(1 P cont ). The value of P cont should be between 0.2 and 0.7 to be consistent with the chain count values observed in [16]. The second set of requests come from all other sources on the Internet including the VoD homepage, the user s social networking page, etc. To model the second type of requests, we add a dummy node n 0 to the graph G. This dummy node represents all other sources of requests and is connected to all other nodes in the G via two directed edges. The next step is to assign transition probabilities corresponding to each edge in this directed graph G(V, E). Let P i,j be the probability a node makes the transition from node n i to node n j. By definition, P i,j = 0 if {i, j} / E. Recall that P cont is the probability that a user requests one of the recommended videos after watching the current video. If not, we assume that the user goes to node n 0 which represents all other sources of video requests. Therefore, by definition, P i,0 = 1 P cont, i > 0. Motivated by the fact that for VoD services without recommendations, content popularity follows the Zipf distribution (as discussed in Section 2.1), we set the value of P 0,j j β for a positive constant β called the Zipf parameter. Typical values of β for VoD services lie between 0.6 and 2 [1]. To assign transition probabilities to edges between a video and its recommended videos, we use the distance between two videos as a measure of similarity in the content of the two videos. For each i, j E, P i,j P cont.(d(i, j)) κ, where D(i, j) = i j and κ is a positive constant. We use the P i,j s to determine the order in which the recommended videos are

Request Patterns and Caching 7 presented to the user. For Video i, we assume that the recommended videos are ordered in decreasing order P i,j s. Remark 1. Our model is characterized by five parameters, namely, the total number of videos n, the size of the graph used in the first step of the Albert- Barabasi model (Figure 1) m, the Zipf parameter β, the probability that a user requests one of the recommended videos after watching the current video P cont, and κ. Remark 2. Another way to assign transition probabilities from a video to its recommended videos is to pick a permutation of the set of recommended videos and assign transition probabilities according to the Zipf law. By construction, this Markov chain will satisfy the property that the mean CTR follows the Zipf distribution and therefore will be consistent with properties observed in Section 2.1. Unlike the model we propose, in this construction, the probability of requesting the i th ranked recommendation is the same across all videos. 3.2 Properties Our model uses the empirically observed properties that the recommendation graph is small-world, its degree distribution follows the power law, content popularity in the absence of recommendations follows the Zipf distribution, and the chain count is between 1.3 and 2.4. In this section, we verify that our Markovian model satisfies the remaining properties discussed in Section 2.1. Content Popularity Profile The popularity of a video is the fraction of total requests for the video. Since the requests are generated by a finite state irreducible Discrete Time Markov Chain (DTMC), this is equal to the steady state probability of requesting the video. We therefore compute the content popularity profile of our model by calculating the stationary distribution of the Markov Chain. Figure 2 illustrates the content popularity profile for a system consisting of 2000 videos as a function of the Zipf Parameter β. Figure 3 shows how final distribution varies with P cont. We see that, as desired, the content popularity profile follows the Zipf distribution for the popular videos and decreases faster than as predicted by the Zipf distribution for the unpopular videos. We thus conclude that the content popularity profile for our model is consistent with the observations in [6]. Click Through Rate As discussed in Section 2.1, the median Click Through Rate (CTR) follows the Zipf distribution. To verify this for our model, we compute the probability of requesting the r th ranked recommended video for each video. We plot the median of this quantity across all videos as a function of r in Figure 4. We see that the median CTR can be approximated by the Zipf distribution. Our model is therefore consistent with the observations in [29]. Varying κ allows us to change the slope of median CTR.

8 Samarth Gupta and Sharayu Moharir log 10 (Probability of requesting the video) 1.5 2 2.5 3 3.5 4 Final distribution, β = 0.6 Final distribution, β = 0.8 4.5 0 0.5 1 1.5 2 2.5 3 3.5 log (Rank of video) 10 Fig. 2. Content popularity profile for our model with m = 20, P cont = 0.4, Number of videos (n) = 2000 and κ = 0.8. log 10 (Probability of requesting the video) 1.5 2 2.5 3 3.5 4 P cont = 0.4 P cont = 0.6 4.5 0 0.5 1 1.5 2 2.5 3 3.5 log 10 (Rank of video) Fig. 3. Content popularity profile for our model with m = 20, Zipf parameter (β) = 0.8, Number of videos (n) = 2000 and κ = 0.8. CDF of Click Through Rate As mentioned in Section 2.1, in [16], the authors compute the Cumulative Distribution Function (CDF) of the Click Through Rate (CTR). To evaluate the CDF, we compute the CTR for the r th ranked video in the recommendation list as follows: n CTR(r) = π(i) P i,r th ranked recommended video. i=1 We plot the CDF of the CTR as a function of the position r in Figure 5. Qualitatively, Figure 5 shows the same trend as observed in Figure 3 in [16]. 4 CDN Setting We consider a Content Delivery Network (CDN) consisting of a central server which stores the entire catalog of contents offered by the VoD service, assisted by a local cache with limited storage capacity (Figure 6). Content can be fetched from the central server and replicated on the local cache to serve user requests. The motivation behind such a network architecture is to serve most of the user

Request Patterns and Caching 9 log 10 (Click through rate) 0.6 0.8 1 1.2 1.4 1.6 1.8 κ = 0.8 κ = 0.6 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 log 10 (Position) Fig. 4. Median Click through rate (CTR) as a function of position of video in the recommendation list for Number of videos (n) = 2000, m = 20, Zipf Parameter β = 0.8, and P cont = 0.4. 0.8 CDF of Click through rate 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10 20 30 40 50 60 70 80 90 100 Position Fig. 5. CDF of CTR vs position in recommendation list for Number of videos (n) = 2000, m = 20, Zipf parameter β = 0.8, κ = 0.8, and P cont = 0.4. requests via the local cache, thus reducing the load on the network backbone, and therefore reducing the overall bandwidth consumption of the network. In addition, the local cache can serve user requests with a lower start-up delay due to their geographical/network proximity. 4.1 Request Model We assume that the local cache serves u users concurrently and the arrival requests from each user are generated i.i.d. according to the Markovian process described in Section 3. We assume that the service time of each request is an Exponential random variable with mean 1. 4.2 Cost Model We divide the cost of serving requests into two parts:

10 Samarth Gupta and Sharayu Moharir Internet Users Local Cache Central Server Fig. 6. An illustration of a Content Delivery Network (CDN) with a central server, a local cache and two users. (i) Cost of Bandwidth Usage: Each time a video is fetched from the central server and replicated on the local cache, the CDN pays a fetching cost denoted by C Fetch. (ii) Cost of Delayed Startup: Each time the requested content is not available in the local cache and has to be fetched from the central server after the request is made, the CDN pays an additional start-up delay cost denoted by C Delay. This captures the cost of deterioration in the quality of service provided to the users. Without loss of generality, we normalize C Fetch = 1 and let C Delay = γ C Fetch, where γ is the start-up delay penalty. Let Cost(t) denote the total cost of serving requests that arrive before time t, F (t) be the number of fetches from the central server to the local cache made before time t and D(t) be the number of delayed start-ups by time t. Then we have that, Cost(t) = F (t) + γ D(t). The goal is to design content caching policies to minimize the total cost of serving user requests. 5 Caching Policies We propose a caching policy which uses the fact that user requests are being generated according to a Markov process to determine which contents to cache. We refer to this policy as the PreFetch policy. The key idea of the PreFetch policy is to pre-fetch the top r recommended videos as soon as a user requests a specific video, thus reducing the chance that the next request from this user will have to face any start-up delay. The policy uses the Least Recently Used (LRU) metric to purge stored content in order to make space to store the fetched content. We use the following definitions in the formal definition of the PreFetch policy:

Request Patterns and Caching 11 Definition 1. A video is said to be in use if it is being used to serve an active request. A video is referred to as a tagged video if it is one of the top r (where r is a pre-determined integer 1) recommendations for any one the videos currently in use. Refer to Figure 7 for a formal definition of the PreFetch policy. 1: Input: An integer r 1. 2: Initialize: Set of cached videos, C = φ, set of tagged videos, T = φ, set of videos in use, U = φ, set of cached videos currently not in use or tagged, V = C \ (T U). 3: On arrival (request for Video i) do, 4: if Video i / C, then 5: if C < cache size, then 6: fetch Video i; C = C Video i 7: else if V φ, then 8: fetch Video i; replace the Least Recently Used (LRU) video in V with Video i. 9: else 10: remove a video T, chosen uniformly at random, and replace it with Video i. 11: end if 12: Update C, V, T and U. 13: end if 14: if top r recommendations of Video i not in cache, then 15: pre-fetch missing recommended videos, 16: for each pre-fetched video do 17: if C < cache size, then 18: add video to the cache, update C, 19: else if V φ, then 20: replace LRU video in V with fetched video, 21: else 22: remove a video T, chosen uniformly at random, and replace it with fetched video. 23: end if 24: Update C, V, T and U. 25: end for 26: end if Fig. 7. PreFetch A caching policy which adapts the content stored on cache to ensure that the top r recommended videos for the videos currently being viewed are pre-fetched to the cache in order to reduce the chance of start-up delay for the next request. Remark 3. We assume that the storage capacity of the local cache is large enough to store more videos than the number of users it serves simultaneously.

12 Samarth Gupta and Sharayu Moharir Remark 4. The PreFetch caching policy can be implemented without the knowledge of the relative popularity of various videos. The only information required to implement the PreFetch policy is the list of recommended videos corresponding to each video in the catalog, which is always known to the VoD service. As discussed in [13], a possible generalization of the PreFetch policy is to prefetch only a fraction of the recommended videos instead of pre-fetching entire videos, and fetching the remaining part of the video only after the request is made. If there exists an α < 1 such that while the user watches the first α fraction of the video, the remaining (1 α) fraction of the video can be prefetched, the CDN can provide uninterrupted service to the user without any start-up delay by pre-fetching only the first α fraction of the video. In the next section, we compare the performance of our PreFetch policy with the popular Least Recently Used (LRU) caching policy. The LRU policy has been traditionally used for caching [25] and has been widely studied for decades. Refer to Figure 8 for a formal definition of the LRU policy. 1: On arrival (request for Video i) do, 2: if Video i not present in the cache, then 3: fetch Video i; replace the Least Recently Used (LRU) cached video with Video i. 4: end if Fig. 8. Least Recently Used (LRU) A caching policy. 6 Simulation Results In this section, we compare the performance of the LRU policy and the PreFetch policy. Our goal is to understand if exploiting the time correlation between requests from a user by pre-fetching recommended videos can lead to better performance. In addition, we also study how the performance of the two caching policies depends on the request arrival process and various system parameters like number of users using a local server (u), size of cache, fraction of video pre-fetched (α). Requests arrive according to the request model discussed in Section 3. We assume that the VoD service has a content catalog consisting of 1000 videos. We use the Albert-Barabasi model (Figure 1) to generate the recommendation graph with m = 20. We fix κ = 0.8 (defined in Section 3) for all the results presented in this section. We assume that the service time of each request is an Exponential random variable with mean of one time unit. We assume all videos are of unit size. For each set of system parameters, we simulate the system for 10 5 time units.

Request Patterns and Caching 13 6.1 Cost v/s Startup delay penalty (γ) 40 35 Cost per request 30 25 20 15 10 5 P = 0.4, PreFetch cont P = 0.4, LRU cont P = 0.6, PreFetch cont P = 0.6, LRU cont 0 0 10 20 30 40 50 60 70 Startup delay penalty (γ) Fig. 9. Cost vs Start-up delay penalty (γ) for a system with Number of videos = 1000, m = 20, Zipf parameter (β) = 0.8, Cache size = 200 and 1 User. As γ increases, PreFetch outperforms the LRU policy. 10 Optimal number of recommendations to prefetch 8 6 4 2 P cont = 0.4 P cont = 0.6 0 0 10 20 30 40 50 60 70 Startup delay penalty (γ) Fig. 10. The optimal number of recommendations to pre-fetch (r) vs Start-up delay penalty (γ) for a system with Number of videos = 1000, m = 20, Zipf parameter (β) = 0.8, Cache size = 200 and 1 User. The optimal number of recommendations to pre-fetch (r) increases with start-up delay penalty γ. In Figure 9, we compare the performance of the PreFetch policy and the LRU policy as a function of the Start-up delay penalty (γ). Recall that P cont is the probability that the next video requested by the user is one of the recommended videos. The PreFetch policy pre-fetches the top r ( 1) recommendations of a video from the central server to the cache the moment a video is requested, thus ensuring that there is no start-up delay if the user requests one the top r recommended videos. In addition, the total cost of service is the sum of the cost of bandwidth usage and cost due to startup delay. In Figure 9, for each

14 Samarth Gupta and Sharayu Moharir value of Start-up delay penalty (γ), we use the empirically optimized value of r which leads to the lowest cost of service. The optimal value of the number of recommendations to pre-fetch (r) increases with increase in Start-up delay penalty (γ) as shown in Figure 10. We observe that for low values of Start-up delay penalty (γ), LRU outperforms the PreFetch policy. As the Start-up delay penalty (γ) increases, PreFetch outperforms the LRU policy. This illustrates the tradeoff between bandwidth usage, i.e., number of pre-fetches and quality of service, i.e., reducing startup delay. 6.2 Cost v/s Number of users (u) 2 1.8 PreFetch LRU Cost per request 1.6 1.4 1.2 1 0.8 1 2 3 4 5 6 7 8 9 10 Number of users Fig. 11. Cost vs Number of users for a system with Number of videos = 1000, Startup delay penalty (γ) = 1, Zipf parameter (β) = 0.8, P cont = 0.4 and Cache size = 200. LRU outperforms the PreFetch policy. 40 PreFetch LRU Cost per request 35 30 25 20 1 2 3 4 5 6 7 8 9 10 Number of users Fig. 12. Cost vs Number of users for a system with Number of videos = 1000, Starup delay penalty (γ) = 63, Zipf parameter (β) = 0.8, P cont = 0.4 and Cache size = 200. The PreFetch policy outperforms the LRU policy.

Request Patterns and Caching 15 Hit rate (%) 80 70 60 50 40 30 LRU γ = 1, PreFetch γ = 63, PreFetch 20 1 2 3 4 5 6 7 8 9 10 Number of users Fig. 13. Hit rate vs Number of users for a system with Number of videos = 1000, Zipf parameter (β) = 0.8, P cont = 0.4 and Cache size = 200. Cache hit rates are significantly improved by using PreFetch scheme for all values of u and γ. In Figures 11 and 12, we compare the performance of the two policies where the value of r used by the PreFetch policy is empirically optimized for each value of u and γ. We see that as the number of users increases from 1 to 5, there is a sharp drop in the cost for both LRU and PreFetch policy. Since all the users access videos according to the same Markov process, when there are multiple users accessing the cache, the probability that the popular videos and their top recommendations are always present in the cache increases. This reduces the number of cache misses and the number of pre-fetches for the most popular videos, thus reducing the overall cost of service. As seen in Figure 11, when the Startup delay cost γ is low, the LRU caching policy outperforms the (optimized) PreFetch policy for all values of u. For γ = 63 (Figure 12), the PreFetch policy outperforms the LRU caching policy. Our simulations shows that for γ 11, the optimal number of recommendations (r) to cache is 1. The optimal value of r is between 4-6 for γ = 63. Figure 13 illustrates that cache hit rates are higher for the PreFetch policy as compared to that of the LRU policy for all values of u and γ considered. 6.3 Cost v/s P cont Recall that P cont denotes the probability that the next video is accessed via the recommendation list. We vary the value of P cont between 0.2 and 0.6 (to be consistent with the observations in [16]) and evaluate the performance of LRU and optimal PreFetch policy for γ = 11 and γ = 63. In Figure 14, we see that LRU outperforms the (optimized) PreFetch policy for low values of P cont and PreFetch outperforms LRU as P cont increases. Since increasing the value of P cont increases the probability that the next video is accessed via the recommendation list, we conclude that if the Startup delay cost is not very high (γ = 11), for low values of P cont, the excess bandwidth usage due to pre-fetching outweighs the benefits of reducing startup delay.

16 Samarth Gupta and Sharayu Moharir 8 7 Cost per request 6 5 1 User, PreFetch 4 1 User, LRU 2 Users, PreFetch 3 2 Users, LRU 10 Users, PreFetch 10 Users, LRU 2 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 P cont Fig. 14. Cost vs P cont for a system with Number of videos = 1000, Startup delay penalty (γ) = 11, Zipf parameter (β) = 0.8 and Cache size = 200. For low values of P cont, the excess bandwidth usage due to pre-fetching outweighs the benefits of reducing startup delay, and for higher values of P cont, pre-fetching leads to reduced cost of service. 45 40 Cost per request 35 30 25 20 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 10 Users, PerFetch 10 Users, LRU 15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 P cont Fig. 15. Cost vs P cont for a system with Number of videos = 1000, Startup delay cost (γ) = 63, Zipf parameter (β) = 0.8 and Cache size = 200. The PreFetch policy outperforms LRU for all values of P cont considered. Figure 15 illustrates that the PreFetch policy outperforms LRU for γ = 63 for all values of P cont considered. In addition, the relative performance of PreFetch policy improves with respect to LRU policy with increase in P cont. In Figure 16, we plot the optimal value of r as a function of P cont. We conclude that with increasing P cont, it is beneficial to pre-fetch more videos from the recommendation list. Figures 17 and 18, corresponding to γ = 11 and γ = 63 respectively, illustrate that cache hit rate is higher for the PreFetch policy as compared to the LRU policy.

Request Patterns and Caching 17 10 Optimal number of recommendations to prefetch 9 8 7 6 5 4 3 1 User 2 Users 10 Users 2 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 P cont Fig. 16. Optimal number of recommendations to pre-fetch (r) vs P cont for a system with Number of videos = 1000, Startup delay penalty (γ) = 63, Zipf parameter (β) = 0.8 and Cache size = 200. The optimal number of recommendations to pre-fetch increases with P cont. 75 70 Hit rate (%) 65 60 55 50 45 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 10 Users, PreFetch 10 Users, LRU 40 35 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 P cont Fig. 17. Hit rate vs P cont for a system with Number of videos = 1000, Startup delay penalty (γ) = 11, Zipf parameter (β) = 0.8 and Cache size = 200. The PreFetch policy has higher hit rate and the difference between the hit rates of the PreFetch policy and the LRU policy increases with increasing P cont.

18 Samarth Gupta and Sharayu Moharir 80 70 Hit rate (%) 60 50 40 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 10 Users, PreFetch 10 Users, LRU 30 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 P cont Fig. 18. Hit rate vs P cont for a system with Number of videos = 1000, Startup delay penalty (γ) = 63, Zipf parameter (β) = 0.8 and Cache size = 200. The PreFetch policy has higher hit rate and the difference between the hit rates of the PreFetch policy and the LRU policy increases with increasing P cont. 6.4 Cost v/s Zipf parameter (β) 50 Cost per request 40 30 20 10 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 10 Users, PreFetch 10 Users, LRU 0 0.5 0.75 1 1.25 1.5 1.75 2 Zipf parameter (β) Fig. 19. Cost vs Zipf parameter (β) for a system with Number of videos = 1000, Startup delay penalty (γ) = 63, P cont = 0.4 and Cache size = 200. The performance of both the LRU policy and the PreFetch policy improve with increasing β. As discussed in Figure 2, increasing the value of the Zipf parameter β makes the overall content popularity more lopsided, i.e., a smaller fraction of the videos account for the same fraction of the total requests. Therefore, the performance for both the LRU policy and the PreFetch policy improves with increasing β (Figure 19), as the small pool of popular videos are available in the local cache more often for both policies. We focus on β values between 0.6 and 2 since typical values of β lie in that range for most VoD services [4, 10, 12, 18, 19, 24, 28]. For Startup delay penalty γ > 11, the PreFetch policy outperforms the LRU policy for all β between 0.6 2. Optimal r for γ = 63 falls between 4 6 for these values of β. Figure 20 illustrates that cache hit rates increase with increasing β.

Request Patterns and Caching 19 100 90 Hit rate (%) 80 70 60 50 40 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 10 Users, PreFetch 10 Users, LRU 30 0.5 0.75 1 1.25 1.5 1.75 2 Zipf parameter (β) Fig. 20. Hit rate vs Zipf parameter (β) for a system with Number of videos = 1000, Start-up delay penalty (γ) = 63, P cont = 0.4 and Cache size = 200. The hit rates for both the LRU policy and the PreFetch policy improve with increasing β. 6.5 Cost v/s Cache size Cost per request 55 50 45 40 35 30 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 25 20 40 60 80 100 120 140 160 180 200 Cache size Fig. 21. Cost vs Cache size for a system with Number of videos = 1000, Startup delay penalty (γ) = 63, P cont = 0.4 and Zipf parameter (β) = 0.8. The performance of both polices improves with increasing cache size. We expect the performance of all policies to improve with the increase in cache size. In Figures 21 and 22, we see that the PreFetch policy performs considerably better than the LRU policy for all cache sizes considered. 6.6 Cost v/s Fraction to prefetch (α) In the simulation results discussed so far, we pre-fetch complete videos. We now explore the possibility of pre-fetching only a fraction of the video and fetching the remaining part of the video only after the request is made. If there exists an α < 1 such that while the user watches the first α fraction of the video, the remaining (1 α) fraction of the video can be pre-fetched, the CDN can provide

20 Samarth Gupta and Sharayu Moharir 70 60 Hit rate (%) 50 40 30 20 1 User, PreFetch 1 User, LRU 2 Users, PreFetch 2 Users, LRU 10 40 60 80 100 120 140 160 180 200 Cache size Fig. 22. Hit rate vs Cache size for a system with Number of videos = 1000, Startup delay cost (γ) = 63, P cont = 0.4 and Zipf parameter (β) = 0.8. The hit rates for both polices improve with increasing cache size. 7.5 7 Cost per request 6.5 6 5.5 5 4.5 1 user 2 users 10 users 4 3.5 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction to prefetch (α) Fig. 23. Cost vs Fraction to pre-fetch (α) for a system with Number of videos = 1000, Startup delay penalty (γ) = 11, P cont = 0.4, Zipf parameter (β) = 0.8 and Cache size = 200. The cost of service increases with α as larger fractions of videos need to be pre-fetched to avoid start-up delay. uninterrupted service to the user without any startup delay by pre-fetching only the first α fraction of the video. Pre-fetching only a fraction of video reduces the bandwidth usage, thus reducing the overall cost of service as shown in Figures 23 and 24. Since the bandwidth usage per pre-fetch is reduced, this allows the CDN to pre-fetch more recommendations at the same cost (Figure 25) which leads to improved cache hit rates (Figure 26). 7 Alternative Model In Section 3, we used the Barabasi-Albert model to generate the recommendation graph G(V, E). The Barabasi-Albert model generates an undirected graph. In the model described in 3, we replace each edge of this graph by two directed edges to get the recommendation relationships. As a result, if v i recommends

Request Patterns and Caching 21 35 Cost per request 30 25 20 1 user 2 users 10 users 15 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction to prefetch (α) Fig. 24. Cost vs Fraction to pre-fetch (α) for a system with Number of videos = 1000, Startup delay penalty (γ) = 63, P cont = 0.4, Zipf parameter (β) = 0.8 and Cache size = 200. The cost of service increases with α as larger fractions of videos need to be pre-fetched to avoid start-up delay. Optimal number of recommendations to prefetch 20 15 10 5 1 User, γ = 11 1 User, γ = 63 2 Users, γ = 11 2 Users, γ = 63 10 Users, γ = 11 10 Users, γ = 63 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction to prefetch (α) Fig. 25. Optimal number of recommendations to pre-fetch (r) vs Fraction to pre-fetch (α) for a system with Number of videos = 1000, P cont = 0.4, Zipf parameter (β) = 0.8 and Cache size = 200. The optimal number of recommendations to pre-fetch decrease with α as larger fractions of videos need to be pre-fetched to avoid start-up delay. v j, then, v j also recommends v i. However, the recommendation links in a VoD service may not always be bidirectional. In this section we explore a directed graph model that can be used to capture this property. 7.1 Model Definition Motivated by the fact that the degree distribution of the recommendation graph of VoD services follows the power law [23], instead of using the Barabasi-Albert model, we use a variation of the directed random graph model proposed in [3] for which the in-degree distribution follows the power law. The random graph is generated in an iterative manner by adding one node at the time. We start with an initial graph of m nodes. Each new node is connected to the graph via m edges. When a new node is introduced, edges are added to the graph in a sequential manner until there are m edges involving the new node as follows:

22 Samarth Gupta and Sharayu Moharir 80 Hit rate (%) 75 70 65 60 55 1 User, γ = 11 1 User, γ = 63 2 Users, γ = 11 2 Users, γ = 63 10 Users, γ = 11 10 Users, γ = 63 50 45 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fraction to prefetch (α) Fig. 26. Hit rate vs Fraction to pre-fetch (α) for a system with Number of videos = 1000, P cont = 0.4, Zipf parameter (β) = 0.8 and Cache size = 200. Hit-rates decrease with increasing values of α as the number of recommendations to pre-fetch reduce as α increases. With probability p out a link from new node is created to an existing node v. Node v is chosen randomly in proportion to the in-degree of v. With probability p in a link from an existing node v to the new node is created. Node v is chosen randomly in proportion to the out-degree of v. Otherwise, a link between 2 existing nodes u and v is created. The originating node (node u) and target node (node v) are chosen randomly in proportion to the in-degree and out-degree of u and v respectively. A formal description of the algorithm is given in Figure 27. Once the recommendation graph is generated, we assign transition probabilities to the various edges as discussed in Section 3. This completes the definition of our directed graph model. 7.2 Properties Similar to the model proposed in Section 3, the alternative model also uses the empirically observed properties that the content popularity in absence of recommendation follows the Zipf s distribution and that chain count is between 1.3 and 2.4 to assign transition probabilities. Next, we verify if this model which uses a directed random graph model to generate the recommendation graph satisfies the rest of the empirical properties observed in section 2.1. Degree distribution The graph G(V, E) generated as described in Figure 27 has a power law in-degree distribution, and the average number of out-links from a node is more than mp out. Figure 28 illustrates the in-degree distribution for a graph of 10, 000 nodes. The slope of the curve can be changed by changing the values of p in and p out. Small World nature We evaluate the clustering coefficient and average path length for this graph with 2000 nodes, p in = 0.4 and p out = 0.4. It is observed

Request Patterns and Caching 23 that this graph has a larger clustering and shorter average path lengths with respect to the Barabasi-Albert graph of the same size. We thus conclude that the construction in 7.1 generates a small world graph. Content popularity profile We generate the content popularity profile of our model by calculating the stationary distribution of the Markov Chain (as in Section 3). Figure 29 shows the content popularity profile in our model. We see that, the content popularity profile follows the Zipf distribution for the popular videos and decreases faster than as predicted by the Zipf distribution for the unpopular videos. Therefore, we conclude that content popularity profile for the alternative model is consistent with the observations in 2.1. Click through rate As in section 3, we plot the median Click Through Rate (CTR) for the directed graph model in Figure 30. We see that the median CTR can be approximated by the Zipf distribution. Our model is therefore consistent with the observations in 2.1. 1: Initialize: Generate a connected graph of m nodes (v 1, v 2,..., v m). Let v = m + 1. 2: Introduce a new node n v in the graph. Until the node gets connected with m existing nodes in the graph, either via inlinks or outlinks, new links are sequentially added as follows: With probability p out, a link from n v to n i for i < v is added, the probability p i of choosing node i is given by p i = Ki j Kj, where K i is the current in-degree of node n i. With probability p in, a link from n i to n v is added, the probability q i of choosing node i is given by q i = Li j Lj, where L i is the current out-degree of node n i. With probability 1 p out p in, a link from n i to n j is added for i, j m, the probability p i of choosing node n i, and q j of choosing node j is given by p i = Ki j Kj, 3: v = v + 1. If v < n, goto Step 2. q j = Lj k L, k Fig. 27. A directed random graph model which generates a random small-world directed graph with a degree distribution following the power law.

24 Samarth Gupta and Sharayu Moharir 3 log 10 (Number of nodes) 2.5 2 1.5 1 0.5 0 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 log 10 (Degree) Fig. 28. In-degree distribution for the directed graph model of size 10,000 nodes with p in = 0.4 and p out = 0.4. log 10 (Probability of requesting the video) 1 1.5 2 2.5 3 3.5 4 Final popularity, β = 0.8 Zipf law, β = 0.8 4.5 0 0.5 1 1.5 2 2.5 3 3.5 log 10 (Rank of video) Fig. 29. Content popularity profile for the model with m = 40, p out = 0.4, p in = 0.4, Zipf parameter (β) = 0.8, number of videos (n) = 2000 and κ = 0.8 Fraction of bi-directional links From a small experiment on a sub-graph of the YouTube recommendation graph, we observed that about 30% of the rec- 0.6 log 10 (Click through rate) 0.8 1 1.2 1.4 1.6 1.8 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 log (Position) 10 Fig. 30. CDF of CTR vs position in recommendation list for Number of videos (n) = 2000, m = 40, p in = 0.4, p out = 0.4, Zipf parameter β = 0.8, κ = 0.8, and P cont = 0.4.

Request Patterns and Caching 25 ommendation links are bi-directional. Figure 31 shows the relationship between the model parameter p in and the fraction of bi-directional links in the graph. We conclude that the value of p in can be tuned to obtain the desired fraction of bi-directional links. This flexibility does not exist in the model proposed in Section 3 and therefore, is a key point of difference between the model proposed in Section 3 and the alternative model discussed in this section. An issue with small values of p in is that it distorts the final popularity from the scale-free nature as shown in Figure 32. 0.4 Fraction of bi directional links 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 Model parameter p in Fig. 31. Fraction of bi-directional links as a function of model parameter p in graph of size 2000 nodes and p out = p in. for a log 10 (Probability of requesting the video) 1 1.5 2 2.5 3 3.5 4 Final popularity, β = 0.8 Zipf law, β = 0.8 4.5 0 0.5 1 1.5 2 2.5 3 3.5 log 10 (Rank of a video) Fig. 32. Content popularity for the model with m = 40, p out = 0.2, p in = 0.2, Zipf parameter (β) = 0.8, number of videos (n) = 2000 and κ = 0.8. The content popularity gets distorted from the Zipf law for small values of p in.

26 Samarth Gupta and Sharayu Moharir 7.3 Caching Simulations In this section, we study the performance of the LRU and PreFetch policy when requests arrive according to the model described in 7.1. We use the same CDN and simulation setting as described in Sections 4 and 6 respectively. In Figure 33, we compare the performance of PreFetch policy and the LRU policy as a function of the Startup delay penalty (γ). In Figure 33, we use the empirically optimized value of r (number of recommendations to prefetch) which leads to the lowest cost of service. The optimal number of recommendations to pre-fetch (r) increases with increase in Start-up delay penalty (γ) as shown in figure 34. Note that these plots exhibit qualitatively similar behavior as in Figures 9 and 10. 40 35 Prefetch LRU Cost per request 30 25 20 15 10 5 0 0 10 20 30 40 50 60 70 Startup delay penalty (γ) Fig. 33. Cost vs Start-up delay penalty (γ) for a system with number of videos = 1000, m = 40, Zipf parameter (β) = 0.8, cache size = 200, P cont = 0.4, p in = 0.4, p out = 0.4 and 1 user. As γ increases, PreFetch outperforms LRU policy. The dependence on other parameters like number of users using a local server (u), size of cache, fraction of videos pre-fetched (α) etc. is qualitatively similar to the results in Section 6. 8 Conclusions In this work, we propose a Markovian model for request arrivals in VoD services with recommendation engines which captures the time-correlation in user requests and is consistent with empirically observed properties. Low start-up delay is a key QoS requirement of users of VoD services. In addition, minimizing the bandwidth consumption of the network is key to reduce the cost of service. Given the trade-off between these two goals, we show that the time-correlation in user requests can be used to design caching policies which outperform popular policies like LRU which do not exploit this time-correlation. More specifically, we show that our caching policy PreFetch which employs recommendation based pre-fetching outperforms the LRU policy in terms of the

arxiv: v3 [cs.ni] 3 May 2017