Analysis of Web Caching Architectures: Hierarchical and Distributed Caching

Size: px
Start display at page:

Download "Analysis of Web Caching Architectures: Hierarchical and Distributed Caching"

Transcription

1 Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez Christian Spanner Ernst W. Biersack Abstract In this paper we compare the performance of different caching architectures. We derive analytical models to evaluate and caching in terms of client s perceived latency, bandwidth usage, load on the caches, and disk space. Hierarchical caching has short connection and low bandwidth usage, however, it requires high performance caches at the top-levels of the hierarchy. On the other hand, caching has short transmission times and does not require intermediate cache levels. With caching the bandwidth usage in the network is higher, however, the traffic is better, with more bandwidth in the less-congested lower network levels. Additionally, we study a hybrid scheme where a certain number of caches cooperate at every level of the caching hierarchy using caching. We find that a well configured hybrid scheme can combine the advantages of both and caching, reducing the retrieval latency, the bandwidth usage, and the load in the caches. Depending on the hybrid caching architecture, the current load in the caches/network, and the document size, there is an optimal number of caches that should cooperate at each network level to minimize the overall retrieval latency. Based on analytical and experimental results, we propose small modifications of the existing cache-sharing schemes to better determine the degree of cooperation at every caching level. I. INTRODUCTION The World Wide Web provides simple access to a wide range of information and services. As a result, the Web has become the most successful application in the Internet. However, the exponential growth in demand experienced during the last years has not been followed by the necessary upgrade in the network/servers, therefore, clients experience frustrating delays when accessing Web pages. In order to keep the Web attractive, the experienced latencies must be maintained under a tolerable limit. One way to reduce the experienced latencies, the server s load, and the Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack are with Institut EURECOM, Sophia Antipolis, France. (frodrigue, spanner, erbi@eurecom.frg). Pablo Rodriguez is supported by the European Commission in form of a TMR (Training and Mobility for Researchers) fellowship. EURECOM s research is partially supported by its industrial partners: Ascom, Cegetel, France Telecom, Hitachi, IBM France, Motorola, Swisscom, Texas Instruments, and Thomson CSF. Preliminary results of this work were presented at the 4th International Caching Workshop, San Diego. March, 999 Submitted to Transactions on Networking, May 99 network congestion is to store multiple copies of the same Web documents in geographically dispersed Web caches. As most of the Web content is rather static, the idea of Web caching at the application layer is not new. The first WWW browsers, e.g. Mosaic [7], were able to cache Web objects for later reference, thus, reducing the bandwidth used for Web traffic and the latency to the users. Web caching rapidly extended from a local cache used by a single browser to a shared cache serving all the clients from a certain institution. Unfortunately, since the number of clients connected to a single cache can be rather small and the amount of information available on the Web is rapidly increasing, the cache s performance can be quite modest. The hit rate, i.e. the percentage of requests that can be served from previously cached document copies, at an institutional cache only ranges between 3% and 5% [8]. The hit rate of a Web cache can be increased significantly by sharing the interests of a large community [6]; the more people accessing the cache, the higher the probability that a given document has already been requested, and is present in the cache. The most common approach to group a large client community is setting up a caching hierarchy with institutional, regional, and national caches [6]. Recently, several researchers have proposed to let institutional caches cooperate directly in a caching scheme with no intermediate cache levels [5][23]. With caching caches at placed at different network levels. At the bottom level of the hierarchy there are the client caches. When a request is not satisfied by the client cache, the request is redirected to the institutional cache. If the document is not found at the institutional level, the request is then forwarded to the regional cache which in turn forwards unsatisfied requests to the national cache. If the document is not found at any cache level, the national cache contacts directly the origin server. When the document is found, either at a cache or at the origin server, it travels down the hierarchy, leaving a copy at each of the intermediate caches. Further requests for the same document travel up the caching hierarchy until the document is hit at any cache level. Hierarchical caching is already a fact of life in much of the Internet [2]. However, there are several problems associated with a caching hierarchy: i) every hierarchy level may introduce additional

2 delays [23] [6], ii) higher level caches may become bottlenecks and have long queuing delays, and iii) multiple document copies are stored at different cache levels. With caching no intermediate caches are set up, and there are only institutional caches that serve each others misses. In order to decide from which institutional cache to retrieve a miss document, institutional caches keep metadata information about the content of every other cooperating institutional cache. To make the distribution of the metadata information more efficient and scalable, a distribution can be used [5] [23]. However, the hierarchy is only used to distribute information about the location of the documents and not to store document copies. In this paper we develop analytical models to study the performance of a pure scheme and compare it with the performance of a pure scheme. Then, we consider a hybrid scheme where caches cooperate at every level of a caching hierarchy using caching. We contrast,, and hybrid caching according to the latency incurred to retrieve a document, the bandwidth usage inside every network provider, the load in the caches, and the disk space requirements. We find that caching has lower connection times than caching and, thus, placing redundant copies at intermediate cache levels reduces the connection time. We also find that caching has lower transmission times than caching since most of the traffic flows through the less congested low network levels. In addition to the latency analysis, we study the bandwidth used by and caching. We find that caching has higher bandwidth usage than caching, however, the network traffic is better, using more bandwidth in the low network levels. We derive models to calculate the needed storage capacity to store all documents requested at a certain cache level (i.e., infinite cache capacity) depending on the document request distribution and the number of incoming requests. We obtain that the capacity needed to store all accessed documents at the top-level cache of a caching hierarchy is about several tens of GBytes, while the capacity needed in an institutional cache is about several GBytes (Section IV- F). With caching there are only institutional caches and no additional disk space is required at intermediate network levels. However, a large scale deployment of caching encounters several problems (i.e., high connection times, higher bandwidth usage, administrative issues, etc.). In a smaller scale, where close institutional caches are interconnected through fast links and bandwidth is abundant, caching performs very well with no additional intermediate cache levels. Our analysis of the hybrid scheme shows that the latency experienced by the clients varies depending on the number of caches that cooperate at every network level. Based on analytical results we determine the optimal number of caches that should cooperate at every network level to minimize the latency experienced by the clients. We find that a hybrid scheme with the optimal number of cooperating caches at every level, can combine the advantages of and caching, reducing the latency, the bandwidth usage, and the load in the caches. Then, we perform experiments to better understand the parameters that should determine the degree of cooperation between caches at every level. We study different schemes to select among all the caches that contain a document copy. We find that a round-trip-time (RTT) selection often provides an incorrect estimate of the fastest cache. Web caching happens at the application-level, and thus, a network-level measurement, i.e. RTT, is not enough to estimate the actual retrieval latency from a cache. Distant caches with high RTT but which are lightly loaded may offer lower retrieval latencies than other closer caches with small RTT which are highly congested. Thus, the degree of cooperation between caches at every level should be dynamically determined depending on the hybrid caching architecture, the document s size, and the current cache/network load. Based on experimental results we find that a cache-sharing scheme that selects the fastest cache based on the application-level rate of the caches, can have latency savings up to 8% compared to a selection based on the RTT. The rest of the paper is organized as follows. In Section I-A we discuss some previous work and different approaches to and caching. In Section II we describe our specific model for analyzing and caching. In Section III we provide latency analysis for and caching. In Section IV we present a numerical comparison of both caching architectures and also consider bandwidth, disk space, and cache s load. In Section V we analyze the hybrid scheme. In Section VI we propose small variations of the current cache-sharing schemes to dynamically determine the number of cooperating caches. In Section VII we summarize our findings and conclude the paper. A. Related Work Hierarchical Web caching was first proposed in the Harvest project [6] to share the interests of a large community of clients. In the context of caching, the Harvest group also designed the Internet Cache Protocol (ICP) [26], which supports discovery and retrieval of documents from neighboring caches as well as par-

3 ent caches. Other approach to caching is the Cache Array Routing Protocol (CARP) [24], which divides the URL-space among an array of loosely coupled caches and lets each cache store only the documents whose URL are hashed to it. In [2], Karger et al. propose a similar approach to CARP which uses DNS severs to resolve the URL-space and allows to replicate popular content in several caches. Povey and Harrison also proposed a Internet cache [5]. In their scheme upper level caches are replaced by directory servers which contain location hints about the documents kept at every cache. A metadata-hierarchy is used to make the distribution of these location hints more efficient and scalable. Tewari et al. proposed a similar approach to implement a fully Internet cache where location hints are replicated locally at the institutional caches [23]. In the central directory approach (CRISP) [], a central mapping service ties together a certain number of caches. In Summary Cache [9], Cache Digest [9], and the Relais project [4] caches inter-exchange messages indicating their content, and keep local directories to facilitate finding documents in other caches. Concerning a hybrid scheme, ICP allows for cache cooperation at every level of a caching hierarchy. The document is fetched from the parent/neighbor cache with a document copy that has the lowest RTT [26]. Rabinovich et al. [6] proposed to limit the cooperation between neighbor caches to avoid obtaining documents from distant or slower caches, which could have been retrieved directly from the origin server at a lower cost. A. Network Model II. THE MODEL As shown in Figure, the Internet connecting the server and the receivers can be modeled as a hierarchy of ISPs, each ISP with its own autonomous administration. Regional Network National Network International Path Regional Network Institutional Network Institutional Network Institutional Network Institutional Network Clients Fig.. Network topology National Network Origin Servers We shall make the reasonable assumption that the Internet hierarchy consists of three tiers of ISPs: institutional networks, regional networks, and national backbones. All of the clients are connected to the institutional networks; the institutional networks are connected to the regional networks; the regional networks are connected to the national networks. The national networks are also connected, sometimes by transoceanic links. We shall focus on a model with two national networks, with one of the national networks containing all of the clients and the other national network containing the origin servers. H H National cache C N access link Regional caches C N C R Router O Institutional caches z International Path C R C I C Clients... Fig. 2. The tree model. Caches placement. C I Origin Server We model the underlying network topology as a full O-ary tree, as shown in Figure 2. Let O be the nodal outdegree of the tree. Let H be the number of network links between the root node of a national network and the root node of a regional network. H is also the number of links between the root node of a regional network and the root node of an institutional network. Let z be the number of links between a origin server and root node (i.e., the international path). Let l be the level of the tree. l 2H +z, where l = is the institutional caches and l = 2H + z is the origin server. We assume that bandwidth is homogeneous within each ISP, i.e. each link within an ISP has the same transmission rate. Let C I, C R, and C N be the transmission rate of the links at the institutional, regional, and national networks. Let C be the bottleneck rate on the international path. B. Document Model Denote N the total number of documents in the WWW. Denote S the size of a certain document. We assume that documents change periodically every update period, thus, documents are removed from the caches every seconds. Requests for document i, i N in an institutional cache are Poisson with average request rate I;i. We consider that each document is requested independently from other documents, so we are neglect- l=h l= l=

4 ing any source of correlation between requests of different documents. Let I be the request rate P from an institutional N cache for all N documents, I = i= I;i. I is Zipf [4] [27], that is, if we rank all N documents in order of their popularity, the i? th most popular document has a request rate I;i given by I;i = I i ; where is a constant that determines how skewed the Zipf distribution is, and is given by = ( NX i= i )? : At the regional and national caches, requests for very popular documents are filtered by intermediate-level caches. To compare and caching we consider homogeneous client communities. Thus, requests for document i are uniformly between all O 2H institutional caches, resulting in tot;i = I;i O 2H total request rate for document i, which is also Poisson. If we would consider heterogeneous client communities the hit rates for both and caching would be reduced, since the geographical locality of reference is smaller (see Section IV-B). For some important parameters for which we want to get absolute numbers, i.e. hit rate, disk space, we also consider heterogeneous client communities (Sections IV-B, IV-F). C. Hierarchical Caching Caches are usually placed at the access points between two different networks to reduce the cost of traveling through a new network. As shown in Figure 2, we make this assumption for all of the network levels. In one country there is one national network with one national cache. There are O H regional networks and every one has one regional cache. There are O 2H local networks and every one has one institutional cache. Caches are placed on height of the tree (level in the cache hierarchy), height H of the tree (level 2 in the cache hierarchy), and height 2H of the tree (level 3 in the hierarchy). Caches are connected to their ISPs via access links. We assume that the capacity of the access link at every level is equal to the network link capacity at that level, i.e., C I, C R, C N and C for the respective levels. The hit rate for documents at the institutional, regional, and national caches is given by hit I, hit R, hit N. D. Distributed Caching In the caching scheme, caches are only placed at the institutional level of Figure 2 and no intermediate copies are stored in the network. Institutional caches keep a copy of every requested document. To share document copies among institutional caches, intermediate network caches are replaced with a metadata hierarchy which contains location hints about about the content of the institutional caches [5]. To avoid lookup latencies at the metadata hierarchy, location hints are replicated locally at every institutional cache [23]. We assume that location hints are instantaneously updated at every institutional cache. III. LATENCY ANALYSIS In this section we model the expected latency to obtain a document in a caching hierarchy and in a caching scheme. We use a similar analysis to the one presented in [7]. The total latency T to fetch a document can be divided into two parts, the connection time T c and the transmission time T t. The connection time T c is the time since the document is requested by the client and the first data byte is received. The transmission time T t is the time to transmit the document. Thus, the average total latency is given by A. Connection Time E[T ] = E[T c ] + E[T t ]: The connection time depends on the number of network links from the client to the cache containing the desired document copy. Let L be the number of links that a request travels to hit a document in the caching hierarchy. We assume that the operating system in the cache gives priority at establishing TCP connections. Let d denote the per-hop propagation delay. The connection time in a caching hierarchy Tc h is given by E[T h c ] = 4d X l2f;h;2h;2h+zg P (L = l)(l + ) where the 4d term is due to the three-way handshake of a TCP connection that increases the number of links traversed before any data packet is sent. To account for the distance between the client and the institutional cache, we consider one more link in the connection time. Now let L be the network level such that the tree rooted at level L is the smallest tree containing a document copy. The connection time in caching Tc d is given by E[T d c ] = 4d P 2H l= P (L = l) (2l + ) + 4d P (L = 2H + z) (2H + z + ): In caching a request first travels up to network level l and then it travels down to the institutional cache with a document copy, thus, accounting for 2l links.

5 We now proceed to calculate the distribution of L which is the same for and caching. To obtain P (L = l) we use P (L = l) = P (L l)? P (L l + ). Note that P (L l) is the probability that the number of links traversed to meet the document is equal to l or higher. To calculate P (L l) let denote the time into the interval [; ] at which a request occurs. The random variable is uniformly over the interval, thus we have Z P (L l) = P (L l j ) d () where P (L l j ) is the probability that there is no request for document i during the interval [; ] P (L l j ) = e?ol I;i : (2) Combining equation and equation 2 we get P (L l) = O l I;i (? I;i e?ol ): (3) B. Transmission Time In this section we calculate the transmission time to send a document in a caching hierarchy and in caching. The transmission time of a document depends on the network level L up to which a request travels. Requests that travel through low network levels will experience low transmission times. Requests that travel up to high network levels will experience large transmission times. We make the realistic assumption that the caches operate in a cut-through mode rather than a store-and-forward mode, i.e., when a cache begins to receive a document it immediately transmits the document to the subsequent cache (or client) while the document is being received. We expect capacity misses to be a secondary issue for large-scale cache architectures because it is becoming very popular to have caches with huge effective storage capacities. We therefore assume that each cache has infinite storage capacity. We now proceed to calculate the transmission time for caching E[Tt h ], and the transmission time for caching E[Tt d ]. E[T t h ] and E[T t d ] are given by: X E[Tt h ] = E[Tt h jl = l] P (L = l) l2f;h;2h;2h+zg X 2H+z E[Tt d ] = E[Tt d jl = l] P (L = l) l= where E[Tt h jl = l] and E[T t d jl = l] are the expected transmission times at a certain network level for and caching. To calculate E[Tt h jl = l] and E[Tt d jl = l] we first determine the aggregate request arrival rate at every network level l for caching l h, and caching d l. For caching, the aggregate request arrival rate at every network level l h is filtered by the hit rates at the lower caches. Thus, the aggregate request arrival rate generated by caching at a link between the levels l and l + of the network tree is given by h l = 8 >< >: O l I (? hit I ) O l I (? hit R ) O 2H I (? hit N ) I l = < l < H H l < 2H 2H l < 2H + z The hit rates at every network level can be calculated using the popularity distribution of the different documents, (i.e., Zipf) and the distribution of L. hit l = NX i= I;i I P (L l) For caching, the aggregate request arrival rate at a link between levels l and l + is filtered by the documents already hit in any institutional cache belonging to the subtree rooted at level l, hit l. Besides, in caching, the traffic between levels l and l + needs to be increased by the percentage of requests that were not satisfied in all the other neighbor caches and that are hit in the subtree rooted at level l, hit N? hit l. Thus, every subtree rooted at level l receives an additional traffic equal to O l I (hit N? hit l ). Therefore, the request rate between levels l and l + in caching is given by d l = O l I ((? hit l ) + (hit N? hit l )) for l < 2H and by O 2H I (? hit N ) for 2H l < 2H + z. To calculate the transmission time we model the routers and the caches on the network path from the sending to the receiving host as M/D/ queues. The arrival rate at a given network level for and caching is given by l h and l d. The service rate is given by the link s capacity at every network level (i.e., C I, C R, C N, and C). We assume that the most congested network link from level l to the clients, is the link at level l. Delays on network levels lower than l are neglected. Let S ~ be the average document size of all N documents. The M/D/ queuing theory [3] gives E[Tt h S ~ jl] = C l? l h ~ S (? h l ~ S 2C l )

6 for the delay at every network level in a caching hierarchy, and E[T d t jl] = S ~ C l? l d S ~ (? d l S ~ ) 2C l for the delay at every network level in a caching scheme. IV. HIERARCHICAL VS DISTRIBUTED CACHING: NUMERICAL COMPARISON In this section we pick some typical values for the different parameters in the model to obtain some quantitative results. The following parameters will be fixed for the remainder of the paper, except when stated differently. The network tree is modeled with an outdegree O = 4 and a distance between caching levels H = 3, yielding O H = 64 regional and O 2H = 496 institutional caches. The distance from the top node of the national network to the origin server is set to z =. Setting a high value for z we model the situation where the cost to access the origin servers is very high. We consider N = 25 million Web documents [3], which are following a Zipf distribution with between :64 and :8 [4]. We consider the total network traffic generated from the institutional caches O 2H I, to be equal to document requests per second [5]. We fix the average document size of a Web document as ~ S = KB [5]. We vary the update time between several days and one month. We consider % of non-cacheable documents [23]. The hit rates can be calculated with the formulas derived in Section III-B. Table I summarizes the hit rates at the different caching levels for homogeneous client communities and different update intervals. We see that the 7 days days 5 days hit I 4% 42% 44% hit R 59% 6% 63% hit N 78% 79% 8% TABLE I PERCENTAGE OF DOCUMENTS HIT AT DIFFERENT CACHING LEVELS FOR SEVERAL UPDATE INTERVALS. = :8 hit rate achieved at an institutional cache is about 4%, at a regional cache about 6%, and at the national cache about 8%. The values for the institutional and regional caches are quite similar to those reported in many real caches [23] [8]. However, the hit rate at the national cache is higher than in a real caching architecture. If we consider heterogeneous client communities at the national cache, i.e. = :64, we obtain hit rates at the national cache about 7%, which are closer to the real ones. A. Connection Time The connection time is the first part of the perceived latency when retrieving a document and depends on the network distance to the document. In Figure 3 we show the connection time for and caching for a generic document with a document s popularity tot. We only consider an update period = 24 hours; if we would consider larger update periods, documents would receive more requests in a period and the curves for and caching would both be displaced towards the left of the x-axis. However, the results of the comparison would be still the same. E[T c ] [ms] Fig. 3. Expected connection time E[T c ], for and caching depending on the document s popularity tot. = 24 hours, d = 5 msec. First, we observe that for very unpopular document (small tot ) both and caching experience high connection times because the request needs to travel to the origin server. As the number of requests for a document increases, the average connection time decreases since there is a higher probability to hit a document at closer caches than the origin server. For all the documents which tot ranges between one request per day and one request per minute, a caching scheme gives shorter connection times than a caching scheme. Document copies placed at the regional and the national caches in a caching hierarchy reduce the expected network distance to hit a document. To make the abstraction of considering a generic document, we eliminate the sub-index i and use tot instead of tot;i

7 When the document is very popular a caching scheme can benefit from copies stored at close neighbor caches, reducing the connection time. In a caching scheme, however, the document still has to be fetched from the regional cache in case of a miss at the institutional cache. Thus, for very popular documents, the caching scheme has shorter connection times than the scheme. However, since the probability that a very popular document is not hit at the local institutional cache is very small, the benefit of caching is almost not appreciable. B. Transmission Time The second part of the overall latency is the time it takes to transmit the Web document. To calculate the transmission time, we first show the distribution of the traffic generated by caching l d, and caching at every network level (Figure 4). We observe that dis- h l the regional and national network, C N = C R. We don t fix the network links capacities at the regional or national networks to certain values, but consider only the degree of congestion under which these links operate (i.e., we vary the utilization of the top national network links in the caching scheme = h 2H S ~ C N ). The international path is always very congested and has a utilization of IO 2H (?hit N ) S ~ C = :95. E[T t ] [ms] Mbps Mbps 2 Mbps (a) Not-congested national network. = :3. kbps kbps H 2H tree level Fig. 4. Network traffic generated by and caching at every tree level. = 24 hours. tributed caching practically doubles the used bandwidth on the lower levels of the network tree, and uses more bandwidth in most part of national network. However, the traffic on the most congested links, around the root node of the national network is reduced to half. Distributed caching uses all possible network shortcuts between institutional caches, generating more traffic in the less congested low network levels. Once we have presented the traffic generated at every network level we calculate the transmission time for two different scenarios, i) the national network is not congested, and ii) the national network is highly congested. We set the institutional network capacity to C I = Mbps. We consider the same network link capacities at E[T t ] [ms] (b) Congested national network. = :8. Fig. 5. Expected transmission time E[T t ], for and caching. = 24 hours. In Figure 5(a) we show the transmission time for the case where the national network is not congested ( = :3). The only bottleneck on the path from the client to the

8 origin server is the international path. We observe that the performance of both and caching is very similar because there are no highly congested links. In Figure 5(b) we contrast the performance of and caching in a more realistic scenario where the national network is congested, = :8. We observe that both, and caching have higher transmission times than in the case when the national network is not congested (Figure 5(a)). However, the increase in the transmission time is much higher for caching than for caching. Distributed caching gives shorter transmission times than caching because many requests travel through lower network levels. Similar results are also obtained in the case that the access link of the national cache is very congested. In this situation the transmission time in caching remains unchanged, while the transmission time in caching increases considerably [2]. C. Total Latency The total latency is the sum of the connection time and the transmission time. For large document sizes, the transmission time is more relevant than the connection time. For small document sizes, the transmission time is very small and the connection time has a higher relevance. Next, we present the total latency for a single document with different document sizes S in both and caching (recall that the average document size is S ~ = KB). We will study the total latency in the scenario where the top nodes of the national network are highly congested. In Figure 6 we observe that E[T] [ms] S [kb] Fig. 6. Average total latency depending on the document size S. National network is congested, = :8 caching gives lower latencies for documents smaller than 2 KBytes because caching has lower connection times than caching. However, caching gives lower latencies for higher documents because caching has lower transmission times than caching. The size-threshold depends on the degree of congestion in the national network. The higher the congestion, the lower is the size-threshold from which caching has lower latencies than caching. D. Bandwidth Usage Next, we consider the bandwidth usage in the regional and national network for both caching architectures. Note that the bandwidth used on the international path is the same for both architectures, since the number of documents hit in the global caching architecture is the same for both schemes. We calculate the bandwidth usage as the expected number of links traversed to distribute one packet to the clients. Figure 7(a) shows the bandwidth usage of and caching in the regional network. We see that caching uses more bandwidth that caching in the regional network since many documents are retrieved from distant institutional caches, traversing many links. However, recall that the network traffic is better, with most of the bandwidth being used on lower, less-congested levels of the network tree. Hierarchical caching uses fewer links in the regional network since documents are hit in the regional cache, which is at a closer network distance than many institutional caches. A similar behavior occurs in the national network (Figure 7(b)); caching uses more bandwidth than caching. However, the bandwidth usage of caching is lower in the national network than in the regional network since many institutional caches can be reached by traversing only few national network links. E. Cache Load In order to estimate the load on the caches, we calculate the filtered request rate at every cache level for and caching. For a caching hierarchy, the filtered request rates are given by h I;i = h R;i = I;i (? P (L )) OH I;i h N;i = (? P (L H)) O2H I;i : In caching, the request rate at the institutional cache d I;i is given by the local request rate I;i from the institutional cache, plus the external request rate from

9 Links In figure 8, we calculate the load in the caches for both caching architectures as h l I and d l I respectively. We see that the request rate at the regional and national cache of a caching hierarchy are very high. For popular documents, the request rate at the national and regional caches is reduced since these documents are already hit at lower cache levels. With caching, the request rate at the institutional cache is increased due to the external request rate, however, this increase is not very significant since the external request rate is shared among all cooperating caches. (a) Regional network 496 institutional regional national 6 5 Load Links 3 2 (b) National network Fig. 8. Document request rate at the different caches with respect to the institutional request rate I. = 24 hours. Considering all N documents with different popularities, we can estimate the total incoming request rates at every cache level. In Table II we show the load at every cache level of a caching hierarchy for different network topologies. Again, we see that upper level caches need to Fig. 7. Bandwidth usage on the regional and national network, depending on the document request rate. = 24 hours. other cooperating institutional caches. The external request rate from other cooperating institutional caches can be calculated using the percentage of requests that are not hit locally in an institutional cache and that are hit in any other institutional cache (P (L 2H)? P (L )). Recall that the percentage of requests that are satisfied among the O 2H institutional caches is shared between all the institutional caches, thus, reducing the external request rate that a single institutional cache needs to handle. The load at an institutional cache in caching is given by: d I;i = I;i + (P (L 2H)? P (L )) I;i institutional regional national O=4, H=3.2 req/s 6.4 req/s 22 req/s O=3, H=3.2 req/s 2.7 req/s 25 req/s TABLE II TOTAL REQUEST RATE AT EVERY HIERARCHY LEVELS FOR DIFFERENT NETWORK TOPOLOGIES, = 5 DAYS. support much higher request rates. For the situation where every cache has a smaller number of children caches, i.e. 27 children caches (O = 3,H = 3), the processing power required at high level caches is reduced, event hough it is still quite high.

10 F. Disk Space In this section we will analyze the disk requirements of and caching. In a caching hierarchy, a copy of every requested document is stored at each level of the hierarchy. In caching, only local caches keep copies of the requested documents. In order to compare the disk space required in both caching schemes, we calculate the expected disk space D h needed to store one document in the caching hierarchy with a given request rate. We calculate D h as the average Web document size S, times the average number of copies present in the caching infrastructure. The average number of copies in the caching infrastructure can be calculated using the probability that a new document copy is created at every cache level. Thus, the average disk space D h needed to store one document in caching is given by E[D h ] = ~ S O 2H (? e?h I;i ) + ~ S O H (? e?h R;i ) + ~ S (? e?h N;i ) and in caching is given by E[D d ] = ~ S O 2H (? e?d I;i ): hierarchy instead of being only at the institutional caches. However, for popular documents the amount of disk space used in and caching is very similar. Given the distribution of document requests at level l for caching, h l;i, and the average Web document size S, ~ we can derive the infinite cache size Dl; h at every hierarchy level, i.e. the disk space needed to keep a copy of every requested document while the document is up-todate. For caching, the infinite disk space is the same as the infinite disk space needed at the institutional caches of a caching hierarchy DI; h. Since documents expire after time units, to have and infinite disk space, a cache needs to keep all requested documents during the period where they are up-to-date. Squid cache recommends an average period for a document to be in the cache of 3 days [25]. The infinite cache size for a cache on level l is given by D h l; = ~ S NX i= (? e?h l;i ): To model different client communities with different Zipf parameters, and a generic request rate l, we calculate the request rate h l;i for document i at level l as h l;i = h l i. Table III presents the infinite cache space for different request rates l h, different Zipf parameters, and different update periods. h l, 5 days days 5 days req/s = GB 7 GB GB 2 req/s =.64 GB 58 GB 369 GB req/s = GB 558 GB 759 GB E[D h ] / E[D d ] 2 TABLE III INFINITE CACHE SIZE AT DIFFERENT LEVELS OF A CACHING HIERARCHY. ~ S = KB..5 Fig. 9. Hierarchical to disk space ratio for one document, depending on the document request rate Figure 9 shows the ratio of the disk space used by to caching for different document popularities. We see that caching uses three times as much disk space for unpopular documents, since document copies are present at the three levels of the caching We have chosen values of and h l that can easily correspond to those of an institutional, a regional, and national cache of a caching hierarchy [4] [8] []. As the period increases, the disk requirements also increase since the probability that a document is requested in a period increases. At the limit, for an infinite update period and a large enough client population, the cache would need a disk space equal to N S KBytes to store all requested N documents. From Table III we see that a cache with a request rate equal to req/sec needs about GBytes of disk space to keep all requested documents during 5 days. As the request rate increases, there are more documents that are requested during a period and the needed

11 disk space increases as well. For a cache with a request rate equal to req/sec the disk space required to store all documents is about several hundreds of Gbytes. These analytical values are very close to those ones reported in different trace-driven simulations with similar parameters l h, and [4] [] [8]. As we have seen in this section, the performance requirements of high level caches concerning disk space and cache load are very high, which makes very difficult to deploy a centralized top-level cache. As a result, many top level caches use several machines to distribute their load and disk space. These, machines cooperate in a fashion [], using some of the existing cache sharing schemes [24] [9] [9]. Distributed caching can decrease the retrieval latency of large documents and reduce the bandwidth usage at high network levels. However, full deployment of caching encounters several problems. When a document is fetched from a neighbor institutional cache, the experienced latency depends not only on the bandwidth of the requesting institutional cache, but also on the bandwidth of the neighbor cache that is contacted. Thus, in a caching scheme investing into a higher capacity links will not result in any benefit when documents are hit in neighbor caches with smaller connection capacities. In order to increase the local hit rates, local ISPs could increase the disk space of their cache and thus store more documents. However, in caching the more documents a given institutional cache stores, the higher it is the number of external requests it will receive from other neighbor institutional caches. Thus, investing in larger disks to save bandwidth and reduce latency can eventually result in more incoming traffic and longer queuing delays at the local cache. Additionally, caching has longer connection times and higher bandwidth usage in the low network levels. Nevertheless, caching can be used in a smaller scale where caches are interconnected with short distances and plentiful bandwidth, i.e. among the caches on a campus or in a metropolitan area. V. A HYBRID CACHING SCHEME In this section we consider a hybrid caching scheme where a certain number of caches k cooperate at every network level of a caching hierarchy. With hybrid caching when a document can not be found in a cache, the cache checks if the document can be found in any of the cooperating caches. If multiple caches happen to have a document copy, the neighbor/parent cache that provides the lowest latency is selected. If the document is not found among the caches at a given level, the request is then forwarded to the immediate parent cache or to the origin server. Next, we will study the optimal number of caches that should cooperate at every network level to minimize the connection time, the transmission time, and the total retrieval latency. The analysis of the hybrid scheme is very similar to the one already presented for and caching. Due to space limitations we omit the detailed analysis for hybrid caching which can be found in [2]. For hybrid caching the probability that a document is found among k cooperating institutional caches can be calculated using Formula 3 and replacing O l institutional caches by k cooperating caches. At the regional level, to calculate the probability that a document is found among k cooperating regional caches, O l should be replaced by O H k. A. Connection Time Next, we present the connection time in a hybrid scheme depending on the number of cooperating caches k at every level. The number of cooperating caches at every cache level can range from (no cooperation) to O H = 64 (all neighbor caches in the same cache level cooperate). In Figure, we show the average connection time for all N Web documents, depending on the number k of cooperating caches. We observe that when the number of cooperat- E[T c ] [ms] k Fig.. Connection time depending on the number of cooperating caches at every cache level in a hybrid scheme. ing caches is very small, the connection time is high. The probability that a document is found in few close neighbor caches is very small, thus, most of the requests are satisfied by the parent cache at a distance of H hops. When the number of cooperating caches increases, the connection time decreases up to a minimum. This is due to the fact that the probability to hit a document at neighbor caches,

12 which are closer than the parent cache increases. However, when the number of cooperating caches increases over a certain threshold k c = 4, the connection time increases very fast because documents are requested from cooperating caches at distant network levels. There is, therefore, an optimum number of caches that should cooperate at every cache level to minimize the connection time. The optimum number of cooperating caches k c that minimize the connection time is given number of caches that are at closer network distances than the parent cache, k c = O bh=2c. In Figure, we present the connection time for a hybrid scheme with the optimum number of cooperating caches k c, caching, and caching. We observe that a hybrid scheme with k c cooperating caches has lower connection times than caching and even lower connection times than caching for a large range of documents popularities. E[T c ] [ms] hybrid, k=4 Fig.. Connection time in a hybrid caching scheme with the optimal number of cooperating caches k c. B. Transmission Time Next, we analyze the transmission time in a hybrid scheme and calculate the optimum number k t of cooperating caches at every network level that minimize the transmission time. In Figure 2 we plot the transmission time in a hybrid scheme for all N Web documents depending on the number of cooperating caches at every cache level. We consider the case where the top links of the national network are not congested ( = :3), that is, the only bottleneck in the path from the origin server to the client is the international path, and the case where the top links of the national network are congested ( = :8). Similar results are also obtained for the case that the access link of the national cache is congested [2]. E[T t ] [ms] ρ=.3 ρ= k Fig. 2. Average transmission time depending on the number of cooperating caches in a hybrid scheme. National network is not congested, = :3. National network is congested, = :8. We observe that for the case where the national network is not congested varying the number of cooperating caches k at every cache level hardly influences the transmission time. However, when the national network is congested, the transmission time strongly depends on the number of cooperating caches at every cache level. If the number of cooperating caches is very small, there is a low probability that the document can be retrieved from close neighbor caches. Since the parent cache contains all documents requested by its children caches, there is a higher probability that the document is retrieved from the parent cache through the highly congested top-level links. As the number of cooperating caches increases, the probability to hit the document at close neighbor caches connected by fast links increases, and thus, the transmission times are lower. If the number of cooperating caches increases over a certain threshold k t = 6, the transmission time increases again because documents are hit in distant neighbor caches which are connected through highly congested top-level links. The optimum number of cooperating caches k t that minimizes the experienced transmission time, depends on the number of cooperating caches that can be reachable avoiding the congested links. In the case where the toplevel links of the national network are congested, the optimum number of cooperating caches at every cache level is k t = 6. This value corresponds to the number of regional caches that can cooperate without traversing the national top-level links, k t = O H?. In Figure 3, we present the transmission time E[T t ] for a hybrid scheme with the optimum number of coop-

13 erating caches k t, caching, and caching. We observe that a hybrid scheme with k t cooperating caches has lower transmission times than caching. We also observe that a hybrid scheme has even lower transmission times than caching since it reduces more the traffic around the high network levels [2]. k t = 6. In Figure 4 we observe that k opt ranges between k c = 4 and k t = 6. For documents smaller than several KBytes, only k c = 4 caches should cooperate at every cache level. For documents larger than several cents of KBytes, k t = 6 caches should cooperate at every cache level. E[T t ] [ms] hybrid, k=6 Fig. 3. Transmission time for a hybrid caching scheme with the optimal number of cooperating caches k t. National network is congested, = :8. Thus, by dynamically choosing the number of cooperating caches, a hybrid scheme can have as low connection times as caching, and as low transmission times as caching. C. Total Latency Depending on the document size there is an optimum number of caches that should cooperate at every cache level to minimize the total latency. For small documents the optimum number of cooperating caches is close to k c, since choosing k c cooperating caches minimizes the connection time. For large documents the optimum number of cooperating caches is close to k t, since choosing k t cooperating caches minimizes the transmission time. For any document size, the optimum number of cooperating caches k opt that minimizes the total retrieval latency is such that k c k opt k t. In Figure 4 we plot the optimum number of caches k opt that should cooperate at every network level to minimize the total retrieval latency depending on the document size. We choose the case where the top-level links of the national network are highly congested, thus the optimum number of caches that minimizes the transmission time is k opt S [kb] Fig. 4. Optimum number of cooperating caches k opt, depending on the document size S. National network is congested, = :8. In Figure 5 we show the total retrieval latency for a large document (S = 2 KB) and the optimal number of cooperating caches k opt = k t = 6. We see that a hybrid E[T] [ms] hybrid, k=6 Fig. 5. Total latency to retrieve a large document in a hybrid caching scheme with the optimum number of cooperating caches k opt = k t = 6. National network is congested, = :8. S = 2 KB. scheme with the optimal number of cooperating caches at every cache level has lower overall retrieval latencies than and caching for a large range of

14 document s popularities. D. Bandwidth Having caches cooperate at every cache level not only influences the latency experienced by the clients, but also modifies the bandwidth usage inside the network. In figures 6(a) and 6(b), we compare the bandwidth usage of a hybrid scheme for different degrees of cooperation with the bandwidth usage of and caching. hybrid architecture uses less bandwidth than a scheme. For the case where k = k c = 4 caches cooperate at every level, the bandwidth usage of a hybrid architecture is even lower than the bandwidth usage of a caching hierarchy, since k c = 4 cooperating caches minimize the connection time in terms of network distance. When the top-level links are congested, k = k t = 6 caches should cooperate to minimize the transmission time. However, in this situation the bandwidth usage increases slightly compared to a pure configuration since distant caches may be contacted to reduce the transmission time. Links hybrid, k=4 hybrid, k=6 (a) National network E. Cache Load A hybrid architecture redistributes the load at every cache level. Institutional and regional caches receive more requests from neighboring cooperating caches and the national cache receives fewer requests since more requests are satisfied among cooperating caches. The average request rates at the different hierarchy levels can be calculated as the aggregate request rate not hit in the children caches plus the external requests coming from the cooperating caches. 496 institutional regional national hybrid, k=4 Links hybrid, k=4 hybrid, k=6 Load 64 2 (b) Regional network Fig. 6. Bandwidth usage on the national and regional network for the, and hybrid caching schemes. We see that for the national, and the regional network, a Fig. 7. Document request rate at the different caches with respect to the institutional request rate I for caching and for hybrid caching with k = k c = 4 cooperating caches (crossed lines). = 24 hours. Figure 7 shows the load at every cache level for, and hybrid caching with k = 4. We see that a hybrid scheme clearly reduces the request rate for popular documents at the national and regional caches compared to caching. The load at the regional and institutional caches increases slightly, since every cooperating cache only serves a part of all the requests hit among the k cooperating caches. For higher number of cooperating

Evaluation of Web Caching Strategies

Evaluation of Web Caching Strategies INSTITUT FÜR INFORMATIK DER TECHNISCHEN UNIVERSITÄT MÜNCHEN Diplomarbeit Evaluation of Web Caching Strategies Bearbeiter: Aufgabensteller: Betreuer: Christian Spanner Prof. Dr. Johann Schlichter, TU München

More information

Improving the WWW: caching or multicast?

Improving the WWW: caching or multicast? Computer Networks and ISDN Systems 30 (1998) 2223 2243 Improving the WWW: caching or multicast? Pablo Rodriguez Ł,1, Keith W. Ross 1, Ernst W. Biersack 1 Institut EURECOM, 2229, route des Crêtes, BP 193,

More information

Improving the WWW: Caching or Multicast?

Improving the WWW: Caching or Multicast? Improving the WWW: Caching or Multicast? Pablo Rodriguez Keith W. Ross Ernst W. Biersack Institut EURECOM 2229, route des Crêtes, BP 193 694, Sophia Antipolis Cedex, FRANCE frodrigue, ross, erbig@eurecom.fr

More information

Modelling and Analysis of Push Caching

Modelling and Analysis of Push Caching Modelling and Analysis of Push Caching R. G. DE SILVA School of Information Systems, Technology & Management University of New South Wales Sydney 2052 AUSTRALIA Abstract: - In e-commerce applications,

More information

Design and Implementation of A P2P Cooperative Proxy Cache System

Design and Implementation of A P2P Cooperative Proxy Cache System Design and Implementation of A PP Cooperative Proxy Cache System James Z. Wang Vipul Bhulawala Department of Computer Science Clemson University, Box 40974 Clemson, SC 94-0974, USA +1-84--778 {jzwang,

More information

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007

CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 CS 344/444 Computer Network Fundamentals Final Exam Solutions Spring 2007 Question 344 Points 444 Points Score 1 10 10 2 10 10 3 20 20 4 20 10 5 20 20 6 20 10 7-20 Total: 100 100 Instructions: 1. Question

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018

Distributed Systems. 21. Content Delivery Networks (CDN) Paul Krzyzanowski. Rutgers University. Fall 2018 Distributed Systems 21. Content Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2018 1 2 Motivation Serving web content from one location presents problems Scalability Reliability Performance

More information

Appendix B. Standards-Track TCP Evaluation

Appendix B. Standards-Track TCP Evaluation 215 Appendix B Standards-Track TCP Evaluation In this appendix, I present the results of a study of standards-track TCP error recovery and queue management mechanisms. I consider standards-track TCP error

More information

Evaluation of Performance of Cooperative Web Caching with Web Polygraph

Evaluation of Performance of Cooperative Web Caching with Web Polygraph Evaluation of Performance of Cooperative Web Caching with Web Polygraph Ping Du Jaspal Subhlok Department of Computer Science University of Houston Houston, TX 77204 {pdu, jaspal}@uh.edu Abstract This

More information

Web Caching and Content Delivery

Web Caching and Content Delivery Web Caching and Content Delivery Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve Web performance Duplicate requests to the same

More information

CS November 2018

CS November 2018 Distributed Systems 21. Delivery Networks (CDN) Paul Krzyzanowski Rutgers University Fall 2018 1 2 Motivation Serving web content from one location presents problems Scalability Reliability Performance

More information

Subway : Peer-To-Peer Clustering of Clients for Web Proxy

Subway : Peer-To-Peer Clustering of Clients for Web Proxy Subway : Peer-To-Peer Clustering of Clients for Web Proxy Kyungbaek Kim and Daeyeon Park Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced Institute

More information

Summary Cache based Co-operative Proxies

Summary Cache based Co-operative Proxies Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck

More information

A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks

A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks A Scalable Framework for Content Replication in Multicast-Based Content Distribution Networks Yannis Matalas 1, Nikolaos D. Dragios 2, and George T. Karetsos 2 1 Digital Media & Internet Technologies Department,

More information

Bringing the Web to the Network Edge: Large Caches and Satellite Distribution

Bringing the Web to the Network Edge: Large Caches and Satellite Distribution Bringing the Web to the Network Edge: Large Caches and Satellite Distribution Pablo Rodriguez Ernst W. Biersack Institut EURECOM 2229, route des Crêtes. BP 193 694, Sophia Antipolis Cedex, FRANCE frodrigue,

More information

CONTENT DISTRIBUTION. Oliver Michel University of Illinois at Urbana-Champaign. October 25th, 2011

CONTENT DISTRIBUTION. Oliver Michel University of Illinois at Urbana-Champaign. October 25th, 2011 CONTENT DISTRIBUTION Oliver Michel University of Illinois at Urbana-Champaign October 25th, 2011 OVERVIEW 1. Why use advanced techniques for content distribution on the internet? 2. CoralCDN 3. Identifying

More information

Experiments on TCP Re-Ordering March 27 th 2017

Experiments on TCP Re-Ordering March 27 th 2017 Experiments on TCP Re-Ordering March 27 th 2017 Introduction The Transmission Control Protocol (TCP) is very sensitive to the behavior of packets sent end-to-end. Variations in arrival time ( jitter )

More information

A Scalable Efficient Robust Adaptive (SERA) Architecture for the Next Generation of Web Service

A Scalable Efficient Robust Adaptive (SERA) Architecture for the Next Generation of Web Service A Scalable Efficient Robust Adaptive (SERA) Architecture for the Next Generation of Web Service Jia Wang Cornell Network Research Group (C/NRG) Department of Computer Science, Cornell University Ithaca,

More information

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network

Improving the Data Scheduling Efficiency of the IEEE (d) Mesh Network Improving the Data Scheduling Efficiency of the IEEE 802.16(d) Mesh Network Shie-Yuan Wang Email: shieyuan@csie.nctu.edu.tw Chih-Che Lin Email: jclin@csie.nctu.edu.tw Ku-Han Fang Email: khfang@csie.nctu.edu.tw

More information

ECEN Final Exam Fall Instructor: Srinivas Shakkottai

ECEN Final Exam Fall Instructor: Srinivas Shakkottai ECEN 424 - Final Exam Fall 2013 Instructor: Srinivas Shakkottai NAME: Problem maximum points your points Problem 1 10 Problem 2 10 Problem 3 20 Problem 4 20 Problem 5 20 Problem 6 20 total 100 1 2 Midterm

More information

Transparent Distributed Web Caching in High Speed Networks

Transparent Distributed Web Caching in High Speed Networks Transparent Distributed Web Caching in High Speed Networks Diplomarbeit Thomas Pröll Bearbeitet bei: INSTITUT EURÉCOM Sophia-Antipolis France Technische Universität München München Fakultät für Informatik

More information

II. Principles of Computer Communications Network and Transport Layer

II. Principles of Computer Communications Network and Transport Layer II. Principles of Computer Communications Network and Transport Layer A. Internet Protocol (IP) IPv4 Header An IP datagram consists of a header part and a text part. The header has a 20-byte fixed part

More information

Course Routing Classification Properties Routing Protocols 1/39

Course Routing Classification Properties Routing Protocols 1/39 Course 8 3. Routing Classification Properties Routing Protocols 1/39 Routing Algorithms Types Static versus dynamic Single-path versus multipath Flat versus hierarchical Host-intelligent versus router-intelligent

More information

Multimedia Streaming. Mike Zink

Multimedia Streaming. Mike Zink Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=

More information

CS November 2017

CS November 2017 Distributed Systems 21. Delivery Networks () Paul Krzyzanowski Rutgers University Fall 2017 1 2 Motivation Serving web content from one location presents problems Scalability Reliability Performance Flash

More information

WSN Routing Protocols

WSN Routing Protocols WSN Routing Protocols 1 Routing Challenges and Design Issues in WSNs 2 Overview The design of routing protocols in WSNs is influenced by many challenging factors. These factors must be overcome before

More information

Master s Thesis. A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems

Master s Thesis. A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems Master s Thesis Title A Construction Method of an Overlay Network for Scalable P2P Video Conferencing Systems Supervisor Professor Masayuki Murata Author Hideto Horiuchi February 14th, 2007 Department

More information

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 2, APRIL Segment-Based Streaming Media Proxy: Modeling and Optimization

IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 2, APRIL Segment-Based Streaming Media Proxy: Modeling and Optimization IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 8, NO. 2, APRIL 2006 243 Segment-Based Streaming Media Proxy: Modeling Optimization Songqing Chen, Member, IEEE, Bo Shen, Senior Member, IEEE, Susie Wee, Xiaodong

More information

1 Connectionless Routing

1 Connectionless Routing UCSD DEPARTMENT OF COMPUTER SCIENCE CS123a Computer Networking, IP Addressing and Neighbor Routing In these we quickly give an overview of IP addressing and Neighbor Routing. Routing consists of: IP addressing

More information

Finding a needle in Haystack: Facebook's photo storage

Finding a needle in Haystack: Facebook's photo storage Finding a needle in Haystack: Facebook's photo storage The paper is written at facebook and describes a object storage system called Haystack. Since facebook processes a lot of photos (20 petabytes total,

More information

Low Latency via Redundancy

Low Latency via Redundancy Low Latency via Redundancy Ashish Vulimiri, Philip Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker Presenter: Meng Wang 2 Low Latency Is Important Injecting just 400 milliseconds

More information

Internet Caching Architecture

Internet Caching Architecture GET http://server.edu/file.html HTTP/1.1 Caching Architecture Proxy proxy.net #/bin/csh #This script NCSA mosaic with a proxy setenv http_proxy http://proxy.net:8001 setenv ftp_proxy http://proxy.net:8001

More information

Design Considerations for Distributed Caching on the Internet

Design Considerations for Distributed Caching on the Internet Design Considerations for Distributed Caching on the Internet Renu Tewari Michael Dahlin, Harrick M. Vin Jonathan S. Kay IBM T.J. Watson Research Center Department of Computer Sciences Cephalapod Proliferationists,

More information

Performing MapReduce on Data Centers with Hierarchical Structures

Performing MapReduce on Data Centers with Hierarchical Structures INT J COMPUT COMMUN, ISSN 1841-9836 Vol.7 (212), No. 3 (September), pp. 432-449 Performing MapReduce on Data Centers with Hierarchical Structures Z. Ding, D. Guo, X. Chen, X. Luo Zeliu Ding, Deke Guo,

More information

CS 421: COMPUTER NETWORKS SPRING FINAL May 16, minutes

CS 421: COMPUTER NETWORKS SPRING FINAL May 16, minutes CS 4: COMPUTER NETWORKS SPRING 03 FINAL May 6, 03 50 minutes Name: Student No: Show all your work very clearly. Partial credits will only be given if you carefully state your answer with a reasonable justification.

More information

CSEN 404 Introduction to Networks. Mervat AbuElkheir Mohamed Abdelrazik. ** Slides are attributed to J. F. Kurose

CSEN 404 Introduction to Networks. Mervat AbuElkheir Mohamed Abdelrazik. ** Slides are attributed to J. F. Kurose CSEN 404 Introduction to Networks Mervat AbuElkheir Mohamed Abdelrazik ** Slides are attributed to J. F. Kurose HTTP Method Types HTTP/1.0 GET POST HEAD asks server to leave requested object out of response

More information

===================================================================== Exercises =====================================================================

===================================================================== Exercises ===================================================================== ===================================================================== Exercises ===================================================================== 1 Chapter 1 1) Design and describe an application-level

More information

Kernel Korner. Analysis of the HTB Queuing Discipline. Yaron Benita. Abstract

Kernel Korner. Analysis of the HTB Queuing Discipline. Yaron Benita. Abstract 1 of 9 6/18/2006 7:41 PM Kernel Korner Analysis of the HTB Queuing Discipline Yaron Benita Abstract Can Linux do Quality of Service in a way that both offers high throughput and does not exceed the defined

More information

Chapter The LRU* WWW proxy cache document replacement algorithm

Chapter The LRU* WWW proxy cache document replacement algorithm Chapter The LRU* WWW proxy cache document replacement algorithm Chung-yi Chang, The Waikato Polytechnic, Hamilton, New Zealand, itjlc@twp.ac.nz Tony McGregor, University of Waikato, Hamilton, New Zealand,

More information

Student ID: CS457: Computer Networking Date: 3/20/2007 Name:

Student ID: CS457: Computer Networking Date: 3/20/2007 Name: CS457: Computer Networking Date: 3/20/2007 Name: Instructions: 1. Be sure that you have 9 questions 2. Be sure your answers are legible. 3. Write your Student ID at the top of every page 4. This is a closed

More information

Top-Down Network Design, Ch. 7: Selecting Switching and Routing Protocols. Top-Down Network Design. Selecting Switching and Routing Protocols

Top-Down Network Design, Ch. 7: Selecting Switching and Routing Protocols. Top-Down Network Design. Selecting Switching and Routing Protocols Top-Down Network Design Chapter Seven Selecting Switching and Routing Protocols Copyright 2010 Cisco Press & Priscilla Oppenheimer 1 Switching 2 Page 1 Objectives MAC address table Describe the features

More information

Fair Bandwidth Sharing Between Unicast and Multicast Flows in Best-Effort Networks

Fair Bandwidth Sharing Between Unicast and Multicast Flows in Best-Effort Networks Fair Bandwidth Sharing Between Unicast and Multicast Flows in Best-Effort Networks Fethi Filali and Walid Dabbous INRIA, 2004 Route des Lucioles, BP-93, 06902 Sophia-Antipolis Cedex, France Abstract In

More information

Performance Study of Satellite-linked Web Caches and Filtering Policy

Performance Study of Satellite-linked Web Caches and Filtering Policy Performance Study of Satellite-linked Web Caches and Filtering Policy Student: Xiao-Yu Hu Advisor: Pablo Rodriguez, Prof. Ernst Biersack June 28, 999 Abstract The exponential growth of World-Wide Web (WWW)

More information

Parallel-Access for Mirror Sites in the Internet

Parallel-Access for Mirror Sites in the Internet 1 -Access for Mirror Sites in the Internet Pablo Rodriguez Andreas Kirpal Ernst W. Biersack Institut EURECOM 2229, route des Crêtes. BP 193 694, Sophia Antipolis Cedex, FRANCE frodrigue, kirpal, erbig@eurecom.fr

More information

Beyond Hierarchies: Design Considerations for Distributed Caching on the Internet 1

Beyond Hierarchies: Design Considerations for Distributed Caching on the Internet 1 Beyond ierarchies: esign Considerations for istributed Caching on the Internet 1 Renu Tewari, Michael ahlin, arrick M. Vin, and Jonathan S. Kay epartment of Computer Sciences The University of Texas at

More information

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1

Seminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1 Seminar on By Sai Rahul Reddy P 2/2/2005 Web Caching 1 Topics covered 1. Why Caching 2. Advantages of Caching 3. Disadvantages of Caching 4. Cache-Control HTTP Headers 5. Proxy Caching 6. Caching architectures

More information

Efficient load balancing and QoS-based location aware service discovery protocol for vehicular ad hoc networks

Efficient load balancing and QoS-based location aware service discovery protocol for vehicular ad hoc networks RESEARCH Open Access Efficient load balancing and QoS-based location aware service discovery protocol for vehicular ad hoc networks Kaouther Abrougui 1,2*, Azzedine Boukerche 1,2 and Hussam Ramadan 3 Abstract

More information

2.993: Principles of Internet Computing Quiz 1. Network

2.993: Principles of Internet Computing Quiz 1. Network 2.993: Principles of Internet Computing Quiz 1 2 3:30 pm, March 18 Spring 1999 Host A Host B Network 1. TCP Flow Control Hosts A, at MIT, and B, at Stanford are communicating to each other via links connected

More information

Routing Basics ISP/IXP Workshops

Routing Basics ISP/IXP Workshops Routing Basics ISP/IXP Workshops 1 Routing Concepts IPv4 Routing Forwarding Some definitions Policy options Routing Protocols 2 IPv4 Internet uses IPv4 addresses are 32 bits long range from 1.0.0.0 to

More information

Internet Architecture & Performance. What s the Internet: nuts and bolts view

Internet Architecture & Performance. What s the Internet: nuts and bolts view Internet Architecture & Performance Internet, Connection, Protocols, Performance measurements What s the Internet: nuts and bolts view millions of connected computing devices: hosts, end systems pc s workstations,

More information

A New Architecture for HTTP Proxies Using Workstation Caches

A New Architecture for HTTP Proxies Using Workstation Caches A New Architecture for HTTP Proxies Using Workstation Caches Author Names : Prabhakar T V (tvprabs@cedt.iisc.ernet.in) R. Venkatesha Prasad (vprasad@cedt.iisc.ernet.in) Kartik M (kartik_m@msn.com) Kiran

More information

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers ABSTRACT Jing Fu KTH, Royal Institute of Technology Stockholm, Sweden jing@kth.se Virtual routers are a promising

More information

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT PhD Summary DOCTORATE OF PHILOSOPHY IN COMPUTER SCIENCE & ENGINEERING By Sandip Kumar Goyal (09-PhD-052) Under the Supervision

More information

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks

A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 8, NO. 6, DECEMBER 2000 747 A Path Decomposition Approach for Computing Blocking Probabilities in Wavelength-Routing Networks Yuhong Zhu, George N. Rouskas, Member,

More information

RED behavior with different packet sizes

RED behavior with different packet sizes RED behavior with different packet sizes Stefaan De Cnodder, Omar Elloumi *, Kenny Pauwels Traffic and Routing Technologies project Alcatel Corporate Research Center, Francis Wellesplein, 1-18 Antwerp,

More information

70 CHAPTER 1 COMPUTER NETWORKS AND THE INTERNET

70 CHAPTER 1 COMPUTER NETWORKS AND THE INTERNET 70 CHAPTER 1 COMPUTER NETWORKS AND THE INTERNET one of these packets arrives to a packet switch, what information in the packet does the switch use to determine the link onto which the packet is forwarded?

More information

Sustainable traffic growth in LTE network

Sustainable traffic growth in LTE network Sustainable traffic growth in LTE network Analysis of spectrum and carrier requirements Nokia white paper Sustainable traffic growth in LTE network Contents Executive summary 3 Introduction 4 Capacity

More information

FINAL Tuesday, 20 th May 2008

FINAL Tuesday, 20 th May 2008 Data Communication & Networks FINAL Exam (Spring 2008) Page 1 / 23 Data Communication & Networks Spring 2008 Semester FINAL Tuesday, 20 th May 2008 Total Time: 180 Minutes Total Marks: 100 Roll Number

More information

Performance Modeling and Evaluation of Web Systems with Proxy Caching

Performance Modeling and Evaluation of Web Systems with Proxy Caching Performance Modeling and Evaluation of Web Systems with Proxy Caching Yasuyuki FUJITA, Masayuki MURATA and Hideo MIYAHARA a a Department of Infomatics and Mathematical Science Graduate School of Engineering

More information

More on Testing and Large Scale Web Apps

More on Testing and Large Scale Web Apps More on Testing and Large Scale Web Apps Testing Functionality Tests - Unit tests: E.g. Mocha - Integration tests - End-to-end - E.g. Selenium - HTML CSS validation - forms and form validation - cookies

More information

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems

A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems A Privacy Preserving Model for Ownership Indexing in Distributed Storage Systems Tiejian Luo tjluo@ucas.ac.cn Zhu Wang wangzhubj@gmail.com Xiang Wang wangxiang11@mails.ucas.ac.cn ABSTRACT The indexing

More information

Today: World Wide Web! Traditional Web-Based Systems!

Today: World Wide Web! Traditional Web-Based Systems! Today: World Wide Web! WWW principles Case Study: web caching as an illustrative example Invalidate versus updates Push versus Pull Cooperation between replicas Lecture 22, page 1 Traditional Web-Based

More information

Cisco Virtualized Workload Mobility Introduction

Cisco Virtualized Workload Mobility Introduction CHAPTER 1 The ability to move workloads between physical locations within the virtualized Data Center (one or more physical Data Centers used to share IT assets and resources) has been a goal of progressive

More information

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator

Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Large-Scale Network Simulation Scalability and an FPGA-based Network Simulator Stanley Bak Abstract Network algorithms are deployed on large networks, and proper algorithm evaluation is necessary to avoid

More information

Last Class: Consistency Models. Today: Implementation Issues

Last Class: Consistency Models. Today: Implementation Issues Last Class: Consistency Models Need for replication Data-centric consistency Strict, linearizable, sequential, causal, FIFO Lecture 15, page 1 Today: Implementation Issues Replica placement Use web caching

More information

Hierarchical Content Routing in Large-Scale Multimedia Content Delivery Network

Hierarchical Content Routing in Large-Scale Multimedia Content Delivery Network Hierarchical Content Routing in Large-Scale Multimedia Content Delivery Network Jian Ni, Danny H. K. Tsang, Ivan S. H. Yeung, Xiaojun Hei Department of Electrical & Electronic Engineering Hong Kong University

More information

Scheduling of Multiple Applications in Wireless Sensor Networks Using Knowledge of Applications and Network

Scheduling of Multiple Applications in Wireless Sensor Networks Using Knowledge of Applications and Network International Journal of Information and Computer Science (IJICS) Volume 5, 2016 doi: 10.14355/ijics.2016.05.002 www.iji-cs.org Scheduling of Multiple Applications in Wireless Sensor Networks Using Knowledge

More information

Chapter 4 NETWORK HARDWARE

Chapter 4 NETWORK HARDWARE Chapter 4 NETWORK HARDWARE 1 Network Devices As Organizations grow, so do their networks Growth in number of users Geographical Growth Network Devices : Are products used to expand or connect networks.

More information

Network Load Balancing Methods: Experimental Comparisons and Improvement

Network Load Balancing Methods: Experimental Comparisons and Improvement Network Load Balancing Methods: Experimental Comparisons and Improvement Abstract Load balancing algorithms play critical roles in systems where the workload has to be distributed across multiple resources,

More information

Interdomain Routing Design for MobilityFirst

Interdomain Routing Design for MobilityFirst Interdomain Routing Design for MobilityFirst October 6, 2011 Z. Morley Mao, University of Michigan In collaboration with Mike Reiter s group 1 Interdomain routing design requirements Mobility support Network

More information

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup

Chapter 4. Routers with Tiny Buffers: Experiments. 4.1 Testbed experiments Setup Chapter 4 Routers with Tiny Buffers: Experiments This chapter describes two sets of experiments with tiny buffers in networks: one in a testbed and the other in a real network over the Internet2 1 backbone.

More information

Loopback: Exploiting Collaborative Caches for Large-Scale Streaming

Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Loopback: Exploiting Collaborative Caches for Large-Scale Streaming Ewa Kusmierek Yingfei Dong David Du Poznan Supercomputing and Dept. of Electrical Engineering Dept. of Computer Science Networking Center

More information

Análise e Modelagem de Desempenho de Sistemas de Computação: Component Level Performance Models of Computer Systems

Análise e Modelagem de Desempenho de Sistemas de Computação: Component Level Performance Models of Computer Systems Análise e Modelagem de Desempenho de Sistemas de Computação: Component Level Performance Models of Computer Systems Virgilio ili A. F. Almeida 1 o Semestre de 2009 Introdução: Semana 5 Computer Science

More information

CS 421: COMPUTER NETWORKS SPRING FINAL May 21, minutes

CS 421: COMPUTER NETWORKS SPRING FINAL May 21, minutes CS 421: COMPUTER NETWORKS SPRING 2015 FINAL May 21, 2015 150 minutes Name: Student No: Show all your work very clearly. Partial credits will only be given if you carefully state your answer with a reasonable

More information

Testing & Assuring Mobile End User Experience Before Production Neotys

Testing & Assuring Mobile End User Experience Before Production Neotys Testing & Assuring Mobile End User Experience Before Production Neotys Henrik Rexed Agenda Introduction The challenges Best practices NeoLoad mobile capabilities Mobile devices are used more and more At

More information

There are 10 questions in total. Please write your SID on each page.

There are 10 questions in total. Please write your SID on each page. Name: SID: Department of EECS - University of California at Berkeley EECS122 - Introduction to Communication Networks - Spring 2005 to the Final: 5/20/2005 There are 10 questions in total. Please write

More information

CS244a: An Introduction to Computer Networks

CS244a: An Introduction to Computer Networks Do not write in this box MCQ 9: /10 10: /10 11: /20 12: /20 13: /20 14: /20 Total: Name: Student ID #: CS244a Winter 2003 Professor McKeown Campus/SITN-Local/SITN-Remote? CS244a: An Introduction to Computer

More information

A Cluster-Based Energy Balancing Scheme in Heterogeneous Wireless Sensor Networks

A Cluster-Based Energy Balancing Scheme in Heterogeneous Wireless Sensor Networks A Cluster-Based Energy Balancing Scheme in Heterogeneous Wireless Sensor Networks Jing Ai, Damla Turgut, and Ladislau Bölöni Networking and Mobile Computing Research Laboratory (NetMoC) Department of Electrical

More information

Simulation Studies of the Basic Packet Routing Problem

Simulation Studies of the Basic Packet Routing Problem Simulation Studies of the Basic Packet Routing Problem Author: Elena Sirén 48314u Supervisor: Pasi Lassila February 6, 2001 1 Abstract In this paper the simulation of a basic packet routing problem using

More information

Saumitra M. Das Himabindu Pucha Y. Charlie Hu. TR-ECE April 1, 2006

Saumitra M. Das Himabindu Pucha Y. Charlie Hu. TR-ECE April 1, 2006 Mitigating the Gateway Bottleneck via Transparent Cooperative Caching in Wireless Mesh Networks Saumitra M. Das Himabindu Pucha Y. Charlie Hu TR-ECE-6-7 April 1, 26 School of Electrical and Computer Engineering

More information

Performance Modeling

Performance Modeling Performance Modeling EECS 489 Computer Networks http://www.eecs.umich.edu/~zmao/eecs489 Z. Morley Mao Tuesday Sept 14, 2004 Acknowledgement: Some slides taken from Kurose&Ross and Katz&Stoica 1 Administrivia

More information

Distributed File Systems Part II. Distributed File System Implementation

Distributed File Systems Part II. Distributed File System Implementation s Part II Daniel A. Menascé Implementation File Usage Patterns File System Structure Caching Replication Example: NFS 1 Implementation: File Usage Patterns Static Measurements: - distribution of file size,

More information

HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 )

HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) HSM: A Hybrid Streaming Mechanism for Delay-tolerant Multimedia Applications Annanda Th. Rath 1 ), Saraswathi Krithivasan 2 ), Sridhar Iyer 3 ) Abstract Traditionally, Content Delivery Networks (CDNs)

More information

Congestion control in TCP

Congestion control in TCP Congestion control in TCP If the transport entities on many machines send too many packets into the network too quickly, the network will become congested, with performance degraded as packets are delayed

More information

CHAPTER 5 PROPAGATION DELAY

CHAPTER 5 PROPAGATION DELAY 98 CHAPTER 5 PROPAGATION DELAY Underwater wireless sensor networks deployed of sensor nodes with sensing, forwarding and processing abilities that operate in underwater. In this environment brought challenges,

More information

Networked Systems (SAMPLE QUESTIONS), COMPGZ01, May 2016

Networked Systems (SAMPLE QUESTIONS), COMPGZ01, May 2016 Networked Systems (SAMPLE QUESTIONS), COMPGZ01, May 2016 Answer TWO questions from Part ONE on the answer booklet containing lined writing paper, and answer ALL questions in Part TWO on the multiple-choice

More information

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks

RECHOKe: A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < : A Scheme for Detection, Control and Punishment of Malicious Flows in IP Networks Visvasuresh Victor Govindaswamy,

More information

Computer Science 461 Midterm Exam March 14, :00-10:50am

Computer Science 461 Midterm Exam March 14, :00-10:50am NAME: Login name: Computer Science 461 Midterm Exam March 14, 2012 10:00-10:50am This test has seven (7) questions, each worth ten points. Put your name on every page, and write out and sign the Honor

More information

Design, Implementation and Evaluation of. Resource Management System for Internet Servers

Design, Implementation and Evaluation of. Resource Management System for Internet Servers Design, Implementation and Evaluation of Resource Management System for Internet Servers Kazuhiro Azuma, Takuya Okamoto, Go Hasegawa and Masayuki Murata Graduate School of Information Science and Technology,

More information

PLEASE READ CAREFULLY BEFORE YOU START

PLEASE READ CAREFULLY BEFORE YOU START Page 1 of 20 MIDTERM EXAMINATION #1 - B COMPUTER NETWORKS : 03-60-367-01 U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E Fall 2008-75 minutes This examination document

More information

PLEASE READ CAREFULLY BEFORE YOU START

PLEASE READ CAREFULLY BEFORE YOU START Page 1 of 20 MIDTERM EXAMINATION #1 - A COMPUTER NETWORKS : 03-60-367-01 U N I V E R S I T Y O F W I N D S O R S C H O O L O F C O M P U T E R S C I E N C E Fall 2008-75 minutes This examination document

More information

Mitigating the Gateway Bottleneck via Transparent Cooperative Caching in Wireless Mesh Networks

Mitigating the Gateway Bottleneck via Transparent Cooperative Caching in Wireless Mesh Networks Purdue University Purdue e-pubs ECE Technical Reports Electrical and Computer Engineering 8-1-26 Mitigating the Gateway Bottleneck via Transparent Cooperative Caching in Wireless Mesh Networks Saumitra

More information

Design, Implementation and Evaluation of Resource Management System for Internet Servers

Design, Implementation and Evaluation of Resource Management System for Internet Servers Design, Implementation and Evaluation of Resource Management System for Internet Servers Paper ID: 193 Total number of pages: 14 Abstract A great deal of research has been devoted to solving the problem

More information

Routing Basics ISP/IXP Workshops

Routing Basics ISP/IXP Workshops Routing Basics ISP/IXP Workshops 1 Routing Concepts IPv4 Routing Forwarding Some definitions Policy options Routing Protocols 2 IPv4 Internet uses IPv4 addresses are 32 bits long range from 1.0.0.0 to

More information

Lecture 5: Performance Analysis I

Lecture 5: Performance Analysis I CS 6323 : Modeling and Inference Lecture 5: Performance Analysis I Prof. Gregory Provan Department of Computer Science University College Cork Slides: Based on M. Yin (Performability Analysis) Overview

More information

The Value of Peering. ISP/IXP Workshops. Last updated 23 rd March 2015

The Value of Peering. ISP/IXP Workshops. Last updated 23 rd March 2015 The Value of Peering ISP/IXP Workshops Last updated 23 rd March 2015 1 The Internet p Internet is made up of ISPs of all shapes and sizes n Some have local coverage (access providers) n Others can provide

More information

NETWORKING research has taken a multipronged approach

NETWORKING research has taken a multipronged approach IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 24, NO. 12, DECEMBER 2006 2313 Performance Preserving Topological Downscaling of Internet-Like Networks Fragkiskos Papadopoulos, Konstantinos Psounis,

More information

Democratizing Content Publication with Coral

Democratizing Content Publication with Coral Democratizing Content Publication with Mike Freedman Eric Freudenthal David Mazières New York University NSDI 2004 A problem Feb 3: Google linked banner to julia fractals Users clicking directed to Australian

More information

HTRC Data API Performance Study

HTRC Data API Performance Study HTRC Data API Performance Study Yiming Sun, Beth Plale, Jiaan Zeng Amazon Indiana University Bloomington {plale, jiaazeng}@cs.indiana.edu Abstract HathiTrust Research Center (HTRC) allows users to access

More information