Shared Running Buffer Based Proxy Caching of Streaming Sessions

Shared Runnng Buffer Based Proxy Cachng of Streamng Sessons Songqng Chen, Bo Shen, Yong Yan, Sujoy Basu Moble and Meda Systems Laboratory HP Laboratores Palo Alto HPL-23-47 March th, 23* E-mal: sqchen@cs.wm.edu, {boshen, yongyan, basus}@hpl.hp.com shared runnng buffer, proxy cachng, patchng, streamng meda delvery, VOD Wth the fallng prce of memory, an ncreasng number of multmeda servers and proxes are now equpped wth a large memory space. Cachng meda objects n memory of a proxy helps to reduce the network traffc, the dsk I/O bandwdth requrement, and the data delvery latency. The runnng buffer approach and ts alternatves are representatve technques to cachng streamng data n the memory. There are two lmts n the exstng technques. Frst, although multple runnng buffers for the same meda object co-exst n a gven processng perod, data sharng among multple buffers s not consdered. Second, user access patterns are not nsghtfully consdered n the buffer management. In ths paper, we propose two technques based on shared runnng buffers (SRB) n the proxy to address these lmts. Consderng user access patterns and characterstcs of the requested meda objects, our technques adaptvely allocate memory buffers to fully utlze the currently buffered data of streamng sessons, wth the am to reduce both the server load and the network traffc. Expermentally comparng wth several exstng technques, we show that the proposed technques acheve sgnfcant performance mprovement by effectvely usng the shared runnng buffers. * Internal Accesson Date Only Approved for External Publcaton Department of Computer Scence, College of Wllam and Mary, Wllamsburg, VA 2388 Copyrght Hewlett-Packard Company 23

Shared Runnng Buffer Based Proxy Cachng of Streamng Sessons Songqng Chen Λ Bo Shen, Yong Yan and Sujoy Basu Department of Computer Scence Moble and Meda System Lab College of Wllam and Mary Page Mll Road Wllamsburg, VA 2388 Palo Alto, CA 9434 sqchen@cs.wm.edu fboshen,yongyan,basusg@hpl.hp.com Abstract Wth the fallng prce of memory, an ncreasng number of multmeda servers and proxes are now equpped wth a large memory space. Cachng meda objects n memory of a proxy helps to reduce the network traffc, the dsk I/O bandwdth requrement, and the data delvery latency. The runnng buffer approach and ts alternatves are representatve technques to cachng streamng data n the memory. There are two lmts n the exstng technques. Frst, although multple runnng buffers for the same meda object co-exst n a gven processng perod, data sharng among multple buffers s not consdered. Second, user access patterns are not nsghtfully consdered n the buffer management. In ths paper, we propose two technques based on shared runnng buffers (SRB) n the proxy to address these lmts. Consderng user access patterns and characterstcs of the requested meda objects, our technques adaptvely allocate memory buffers to fully utlze the currently buffered data of streamng sessons, wth the am to reduce both the server load and the network traffc. Expermentally comparng wth several exstng technques, we show that the proposed technques acheve sgnfcant performance mprovement by effectvely usng the shared runnng buffers. Key Words: Shared Runnng Buffer, Proxy Cachng, Patchng, Streamng Meda, VOD. Introducton The buldng block of a content delvery network s a server-proxy-clent system. In ths system, the server delvers the content to the clent through a proxy. The proxy can choose to cache the object so that subsequent requests to the same object can be served drectly from the proxy wthout contactng the server. Proxy cachng strateges have therefore been the focus of many studes. Much work has been done n cachng statc web content to reduce network load and end-to-end latency. Typcal examples of such work nclude CERN httpd [6], Harvest [4] and Squd [4]. The cachng of streamng meda content presents a dfferent set of challenges: () The sze of a streamng meda object s usually orders of magntudes larger than tradtonal web contents. For Λ Work performed as an ntern at HP Labs, summer 22.

example, a two-hour long MPEG vdeo requres about.4 GB of dsk space, whle tradtonal web objects are of the magntude of KB; () The demand of contnuous and tmely delvery of a streamng meda object s more rgorous than that of text-based Web pages. Therefore a lot of resources have to be reserved for delverng the streamng meda data to a clent. In practce, even a relatvely small number of clents can overload a meda server, creatng bottlenecks by demandng hgh dsk bandwdth on the server and hght network bandwdth to the clents. To address these challenges, researchers have proposed dfferent methods to cache streamng meda objects va partal cachng, patchng or proxy bufferng. In the partal cachng approach, ether a prefx [8] or segments [2] of a meda object nstead of the whole object s/are cached. Therefore, less storage space s requred. For on-gong streamng sessons, patchng can be used so that later sessons for the same object can be served smultaneously. For proxy bufferng, ether a fxed-sze runnng buffer [3] or an nterval [9] can be used to allocate buffers to buffer a runnng wndow ofan on-gong streamng sesson n the memory of the proxy. Among these technques, partal cachng uses dsk resource on the proxy; patchng uses storage resource on the clent sde, and theoretcally no memory resource s requred at the proxy; proxy bufferng uses the memory resource on the proxy. However, nether runnng buffer nor nterval cachng uses the memory resource to the full extent. More detaled analyss of these technques can be found n Secton 2. In ths paper, we frst propose a new memory-based cachng algorthm for streamng meda objects usng Shared Runnng Buffers (SRB). In ths algorthm, the memory space on the proxy s allocated adaptvely based on the user access pattern and the requested meda objects themselves. Streamng sessons are cached n runnng buffers. Ths algorthm dynamcally caches meda objects n the memory whle delverng the data to the clent so that the bottleneck of the dsk and/or network I/O s releved. More mportantly, smlar sessons can share dfferent runs of the on-gong sessons. Ths approach s especally useful when requests to streamng objects are hghly temporal localzed. The SRB algorthm () adaptvely allocates memory space accordng to the user access pattern; () enables maxmal sharng of the cached data n memory; () optmally reclams memory space when requests termnate; (v) apples a near-optmal replacement polcy n the real tme. Based on the SRB meda cachng algorthm, we further propose an effcent meda delverng algorthm: (PSRB), whch further mproves the performance of the meda data delvery wthout the necessty of cachng. Smulatons are conducted based on synthetc workloads of web meda objects wth mxed lengths as well as workloads wth accesses to lengthy meda objects n the VOD envronment. In addton, we use an access log of a meda server n a real enterprse ntranet to further conduct smulatons. The smulaton results ndcate that the performance of our algorthms s superor to prevous solutons. The rest of the paper s organzed as follows. In Secton 2, the related work s surveyed. Secton 3 descrbes the optmal memory-based cachng algorthms we propose. To test the performance of the proposed algorthms, we use synthetc workloads as well as real workload from an enterprse meda server. Some statstcal analyss of these workloads s provded n Secton 4. Performance evaluatons are conducted n Secton. We then make concludng remarks n Secton 6. 2 Related Work In ths secton, we survey prevous work related to the cachng of streamng meda content. Three types of methods are nvestgated. Frst, we brefly dscuss streamng meda cachng methods that 2

use the storage resource on the proxy. These methods cache meda objects for streamng sessons that are not overlapped n tme. The cached data perssts n cache even after sesson termnaton. Usually, a long-term storage resource such as dsk storage s used. Second, we nvestgate cachng methods that use the storage resource on the clent devce. These methods share streamng sessons that are overlapped n tme. However, the proxy does not buffer data for any sesson. Lastly, we study cachng methods that use the storage resource on the proxy to buffer portons of meda objects for streamng sessons that are overlapped n tme. Note that the buffered data perssts n cache only when the sessons are alve. Therefore, short-term storage resource such as RAM s used. 2. Partal Cachng Storng the entre meda object n the proxy may be neffcent f mostly small portons of very large meda objects are accessed. Ths s partcularly true f the cached streamng meda object s not popular. The frst ntuton s to cache portons of the meda object. These partal cachng systems always use the storage on the proxy. Some early work on the storage for meda objects can be found n [2, 9]. Two typcal types of partal cachng have been nvestgated. Prefx cachng [8] stores only the frst part (prefx) of the popular meda object. When a clent requests a meda stream whose prefx s cached, the proxy delvers the prefx to the clent whle t requests the remander of the object from the orgn server. By cachng a (large enough) prefx, the start-up latency perceved by the clent s reduced. The challenge les n the determnaton of the prefx sze. Items such as roundtrp delay, server-to-proxy latency, vdeo specfc parameters (e.g., sze, bt rate, etc.), and retransmsson rate of lost packets can be consdered n calculatng the approprate prefx length. Alternatvely, meda objects can be cached n segments. Ths s partcularly useful when clents only vew portons of the objects. The segments n whch clents are not nterested wll not be cached. Wu et al. [2] use segments wth exponentally ncremental sze to model the fact that clents usually start vewng a streamng meda object from the begnnng and are more and more lkely to termnate the sesson toward the end of the object. A combnaton of fxed length and exponental length segment-based cachng method s consdered n the RCache and Slo project []. Lee et al. [] uses a context-aware segmentaton so that segments of user nterest are cached. 2.2 Sesson Sharng The delvery of a streamng meda object takes tme to complete. We call ths delvery process a streamng sesson. Sharng s possble among sessons that overlap. To ths end, varous knds of patchng schemes are proposed for sharng along the tme lne. Ths type of work s typcally seen n research related to vdeo on demand (VOD) systems. Patchng algorthms [3] use storage on the clent devce so that a clent can lsten to multple channels. Greedy patchng always patches to the exstng full streamng sesson whle grace patchng restarts a new full streamng sesson at some approprate pont n tme. Optmal patchng [7] further assumes suffcent storage resource at the clent sde to receve as much data as possble whle lstenng to as many channels as possble. Fgure shows the basc deas of the patchng technques through examples. In these fgures, meda poston ndcates a tme poston at whch offset the meda object s delvered 3

Meda Poston Meda Poston Meda Poston Access Tme Access Tme Access Tme Fgure : Left to rght, (a) Greedy Patchng, (b) Grace Patchng, (c) Optmal Patchng. to the clent. For llustraton purpose, we assume nstantaneous delverng of meda content at a constant rate equal to the presentaton rate of the meda object. That s, f a request s served mmedately, the clent receves n tme unts of data after n tme unts snce the request s ntated. The sold lnes represent a streamng sesson between the server and the proxy. The goal of these algorthms s to reduce the server-to-proxy traffc. A greedy patchng example s shown n Fgure (a). A full server-to-proxy sesson s establshed at the frst request arrval for a certan meda object. All subsequent requests for the same meda object are served as patches to the ths full sesson before t termnates, at whch pont another full sesson s establshed for the next request arrval. It s obvous that the patchng sesson can be excessvely long as shown n the last two requests n Fgure (a). It s more effcent to start a new sesson even before the prevous full sesson termnates. The grace patchng example shown n Fgure (b) ndcates that upon the arrval of the ffth request, a new full server-to-proxy sesson s started. Therefore, the patchng sesson of the last request s sgnfcantly shorter than n the greedy patchng example. The optmal pont to start a new full sesson can be derved mathematcally gven certan request arrval pattern [2]. The optmal patchng scheme s ntroduced n [7]. The basc dea s that later sessons patch to as many on-gong sessons as possble. The on-gong sessons can be a full sesson or a patchng sesson. Fgure (c) shows an example scenaro. The last request patches to the patchng sesson started for the second last request as well as the full sesson started for the frst request. Ths approach acheves the maxmum server-to-proxy traffc reducton. On the other hand, t requres extra resource n clent storage and proxy-to-clent bandwdth as well as complex schedulng at the proxy. Note that no storage resource s requred on the proxy for these sesson sharng algorthms. On the other hand, snce the clents have to lsten to multple channels and store content before ts presentaton tme, clent sde storage s necessary. One specal type of the sesson-sharng approaches s the batchng approach [, ]. In ths approach, requests are grouped and served smultaneously va multcast. Therefore, requests arrvng earler have to wat. Hence, certan delay s ntroduced to the early arrved requests. 4

2.3 Proxy Bufferng To further mprove the cachng performance for streamng meda, memory resources on the proxy are used. Ths s more and more practcal gven that the prce for memory keeps fallng. For memory-based cachng, runnng buffer and nterval cachng methods have been studed. Meda Poston B B2 Access Tme R R2 R3 R4 R R6 (a) Meda Poston B R R2 R3 R4 R R6 (b) B2 Access Tme Fgure 2: (a) Runnng Buffer, (b) Interval Cachng. Dynamc cachng [3] uses a fxed-sze runnng buffer to cache a runnng wndow of a streamng sesson. It works as follows. When a request arrves, a buffer of a predetermned length s allocated to cache the meda data the proxy fetches from the server, n the expectaton that closely followed requests wll use the data n the memory nstead of fetchng from other sources (e.g., dsk or orgn server or other cooperatve caches). Fgure 2(a) shows an example of runnng buffer cachng. A fx-szed buffer s allocated upon the arrval of request R. The buffer s flled by the sesson started by R and run to the end of stream (the paralleled vertcal lnes ndcate the buffer fullness and the locaton of the buffered data relatve to the begnnng of the meda object). Subsequent requests R 2, R 3, R 4 are served from the buffer snce they arrved wthn the tme perod covered by the buffer. Request R arrves beyond the range covered by the frst buffer, hence a second buffer of the same predetermned sze s allocated. Subsequently, R 6 s served from the second buffer. On a dfferent approach, nterval cachng [8, 9] further consders request arrval patterns so that memory resource s more effcently used. Interval cachng consders closely arrved requests for the same meda object as a par and orders ther arrval ntervals globally. The subsequent allocaton of memory always favors smaller ntervals. Effectvely, more requests are served gven a fxed amount of memory. Fgure 2(b) shows an example of the nterval cachng method. Upon the arrval of request R 2, an nterval s formed, and a buffer of the sze equvalent to the nterval s allocated. The buffer s flled by the sesson started by R from ths pont on. The sesson ntated by R 2 only needs to receve the frst part of data from the server (the sold lne startng from R 2 ). It can receve the rest of data from the buffer. The same apples when R 3 and R 4 arrve. The stuaton changes untl the arrval of R. Snce the nterval between R 4 and R s smaller than that between R 3 and R 4, the buffer ntally allocated for the nterval of R 3 and R 4 (not flled up yet) s released, and the space s re-allocated for the new nterval of R 4 and R. Now the sesson for R 4 has to run to the end of the stream. Note that these two approaches use the memory resource on the proxy so that the clent s releved from the buffer requrement as n the patchng schemes dscussed n the prevous secton. In addton, one proxy-to-clent channel suffces for these bufferng schemes.

3 Shared Runnng Buffer (SRB) Meda Cachng Algorthm Bufferng streamng content n memory has been shown to have great potental to allevate contenton on the streamng server so that a larger number of sessons can be served. It has been shown that two exstng memory cachng approaches: runnng buffer cachng [3] and nterval cachng [8, 9], do not make effectve use of the lmted memory resource. Motvated by the lmts of the current memory bufferng approaches, we desgn a Shared Runnng Buffer (SRB) based cachng algorthm for streamng meda to better utlze memory. In ths secton, wth the ntroducton of several new concepts, we frst descrbe the basc SRB meda cachng algorthm n detal. Then, we present an extenson to the SRB: (PSRB). 3. Related Defntons The algorthm frst consders buffer allocaton n a tme span T startng from the frst request. We denote R j j as the j-th request to meda object, andt as the arrval tme of ths request. Assume that there are n request arrvals wthn the tme span T and R n s the last request arrved n T. For the convenence of representaton wthout losng precson, T s normalzed to and T j (where < j» n) s a tme relatve to T. Based on the above, the followng concepts are defned to capture the characterstcs of the user request pattern.. Interval Seres: An nterval s defned as the dfference n tme between two consecutve request arrvals. We denote I k as the k-th nterval for object. An Interval Seres conssts a group of ntervals. Wthn the tme T,fn =, the nterval I s defned as ; otherwse, t s defned as: 8 < I k = : T k+ T k ; f <k<n T T n ; f k = () n: Snce I n represents the tme nterval between the last request arrval and the end of the nvestgatng tme span, t s also called as the Watng Tme. P n 2. Average Request Arrval Interval (ARAI): The ARAI s defned as k= Ik =(n ) when n >. ARAI does not exst when n = snce t ndcates only one request arrval wthn tme span T and thus we set t as. For the buffer management, three buffer states and three tmng concepts are defned respectvely as follows.. Constructon State & Start-Tme: When an ntal buffer s allocated upon the arrval of a request, the buffer s flled whle the request s beng served, expectng that the data cached n the buffer could serve closely followed requests for the same object. The sze of the buffer may be adjusted to cache less or more data before t s frozen. Before the buffer sze s frozen, the buffer s n the Constructon State. Thus, the Start-Tme S j of a buffer B j, the j-th buffer allocated for object, s defned as the arrval tme of the last request before the buffer sze s frozen. The requests arrvng n a buffer's Constructon State are called as the resdent requests of ths buffer and the buffer s called as the resdent buffer of these requests. 6

Note that f no buffer exsts for a requested object, a frst buffer wth superscrpt j = s allocated. Subsequent buffer allocatons use monotoncally ncreasng js even f the mmedate precedng buffer has been released. Therefore, only after all buffers of the object run to the end, s t possble to reset j and start from agan. 2. Runnng State & Runnng-Dstance: After the buffer freezes ts sze t serves as a runnng wndow of a streamng sesson and moves along wth the streamng sesson. Therefore, the state of the buffer s called the Runnng State. The Runnng-Dstance of a buffer s defned as the dstance n tme between the start-tme of a runnng buffer and the start-tme of ts precedng runnng buffer. We used j to denote the Runnng-Dstance of B j. Note that for the frst buffer allocated to an object, D s equal to the length of object : L, assumng a complete vewng scenaro. Snce we are encouragng sharng among buffers, clents served from B j are also served from any precedng buffers that are stll n runnng state. Ths requres that the runnng-dstance of B j equals to the tme dfference wth the closest precedng buffer n runnng state. Mathematcally, wehave: D j = 8 < : L ; f j = S j Sm ; f j>; (2) where m<jand S m s the start tme of the closest precedng buffer n runnng state. 3. Idle State & End-Tme: When the runnng wndow reaches the end of the streamng sesson, the buffer enters the Idle State, whch s a transent state that allows the buffer to be reclamed. The End-Tme of a buffer s defned as the tme when a buffer enters dle state and s ready to be reclamed. The End-Tme of the buffer B j, denoted as Ej s defned as: E j = 8 < : Sj + L ; f j = mn(s latest + D j ;Sj + L ) f j>: (3) S latest denotes the start tme of the latest runnng buffer for object. Here, E j s dynamcally updated upon the formng of new runnng buffers. The detaled updatng procedure of s descrbed n the followng secton. 3.2 Shared Runnng Buffer (SRB) Algorthm For an ncomng request to the object, the SRB algorthm works as follows: ) If the latest runnng buffer of the object s cachng the prefx of the object, the request s served drectly from all the exstng runnng buffers of the object. 2) Otherwse, (a) If there s enough memory, a new runnng buffer of a predetermned sze T s allocated. The request s served from the new runnng buffer and all exstng runnng buffers of the object. (b) If there s not enough memory, the SRB buffer replacement algorthm (see Secton 3.2.3) s nvoked to ether re-allocate an exstng runnng buffer to the request or serve ths request wthout cachng. 3) Update the End-Tmes of all exstng buffers of the object based on Eq. (3). Durng the process of the SRB algorthm, parts of a runnng buffer could be dynamcally reclamed as descrbed n Secton 3.2.2 due to the request termnaton and the buffer s dynamcally managed based on the user access pattern through a lfecycle of three states as descrbed n Secton 3.2.. 7

Meda Poston T (a) Access Tme Meda Poston T (b) Access Tme Meda Poston T T Access Tme (c) Fgure 3: SRB Memory Allocaton: the Intal Buffer Freezes ts Sze (a) (b) (c) 3.2. SRB Buffer Lfecycle Management Intally, a runnng buffer s allocated wth a predetermned sze of T. Startng from the Constructon State, the buffer then adjusts ts sze by gong through a three-state lfecycle management process as descrbed n the followng. ffl Case : the buffer s n the Constructon State. The proxy makes a decson at the end of T as follows. If ARAI =, whch ndcates that there s only one request arrval so far, the ntal buffer enters the Idle State (case 3) mmedately. For ths request, the proxy acts as a bypass server,.e., content s passed to the clent wthout cachng. Ths scheme gves preference to more frequently requested objects n the memory allocaton. Fgure 3(a) llustrates ths stuaton. The shadow ndcates the allocated ntal buffer, whch s reclamed at T. If I n >ARAI(I n s the watng tme), the ntal buffer s shrunk to the extent thatthe most recent request can be served from the buffer. Subsequently, the buffer enters the Runnng State (case 2). Ths runnng buffer then serves as a shftng wndow and run to the end. Fgure 3(b) llustrates an example. Part of the ntal buffer s reclamed at the end of T. Ths scheme performs well for perodcally arrved request groups. If I n» ARAI, the ntal buffer mantans the constructon state and contnues to grow to the length of T, where T = T I n + ARAI, expectng that a new request arrves very soon. At T,theARAI and I n are recalculated and the above procedure repeats. Eventually, when the request to the object becomes less frequent, the buffer wll freeze ts sze and enter the Runnng State (case 2). In the extreme case, the full length of the meda object s cached n the buffer. In ths case, the buffer also freezes and enters the runnng state (a statc runnng). For most frequently accessed objects, ths scheme ensures that the requests to these objects are served from the proxy drectly. Fgure 3(c) llustrates ths stuaton. The ntal buffer has been extended beyond the sze of T for the frst tme. The buffer expanson s bounded by the avalable memory n the proxy. When the avalable memory s exhausted, the buffer freezes ts sze and enters the runnng state regardless of future request arrvals. 8

ffl Case 2: the buffer s n the Runnng State. After a buffer enters the runnng state, t starts runnng away from the begnnng of the meda object and subsequent requests can not be served completely from the runnng buffer. In ths case, a new buffer of an ntal sze T s allocated and goes through ts own lfecycle startng from case. Subsequent requests are served from the new buffer as well as ts precedng runnng buffers. Fgure 4 llustrates the maxmal data sharng n the SRB algorthm. The requests R k+ to R n are served smultaneously from B and B 2. Late requests are served from all exstng runnng buffers. Note that except for the frst buffer, the other buffers do not run to the end of the object. R Meda Poston B 2 R 2... R k R k+ R n B Fgure 4: Multple Runnng Buffers Access Tme The Runnng-Dstance and End-Tme are determned based on Eq. (2) and Eq. (3) respectvely for any buffer enterng the Runnng State. In addton, the End-Tmes of ts precedng runnng buffers need to be modfed accordng to the arrval tme of the latest request, as shown n Eq. (3). Fgure shows an example scenaro for the updatng of End-Tmes. Note that E 2 s updated when B 3 s started, at whch tme the runnng dstance of B 2 s extended (dashed lne). When a buffer runs to ts End-Tme, tenters the Idle State (case 3). ffl Case 3: the buffer s n the Idle State. When a buffer enters the Idle State, t s ready for reclamaton. In the above algorthm, the tme span T (whch s the ntal buffer sze) s determned based on the object length. Typcally, a Scale factor (e.g., /2 to /32) of the orgn length s used. To prevent a extremely large or small buffer sze, the buffer sze s bounded by a upper bound Hgh-Bound and a lower bound Low-Bound. These bounds are dependent on the streamng rate to allow the ntal buffer to cache a reasonable porton (e.g., mnute to mnutes) of meda objects. The algorthm requres the clent be able to lsten to multple channels at the same tme: once a request s posted, t should be able to receve data from all the ongong runnng buffers of that object smultaneously. 9

Meda poston B B 2 B 3 Access tme S S 2 S 3 E 3 =S 3 +D 3 D 3 E 2 =S 2 +D 2 E 2 =S 3 +D 2 D 2 Fgure : Example on Updatng of End-Tme 3.2.2 SRB Dynamc Reclamaton Memory reclamaton of a runnng buffer s trggered by two dfferent types of sesson termnatons: complete sesson termnaton and premature sesson termnaton. In complete sesson termnaton, a sesson termnates only when the delvery of the whole meda object s completed. Assumng that R s the frst request beng served by a runnng buffer, when R reaches the end of the meda object, the resdent buffer of R s reclamed as follows. ffl If the resdent buffer s the only buffer runnng for the meda object, the resdent buffer enters the Idle State. In ths state, the buffer mantans ts content untl all the resdent requests reach the end of the sesson, at whch tme the buffer s released. ffl If the resdent buffer s not the only buffer runnng, that s, there are succeedng runnng buffers, the buffer enters the Idle State and mantans ts content untl ts End-Tme. Note that the End-Tme may have be updated by succeedng runnng buffers. Premature sesson termnaton s much more complcated. In ths case, a request that arrves later may termnate earler. Consder a group of consecutve requests R to R n, the sesson for one of the requests, say R j, where <j<n, termnates before everyone else. The stuaton s handled as follows. ffl If R j s served from the mddle of ts resdent buffer, that s, there are precedng and succeedng requests served from the same runnng buffer, the resdent buffer mantan ts current state and R j s deleted from all ts assocated runnng buffers. Fgure 6(a) and (a') show the buffer stuaton before and after R j s termnated, respectvely. ffl If R j s served from the head of ts resdent buffer as shown n Fgure 6(b), the request s deleted from all of ts assocated runnng buffers. The resdent buffer enters the Idle State for

a tme perod of I. Durng ths tme perod, the content wthn the buffer s moved from R j+ to the head. At the end of the tme perod I, the buffer space from the tal to the last served request s released and the buffer enters the Runnng State agan as shown n Fgure 6(b'). ffl If R j s served at the tal of a runnng buffer, two scenaros are further consdered. After deletng the R j from the request lst of ts resdent buffer, f the request lst s not empty, then do nothng. Alternatvely, the algorthm can choose to shrnk the buffer to the extent that R j s served from the buffer assumng R j s a resdent request of the same buffer. In ths case, the End-Tme of the succeedng runnng buffers needs to be adjusted. If R j s at the tal of the last runnng buffer as shown n Fgure 6(c), the buffer s shrunk to the extent that R j s the last request served from the buffer. R j s deleted from the request lst. Subsequently, the buffers run as shown n Fgure 6(c'). Stream of object Stream of object k B k B............ j R (a) (a ) Stream of object j R (b) k B......... Stream of object k B (b )... Stream of object Stream of object k B k B...... j R (c) (c ) Fgure 6: SRB Memory Reclamaton: Dfferent Stuatons of Sesson Termnaton 3.2.3 SRB Buffer Replacement Polces The replacement polcy s mportant n the sense that the avalable memory s stll scarce compared to the sze of vdeo objects. So to effcently use the lmted resources s crtcal to acheve the best performance gan. In ths secton, we propose popularty-based replacement polces for the SRB meda cachng algorthm. The basc dea s descrbed as follows:

ffl When a request arrves whle there s no avalable memory, all the objects that have on-gong streams n memory are ordered accordng to ther populartes calculated n a certan past tme perod. If the object beng demanded has a hgher popularty than the least popular object n memory, then the latest runnng buffer of the least popular object s released, and the space s re-allocated to the new request. Those requests wthout runnng buffers do not buffer ther data at all. In ths case, theoretcally, they are assumed to have no memory consumpton. Alternatvely, the system can choose to start a memoryless sesson n whch the proxy bypasses the content to the clent wthout cachng. Ths s called a non-replacement polcy. We have evaluated the performances of both two polces by smulatons n the later secton. It s shown that a straghtforward non-replacement polce may acheve smlar performance gven long enough system runnng tme. 3.3 (PSRB) Meda Delverng Algorthm Snce the proxy has fnte amount of memory space, t s possble that the proxy serves as a bypass server wthout cachng concurrent sessons. The SRB algorthm prohbts the sharng of such sessons, whch may lead to excessve server access f there are ntensve request arrvals to many dfferent objects. To solve ths problem, the SRB algorthm can be extended to a patchng SRB (PSRB) algorthm whch enables the sharng of such bypass sessons. Ths s related to the sesson sharng algorthms as dscussed n Secton 2.2. It s mportant to note that PSRB scheme makes the memory-based SRB algorthm work wth the memoryless patchng algorthm. Meda Poston B 2 b a R R 2 R 3 R 4 R R 6 R 7 R 8 B Fgure 7: PSRB Cachng Example Access Tme Fgure 7 llustrates a PSRB scenaro. The frst runnng buffer B has been formed for requests R to R. No buffer s runnng for R6 snce t does not have a close neghborng request. However, a patchng sesson has been started to retreve the absent prefx n B from the content server. At ths tme, request R 6 s served from both the patchng sesson and B untl the mssng prefx s patched. Then, R 6 s served from B only (the sold lne for R 6 stops n the fgure). When R 7 and R 8 arrve and form the second runnng buffer B 2, they are served from B and B 2 2

as descrbed n the SRB algorthm. In addton, they are also served from the patchng sesson ntated for R 6. Note that the patchng sesson for R6 s transent, or we can thnk of t as a runnng buffer sesson wth zero buffer sze. As evdent from Fgure 7, the fllng of B 2 does not cause server traffc between poston a and b (no sold lne between a and b) snceb 2 s flled from the patchng sesson for R 6. Thus, sharng the patchng sesson further reduces the the number of server accesses for R 7 and R8. By usng more clent-sde storage, PSRB tres to maxmze the data sharng among concurrent sessons n order to mnmze the server-to-proxy traffc. 4 Workload Analyss To evaluate the performance of the proposed algorthms and to compare them wth pror solutons, we conduct smulatons based on several workloads. Both synthetc workloads and a real workload extracted from enterprse meda server logs are used. We desgn three synthetc workloads. The frst smulates accesses to meda objects n the Web envronment nwhch the length of the vdeo vares from short ones to longer ones. The second smulates the vdeo access n a vdeo-on-demand (VOD) envronment n whch only longer streams are served. Both workloads assume complete vewng clent sessons. We use WEB and P VOD as the name of these workloads. These workloads assume N a Zpf-lke dstrbuton (p = f = = f ;f == ff ) for the popularty of the meda objects. They also assume request nter arrval to follow theposson dstrbuton (p(x; ) =e Λ ( ) x =(x!);x = ; ; 2:::). In the case of vdeo accessng n the Web envronment, clents accesses to vdeos may be ncomplete, that s, a sesson may termnate before the whole meda object s delvered. We smulate ths scenaro by desgnng a partal vewng workload based on the WEB workload. In ths workload, called PARTIAL, 8% of the sessons termnate before 2% of the accessed objects s delvered. For the real workload, we obtan logs from HP Corporate Meda Solutons, coverng the perod from Aprl through Aprl, 2. Durng these ten days, there were two servers runnng Wndows Meda Server TM, servng contents to clents around the world wthn HP ntranet. The contents nclude vdeos, wth the coverage of keynote speeches at varous corporate and ndustry events, messages from the company's management, product announcements, tranng vdeos and product development courses for employees, etc. Ths workload s called REAL. A detaled analyss of the overall characterstcs of the logs from the same servers coverng dfferent tme perods can be found n [6]. 4. Workload Characterstcs Table lsts some statstcs of the four workloads. For the WEB workload, there s a total of 4 unque meda objects, wth a total sze of GB, stored on the server. The length of the meda objects ranges from 2 mnutes to 2 hours. The request nter-arrval follows a Posson dstrbuton wth = 4 seconds and ff = :47 (accordng to[7]) of Zpf-lke dstrbuton for object populartes. The meda objects are coded and streamed at 26 Kbps. The total number of requests s 88. The smulaton lasts 874 seconds or roughly 24 hours. The Low-Bound and Hgh-Bound for the ntal buffer sze are set as 2 MB and 6 MB respectvely. For the PARTIAL workload, 8% of the requests vew only 2% of the requested streamng meda objects and then prematurely termnate. Both the partal-vewng requests and the partal-vewed objects are randomly dstrbuted. Other parameters of the syntheszed trace are dentcal to those 3

Workload Number of Number of Total Object Posson Zpf Smulaton name requests objects sze (GB) length (mn) ff duraton (day) WEB 88 4 2ο2 4.47 PARTIAL 88 4 2ο2 4.47 VOD 73 49 6ο2 6.73 7 REAL 9 43 2 6ο3 - - Table : Workload Statstcs of WEB as shown n Table. For the VOD workload, the number of unque objects stored on the server s, whch accounts to total sze of 49 GB. The length of the objects ranges from hour to 2 hours. The request nterarrval follows a Posson dstrbuton wth = 6 seconds. The Zpf-lke dstrbuton for object popularty s set as ff = :73 (accordng to []). The meda objects are coded and streamed at 2 Mbps. The total number of requests s 73. The smulaton runs for 6439 seconds or roughly aweek. The Low-Bound and Hgh-Bound for ntal buffer allocaton are set as 6 MB and 28 MB respectvely. For the REAL workload, there s a total of 43 objects wth a total sze of 2 GB. There are 9 request arrvals n a tme span of 96427 seconds or roughly days. Fgure 8(a) shows the dstrbuton of the absolute length of vewng tme (n mnutes). It shows that 83.8% of the sessons last less than mnutes. Fgure 8(b) shows the cumulatve dstrbuton of the percentage of vewng tme wth respect to the full object length. It shows that 6.24% of the requests access less than % of the meda object. Only 9.74% of the requests access the full objects. 8 9 Percentage (%) 6 4 Cumulatve dstrbuton (%) 8 7 2 6 ~ 2~3 4~ 6~8 8~ ~2 2+ Absolute length of vewng tme (mn.) ~ 2~3 4~ 6~7 8~9 Vewng percentage (%) Fgure 8: Real workload (left to rght), (a) Dstrbuton of absolute vewng tme (b) dstrbuton of vewng percentage. 4

4.2 Potentally Cachable Content To nd out the amount of potentally cachable content, we evaluate the number of overlapped sessons at each tme nstance durng the smulaton. Spec cally, at one pont n tme, f a sesson s streamng a meda object smlar to at least another sesson, t s called a shared sesson. Note that shared sessons may be streamng d erent portons of the meda object. If a sesson s streamng a meda object that no other sesson s streamng, t s called a not-shared sesson. Fgures 9 and show the accumulated number of sessons n two categores durng full or part of the smulatons for the four workloads. 8 2 Shared Not-Shared Shared Not-Shared 7 6 9 Sessons Sessons 4 8 7 3 6 2 4 2 3 4 Tme (s) 6 7 8 3 9 2 4 6 8 2 Tme (s) Fgure 9: Potentally Cachable Content: WEB (left) and VOD (rght) workloads 3 6 Shared Not-Shared 2 2 4 Sessons Sessons Shared Not-Shared 3 2 2 3 4 Tme (s) 6 7 8 9 4 42 44 46 48 Tme (s) Fgure : Potentally Cachable Content: PARTIAL (left) and REAL (rght) workloads At the rst sght of these gures, t seems that the potental cachable contents of d erent workloads

n the decreasng order should be WEB, VOD, PARTIAL and REAL. But be aware that only WEB and PARTIAL show ther whole duratons n the fgures, whle VOD and REAL only show a part of ther full duratons snce ther duratons are too large to be shown on these fgures. Recall that the duraton of PARTIAL s 7 days, whle the duraton of REAL s days. Combnng these factors, we expect that the WEB workload presents the greatest amount of potentally cachable contents, followed by VOD, REAL and PARTIAL workloads n that order. 4.3 Shared Sesson Durng the course of a streamng sesson of an object, when there are other requests accessng the same object, ths porton of the streamng sesson s shared. The actual cachng beneft depends on the number of sharng requests. For example, the beneft of cachng an on-gong sessons smultaneously shared by two requests s dfferent from that by three requests. To further evaluate the beneft the proxy system can get by cachng for shared sessons, we obtan the dstrbuton of the number of requests on the shared sessons. Ths statstcs s collected as follows. At each tme nstance (second), we recode the numbers of on-gong sessons that are shared by dfferent numbers of sharng requests. These numbers are accumulated n bns representng numbers of sharng requests. At the end of the smulaton, the number n each bn s dvded by thetotal smulaton duraton n second, thus obtanng a hstogram of the averaged number of shared sessons at any tme nstance. Fgure (a) shows the dstrbuton n full range on the number of sharng requests, and (b) shows the range from 2 to 2 n more detal. For a gven pont (x; y) on the curves, y ndcates the average number of on-gong sessons that are shared by x number of requests. 3 2 WEB VOD PARTIAL REAL 3 2 WEB VOD PARTIAL REAL Average Number of Shared Sessons 2 Average Number of Shared Sessons 2 2 4 6 8 2 Number of Sharng Requests 2 4 6 8 2 4 6 8 2 Number of Sharng Requests Fgure : Shared request hstogram. (a) full, (b) part. We fnd that the number of sharng sessons ranges predomnately from 2 to 2. Snce the sessons n the VOD workload last the longest, t s expected that ts average number of shared sessons s the largest among the group of workloads. PARTIAL s the partal vewng case of WEB, thus the number of shared sessons n PARTIAL should be less than that n WEB workload. 6

Performance Evaluaton. Evaluaton Metrcs We have mplemented an event-drven smulator to model a proxy's memory cachng behavors. Snce object ht rato or ht rato s not sutable for evaluatng the cachng performance of the streamng meda, we use server traffc reducton rate (shown as "bandwdth reducton" n the fgures) to evaluate the performance of the proposed cachng algorthms. If the algorthms are employed on a server, ths parameter ndcates the dsk I/O traffc reducton rate. Usng SRB or PSRB algorthms, a clent needs to lsten to multple channels for maxmal sharng of cached data n the proxy's memory. We measure the traffc between the proxy and the clent n terms of the average clent channel requrement. Ths s an averaged number of channels the clents are lstenng to durng the sessons. Snce the clents are lstenng to earler on-gong sessons, storage s necessary at the clent sde to buffer the data before ts presentaton. We use the average clent storage requrement n percentage of the full sze of the meda object to ndcate the storage requrement on the clent sde. The effectveness of the algorthms s studed by smulatng dfferent scale factors for the allocaton of the ntal buffer sze and varyng memory cache capactes. The streamng rate s assumed to be constant for smplcty. The smulatons are conducted on HP workstaton x4, wth GHz CPU and GB memory. For each smulaton, we compare a set of seven algorthms n three groups. The frst group contans the bufferng schemes whch nclude dynamc bufferng and nterval cachng. The second group contans the patchng algorthms, namely greedy patchng, grace patchng and optmal patchng algorthms. The thrd group contans the two shared runnng buffer algorthms proposed n ths paper..2 Performance on Synthetc Workloads We consder complete vewng scenaro for streamng meda cachng n both Web envronments and VOD envronments. We also smulate the partal vewng scenaro for web envronment. There are no partal vewng cases for meda delvery n the VOD envronment..2. Complete Vewng Workload of Web Meda Objects Frst, we evaluate the cachng performance wth respect to ntal buffer sze. Wth a fxed memory capacty of GB, the ntal buffer sze vares from to /32 of the length of the meda object. For each scale factor, an ntal buffer of dfferent sze s allocated f the meda object length s dfferent. The server traffc reducton, the average clent channel requrement and the average clent storage requrement are recorded n the smulaton. The results are plotted n Fgure 2. Fgure 2(a) shows the server traffc reducton acheved by each algorthm. Note that PSRB acheves the best reducton and SRB acheves the next best reducton except optmal patchng. RB cachng acheves the least amount of reducton. As expected, the performance of the three patchng algorthms does not depend on the scale factor for allocatng of the ntal buffer. Nether does that of nterval cachng snce t allocates buffers based on access ntervals. 7

7 3 Bandwdth Reducton (%) 4 4 3 3 2 2 Average Clent Channel Requrement 6 4 3 2 Average Clent Storage Requrement (%) 2 2 2 4 8 6 32 2 4 8 6 32 2 4 8 6 32 /Scale /Scale /Scale Fgure 2: WEB workload (left to rght): (a) Server Traffc Reducton, (b) Average Clent Channel and (c) Storage Requrement(%) wth GB Proxy Memory For the runnng buffer schemes, we notce some varaton n the performance wth respect to the scale factor. In general, the varatons are lmted. To a certan extent, the performance gan of the bandwdth reducton s a trade-off between the number of runnng buffers and the sze of runnng buffers. A larger buffer ndcates that more requests can be served from the t. However, a larger buffer ndcates less memory space left for other requests. Ths n turn leads to more server accesses snce there s no avalable proxy memory. On the other hand, a smaller buffer may serve a smaller number of requests but t leaves more proxy memory space to allocate for other requests. Fgure 2 (b) and (c) show the average channel and storage requrement on the clent. Note that optmal patchng acheves the better server traffc reducton by payng the penalty of mposng the bggest number of clent channels requred. Comparatvely, PSRB and SRB requres 3ο6% less clent channels whle achevng smlar or better server traffc reducton. PSRB allows sesson sharng even when memory space s not avalable. It s therefore expected that PSRB acheves the maxmal server traffc reducton. In the mean tme, t also requres the maxmum clent sde storage and clent channels. On the other hand, SRB acheves 6 percentage pont less traffc reducton than PSRB, but the requrement on clent channel and storage s sgnfcantly lower. 6 7 3 Bandwdth Reducton (%) 4 3 2 Average Clent Channel Requrement 6 4 3 2 Average Clent Storage Requrement (%) 2 2.2. 2 4.2. 2 4.2. 2 4 Memory Cache Sze (Gbytes) Memory Cache Sze (Gbytes) Memory Cache Sze (Gbytes) Fgure 3: WEB workload (left to rght): (a)bandwdth Reducton, (b)average Clent Channel and (c)storage Requrement(%) wth the scale of /4 8

We now nvestgate the performance of dfferent algorthms wth respect to varous memory capactes on the proxy. In ths smulaton, we use a fxed scale factor of /4 for the ntal buffer sze. Fgure 3 (a) ndcates flat traffc reducton rate for the three patchng algorthms. Ths s expected snce no proxy memory resource s utlzed n patchng. On the other hand, all the other algorthms nvestgated acheve hgher traffc reducton when memory capacty ncreases. It s mportant to note that the proposed two SRB algorthms acheve better traffc reducton than the nterval cachng and runnng buffer schemes. In Fgure 3(b), the clent channel requrement decreases for the PSRB algorthm when the cache capacty ncreases. Ths s agan expected snce more clents are severed from the proxy buffers nstead of proxy patchng sessons. When the cache capacty reaches 4 GB, PSRB requres only 3% of the clent channel needed for the optmal patchng scheme. PSRB also requres less clent storage at ths cache capacty as ndcated n Fgure 3(c). And yet, PSRB acheves more than percentage ponts of traffc reducton comparng to the optmal patchng scheme. For the SRB algorthm, t generally acheves the second best traffc reducton wth even less penalty on clent channel and storage requrements..2.2 Complete Vewng Workloads of VOD Objects To further evaluate the performance of the proposed algorthm n a vdeo delvery envronment wth relatvely longer streamng sessons, we use the VOD workload for the smulaton. Compared wth the result we obtaned n the WEB workload smulaton, the smulaton on VOD workload demonstrates smlar trade-offs wth respect to varyng scale factors. Ths s evdent as shown n Fgure 4. 4 3 2 Bandwdth Reducton (%) 3 3 2 2 Average Clent Channel Requrement 2. 2.. Average Clent Storage Requrement (%) 2 2 4 8 6 32 /Scale 2 4 8 6 32 /Scale 2 4 8 6 32 /Scale Fgure 4: VOD workload (left to rght): (a) Bandwdth Reducton, (b) Average Clent Channel and (c) Storage Requrement(%) wth GB Proxy Memory Wth a fxed scale factor for the ntal buffer sze, Fgure (a) llustrates smlar performance gan as acheved for the WEB workload. In addton, smlar penaltes are mposed on each algorthm as s evdent from Fgure (b) and (c). Note that the performance gan for the VOD workload s larger than that of the WEB workload. Ths s manly due to the longer streamng sesson on average n the VOD workload. A longer streamng sesson provdes more opportunty for multple clents to share the sesson. 9

4 3 2 Bandwdth Reducton (%) 3 3 2 2 Average Clent Channel Requrement 2. 2.. Average Clent Storage Requrement (%) 2.2. 2 4 Memory Cache Sze (Gbytes).2. 2 4 Memory Cache Sze (Gbytes).2. 2 4 Memory Cache Sze (Gbytes) Fgure : VOD workload (left to rght): (a) Bandwdth Reducton, (b) Average Clent Channel and (c) Storage Requrement(%) wth the Scale of /4.2.3 Partal Vewng of Web Workload In streamng meda delvery over the Web, t s possble that some clents termnate the sesson after watchng for a whle from the begnnng of the meda object. It s mportant to evaluate the performance of the proposed algorthm under ths stuaton. Usng the PARTIAL workload as defned n Secton 4, we perform the same smulatons and evaluate the same set of parameters. Fgure 6(a) shows smlar characterstcs as that n Fgure 2. PSRB and SRB stll acheves better traffc reducton. The concluson holds that PSRB uses 6% of the clent channel to acheve percentage pont better traffc reducton compared wth the optmal patchng. 4 7 Bandwdth Reducton (%) 3 3 2 2 Average Clent Channel Requrement 6 4 3 2 2 4 8 6 32 /Scale 2 4 8 6 32 /Scale Fgure 6: PARTIAL workload (left to rght): (a) Bandwdth Reducton and (b) Average Clent Channel Requrement wth GB Proxy Memory In the event that a sesson termnates before t reaches the end of the requested meda object, t s possble that the clent has already downloaded future part of the meda stream whch s no longer needed. To characterze ths wasted delvery from the proxy to the clent, we record average clent waste. It s the percentage of wasted bytes versus the total prefetched data. Fgure 7 shows the 2

clent waste statstc. Note that for PARTIAL and REAL workloads, snce both contan premature sesson termnatons, the prefetched data whch s not used n the presentaton are not counted as bytes ht n the calculaton of the server traffc reducton. 2 Average Clent Storage Requrement (%) 2 Average Clent Waste (%) 4 4 3 3 2 2 2 4 8 6 32 /Scale 2 4 8 6 32 /SCale Fgure 7: PARTIAL workload (left to rght): (a) Average Clent Storage Requrement(%) and (b) Clent Waste(%) wth GB Proxy Memory As shown n Fgure 7(b), PSRB and SRB have about 42% and % of prefetched data wasted comparng wth % for nterval cachng. Snce the wasted bytes are not counted as ht, ths leads to the lowered traffc reducton rate for PSRB and SRB comparng to that of nterval cache. From another perspectve, nterval cachng does not promote sharng among buffers, hence the clent lstens to one channel only and there s no bufferng of future data. Thus, there s no waste n proxy-to-clent delvery n the event of premature sesson termnaton. We now agan start nvestgaton of the cachng performance wth a fxed scale factor for the ntal buffer sze n Fgure 8. Comparng wth Fgure 2(a), the dstances between the traffc reducton curves between PSRB, SRB and nterval cachng become much smaller n general. Ths renforces the observaton above that PSRB and SRB may lead to more wasted bytes n the partal vewng cases. In addton, the grace patchng achevng almost no traffc reducton shows ts ncapablty n dealng wth the partal vewng stuaton. The reason mght be that the new sesson started by the grace patchng, whch s supposed to be a complete sesson, often termnates when 2% of the meda object s delvered. Snce the duraton of the sesson s short, t s less lkely that a new request to the same meda object s receved. In Fgure 8(b), PSRB demonstrates monotonc decreasng of average clent channel requrement when memory capacty ncreases. Ths s due to the fact that there s a fewer number of zero-szed runnng buffers wth ncreasng proxy memory capacty. Smlarly, as shown n Fgure 9, the clent storage requrement and average clent waste also decrease snce a fewer number of patchng s requred. 2