On Achieving Fairness in the Joint Allocation of Buffer and Bandwidth Resources: Principles and Algorithms

On Achevng Farness n the Jont Allocaton of Buffer and Bandwdth Resources: Prncples and Algorthms Yunka Zhou and Harsh Sethu (correspondng author) Abstract Farness n network traffc management can mprove the solaton between traffc streams, offer a more predctable performance, elmnate certan knds of transent bottlenecks and may serve as a crtcal component of a strategy to acheve certan guaranteed servces such as delay bounds and mnmum bandwdths. Whle farness n bandwdth allocaton over a shared lnk has been studed extensvely, the desred eventual goal s overall farness n the use of all the resources n the network. Ths paper s concerned wth achevng farness n the jont allocaton of buffer and bandwdth resources. Although a large varety of buffer management strateges have been proposed n the research lterature, a provably far and practcal algorthm based on a rgorously defned theoretcal framework does not exst. In ths paper, we descrbe such a framework and a new, provably far, and practcal strategy for the jont allocaton of buffer and bandwdth resources usng the max-mn noton of farness. Through smulaton experments usng real gateway traffc and vdeo traffc traces, we demonstrate the mproved farness of our strategy n comparson to several popular buffer management algorthms. Jont management of buffer and bandwdth resources nvolves both an entry polcy nto the buffer and an ext polcy through the output lnk. Our study reveals that, even though algorthms such as WFQ and DRR that can serve as far ext polces have receved sgnfcantly more attenton, a far entry polcy s more crtcal than a far ext polcy to the overall farness goal when buffer resources are constraned. Index Terms Farness, far schedulng, resource allocaton, buffer management, max-mn, RED A. Background and Motvaton I. INTRODUCTION Farness s an ntutvely desrable property n the allocaton of resources n a network shared among multple flows of traffc from dfferent users. Even when the network s overprovsoned, as s the case n parts of the Internet core today, strct farness n traffc management can mprove the solaton between traffc streams, offer a more predctable performance and also mprove performance by elmnatng some transent bottlenecks. Far allocaton of network resources s especally crtcal n wreless networks and access networks where the demand for resources s frequently greater than the avalablty. Far schedulng polces can also be used to guarantee certan Ths work was supported n part by NSF CAREER Award CCR-9984161 and U.S. Ar Force Contract F30602-00-2-0501. Y. Zhou s wth MSN Dvson, Mcrosoft Corporaton, 1 Mcrosoft Way, Redmond, WA 98052-8300. Tel: 425-722-7049; Fax: 425-936-7329; E- mal: yunkaz@mcrosoft.com. H. Sethu s wth Department of Electrcal and Computer Engneerng, Drexel Unversty, 3141 Chestnut Street, Phladelpha, PA 19104-2875. Tel: 215-895-5876; Fax: 215-895-1695; E-mal: sethu@ece.drexel.edu. qualty-of-servce (QoS) requrements such as delay bounds and mnmum bandwdths. These polces are lkely to play a crtcal role n future packet-swtched networks n supportng applcatons such as vdeo conferencng and Internet TV statons through controllng the nteractons among varous traffc streams wth dfferent requrements. Several formal notons of farness have been proposed to address the queston of what s far n the allocaton of a sngle shared resource among multple requestng enttes. These nclude, among others, max-mn farness [1 3], proportonal farness [4], utlty max-mn farness [5] and mnmum potental delay [6]. Durng the last several years, a varety of algorthms that seek to realze these formal notons have been proposed and mplemented to acheve far allocaton of bandwdth on a shared output lnk [1,2,5 10]. However, bandwdth on a lnk s only one among several knds of resources shared by multple flows n a typcal network. As flows of traffc traverse through a network, they share wth other flows a varety of resources such as lnks, buffers and router CPUs n ther path. The allocaton polces wth respect to each of these resources can have a sgnfcant mpact on the overall performance and QoS acheved by flows. Even though far schedulng of bandwdth over a lnk has receved the most attenton, overall farness n the use of all the resources n the network s ultmately the desred goal. Several researchers, for example, have already recognzed the mportance of jont allocaton of buffer and bandwdth resources [11 14]. Buffer allocaton polces n swtches and routers are drectly related to congeston avodance and flow control polces wth a drect mpact on end-user applcatons. Far allocaton of buffer resources n routers and swtches takes on addtonal sgnfcance wth the ncreasng prevalence of multmeda applcatons that use UDP nstead of TCP and choose to avod end-to-end congeston avodance polces. Ths paper s concerned wth achevng farness n the jont allocaton of buffer and bandwdth resources n a network. A management polcy for a shared buffer conssts of two components. The entry scheduler determnes whch data from whch flows are permtted nto the buffer and whch are not. The entry scheduler s also responsble for pushout,.e., the dscardng of data from the shared buffer n order to accommodate new arrvng traffc. The ext scheduler dequeues traffc from the shared buffer and transmts them onto the output lnk. It s the combnaton of both the entry and the ext schedulers that determnes the overall farness n the allocaton of the buffer and bandwdth resources.

SUBMITTED TO COMPUTER NETWORKS 2 Over the last couple of decades, researchers have proposed and analyzed a varety of entry polces [15 22]. For example, Random Early Detecton (RED) [17] s wdely used n current Internet routers. A revew of varous RED-based buffer management algorthms may be found n [23]. Another popular class of entry schedulers are based on modfcatons to Far Buffer Allocaton (FBA) [19], a revew of whch may be found n [24]. Most of these have attempted to maxmze performance or acheve congeston avodance although several of them have also tred to be far by one measure or another. A precse and formal noton of farness n buffer allocaton, however, has not yet been developed. Thus, there s currently no theoretcal framework around whch one can desgn practcal and far buffer allocaton algorthms, and there also are no formal means of evaluatng the varous buffer allocaton polces already proposed. Ths paper seeks to provde such a framework to defne farness n the jont allocaton of buffer and bandwdth resources, and to facltate the desgn of provably far buffer management strateges. B. Contrbutons The prmary contrbuton of ths paper s a new, provably far and practcal algorthm for the jont allocaton of buffer and bandwdth resources. Ths algorthm s based on a framework that provdes a smple but powerful generalzaton of any of several notons of farness prevously defned for the allocaton of a sngle shared resource or a set of resources vewed as a sngle entty. Our contrbuton begns wth the defnton of an deally far strategy, Flud-flow Far Bufferng (FFB), for the jont allocaton of buffer and bandwdth resources usng the max-mn noton of farness. FFB s an deally far but unmplementable resource allocaton strategy, just as the Generalzed Processor Sharng (GPS) [2] s an deally far but unmplementable schedulng dscplne for allocatng bandwdth among flows sharng a lnk. FFB s ntended to serve research efforts n the desgn of practcal and far buffer allocaton strateges n a manner analogous to the role served by GPS for almost a decade n the desgn, analyss and measurement of schedulng dscplnes for allocatng bandwdth on a shared lnk. Our practcal algorthm, Packet-by-packet Far Bufferng (PFB), s an mplementable approxmaton of the FFB algorthm. We analytcally prove that PFB acheves a close bounded approxmaton to the FFB algorthm. We use real gateway traffc and vdeo traffc traces to compare the farness of PFB aganst several combnatons of popular entry and ext polces and demonstrate the mproved farness wth PFB. Our results show that the entry polcy used n PFB can sgnfcantly mprove farness even n combnaton wth an unfar ext polcy such as Frst-Come-Frst-Served (FCFS) when the buffer sze s fnte. Our results reveal that when buffer resources are constraned, a far entry polcy s more crtcal than a far ext polcy to the overall farness goal. C. Organzaton Ths paper s organzed as follows. In Secton II, we ntroduce the system model consdered n ths paper. In Secton III, we descrbe the concepts of cumulatve resource dvdends and I 1 (t). I (t). I N (t) Fg. 1. Entry A. 1 (t) A. (t) A N (t) The system model. Buffer Capacty C(t) B(t).. D 1 (t) D(t) D N (t) Ext R(t) demands, and also the concept of statonary ntervals of tme over whch one can apply notons of farness n a system wth multple resources. We conclude the secton wth the statement of the Generalzed Prncple of Farness (GPF) for use n systems wth more than one shared resource. In Secton IV, we llustrate the applcaton of GPF and defne what s far n the jont allocaton of buffer and bandwdth resources based on the max-mn noton of farness. In ths secton, we also present the deally far but unmplementable FFB strategy. In Secton V, we present the PFB strategy, a novel and practcal buffer allocaton strategy and prove that t closely approxmates FFB. In Secton VI, we present smulaton results usng real gateway traffc and vdeo traffc to demonstrate the mproved farness of PFB n comparson to popular combnatons of buffer and bandwdth management strateges. Fnally, Secton VII concludes the paper wth a summary. II. SYSTEM MODEL In the system model consdered here, a shared buffer s fed by N flows, labeled as 1, 2,..., N, all destned to the same output lnk. Let R(t) be the maxmum lnk speed at tme nstant t and let C(t) be the capacty of the shared buffer at tme nstant t. Both values are defned to be functons of tme n order to accommodate general stuatons, such as a hgher-level allocaton scheme that may change the avalable capacty of the lnk or the buffer. We assume that all flows belong to the same servce prorty class, and w s the weght assocated wth flow. In reservaton-based networks, the reserved rate of a flow may be used as ts weght. Fg. 1 llustrates our system model and some of the notaton used n ths paper. An entry scheduler regulates the entry of traffc from the N flows nto the shared buffer. The entry scheduler determnes whch data from whch flows are permtted nto the buffer and whch are not. The entry scheduler s also responsble for pushout,.e., the dscardng of data from the shared buffer n order to accommodate new arrvng traffc from another flow. An ext scheduler dequeues traffc from the shared buffer and transmts them onto the output lnk. The ext scheduler, as n schedulng algorthms for the allocaton of bandwdth on a lnk, determnes the sequence n whch traffc from varous flows wll ext through the output lnk. Let S denote the system under consderaton. Let I (t) be the rate at whch data arrves n flow at tme nstant t seekng entry nto the shared buffer. Ths s the only nput nto the system S. Consder a buffer allocaton polcy P, a combnaton of the entry and the ext scheduler s polces. Defne the admsson rate A S,P (t), at tme nstant t, as the rate at whch data from flow get accepted nto the shared buffer of system S under the allocaton polcy P. Traffc that s not admtted nto the shared

SUBMITTED TO COMPUTER NETWORKS 3 buffer s dropped. Note that A S,P (t) can be negatve, such as when the net rate of acceptance nto the buffer s negatve due to pushouts. A S,P (t) I (t) holds for all and t. Defne the departure rate, D S,P (t), as the actual rate at whch traffc belongng to flow departs the shared buffer through the output lnk of system S under the allocaton polcy P. At tme nstant t, let B S,P (t) be the queue length or the buffer occupancy of flow n the shared buffer n system S under the allocaton polcy P. At any gven tme nstant t t 0, B S,P (t) = B S,P (t 0 ) + t t 0 ( A S,P ) (τ) D S,P (τ) dτ. (1) Throughout ths paper, the sum of a quantty over all flows s denoted by droppng the subscrpt for the flow n the notaton. For example, I(t) s the sum of the nput rates of all of the N flows,.e., I(t) = N =1 I (t). A S,P (t), B S,P (t) and D S,P (t) are also defned smlarly. Of course, D S,P (t) R(t), and B S,P (t) C(t). Note that, as mentoned before, the buffer allocaton strategy s completely determned by the actons of the entry and the ext schedulers, whch together determne A S,P (t) and D S,P (t). Also note that the queue length of a flow n the shared buffer s completely determned by the admsson rate, the departure rate and the ntal queue length, as gven by (1). Defnng what s far n buffer allocaton n system S over a certan nterval of tme (, t 2 ), therefore, s the same as defnng the condtons on A S,P (t) and D S,P (t) for all t n (, t 2 ), such that P s far. III. GENERALIZED PRINCIPLE OF FAIRNESS Several dfferent notons of farness have been proposed n the research lterature for the allocaton of a sngle shared resource among a set of requestng enttes [6]. We begn our dscusson by ntroducng a notaton that allows a representaton of any of these notons of farness. Consder N traffc flows, labeled 1, 2,..., N, and wth a weght w assocated wth flow. Let R be the sze of the resource shared among these N flows and let d be the demand correspondng to flow. For the sake of convenence, throughout ths paper we use vectors to ndcate values correspondng to a set of flows. We denote a vector by the ndexed value n a par of square brackets. For nstance, we denote the demand vector as [d ]. Therefore, gven the demand vector [d ], the weght vector [w ], and the total avalable resource R, any gven noton of farness may be represented as, [a ] = F(R, [d ], [w ]) (2) where a s the allocaton for flow based on the noton of farness defned by the functon F. The functon F s dfferent for dfferent notons of farness such as max-mn farness, proportonal farness or utlty max-mn farness. Gven a noton of farness F, an deal schedulng polcy, denoted by G F (S), s one that exactly acheves ths noton of farness n system S. For example, f F represents the functon correspondng to the max-mn far polcy wth respect to the bandwdth [3], and L represents a work-conservng system wth a sngle shared lnk, G F (L) wll denote the GPS polcy [2], the deally far schedulng polcy for max-mn farness. Note that for each system S, there mght exst many far allocaton polces, each correspondng to a dfferent performance level. For example, a polcy that results n nothng beng allocated to any flow may also be consdered as a far polcy as per the noton of max-mn farness. We denote by G F (S) the set of all deally far allocaton polces for system S as per farness noton F. Our theoretcal framework s presented n the context of a prortzed set of resources n [25]. For the sake of clarty and completeness, we descrbe the framework n detal n ths secton usng the smpler system model consdered n ths paper. A. Resource Dvdends and Demands Consder a set of flows usng a shared set of resources n a certan system S. Each flow n the system, dependng upon the applcaton that generates the flow, has a certan desred goal, whch we genercally refer to as the utlty sought by the flow. Over any gven nterval of tme, the cumulatve utlty s merely the utlty consdered over that nterval of tme. Note that the defntons of utlty and cumulatve utlty may be very dfferent n dfferent contexts. For example, n the schedulng of bandwdth over a sngle shared lnk as accomplshed by far schedulng algorthms such as DRR [8], the utlty may be defned as the bandwdth acheved by a flow; the cumulatve utlty acheved by a flow over an nterval of tme would be defned as the amount of ts data transmtted through the shared lnk durng the nterval. For real-tme applcatons wth guaranteed delay requrements, one may defne the cumulatve utlty over an nterval as the fracton of packets that are successfully delvered wthn the specfed guaranteed delay over ths nterval of tme. It s mportant to note that, n ths paper, we do not mpose any partcular noton of how cumulatve utlty over an nterval should be defned. Our only assumpton n ths regard s that the cumulatve utlty over any nterval acheved by a flow s always non-negatve and does not decrease wth an ncrease n the amount of any resource allocated to t. Consder a polcy P for the allocaton of the shared set of resources. Over tme nterval (, t 2 ), denote by U S,P (, t 2 ) the cumulatve utlty acheved by flow under allocaton polcy P n system S. Consder an allocaton polcy, None(), whch grants none of the shared resources to flow. By our notaton, (, t 2 ) s the cumulatve utlty acheved by flow durng tme nterval (, t 2 ) wth the allocaton polcy None(). The dfference n the cumulatve utltes acheved by a flow wth and wthout the use of the allocated porton of the shared U S,None() set of resources,.e., the dfference between U S,P (, t 2 ) and U S,None() (, t 2 ), represents the beneft accrued to the flow due to ths shared set of resources. The followng formally defnes ths concept. Defnton 1: The Cumulatve Resource Dvdend (denoted by CRDIV S,P (, t 2 )) of flow n system S under the allocaton polcy P over an nterval of tme (, t 2 ) s defned as, CRDIV S,P (, t 2 ) = U S,P (, t 2 ) U S,None() (, t 2 ). (3) Now, a noton of farness n the allocaton of the shared resources should specfy a dstrbuton of these cumulatve resource dvdends among the flows. However, such a noton of

SUBMITTED TO COMPUTER NETWORKS 4 farness cannot be developed wthout also defnng a noton of the demands placed on the shared set of resources by the flows. For example, t s only sensble that flows whch have no need for the shared set of resources,.e., wth no demand for them, should not unnecessarly be allocated any of these resources. Ths prncple s a trval generalzaton of already exstng notons of farness n the allocaton of a sngle resource. The demand of a flow for the shared set of resources can be expressed n terms of the beneft or the cumulatve resource dvdend that the flow desres from an allocaton of the shared set of resources. Any flow would lke a based allocaton polcy that grants all of the shared set of resources exclusvely to t. Therefore, the demand of a flow s really the beneft accrued to the flow,.e., the cumulatve resource dvdend of the flow, when all of the shared set of resources s allocated exclusvely to the flow. Let All() be an allocaton polcy that allocates all of the shared resources, n entrety and exclusvely, to flow. The noton of the demand of a flow can now be formally defned as follows. Defnton 2: The Cumulatve Resource Demand (denoted by CRDEM S (, t 2 )) of flow n system S over an nterval of tme (, t 2 ) s defned as, CRDEM S (, t 2 ) = U S,All() (, t 2 ) U S,None() (, t 2 ). (4) Note that the cumulatve resource demand s ndependent of the allocaton polcy P. Note also that the cumulatve resource demand of a flow s no less than the cumulatve resource dvdend of the flow under any allocaton polcy. In schedulng of bandwdth over a sngle shared lnk, a flow gets no throughput at all wth polcy None() snce the lnk s the only resource contrbutng to the utlty. Thus, over any tme nterval, all of the bandwdth allocated to a flow represents the beneft accrued to the flow from the shared resource. In ths case, the cumulatve resource dvdend of a flow over a gven nterval of tme wth a schedulng polcy s the same as the total amount of data from the flow scheduled for transmsson by the polcy durng ths nterval. Smlarly, the cumulatve resource demand of a flow over a certan nterval of tme s just the total amount of data that the flow could transmt durng the nterval f t dd not have to compete wth any other flow. B. The Generalzed Prncple of Farness Based on the defnton of the cumulatve resource demand and the cumulatve resource dvdend over any gven nterval of tme, the shared resources can now be allocated accordng to any gven noton of farness F appled to the cumulatve resource dvdends wth respect to the cumulatve resource demands. Ths would ensure that each flow receves, as per the noton of farness F, a far share of the dvdend from the shared set of resources. However, one cannot apply such a noton of farness over any arbtrary nterval of tme, and ths sgnfcantly hnders a smple extenson of the noton of farness from the sngle-resource case to that for a system wth a lnk and a buffer resource. For example, a noton of farness such as the prncple of max-mn farness also cannot be appled to any arbtrary nterval of tme n the allocaton of bandwdth on a lnk among competng flows. In ths case, a flow s consdered actve at any gven nstant of tme f and only f t s backlogged [8]; and actve over a gven nterval of tme f and only f t s actve at each nstant of tme durng ths nterval. The prncple of max-mn farness may only be appled over ntervals of tme durng whch no flow changes ts state from beng actve to not beng actve, or vce-versa. In our study, we refer to such an nterval of tme over whch one can apply a noton of farness as a statonary nterval. In extendng a noton of farness to the system model S dscussed n Secton II, we wll have to extend the concept of the actve/nactve state of a flow and the concept of a statonary nterval. Consder a system wth two dstnct resources, one of whch s the preferred resource. In the system under consderaton, the lnk s the preferred resource; flows use the buffer resource only f the lnk resource s not avalable for mmedate use. Our framework for the defnton of farness n the system under consderaton s based on a smple common-sense and therefore, axomatc approach whereby we allocate the preferred resource farly, and then allocate the non-preferred resource farly among the flows wth unsatsfed demands. As per any gven noton of farness, a far allocaton n a system wth two dstnct resources s one whch, frstly, farly allocates the preferred resource among all the competng flows and then, farly allocates the other resource among the flows that stll have unsatsfed demands. Denote by S, an dentcal system as the one under consderaton, but wthout the buffer resource. Ths s a system wth just a sngle shared output lnk. We assume that, as per any gven noton of farness F, the deally far allocaton strategy s known for system S. Based on the earler dscusson, only when the demand of a flow n system S cannot be satsfed does t compete wth other flows for the buffer resource. In other words, a flow should be consdered n competton for the buffer resource, f and only f, n the absence of the buffer resource, the flow s not satsfed wth a far allocaton of the lnk resource. Therefore, an actve flow wth respect to the buffer resource s one whose demand, n system S, would not be met under the deally far allocaton polcy, G F (S ). The followng defntons formalze ths thought. Defnton 3: Wth respect to the buffer resource, a flow s actve durng an nterval of tme (, t 2 ) as per the noton of farness F, f and only f, over each subnterval of tme (τ 1, τ 2 ) such that τ 1 τ 2 t 2, the cumulatve resource demand of flow n system S s greater than the cumulatve resource dvdend t would acheve n system S under the deally far allocaton polcy, G F (S ). In other words, flow s actve wth respect to the buffer resource over tme nterval (, t 2 ), f and only f, CRDEM S (τ 1, τ 2 ) > CRDIV S,G F(S ) (τ 1, τ 2 ) for all tme ntervals (τ 1, τ 2 ) such that τ 1 τ 2 t 2. Defnton 4: Wth respect to the buffer resource, a flow s nactve durng an nterval of tme (, t 2 ) as per the noton of farness F, f and only f, over each subnterval of tme (τ 1, τ 2 ) such that τ 1 τ 2 t 2, the cumulatve resource demand of flow n system S s equal to the cumulatve resource

SUBMITTED TO COMPUTER NETWORKS 5 dvdend t would acheve n system S under the deally far allocaton polcy, G F (S ). In other words, flow s nactve wth respect to the buffer resource over a tme nterval (, t 2 ), f and only f, CRDEM S (τ 1, τ 2 ) = CRDIV S,G F(S ) (τ 1, τ 2 ) for all tme ntervals (τ 1, τ 2 ) such that τ 1 τ 2 t 2. Note that t s possble that a flow s nether actve nor nactve wth respect to the buffer resource over a certan nterval of tme, snce the above defntons are based on condtons that requre to be satsfed n each subnterval of tme wthn the gven nterval. For example, consder two contguous ntervals of tme. In the frst nterval, assume that a certan flow s actve wth respect to the buffer resource whle n the second nterval the flow s nactve wth respect to t. Then, n the combned nterval of tme consstng of both the above two ntervals, the flow s nether actve nor nactve wth respect to the buffer resource. Thus, durng any gven nterval, a flow may be sad to be n one of three states wth respect to the buffer resource: actve, nactve or nether. In our case of a system wth more than one resource, f a flow does not need the less preferred resource, then t mples that the flow s satsfed and s not n actve competton wth other flows. Generalzng the concept used n the allocaton of a sngle resource, one may defne farness wth respect to a resource over an nterval only when the set of flows competng for the resource stays constant durng the nterval. We are now ready to present the concept of a statonary nterval n our system, and the Generalzed Prncple of Farness. Defnton 5: In a system S wth two dstnct sets of resources, one of whch s the preferred resource set, a certan nterval of tme s a statonary nterval, f and only f, each flow s ether actve or nactve (but not nether) wth respect to the non-preferred resource set over ths entre nterval. Generalzed Prncple of Farness (GPF): Consder a system S and an allocaton polcy P. P s far as per a noton of farness F, f and only f, over all statonary ntervals of tme, the cumulatve resource dvdends acheved by the flows are dstrbuted farly, as per the noton of farness F, wth respect to the cumulatve resource demands requested by the flows. Note that, f a flow s nether actve nor nactve over a certan tme nterval (, t 2 ), ths nterval can be dvded nto a contguous sequence of subntervals, durng each of whch flow s ether actve or nactve. Thus, even though GPF defnes farness only over statonary ntervals, any gven nterval of tme may be broken down nto a sequence of contguous statonary ntervals. Thus, GPF may be used to defne a far allocaton over any gven nterval. IV. APPLICATION TO BUFFER-LINK SYSTEM MODEL A. What s Far? In ths secton, we llustrate the applcaton of GPF to the system model descrbed n Secton II under specfc notons of farness and the cumulatve utlty acheved by a flow. We wll use max-mn farness as the noton of farness. Throughout the rest of ths paper, we also use the total amount of data from a flow transmtted over the output lnk durng any gven nterval of tme as the cumulatve utlty acheved by the flow over ths nterval. Thus, the cumulatve utlty of a flow n system S over any nterval of tme s gven by, U S,P (, t 2 ) = D S,P (τ)dτ (5) for any allocaton polcy P. Consder the allocaton polcy None(). In the absence of ths set of shared resources (both the lnk and the buffer), the cumulatve utlty s obvously 0. Wth an allocaton polcy P, therefore, the cumulatve resource dvdend over an nterval for each flow s exactly the cumulatve utlty acheved by the flow over the nterval. The cumulatve resource demand of flow s the cumulatve utlty t gets usng the allocaton polcy All(), whch allocates the entre buffer and the output lnk exclusvely to ths flow. Thus, applyng (5) nto (3) and (4), we have, CRDIV S,P (, t 2 ) = CRDEM S (, t 2 ) = D S,P (τ)dτ (6) D S,All() (τ)dτ (7) for any flow and any allocaton polcy P. Recall that to defne the state of a flow as actve or nactve wth respect to the buffer resource, we need to consder the system wthout the buffer resource S. Note that the deally far allocaton polcy n system S defned earler, gven the maxmn noton of farness, s GPS [2]. Now, a flow s sad to be actve wth respect to the buffer resource, or smply actve, over an nterval of tme (, t 2 ) f and only f, over each subnterval (τ 1, τ 2 ) such that τ 1 τ 2 t 2, τ2 D S,All() (τ)dτ > τ 1 τ2 D S,GPS τ 1 (τ)dτ. (8) Smlarly, a flow s sad to be nactve wth respect to the buffer resource, or smply nactve, over an nterval of tme (, t 2 ) f and only f, durng each subnterval (τ 1, τ 2 ), τ2 D S,All() (τ)dτ = τ 1 τ2 D S,GPS τ 1 (τ)dτ. (9) A statonary nterval s one durng whch each flow s ether actve or nactve, and an allocaton polcy P s far f and only f, over all statonary ntervals (, t 2 ), [ ] ( CRDIV S,P (, t 2 ) = F MMF CRDIV S,P (, t 2 ), [CRDEM ) S (, t 2 )], [w ] where F MMF represents the polcy of Max-Mn Farness. B. An Ideally Far Allocaton Strategy (10) Based on the framework developed above and the noton of max-mn farness, we now dscuss Flud-flow Far Bufferng

SUBMITTED TO COMPUTER NETWORKS 6 (FFB), an deally far work-conservng strategy for the jont allocaton of buffer and bandwdth resources. As n the deally far GPS scheduler, the FFB algorthm also assumes that traffc can be dvded nto nfntesmally small quanttes and s schedulable at ths granularty. Wth flud flow traffc, a protocol where traffc may be allowed to bypass the buffer f the buffer s empty s equvalent to one n whch traffc has to always pass through the buffer. Ths s because traffc that s forced to pass through the buffer even though the buffer may be empty spends only an nfntesmal amount of tme n the buffer n such a hypothetcal system. Recall that a buffer allocaton polcy contans two parts: an entry polcy and an ext polcy. It can be readly verfed that the FFB has to use GPS as the ext polcy n order for t to acheve max-mn farness. Ths s because FFB s the far algorthm for the jont allocaton of buffer and bandwdth resources, and therefore t stll needs to provde farness n bandwdth allocaton when buffer allocaton s not an ssue (such as f the buffer s of nfnte capacty). Wth an nfnte buffer, FFB s far f and only f ts ext polcy s far, mplyng that ts ext should be GPS. We now proceed to dscuss the entry polcy n FFB. In the FFB algorthm, we mantan for each flow an acceptance counter (AC ) whch ndcates the amount of data accepted nto the shared buffer from flow. When some data from flow are accepted nto the buffer, AC s ncremented by the amount of the data accepted; when some data from flow are pushed out from the shared buffer, AC s decremented by the amount of data pushed out. Note that AC s not decremented when some data from flow ext the buffer and get transmtted through the shared output lnk; otherwse, AC would be just the buffer occupancy of flow. Therefore, t s possble that a flow has a large value of the acceptance counter n comparson to other flows even whle ts buffer occupancy s relatvely low. Denote by AC (t) the value of the acceptance counter of flow at tme nstant t. If at tme nstant τ, the nput rates for all flows become 0, then AC (τ) ndcates the total amount of data transmtted from flow after all of ts data n the buffer at tme τ are also transmtted. Thus, the acceptance counter of a flow represents ts potental cumulatve dvdend and therefore, represents the quantty that the entry polcy should attempt to be far about n order to acheve far dstrbuton of the cumulatve dvdends wth respect to the demands. By the max-mn far noton of farness, therefore, the FFB strategy should ensure that the acceptance counters of all flows conform to the weghted max-mn far allocaton wth respect to the demands. In summary, the deally far FFB algorthm as per the maxmn noton of farness uses the GPS server as the ext polcy and ensures a weghted max-mn dstrbuton of the acceptance counters as the entry polcy durng each statonary nterval of tme. V. PACKET-BY-PACKET FAIR BUFFERING It s obvous that the FFB scheduler that assumes flud flow behavor s not mplementable wth real traffc that s packetzed. In ths secton, we present Packet-by-packet Far Bufferng (PFB), a practcal and mplementable approxmaton to the deally far FFB scheduler. 1 Intalze: /* Invoked when the system starts */ 2 FlowLst NULL; 3 Enqueue: /* Invoked whenever a packet arrves */ 4 p ArrvngPacket; 5 Flow(p); /* Flow of packet p */ 6 f (ExstsInFlowLst() = FALSE) then 7 Fnd flow k wth mnmum AC j/w j, j FlowLst; 8 AC w AC k /w k ; 9 Append flow to FlowLst; 10 end f; 11 f (EmptySpaceInBuffer Sze(p)) then 12 Accept p nto buffer; 13 AC AC + Sze(p); 14 else 15 Pushout(p); /* Pushout packets to accommodate p */ 16 end f; 17 Dequeue: /* Always runnng */ 18 Use any real approxmaton of GPS, except that 19 after each packet p s transmtted, nvoke Transmt(p) 20 Transmt(p): /* Invoked whenever a packet departs */ 21 Flow(p); /* Flow of packet p */ 22 f (QueueIsEmpty() = TRUE) then 23 Remove flow from FlowLst; 24 end f; Fg. 2. Pseudo-code of Packet-by-packet Far Bufferng. A. The PFB Algorthm The pseudo-code of the PFB algorthm s presented n Fgs. 2 and 3. The PFB algorthm mantans a lnked lst, called FlowLst, whch conssts of all the flows wth packets watng n the shared buffer. When a flow has no packets watng n the shared buffer, t s removed from the FlowLst (accomplshed by lnes 22 24 and 50 52) and other flows are not affected. In the Dequeue procedure, an mplementable far schedulng algorthm such as SCFQ [10], SPFQ [9] or DRR [8] may be used that can acheve long-term farness wth a bounded value of the relatve farness bound as defned n [3, 10]. The Enqueue procedure (lnes 3 16) s nvoked whenever a packet arrves at an nput port of the shared buffer. Assume that a packet p from flow arrves. If flow does not already exst n FlowLst, t s appended to the tal of the FlowLst and ts normalzed acceptance counter s set to the current mnmum of the normalzed acceptance counters of the actve flows (lnes 6 10). Ths s smlar to the dea of the potental functon n [9] where a newly backlogged flow has an ntal potental equal to the system potental, whch s no more than the mnmum of the potentals of all exstng flows. It should be noted that the acceptance counter of a newly backlogged flow s not ntalzed to 0. Otherwse the system wll be based n favor of a new flow when the buffer s congested. Consder a system wth two flows of equal weght, 1 and 2, and each flow has the same acceptance counter of a relatvely hgh value snce they have been backlogged for some tme. Assume that the buffer s full, and at ths moment flow 3 also of the same weght arrves. Ths begns a new statonary nterval. If AC 3 s ntalzed to a comparatvely low value, data from flow 3 wll not be pushed out for some tme; nstead, t s data from flows 1 and 2 that wll be pushed out. The system

SUBMITTED TO COMPUTER NETWORKS 7 25 Pushout(p): 26 Flow(p); /* Flow of packet p */ 27 SpaceNeeded Sze(p) EmptySpaceInBuffer; 28 Accepted FALSE; 29 whle(accepted TRUE) 30 Among all unmarked flows, fnd flow k wth 31 the largest AC j/w j, j FlowLst; 32 f (AC k /w k = AC /w ) then 33 Dscard p; 34 Unmark all marked flows and data; 35 return; 36 end f; 37 f (Occupancy(k) < SpaceNeeded) then 38 Mark flow k and all data from flow k; 39 SpaceNeeded SpaceNeeded Occupancy(k); 40 else 41 Mark flow k and at least SpaceNeeded worth 42 of data from flow k; 43 Accepted TRUE; 44 end f; 45 end whle; 46 for each marked flow j 47 Unmark flow j; 48 Pushout all marked data from flow j; 49 Decrement AC j by the amount of data pushed out; 50 f (QueueIsEmpty(j) = TRUE) then 51 Remove flow j from FlowLst; 52 end f; 53 end for; 54 Accept p nto buffer; 55 AC AC + Sze(p); Fg. 3. Pseudo-code of the Pushout procedure n PFB algorthm. would thus be unfar n ths statonary nterval because t wll punsh flows for beng prevously backlogged. On the other hand, f AC 3 s ntalzed to be the same as AC 1 and AC 2 as descrbed n Fg. 2, data beng pushed out wll be equally dstrbuted amongst all the three flows, thus achevng farness n the new statonary nterval. Note that flow 3 n the above example can be a bursty nput traffc, and the ntalzaton of acceptance counters n PFB ensures that bursty traffc s not treated n a based manner n comparson to flows wth steady traffc. Also note that, n the case when there are no exstng flows n the system, the acceptance counter of a newly arrvng flow wll be ntalzed to 0. The algorthm subsequently checks f there s enough empty space n the shared buffer to accommodate packet p (lnes 11 16). If so, packet p s accepted nto the shared buffer, and the acceptance counter of flow s ncremented by the sze of packet p. If there does not exst enough buffer space for packet p, the Pushout procedure shown n Fg. 3 s nvoked to push out some packets from other flows to accommodate packet p. In order to acheve a max-mn dstrbuton of the acceptance counters that s as close as possble to the deal, based on the szes of the packets nvolved, one may need to push out packets from more than one flow. In our approxmaton of the deally far algorthm, however, for the sake of computatonal effcency, we select exactly one flow and attempt to pushout as much data from ths flow as necessary to accommodate the newly arrved packet. Pushout from another flow occurs only when the flow that s selected frst does not have suffcent data that can be pushed out to accommodate the arrvng packet. When the Pushout procedure s nvoked, t frst fnds a flow, say k, wth the largest value of the normalzed acceptance counter. If the normalzed values of the acceptance counters are avalable n a sorted array organzed as a heap, ths can be done n O(log N) tme wth respect to the number of flows. If flows and k have the same value of the normalzed acceptance counter,.e., AC /w = AC k /w k, packet p s dscarded (lnes 32 36). Otherwse, the Pushout procedure attempts to push out some data from flow k to accommodate packet p. If flow k has enough data, the Pushout procedure returns after packet p s accepted, and AC and AC k are approprately updated (lnes 41 43). When the buffer occupancy of flow k s not large enough, all data from flow k are pushed out, and the Pushout procedure wll contnue wth the next flow wth the largest value of the normalzed acceptance counter. Note that n PFB, data to be pushed out are only marked frst (lnes 38 and 41), and the true pushout s executed only f packet p can be accepted (lnes 46 53), thus makng PFB a work-conservng algorthm. Note that n a lghtly loaded system of a shared buffer and a shared lnk, the entry polcy of the PFB algorthm wll reduce to that of a smple scheduler whch accepts all traffc. B. Farness Analyss Wth packetzed traffc, as opposed to flud-flow traffc, shortterm farness may be degraded but the algorthm wll stll acheve long-term farness. The followng theorem proves an upper bound on the lag of the PFB algorthm wth respect to the deally far FFB algorthm. Theorem 1: Consder a certan statonary nterval (, t 2 ). Consder two dentcal systems wth the same ntal condtons and the same nput traffc sequence, except that one uses the FFB entry polcy and the other uses the PFB entry polcy. For any flow and for any tme nstant t (, t 2 ), we have, AC F (t) ACP (t) M where AC F(t) and ACP (t) are the acceptance counters correspondng to FFB and PFB respectvely, and M s the maxmum packet sze. Proof: Note that snce (, t 2 ) s a statonary nterval, no flow changes state durng ths nterval, and therefore, durng (, t 2 ) no flow becomes empty n the shared buffer due to pushout. Note that a dfference n the actons by the entry polces of PFB and FFB occurs only when the buffer s full. Therefore, we only need to consder the stuaton when a buffered packet s pushed out or an arrvng packet s dscarded. Consder a certan flow. For the sake of convenence, denote by T F (t) the set of flows wth the largest normalzed acceptance counter at tme nstant t under FFB,.e., the set wth the largest value of ACj F(t)/w j, j. The set T P (t) s smlarly defned for the PFB algorthm. Assume tme nstant t 0 s the frst tme that AC F (t) and AC P(t) dffer from each other,.e., ACF (t) = ACP (t), t < t 0. Let t + 0 be the nstant of tme that the executon of the pushout or the dscard completes n response to an event at tme t 0 (assume neglgble length of tme to complete such an executon). It can be verfed that AC F (t+ 0 ) becomes larger than AC P(t+ 0 ) n only one of the two followng stuatons.

SUBMITTED TO COMPUTER NETWORKS 8 A packet p from flow arrves at tme t 0. At ths nstant, there exsts a certan amount of space avalable n the shared buffer but t s not large enough to accommodate all of p. If flow belongs to both T F (t 0 ) and T P (t 0 ), part of the packet p s accepted under FFB, whle under PFB, the entre packet p s dscarded. In ths case, 0 AC F (t+ 0 ) ACP (t+ 0 ) M. Note that after FFB accepts part of packet p, flow belongs to T F (t + 0 ). At tme nstant t 0, a packet p from a flow other than arrves and the buffer does not have enough space. Assume both T F (t 0 ) and T P (t 0 ) contan more than one flow, and flow belongs to both sets. Thus, the amount pushed out from flow under FFB s less than the sze of packet p, snce data from multple flows are pushed out. In the case of PFB, however, only one flow s selected for pushout, and t s possble that flow s chosen. In ths case, 0 AC F (t+ 0 ) ACP (t + 0 ) M. Note that, agan we have T F (t + 0 ), whle n the case of PFB, there s another flow wth a larger normalzed acceptance counter,.e., / T P (t + 0 ). Note that n both the above stuatons, we have T F (t + 0 ). Next, we proceed to show that, for any tme nstant τ wthn a statonary nterval, f flow belongs to T F (τ ), AC F (τ + ) AC P (τ + ) AC F (τ ) AC P (τ ). (11) It s suffcent to consder only the tme nstants when acceptance counters change,.e., when new packets arrve. Assume a packet p arrves at tme τ and the flow under consderaton T F (τ ). If packet p s from flow, t wll be dscarded under FFB, and thus AC F (τ+ ) = AC F (τ ). On the other hand, under PFB, packet p wll ether be accepted or dscarded,.e., the acceptance counter of flow wll not decrease. Therefore, (11) s satsfed. If packet p comes from another flow other than, some data from flow wll be pushed out under FFB, snce T F (τ + ). Whle n PFB, AC P(τ+ ) = AC P(τ ), snce as mentoned above, / T P (τ + ). Agan, (11) s satsfed. Note that n both cases, t s always true that T F (τ + ). Therefore by nducton, we may conclude that once the dfference AC F (t) ACP (t) becomes greater than 0, t only decreases wth ncreasng tme at least untl t becomes negatve. In addton, as shown above, when AC F (t) ACP (t) becomes postve, ts maxmum possble value s M, the sze of the largest packet. Therefore, for any tme nstant t (, t 2 ), AC F (t) ACP (t) M, thus boundng the dfference between the practcal and the deally far schedulers. C. Computatonal Effcency Theorem 2: The computatonal complexty of Enqueue procedure n PFB s O(log N), where N s the number of flows n the system. Proof: Note that PFB can mantan the FlowLst usng a sorted lst, based on the normalzed value of the acceptance counter. The work complexty s O(log N) when usng a heap to mantan the sorted lst and usng vrtual queues to manage the dfferent flows n the shared buffer. In addton, to accommodate a packet, at most M/m packets need to be marked and pushed out, where M and m are the maxmum and mnmum packet szes, respectvely. Therefore, the whle loop (lnes 29 45) and the for loop (lnes 46 53) wll be executed, n the worst case, M/m tmes. The complexty of the Pushout procedure, therefore, s O( M/m log N) or smply, O(log N). Note that the computatonal complexty of the Dequeue procedure s smply the complexty of the far scheduler used to mplement the ext polcy. For example, f DRR [8] s used, the per-packet dequeung complexty wll be O(1) wth respect to the number of flows. VI. MEASURE OF FAIRNESS AND SIMULATION RESULTS In ths secton, we present a measure of farness n the jont allocaton of buffer and bandwdth resources, and usng ths measure we compare the farness of PFB aganst some representatve entry and ext polces usng real gateway traffc traces and vdeo traffc traces. A. Measure of Farness In measurng farness n the jont allocaton of buffer and bandwdth resources, we extend the basc premse of the Absolute Farness Bound (AFB) defned n the context of allocatng bandwdth on a lnk [3,26]. The AFB captures the upper bound on the dfference between the normalzed servce receved by a flow under the polcy beng measured and that receved by the same flow under the deally far polcy. Let G F (S, P) be an deally far buffer allocaton polcy for system S based on the noton of farness F, such that ts total cumulatve utlty s dentcal to that of P,.e., D S,P (τ)dτ = D S,GF(S,P) (τ)dτ. Note that the defnton of G F (S, P) s slghtly dfferent from the defnton of G F (S) n Secton III. G F (S) s defned as the deally far allocaton for system S as per farness noton F. As we mentoned before, for each system S, there mght exst many far allocaton polces, each wth a dfferent cumulatve dvdend (total throughput n ths case). G F (S, P) s the one wth exactly same cumulatve dvdend as that of polcy P, n other words G F (S, P) G F (S). We wll use G F (S, P) to evaluate the farness of polcy P snce they have the same total cumulatve dvdends. Note that n our study of farness n buffer allocaton, we make no assumpton about whether or not the allocaton polcy beng measured s work-conservng wth respect to the shared set of resources. Therefore, a normalzng quantty based on performance s necessary n extendng the noton of the farness measure to our case of buffer allocaton. Ths normalzaton should allow us to use our farness measure n a vald comparson between varous buffer allocaton strateges. We now defne the normalzed Absolute Farness Measure over an nterval of tme as follows:

SUBMITTED TO COMPUTER NETWORKS 9 Defnton 6: In a system S wth a shared buffer, a shared output lnk and a gven nput traffc arrval pattern, the normalzed Absolute Farness Measure, nafm S,P (, t 2 ), of an allocaton polcy P over an nterval of tme (, t 2 ) s defned as, nafm S,P (, t 2 ) max = D S,P w (τ)dτ D S,P (τ)dτ D S,GF(S,P) w (τ)dτ. Note that the above measure depends on the nput traffc arrval pattern, and therefore, an algorthm wll naturally have dfferent upper bounds, nafb, for dfferent nput traffc patterns. Also note that the farness measured as above wll approach 1.0 wth any real algorthm when the sze of the tme nterval, t 2, s extremely small. At the same tme, for most real buffer allocaton strateges, the farness measured as above wll approach 0 when the sze of the tme nterval consdered s very large. Thus, a vald comparson between varous allocaton algorthms can be made usng the above measure only f the szes of the tme ntervals beng consdered are dentcal. Therefore, a meanngful measure of the farness for a gven nput pattern s not a sngle number but a functon of τ, the sze of the tme nterval over whch farness s measured. In our smulaton study, we use the observed maxmum of nafm S,P (t, t + τ) over all t to ndcate the farness of the allocaton polcy P for each nterval of sze τ. Also note that the normalzed farness measure depends on the values of the flow weghts. Ths s consstent wth the defntons of farness measure and farness bound n other works such as [3,8,26]. It s often assumed that ether the total weght of all flows s 1, or the mnmum weght s 1. B. Smulaton Setup Our smulaton model conssts of a shared buffer fed by 8 nput traffc sources. Traffc from these 8 flows s headed to the same shared output lnk va the shared buffer. In our frst set of smulaton experments, we use real gateway traffc traces. In our second set of smulaton experments, we use vdeo traffc traces 1. In our study, we have mplemented fve dfferent entry polces ncludng the PFB entry polcy and three dfferent ext polces. Note that the PFB algorthm for buffer and bandwdth management uses the PFB entry polcy and a far packet scheduler such as DRR as the ext polcy. The entry polces we smulate are chosen to be representatve and nclude the followng: () Drop From Longest Queue (DFLQ), whch pushes out packets belongng to the flow wth the longest queue n the buffer whenever the shared buffer s full, and accepts all packets, otherwse; () Statc Threshold (ST), whch assgns an equal fxed buffer occupancy threshold (.e., 1/8 of the total buffer sze) to each flow and no flow s allowed to occupy more than ths threshold; () Random Early Detecton (RED) [17], whch drops arrvng packets wth 1 Here we only show the smulaton results usng real traffc traces so that the performance of these entry and ext polces under evaluaton can be better llustrated n real scenaros. Synthetc traffc has also shown smlar results. a probablty that s a dynamc functon of the average buffer occupancy; (v) Far Bufferng Random Early Detecton (FB- RED) [27], whch s a varant of RED that uses the bandwdthdelay product of a flow to determne the probablty wth whch a packet from the flow s dropped; and fnally, (v) the PFB entry polcy. These fve entry polces can be categorzed nto two groups: one ncludng DFLQ, ST and PFB, and the other ncludng RED and FB-RED. Ths s because both RED and FB-RED are ntended to be congeston avodance algorthms, and therefore assumed to work n stuatons where the shared buffer s never full (packets are dropped before the buffer gets full). In our smulaton studes, all parameters of the RED algorthm follow the recommendatons n [28]. The shared buffer sze used n the smulatons s selected n such a way that the buffer can accommodate a moderate traffc burst, and yet the buffer s congested durng the majorty of smulaton tme. Three ext polces are also mplemented: () Frst-Come Frst-Serve (FCFS), whch dequeues packets n the order of ther arrval tmes; () Longest Queue Frst (LQF), whch schedules packets from the flow wth the longest queue n the shared buffer; and () Defct Round-Robn (DRR) [8], a smple and popular far round-robn scheduler. In our mplementaton, the DRR quantum s set to be equal to the maxmum packet sze. In the schedulng of bandwdth over a lnk, both FCFS and LQF have an absolute farness bound of nfnty,.e., both are unfar gven the max-mn far noton of farness. DRR, on the other hand, s the representatve far algorthm used here. C. Gateway Traffc Traces In ths study, we use real traffc recorded at Internet gateways as the nput traffc [29] 2. Fg. 4 plots the observed maxmum value of nafm S,P (t, t + τ) aganst τ for dfferent pars of entry and ext schedulng polces 3. Specfcally, Fg. 4(a) plots the observed maxmum value of nafm S,P (t, t + τ) aganst τ when the entry schedulng polcy s RED and FB-RED, whle Fg. 4(b) plots the same for DFLQ, ST and PFB entry polces. From Fg. 4(b) t s seen that among all examned combnatons of entry and ext polces, fve have a farness measure approachng zero as τ ncreases. These fve combnatons that are able to provde long-term farness are ST wth DRR, DFLQ wth DRR and all three combnatons wth PFB. To better llustrate the dfferences between these fve combnatons, a logarthmc plot s also presented n Fg. 4(c). Fg. 4 shows that usng only a far ext scheduler such as DRR s not enough to guarantee overall farness. Note that buffer allocaton strateges wth RED and FB-RED as the entry polcy fal to acheve farness, whle those wth PFB succeed. One nterestng observaton s that, when PFB s used n combnaton wth unfar ext schedulers such as LQF and FCFS, the farness acheved s actually very close to that wth DRR as the ext scheduler. In other words, a far entry polcy n combnaton wth a hghly unfar ext polcy leads to acceptable overall farness; however, an unfar entry polcy even wth a far ext polcy cannot guarantee overall farness. Ths concluson that emerges 2 The traces are obtaned from the Passve Measurement and Analyss project at the Natonal Laboratory for Appled Network Research (NLANR). 3 Here one cycle s the average tme to transm packet nto the shared buffer.