End-to-end Distortion Estimation for RD-based Robust Delivery of Pre-compressed Video

End-to-end Dstorton Estmaton for RD-based Robust Delvery of Pre-compressed Vdeo Ru Zhang, Shankar L. Regunathan and Kenneth Rose Department of Electrcal and Computer Engneerng Unversty of Calforna, Santa Barbara, CA 93106 Λ Abstract Applcatons where packetzed vdeo s streamed over the Internet, must be desgned to acheve robustness to packet loss as well as compresson effcency. Whenever possble, the deal soluton to ths problem s to jontly optmze the adaptaton of the compresson and error protecton strateges to the network status, so as to mnmze the expected end-to-end dstorton of reconstructed vdeo at the recever. However, n the case of pre-compressed vdeo streamng, compresson s performed wthout knowledge of the network condton, and conversely, the delvery s performed wthout access to the orgnal sgnal. It s hence dffcult for the transmtter to estmate and mnmze the end-to-end dstorton durng delvery. Ths paper addresses ths problem by dervng an algorthm whch enables the transmtter, or other ntermedate nodes, to estmate the overall end-to-end dstorton whle delverng pre-compressed vdeo. Ths estmate fully accounts for the effects of (pror) quantzaton, packet loss and error propagaton, as well as error concealment. The accuracy of the estmate s demonstrated by smulaton results. The algorthm requres storage of mnmal sde-nformaton that s computed durng compresson. The algorthm s of low complexty, and s applcable to vrtually all codng technques, ncludng the standard (predctve) vdeo coders. The paper also dscusses the use of ths estmate to adapt a varety of packet-loss reslence technques for pre-compressed vdeo streamng. The consderable potental gans of ths approach are llustrated va the example of an FEC-based streamng vdeo system. 1 Introducton Internet-based packetzed vdeo streamng applcatons have attracted tremendous attenton n recent years. The Λ Ths work s supported n part by the NSF under grants no. MIP- 9707764, EIA-9986057 and EIA-0080134, the Unversty of Calforna MICRO Program, Conexant Systems, Inc., Dolby Laboratores, Inc., Lucent Technologes, Inc., Medo Stream, Inc., and Qualcomm, Inc. unrelable packet delvery through the Internet requres that vdeo streamng systems provde robustness to loss as well as compresson effcency. Whle standard source-channel codng algorthms [1] [2] can be used to optmze the delvery of lve-content, they are ncompatble wth applcatons whch stream pre-compressed vdeo. The man dffculty s due to the fact that network condtons are unknown durng the compresson stage. As an llustraton, consder an applcaton that delvers Vdeo on Demand. The raw vdeo content s compressed offlne, and s stored on the server. Network condton parameters such as bandwdth, packet loss probabltes and delay jtter vary wdely based on the characterstcs of the avalable lnks between the server and the clent (recever). They have sgnfcant effects on system performance. Clearly, optmzaton of the error reslence strategy durng delvery has no access to the orgnal vdeo. Ths represents a major dffculty n estmatng and mnmzng the end-to-end dstorton, whch quantfes the dfference between the orgnal sgnal and the decoder reconstructed sgnal (after loss and error concealment). Further, practcal restrctons on server complexty preclude the use of complex algorthms that perform requantzaton of the source btstream, or perform other modfcatons at the source-syntax level. Instead, adaptaton should be based on smple transport-level tools, such as Forward Error Correcton (FEC) or Automatc Retransmsson request (ARQ). Note that smlar constrants apply to other streamng vdeo applcatons such as transcodng at an ntermedate node, and Internet multcast. The problem of robust streamng of pre-compressed vdeo has been addressed n [3] [4] [5] [6]. In [4], t was recognzed that the deal reslence strategy at the server s one whch adapts to the actual bandwdth and packet loss statstcs of the network n order to mnmze the expected end-to-end dstorton of reconstructed vdeo at the recever. A Lagrangan Rate-Dstorton (R-D) framework was proposed to acheve the optmal adaptaton strategy. But, the practcal usefulness of ths framework s lmted n the absence of a convenent method to compute the overall reconstructon dstorton. The task of computng end-to-end ds-

torton s complcated by many nter-related factors. They nclude (pror) quantzaton, effectve packet loss statstcs whch s a functon of the network condton and the error reslence strategy, and error concealment. Further, the use of nter-frame predcton n vdeo coders results n s- patal and temporal error propagaton, and hence addtonal nter-dependences between packets. The problem of dstorton estmaton was rendered tractable n pror work by ether neglectng the effect of nter-frame error propagaton [5] [6], or by gnorng error concealment [4]. However, the consequent naccuracy n the dstorton estmates can result n poor adaptaton strateges. The man contrbuton of ths paper s an effcent algorthm that enables the transmtter to estmate the expected overall end-to-end dstorton at the recever. The algorthm takes nto account the effects of quantzaton, nterdependences among packets through predcton and error propagaton, and error concealment. The algorthm requres a small amount of sde-nformaton that s easly computed durng compresson, and stored at the server. In addton to ts accuracy, the estmate provdes another advantage. It s lnearly dependent on the packet loss statstcs, and thus allows for low-complexty R-D optmzaton of packet-loss reslence strateges. The paper s organzed as follows: Secton 2 ntroduces notatons and derves the decoder dstorton estmate. Smulaton results demonstrate ts accuracy. Secton 3 dscusses the ntegraton of ths estmate wthn an RD framework for optmzng streamng effcency and robustness. The potental for substantal performance gans s llustrated usng the example of a FEC-based robust delvery system. 2 End-to-end Dstorton Estmaton for pre- Compressed Vdeo In ths secton, we analyze the problem of end-toend dstorton estmaton for a system that delvers precompressed vdeo. We then derve a frst order estmaton algorthm as an effcent soluton. 2.1 End-to-end Dstorton Wthout loss of generalty, we assume that the compressed vdeo s packetzed nto ndependent groups of packets (GOP). The expected dstorton of each GOP can be calculated ndependently as there s no dependency across GOPs. However, packets wthn one GOP may depend on each other due to predcton. Thus, the dstorton for all packets n one GOP must be calculated jontly. Let there be N source packets per GOP. Let p denote the effectve packet loss rate (PLR) of packet. Note that p s a functon of both the network condtons, and the reslence strategy used for ths packet. The reslence s- trategy could nvolve retransmsson of the packet, or the use of error correcton codes. The PLR vector for the entre GOP s gven by, P = fp 0 ;p 1 ; :::; p ; :::; p g. S- nce each packet can be ether receved correctly, or consdered as lost, there s a total of 2 N possble events for each GOP. The event vector of the entre GOP s represented by B (k) = fb (k) 0 ;b(k) 1 ; :::; b(k) ; :::; b (k) g,wherek denotes the ndex of the event (k = 1; 2; :::; 2 N ), and bnary random denotes the status of the th packet n the kth event. The packet s receved correctly f b (k) = 0, ands lost f b (k) =1. Q The probablty of the kth event vector s gven by p (k) = (1 p ) (1 b(k) ) b p (k). varable b (k) Let f denote the value of some pxel n the orgnal vdeo. Let f ~ denote the correspondng reconstructed pxel at the recever. Note that f ~ s n fact a random varable at the transmtter snce t depends on the effects of packet loss, concealment and error propagaton whch are unknown to the transmtter. However, t s mportant to note that the decoder reconstructon s completely determned gven the event vector of the entre GOP. Thus, the decoder reconstructon under the kth event, f ~ (k), can be exactly computed. The end-to-end dstorton of ths pxel under the kth event s gven by d (k) =(f f ~ (k) ) 2. The overall dstorton of the GOP dstorton under the kth event s D (k) = f 2GOP d (k) : (1) At the compresson stage, the encoder can compute D (k) for k = 1; 2; :::; 2 N, and store these quanttes as sdenformaton at the server. Durng delvery, the probablty of occurrence of event B k s gven by p (k). Therefore, the expected overall dstorton of the recever s gven by = EfD(P)g = 2 N k=1 ( Y 2 N k=1 p (k) D (k) (1 p ) (1 b(k) ) b p (k) )D (k) : (2) Note that ths estmate s exact (.e., wthout approxmaton). It consders all possble error events, and takes nto account the effects of compresson, loss, error propagaton and error concealment. In practcal applcatons, ths estmate has two major drawbacks. Frst, 2 N real values (D k ) need to be stored as sde-nformaton for each GOP. Ths mposes a large storage requrement. Second, the expected dstorton s a complcated functon of the ndvdual packet loss rate as shown n (2). Therefore, the use of ths metrc to optmze error reslence strateges nvolves a hgh computatonal complexty.

32 33 PSNR(dB) 29 26 20 17 Actual SODE FODE ADDE PSNR(dB) 28 18 Actual Mult-FODE Sngle-FODE ADDE 14 5% 10% 15% 20% Packet Loss Rate (PLR) (a) 13 1 2 3 4 5 PLR cases (b) Fgure 1. PSNR vs. packet loss rate for model accuracy. QCIF sequence carphone. (a) sngle-layer btstream at 32kbps for 10fps. (b) three-layer btstream at 32/64/96kbps for 10 fps. The packet loss rate for the three layers n (b) are: case 1 (0%, 5%, 10%), case 2: (1%, 3%, 5%), case 3 (3%, 8%, 15%), case 4: (5%, 10%, 95%), and case 5 (5%, 95%, 95%). 2.2 Frst-order Approxmaton through Partal Dervatves The objectve of ths secton s to derve a smple approxmaton of the end-to-end dstorton estmate. At the cost of a slght loss of accuracy, ths approxmaton allows for substantal reducton n the amount of sde-nformaton, and computatonal complexty. We propose to use the frst order Taylor expanson of (2). Assume we expand the current GOP dstorton of (2) about a partcular reference PLR vector, P μ = fμp 0 ; μp 1 ; :::; μp ; :::; μp j ; :::; μp g. For example, P μ could correspond to the case when the loss probablty s zero for all packets n the GOP. For a PLR vector P whch s close to the reference PLR, t s reasonable to approxmate the expected dstorton of (2) va the frst order Taylor seres expanson. Thus, we have EfD(P)g ß EfD( μ P)g + where = EfD( μ P)g + @EfD(P)g @p j P= μ P (p μp ) fl (p μp ); (3) fl = @EfD(P)g j P= P μ; (4) @p s the partal dervatve of the expected dstorton wth respect to the PLR of packet. The value of EfD( P)g μ s easly pre-computed for any gven reference PLR P μ va (2). Smlarly, fl may be easly pre-computed for each packet (Due to space constrants, the detals are omtted here). The number of reference PLRs determnes the amount of sde-nformaton needed for ths Frst Order Dstorton Estmaton (FODE) model. If m reference PLRs are used, we need to store m(n +1)quanttes for each GOP, whch represents a sgnfcant reducton n sde-nformaton over the exact approach. Ths ssue s further dscussed n the smulaton secton. Further, note that the expected dstorton depends lnearly on the PLRs, and all nter-dependences have been decoupled through the partal dfferental value fl. 2.3 Smulaton Results Ths subsecton demonstrates the accuracy of FODE through smulatons. The source vdeo btstreams were generated by the standard H.263+ codec [8]. We consder both the sngle layer codng system, and the scalable codng system. The decoder uses the adjacent lower layer reconstructon f any enhancement layer packet s lost, or replaces the lost base layer packet wth nformaton n the prevous frame. We mplemented FODE to pre-calculate the partal dervatves as sde nformaton. Usng ths sde nformaton, we estmate the dstorton values for dfferent PLR vectors. We compared these estmates to the actual dstorton of reconstructed vdeo at the recever. The actual dstorton was averaged over 50 realzaton of the network under the same PLR condtons. An addtonal comparson was made to the Acyclc Dependent Dstorton Estmaton (ADDE) proposed n [4] where the effect of error concealment s neglected. Fgure 1 shows the smulaton results representng the estmaton accuracy under dfferent PLR dstrbuton. Fgure 1 (a) gves the results for QCIF sequence carphone n

a sngle layer system. For the sngle layer system, we only use the all-zero reference PLR for the Taylor expanson, μp = f0; 0; :::; 0; :::; 0g. We also smulated the performance of a Second Order Dstorton Estmaton (SODE). These results demonstrate the hgh accuracy of FODE n comparson to ADDE. The mportance of accountng for the effect of error concealment s obvous. The second order correcton of SODE enables slghtly better estmates than FODE at large packet loss rates, but requres more sde-nformaton and complexty. Fgure 1 (b) presents the results n a three-layer system. For both the sngle-fode model where only the all-zero reference PLR s used, and the mult- FODE model where addtonal reference PLRs are used. These addtonal reference PLRS are now needed to account for the case where enhancement layer packets are dscarded at the transmtter to conserve bts. The reference PLRs used n the mult-fode model are: P0 μ = f(0; 0; 0); :::; (0; 0; 0); :::; (0; 0; 0)g, P μ 1 = f(0; 0; 1); :::; (0; 0; 1); :::; (0; 0; 1)g, and P2 μ = f(0; 1; 1); :::; (0; 1; 1); :::; (0; 1; 1)g. The results demonstrate the accuracy of FODE n scalable coders. Note that the mult-fode gves better approxmaton than the sngle- FODE when the enhancement-layer packets are dscarded (as n case 4 and case 5 n Fgure 1 (b)). But these gans are acheved at the cost of more sde nformaton. In summary, the smulaton results show that FODE s effcent n approxmatng the expected overall reconstructon dstorton at the recever. Whle a sngle reference PLR s suffcent for non-scalable codng systems, multple PLRs may be needed for scalable codng systems. 3 RD-based Robust Delvery of pre- Compressed Vdeo In ths secton, FODE s ntegrated nto the RD framework to optmze error-reslent schemes for delvery of precompressed vdeo. The potental performance gans are llustrated usng the example of scalable encoder and FECbased unequal error protecton. 3.1 Optmzed Delvery Schemes wthn an RD Framework Any adaptve error-reslence scheme provdes a set of polcy choces, ß 2 fß (0) ;ß (1) ; :::; ß (S) g, for each packet. Dependng on the reslence scheme, the polcy choces could be the number of retransmssons, or the strength of error correcton code. The effectve loss rate, p,fortheth packet, s a functon of both the network condton and the polcy choce. The cost of the polcy choce c(ß), s usually the total number of bts needed to send the orgnal source packet. The polcy vector for a group of (source) packet (GOP) s defned as Π=fß 0 ;ß 1 ; :::; ß ; :::; ß g. The correspondng PLR vector and the cost vector are denoted by P(Π), and C(Π). The expected end-to-end dstorton for a GOP can be estmated usng FODE as EfD(P(Π))g ßEfD( μ P)g + The total cost for the GOP s gven by C(Π) = fl (p (ß ) μp ): (5) c (ß ): (6) The optmal adaptve delvery scheme should then choose the polcy that mnmzes the expected dstorton EfD(P(Π))g whle satsfyng constrant on the cost C(Π). Ths problem can be recast as an unconstraned mnmzaton of the Lagrangan, EfD(P(Π))g + C(Π) ß EfD( μ P)g + [fl (p (ß ) μp )+ c (ß )]: Note that the dstorton estmate provded by FODE depends lnearly on the PLR. Thus, theoretcally, the polces can be chosen ndependently for each packet to mnmze the Lagrangan cost, and practcally the optmzaton can be done at any level wth any structure at the convenence of the adaptaton scheme. Ths results n low computatonal complexty of the optmzaton procedure. 3.2 Smulaton Results For the smulatons, we consder a system of layered codng wth unequal transport prortzaton to demonstrate the superorty of our algorthm. The system conssts of a fully standard-compatble layered source codng for precompresson of the vdeo sgnal, and unequal error protecton through FEC on the packets of dfferent layer at the tme of delvery. The systematc Reed-Solomon (RS) code s adopted to generate redundant packets to combat packet loss [2] [5]. A fve-layer btstream for QCIF sequence carphone s generated. Three onlne delvery schemes are compared. The frst s the RD optmzed scheme usng our mult-fode model (M-FODE-RD). The second uses only the sngle- FODE model (S-FODE-RD). Both of them dynamcally select the best error protecton (n; k) code, gven a fxed k,so as to mnmze the RD cost for packets n each layer. The thrd scheme uses fxed unequal error protecton (UEP) for each layer, wth more protecton for lower layers, through

PSNR(dB) 31 29 27 25 0 110 220 330 440 550 bt rate n kbps M-FODE-RD S-FODE-RD Fxe-N Src Fgure 2. PSNR vs. total bt rate for dfferent delvery schemes. QCIF sequence carphone, 10 fps, 5-layer btstream at 16/64/112/240/496kbps. RS codes (fxed-n). Whle the frst two schemes can adapt to any rate constrant, the fxed-n scheme can be used only for certan target bt rates. The performance of the unprotected source btstream s also presented as a reference. The three delvered btstreams generated by those schemes go through the same tme-varyng channels wth PLR n the range of 1% ο 20%. Fgure 2 shows the decoder dstorton for each of them versus the bt rate. The results llustrate that FODE-RD schemes acheve substantal gans whle mantanng more flexblty than the fxed-n scheme. Note that mult-fode-rd scheme yelds only small gans over sngle-fode-rd scheme. Ths ndcates that sngle FODE may be suffcent for most practcal applcatons. 4 Concluson References [1] W. Tan and A. Zakhor, Real-tme Internet vdeo usng error reslent scalable compresson and TCP-fredly transport protocol, IEEE Transactons on Multmeda, vol. 1, no. 2, pp. 172-186, June 1999. [2] M. Gallant and F. Kossentn, Rate-dstorton optmzed layered codng wth unequal error protecton for robust Internet vdeo, IEEE Transactons on Crcuts and Systems for Vdeo Technology, vol. 11, no. 3, p- p. 357-372, Mar. 2001. [3] G. J. Conkln, G. S. Greenbaum, K. O. Lllevold, A. F. Lppman,and Y. A. Reznk, Vdeo codng for streamng meda delvery on the Internet, IEEE Transactons on Crcuts and Systems for Vdeo Technology, vol. 11, no. 3, pp. 269-81. Mar. 2001. [4] P. A. Chou and Z. Mao, Rate-dstorton optmzed streamng of packetzed meda, submtted to IEEE Transactons on Multmeda, Feb. 2001. [5] B. Grod, K. Stuhlmuller, M. Lnk and U. Horn, Packet loss reslent nternet vdeo streamng, Proceedngs of the SPIE, Vsual Communcatons and Image Processng 99, San Jose, CA, USA, vol. 3653, pp. 833-44. Jan. 1999. [6] W. Tan and A. Zakhor, Vdeo multcast usng layered FEC and scalable compresson, IEEE Transactons on Crcuts and Systems for Vdeo Technology, pp. 373-86, vol. 11, no. 3, Mar. 2001. [7] ITU-T, Rec. H,263, Vdeo codeng for low btrate communcatons, verson 2 (H.263+), Jan. 1998 [8] H.263+ Codec. http://spmg.ece.ubc.ca/ The estmate of the end-to-end dstorton s a fundamental ssue n RD-optmzed adaptve delvery of precompressed vdeo over lossy packet networks. We proposed an algorthm to accurately calculate the overall end-to-end dstorton, whch takes nto account all the effects of the encoder s compresson algorthm, the nter-dependences a- mong packets, the changng network statstcs, the delvery schemes and the error concealment used by the decoder. Its accuracy s demonstrated through smulaton results. Moreover, t only requres mnmal sde nformaton. The dstorton estmate can be used to optmze varous robust adaptaton strateges for delvery of pre-compresson vdeo. Farly low complexty n the optmzaton procedure s acheved due to our lnear estmaton model. The potental performance gans are llustrated usng the example of a delvery system that combnes scalable codng wth FEC-based error protecton.