IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER

Size: px

Start display at page:

Download "IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER"

Joseph Gray
5 years ago
Views:

1 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER Effcent Multvew Depth Codng Optmzaton Based on Allowable Depth Dstorton n Vew Synthess Yun Zhang, Member, IEEE, Sam Kwong, Fellow, IEEE, Sudeng Hu, and Chung-Cheh Jay Kuo, Fellow, IEEE Abstract Depth vdeo s used as the geometrcal nformaton of 3D world scenes n 3D vew synthess. Due to the msmatch between the number of depth levels and dsparty levels n the vew synthess, the relatonshp between depth dstorton and renderng poston error can be modeled as a many-toone mappng functon, n whch dfferent depth dstorton values mght be projected to the same geometrcal dstorton n the syntheszed vrtual vew mage. Based on ths property, we present an allowable depth dstorton (ADD) model for 3D depth map codng. Then, an ADD-based rate-dstorton model s proposed for mode decson and moton/dsparty estmaton modules amng at mnmzng vew synthess dstorton at a gven bt rate constrant. In addton, an ADD-based depth bt reducton algorthm s proposed to further reduce the depth bt rate whle mantanng the qualtes of the syntheszed mages. Expermental results n ntra depth codng show that the proposed overall algorthm acheves Bjontegaard delta peak sgnal-to-nose rato gans of 1.58 and 2.68 db on average for half and ntegerpxel renderng precsons, respectvely. In addton, the proposed algorthms are also hghly effcent for nter depth codng when evaluated wth dfferent metrcs. Index Terms 3D vdeo, depth codng, vew synthess, depth no-synthess-error, rate-dstorton optmzaton, allowable depth dstorton. I. INTRODUCTION THREE Dmensonal Vdeo (3DV) has been attractng more and more attenton recently snce t s able to provde mmersve vson, real 3D depth percepton and Manuscrpt receved November 28, 2013; revsed Aprl 28, 2014; accepted August 20, Date of publcaton September 8, 2014; date of current verson October 2, Ths work was supported n part by the Natonal Natural Scence Foundaton of Chna under Grant , Grant , and Grant , n part by the Shenzhen Emergng Industres through the Strategc Basc Research Project under Grant JCYJ , and n part by the Natural Scence Foundaton of Guangdong Provnce under Grant S The assocate edtor coordnatng the revew of ths manuscrpt and approvng t for publcaton was Prof. Béatrce Pesquet-Popescu. Y. Zhang s wth the Shenzhen Insttutes of Advanced Technology, Chnese Academy of Scences, Shenzhen , Chna, and also wth the Department of Computer Scence, Cty Unversty of Hong Kong, Hong Kong (e-mal: yun.zhang@sat.ac.cn). S. Kwong s wth the Department of Computer Scence, Cty Unversty of Hong Kong, Hong Kong, and also wth the Cty Unversty of Hong Kong Shenzhen Research Insttute, Shenzhen , Chna (e-mal: cssamk@ctyu.edu.hk). S. Hu and C.-C. J. Kuo are wth the Mng Hseh Department of Electrcal Engneerng, Unversty of Southern Calforna, Los Angeles, CA USA (e-mal: sudenghu@usc.edu; cckuo@sp.usc.edu). Color versons of one or more of the fgures n ths paper are avalable onlne at Dgtal Object Identfer /TIP new vsual enjoyments for new type of multmeda applcatons, such as Three Dmensonal TeleVson (3DTV) and Free-vewpont TeleVson (FTV). Multvew depth vdeo s one of the most mportant data of 3DV [1], whch provdes the geometrcal nformaton of the 3D world scene and allows arbtrary vew renderng wth hgh qualty and low complexty [2]. To lower the complexty of the vdeo clents, the depth vdeos are generated, encoded at server and transmtted to the clents nstead of beng generated wth the multvew vdeo at the clents [1]. However, the depth vdeos possess large data volume and ther amounts ncrease wth the number of vews. So n addton to effcent color encoder, hgh effcency depth encodng algorthm s also hghly desred to reduce the requrements on the storage space and transmsson bandwdth. To tackle ths problem, Multvew Vdeo Codng (MVC) [3] and ts codng optmzatons [4], [5] can be extensvely used to encode the depth vdeo whle regardng the depth vdeo as llumnaton component of color vdeo. However, the depth vdeos have dfferent correlatons and propertes from the llumnaton component of the color vdeos. For example, the depth maps are generally smooth; they may have nose and temporal nconsstency rased by depth estmaton. On the other hand, the depth vdeos are used as the geometrcal nformaton n vew renderng for 3D vdeo system, thus, depth codng dstorton wll be projected to be the geometrcal errors nsde or among vdeo objects. For example, rng artfacts of depth vdeo may lead to edge corruptons, whle block artfacts may ntroduce false contours. Due to the dfferences between color and depth vdeos, tradtonal MVC algorthms and tools for color vdeo codng are not effectve enough to be appled drectly to depth map codng for achevng good qualty of syntheszed vdeos. Currently, the Jont Collaboratve Team on 3D vdeo codng extenson development (JCT-3V), has been establshed to develop more advanced 3DV codng technology [6]. In addton, many researchers have devoted ther efforts to the depth codng and a number of technques were proposed. Snce the qualtes of the vrtual vew mages are senstve to the codng dstortons n the boundares of depth vdeo, boundary reconstructon flter [7] and trlateral flter [8] were utlzed to preserve the sharp depth boundares. Nguyen et al. [9] presented weghted mode flterng n order to reduce the codng artfacts at the edges n reduced resoluton depth codng. Snce some depth maps are generated by the depth estmaton algorthms, they may have nose and temporal nconsstency, IEEE. Personal use s permtted, but republcaton/redstrbuton requres IEEE permsson. See for more nformaton.

2 4880 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 whch leads to low depth codng effcency. Thus, spatal and temporal smoothng flters [10] were proposed to reduce hgh frequency predcton resdues and mprove the compresson effcency. Zhu et al. [11] proposed a flterng scheme for the depth encoder amng at compensatng the vew synthess dstorton by addtonally transmttng the flter coeffcents. In addton, based on the depth vsual senstvty n human 3D vsual system, Slva et al. [12] proposed depth adaptve preprocessng flter so as to suppress the depth detals wthn Just Notce Dfference n Depth (JNDD) [13] and thus mproved the depth compresson rato. Consderng the depth vdeos usually contan less texture, down and up samplng algorthms [14] were developed for reduced resoluton depth codng. As the depth maps are used for vew synthess, Zhao et al. [15] proposed a Depth NO-Synthess-Error (D-NOSE) model to smooth the depth map, and thereby mproved INTRA depth codng effcency. However, the D-NOSE has not been ncluded n the codng process and spatal redundances were not fully exploted. On the other hand, t could hardly guarantee both flterng and quantzaton errors are wthn the D-NOSE range, especally n the case of usng large Quantzaton Parameter (QP). These algorthms can be consdered as pre-/post- refnement for the depth vdeo. In addton, many depth codng algorthms [16] [20] have also been extensvely studed. Pan et al. [16] exploted the mode correlaton between depth and ts correspondng color vdeo to reduce the mode canddates for fast depth codng. In [17], moton nformaton, ncludng moton vector, mode, and reference frame ndex, were shared between the color and depth vdeos to reduce the depth codng bts. In [18], vew synthess predcton was adopted n depth codng n order to mprove the depth nter-vew predcton accuracy. Snce the depth maps are used for the vew synthess n 3D vdeo system, some depth codng algorthms [19], [20] were proposed wth the target of maxmzng the vrtual mage qualty. Lee et al. [19] proposed a depth codng scheme whch drectly coped collocated block pxels and transmtted a flag nstead when the correspondng color dfferences were suffcently small. In [20], Kang et al. presented an INTRA predcton algorthm whch refned INTRA predcton modes at depth edges and thereby preserved boundary for hgher vew synthess qualty. As vew synthess mage qualty s fnally adopted n measurng the depth map qualty, the dstorton term of the Rate-Dstorton (RD) cost functon n the depth codng shall consder the vew synthess dstorton. To estmate the vew synthess dstorton, lnear model [21], [22] and power model [23] were adopted to model the relaton between depth dstorton and vew synthess dstorton. Chung et al. [24] estmated the vew synthess dstorton from the frequency doman. In addton, structural smlarty was ntroduced [25] to measure the syntheszed vew dstorton for better perceptual qualty. These schemes can be regarded as the global estmaton schemes whch were appled to the entre depth mage. In [26], each depth map was dvded nto two knds of regons accordng to ther characterstcs and effects to the vew synthess. Then, lnear model wth dfferent slopes were derved to gude the regonal selectve depth codng. As the statstcal dependences between the depth error and the dstorton of syntheszed vrtual vew changes along frames, Yuan et al. [27] derved a polynomal model and proposed a depth map codng by modfyng the Lagrangan multpler at frame level n order to maxmze the qualty of the syntheszed mages. For low complexty purpose, these models attempt to estmate the vew synthess dstorton by not nvolvng the vew synthess process. In [27], partal re-renderng was performed to obtan more accurate vew synthess dstorton. Snce the qualty of the syntheszed vrtual vew mages s not only affected by the qualty of depth mages, but also affected by the qualty of color mages, depth and color codng processes could be jontly optmzed under the total bt rate constrant. In [29], jont bt allocaton algorthms among depth and color channels were presented n order to maxmze the qualty of rendered vrtual vew mages under the total bt constrants. Hu et al. [30] presented a jont rate control scheme wth the target of maxmzng the summatve qualtes of both the orgnal color vdeos and the rendered mages. In [31], t was revealed that the depth dstorton and color dstorton were addtve to the vew synthess dstorton n global aspect, whle Oh et al. [32] found the local depth dstorton had a jont effect wth the correspondng color spatal texture whle mappng to the vew synthess dstorton. In our prevous work [26], we have exploted the regonal selectve propertes of a depth map and dvded t nto two knds of regons, named Color Texture Area correspondng Depth (CTAD) and Color Smooth Area correspondng Depth (CSAD). Snce the depth dstorton n CSAD has less mpact on the qualty of syntheszed mage than that n CTAD regon, optmal QPs and Lagrangan multplers were specfcally adjusted to gve hgher prorty to CTAD and lower prorty to CSAD n allocatng depth bts and usng the codng technques. Therefore, [26] s a regonal selectve codng algorthm whch encodes the CSAD and CTAD dfferently. However, n ths paper, consderng that the number of depth levels s usually larger than the number of dsparty levels n the vew synthess, several dfferent depth values are projected to one dsparty,.e. a many-toone mappng functon. It ndcates some small dstortons n depth map wll not lead to any renderng poston error,.e. no-synthess-error [15]. In addton, t also allows multple dfferent depth dstortons projectng to one non-zero renderng poston error. These depth redundances are called Allowable Depth Dstorton (ADD) n vew synthess and they exst at each depth value and pxel. In ths work, we explot these ADD redundances by desgnng a new depth dstorton crteron wth pecewse functon and optmze the RD cost functon for mode decson and Moton Estmaton/Dsparty Estmaton (ME/DE) at macroblock (MB) level. Furthermore, the depth codng bts are reduced by explotng the ADD. The paper s organzed as follows, n Secton II, a manyto-one functon for the ADD n vew synthess s derved to model the relatonshp between the vew synthess dstorton and the depth dstorton. In Secton III, the ADD model s presented for the codng optmzaton. Then, two depth codng technques, the Rate Dstorton Optmzaton (RDO) and Depth Bt Reducton (DBR) algorthms, are proposed

3 ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4881 based on the ADD model. The performances of the proposed algorthms wth dfferent settngs are comparatvely evaluated and analyzed n Secton IV. Fnally, Secton V draws the conclusons. II. ALLOWABLE DEPTH DISTORTION (ADD) IN DEPTH IMAGE BASED RENDERING In [15], Zhao et al. proposed the D-NOSE model to explot the allowable dstorton n the depth for the case that no renderng poston error n vew synthess would be caused,.e. no-synthess-error case. However, another allowable dstorton n depth that makes multple dfferent depth dstortons project to one non-zero renderng poston error was not consdered n [15]. In ths secton, we analyze the ADD redundances n vew synthess, whch ncludes the both above two cases. In Depth Image Based Renderng (DIBR), the pxels of vrtual vew mage can be rendered from the pxels of ts vewneghborng reference mages wth the depth and camera parameters by [2] p 2 = z 1 A 2 R 2 R1 1 A 1 1 p 1 A 2 R 2 R1 1 t 1 + A 2 t 2, (1) where p 2 = [a, b, c] T and p 1 = [x, y, 1] T are the two correspondng pxels n rendered and real vew mages, respectvely. z 1 s the depth for p 1 ; A 1 and A 2 are two 3 3 matrces ndcatng camera ntrnsc parameters for the vrtual and real cameras. [R 1, t 1 ]and[r 2, t 2 ] are the extrnsc parameters for the two cameras, where R 1 and R 2 are the rotaton matrces, t 1 and t 2 are the translaton factors. The dsparty offsets between p 1 and p 2 n horzontal and vertcal drectons, (d x, d y ), can be calculated as ( d x = x a ) c d y = ( y b c where () s a roundng operaton. Ths roundng operaton functon reles on the dsparty accuracy,.e. the renderng precson of the vew synthess. Ths functon can be mathematcally expressed as x 2 m + k f (x) = 2 m, (3) where ndcates a floor operaton; k f s the compensaton factor to round down and up decmal fractons, whch s 2 m 1 ; m s the renderng precson, whch s 0, 1 or 2 for nteger, half or quarter-pxel precson, respectvely. Suppose the vrtual and real cameras are parallel to each other, well calbrated and have the same ntrnsc parameters,.e. A 1 = A 2, R 1 = R 2, t 1 t 2 =[L, 0, 0] T,whereL s the nterval of the camera array n the baselne. The vertcal renderng dsparty d y s zero and the horzontal renderng dsparty d x s ( ) fx L d x =, (4) Z where f x s the horzontal focal length and Z s physcal depth. For the depth map n MPEG-3DV, a non-lnear quantzaton scheme s adopted to convert the physcal depth Z nto ), (2) Fg. 1. Example of mappng depth value to dsparty. n-bt depth value ranges from 0 to 2 n -1 [1], where n s the bt wdth representng the depth value. The nverse quantzaton from depth value v to depth Z s [1] Z = Q 1 1 (v) = ( ), (5) v 1 2 n Z near Z 1 far + Z 1 far where Z near and Z far are the dstance from the camera to the nearest and furthest depth planes of a vdeo scene, respectvely. Applyng (5) nto (4), we can have d x = (Lf x (C 1 v + C 2 )), (6) ( where C 1 = n Z near 1, C 2 = Z 1 far. In a 3DV system, depth vdeo wll be encoded and ts bt stream s transmtted to the clents for arbtrary vrtual vew renderng. In the lossy depth codng process, codng dstorton wll be ntroduced by the quantzaton. Suppose a depth codng dstorton v s ntroduced to the depth value v by the encoder, and the dstorted depth value v + v obtaned from decodng s then used n vew synthess at clent s machne. Comparng wth the vew renderng of usng the orgnal depth v, the dsparty dfference d x can be calculated by Z far ) d x = g (v, v) = (Lf x (C 1 v + C 2 )) (Lf x (C 1 (v + v) + C 2 )) (7) Due to the roundng operaton (), d x may not be changed sometmes even when v changes,.e. several dfferent vs are mapped to one d x. It ndcates the relaton between d x and v s a many-to-one mappng. Let s be the dynamc dsparty levels (number of levels from the maxmum to the mnmum dsparty) between the rendered mage and the reference mage. Therefore, the number of dsparty levels at a gven renderng precson m s s 2 m.the number of the depth levels (.e. the number of levels of v) s 2 n. We defne a mappng coeffcent C MM as C MM = 2n s 2 m, (8) where C MM ndcates the average number of depth levels correspondng to the same dsparty level. Accordng to the requrement of the 3D vdeo, n s usually 8 n the latest 3D depth representaton [1]. The number of dsparty levels (s) s usually less than 30, and most of them are smaller than 10. In ths case, when we use the 1/2 pxel renderng process (.e. m = 1), C MM ranges from 4 to 12, whch mples 4 to 12 dfferent vs are mapped to one dsparty d x.asshown n Fg.1, nstead of one sngle v, multple ponts v + v, v [ v, v + ], are mapped to the same dsparty d. Here, v s the v for depth value v, v and v + are defned as the lower and upper bounds for v.

4 4882 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 Consequently, two mportant features can be derved from ths many-to-one mappng process. Frst, f both the orgnal depth v and dstorted depth value v + v are wthn the ADD range [ v v, v + v + ],.e. v [ v, v + ], the dsparty dfference d x n (7) wll equal to zero,.e. no synthess error. Second, f the depth dstorton v k causes a non-zero dsparty dfference d x, there s an allowable dstorton range around the v k, v [ v k v, v k + v + ], leadngtothesame d x. The depth dstorton that meets the above requrements wll not change the d x and thus t s allowable n vew synthess. Thereby, we call t allowable depth dstorton n vew synthess, and denote t as ADD. For the upper bound v +, t s the rghtmost pont of v that s mapped to d and ts rght neghborng pont v s mapped to d +1. Therefore, gven a depth value v, ts upper bound value v + can be obtaned when t satsfes g ( v, v ) = 1 2 m g ( v, v + ) = 0 v + v + [ 0, 2 n 1 ]. Fg. 2. ADD for dfferent 3D vdeo sequences. Smlarly, the lower bound value v s obtaned when t satsfes g ( v, v 1 ) = 1 2 m g ( v, v ) = 0 v v [ 0, 2 n 1 ]. Fg. 3. Flowchart of analyzng depth effect n vew synthess. Actually, the lower bound value of v (v v ) s equal to the upper bound of v 1 (v +1 + v + 1 ) plus 1,.e. v v = v 1 + v The number of v s mappng to one d x s defned as ADD depth nterval W DI, whch equals to v + + v +1. For the parallel camera settng, W DI s the v when Lf x C 1 v s approachng 1 2 m [15], whch s W DI = v : Lf x C 1 v 1 2 m (9) where x x 0 ndcates that the varable x s ultmately approachng a constant x 0. Therefore, W DI s an nteger v that s generally ndependent to v and t can be presented as W DI = 1 2 m Lf x C 1 ζ + 1, (10) where ζ s a postve constant approachng to 0, and t may make W DI slghtly dfferent for dfferent depth values (v ). Equaton (10) ndcates that the depth nterval W DI s dependent on camera postons, and t decreases as the vew synthess precson (m), focal length ( f x ) and camera baselne (L) ncrease. Addtonally, t also changes as the nearest or farthest depth planes change. Fg.2 shows the depth nterval W DI for 3DV sequences when nteger-pxel renderng precson (m = 0) s used n the vew synthess. The test sequences nformaton can be referred n Secton IV. We observe that the depth nterval W DI s sequence dependent. It ranges from 6 to 25 and most of them are larger than 10, whch mply redundances exst n the depth. III. ADD MODEL AND DEPTH CODING OPTIMIZATIONS A. Proposed ADD Model for Depth Codng Snce the quantzaton error n vdeo codng resultng from unform scalar quantzaton can be modeled by Whte Nose (WN) model [33], zero mean WN nose s added to the depth map to analyze the ADD and ts mpact on the vew synthess. The flowchart of analyzng depth effect n the vew synthess s shown n Fg.3. In ths paper, we assume the color vdeos and depth vdeos are separately encoded, and optmze the depth codng whle the color vdeo s ether orgnal or already encoded. The color vdeo can be regarded as unchanged before and after the depth codng/processng. Thus, the WN s only added nto the depth vdeos and the color vdeos are the same n the two renderng processes. There are three dfferent knds of WN njecton patterns. The frst type s global WN njecton where whte nose wth dfferent varances s added to the entre depth map; the second type s global WN njecton wth ADD control, where the magntudes of WN are clpped wthn the ADD range for each pxel,.e. v [ v, v + ] ; the thrd type s that the above two ndependent WNs are sequentally njected. Fg.4 shows the relatonshp between depth dstorton (D d ) and vew synthess dstorton (D VS ) wth and/or wthout ADD control. The x-axs s the average Mean Squared Error (MSE) of the depth mages and the y-axs s the MSE of syntheszed vrtual vew mages. The NoADD ndcates D VS &D d relaton when zero mean WN wth dfferent varances are njected wthout control. For ADD_0 to ADD_15, zero mean WN

5 ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4883 Fg. 5. RD analyss on the ADD based depth codng. Fg. 4. Relatonshp between D d and D VS wth and/or wthout ADD control. (a) Balloons, (b) Doorflowers. wth varances 0, 5, 10 or 15 are ntally added to the depth to smulate the depth codng dstorton caused by quantzaton. Then, another ndependent WN wth dfferent varances are added under ADD control, where all the dstorton s clpped wthn [ v, v + ]. We connect the rght most ponts of ADD_0 to ADD_15, and formulate the blue curve wth start symbol. Ths blue lne s the upper bound snce no more dstorton (x-axs) can be further added n the depth whle mantanng the same vrtual vew mage qualty (y-axs). It can be observed from Fg.4 that 1). D VS s monotoncally ncreasng wth D d for NoADD curves. It s generally lnear or logarthmc between D VS and D d. The ncreasng slopes are dfferent over sequences and they usually ncrease as the texture of the vdeos gettng more complex. 2). The blue upper bound curve s almost parallel wth the red NoADD curve. 3). Curves ADD_0 to ADD_15 are almost horzontal and the D VS s almost the same as D d changes, whch means the added depth dstorton has lttle effect on D VS whle under the ADD control. Though the depth dstorton wth ADD control does not cause error for the 3D warpng, the depth dependent hole fllng and mergng after the renderng process wll cause some dfferences n the syntheszed vrtual vew [15]. However, ths effect s small n the smulaton and the slopes of ADD_0 to ADD_15 approxmately equals to zero. 4). ADD_0 to ADD_15 are the four parallel lnes between the NoADD and the upper bound. Actually, when we seamlessly change the varance for the frst round WN njecton, D VS &D d ponts can cover the whole regon n the area between the NoADD and the upper bound. Smlar results can also be found when the dstorton s measured wth Mean Absolute Dfference (MAD). Based on the above observatons of D VS &D d relaton, we derve a new model for depth bt rate (R) and vew synthess dstorton (D VS ),.e.r&d VS model, by combnng D VS &D d and R&D d relatons. Fg.5 shows a sketch map for the ADD based RD model, where the x-axs s the depth dstorton (D d ), the postve y-axs s the vew synthess dstorton (D VS ),and the negatve y-axs s the depth bt rate (R). The frst quadrant s the D VS &D d relatonshp and the fourth (bottom) quadrant shows the R&D d relatonshp. In the frst quadrant, the red lne ndcates the lnear D VS &D d relatonshp wthout ADD control, whch s deduced from the NoADD curve n Fg.4 by lnear approxmaton. The red lne s also regarded as the lower bound for the D VS &D d relaton. If the D VS &D d relaton s located on the red curve, t ndcates that the performance s almost the same as the orgnal encoder. If the D VS &D d s located n the left regon of the red lne, the depth codng performance may degrade. The angle between the red lne and x-axs s denoted by θ, and the slope value of the red lne s tan θ. The blue lne s parallel to the red lne and t s the upper bound of D VS &D d wth ADD control, whch s derved from the blue upper bound curve n Fg.4 by lnear approxmaton. Compared wth the red lne, the D VS &D d pont on the blue lne has smaller vew synthess dstorton whle mantanng the same bt rate or depth dstorton. Or, t has smaller bt cost whle mantanng the same vew synthess dstorton. The yellow regon between the red and blue lnes s the regon of D VS &D d canddates, whch could be acheved by a certan codng or processng scheme. For better understandng, we defne set {R(K), D d (K), D VS (K)} to measure the RD performance of an algorthm at pont K {A,B,C,D,E,F,G}nFg.5,whereR(K) s the depth codng bt rate, D d (K) s the depth dstorton and element D VS (K) s the vew synthess dstorton. When the

6 4884 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 depth vdeo s encoded wth a tradtonal depth codng algorthm, such as the orgnal Jont Multvew Vdeo Codng (JMVC), we suppose t has depth bt rate R(A) and the correspondng depth dstorton s D d (A), here D d (A) = D d (C). Then the depth dstorton D d (C) wll be mapped to D VS (C) wth a lnear D VS &D d relatonshp whle ADD control s dsabled. In other words, we have a set {R(A), D d (C), D VS (C)} to ndcate the RD performance of the tradtonal depth codng algorthm. Smlarly, when the depth vdeo s encoded wth the same depth encoder but wth dfferent codng parameters, e.g. larger QPs, we have another set {R(B), D d (D), D VS (D)}, where D d (D) = D d (B). Therefore, from the RD set {R(A), D d (C), D VS (C)} to the set {R(B), D d (D), D VS (D)}, the bt rate savng R = R(A)-R(B) s acheved at the cost of depth qualty degradaton, D d = D d (D)-D d (C), and vew synthess mage qualty degradaton, D VS = D VS (D)-D VS (C). Bascally, the codng performances of sets {R(A), D d (C), D VS (C)} and {R(B), D d (D), D VS (D)} are for one depth encoder. Depth vdeo s used to synthesze vrtual vew mages. Thus, the vew synthess dstorton wll be consdered n the depth codng optmzaton. To mprove the depth codng effcency wth the ADD model, we can properly allocate the depth dstorton based on the ADD nformaton, and move the dstorton from C to the nner yellow regon, such as the pont F. We can see that the vew synthess dstorton dfference D VS (F) D VS (C) s smaller than D VS (D) D VS (C), whch means vew synthess dstorton s reduced whle mantanng the same bt rate reducton R = R(A) R(B). It means F has better RD performance than C and D. In fact, the pont C may change to E, F, G or H due to dfferent optmzaton technques, and E, F, G or H are all better than C n terms of R&D VS performance. If the RD moves from C to G, t means the ADD based optmzaton can reduce the vew synthess dstorton from D VS (C) to D VS (G) wth the same bt rate R(A). If RD moves from C to E, t means both the vew synthess dstorton and bt rate can be reduced. We also fnd that the smaller the dstance s between the end pont (G, E, F or H) and the blue lne, the better the RD performance s. The blue lne s the upper bound of the RD performance for ADD based optmzaton technques. The maxmum potental Peak Sgnal-to-Nose-Rato (PSNR) gan at the same bt rate can be calculated when the D VS &D d pont locates at the upper bound, whch s D VS + D VS,max PSNR VS,max = 10 log 10, (11) D VS where D VS s the vew synthess dstorton of the orgnal codng scheme, and D VS,max s the addtonal vew synthess dstorton reducton acheved by a new scheme. Based on the D VS &D d relatonshp of NoADD, we get the maxmum vew synthess dstorton dfference as D VS,max = D d,max tan θ, where D d,max s MSE of v j,and v j s an addtonal quantzaton error at poston (, j), v j [ v, v + ]. In the new scheme, the maxmum PSNR s acheved when v j equals to ts the maxmum or mnmum values, v + and v. Suppose v = v +, they approxmate to half of the depth nterval,.e. W DI /2. Then, D d,max s the MSE of v j when v j s ± W DI /2, thus, D d,max s obtaned as WDI 2 /4. Consequently, applyng t to (11), the maxmum potental PSNR gan s D VS + ( W 2 / ) DI 4 tan θ PSNR VS,max = 10 log 10. (12) D VS Takng the Doorflowers sequence as an example, the W DI s 12; from the NoADD n Fg.4b, tan θ s approxmately equal to 0.35 n terms of the rato of ts y-axs dynamc range to ts x-axs dynamc range; D VS s 6.5 when the orgnal PSNR of syntheszed mage s 40dB. Thereby, we get the potental gan PSNR VS,max s up to 4.68dB, whch s a large codng gan. In color vdeo codng, the objectve s mnmzng the dstorton under the bt constrant. However, n depth codng, the depth s used for vrtual vew synthess. The objectve s mnmzng vew synthess dstorton D VS under depth bt rate constrant, or, mnmzng the depth codng bt rate whle mantanng the same vew synthess mage qualty. In other words, the optmzaton RD model for depth codng s not R&D d model but R&D VS model. In the followng subsectons, we frstly buld the mathematcal relaton between D VS and D d wth ADD redundances. Then, we present a new ADD based RD model to mnmze D VS n the varable block sze mode decson, mult-reference frame selecton as well as ME/DE. Fnally, we present an ADD based DBR scheme, whch further reduces the depth bt rate whle mantanng D VS. B. ADD Based Dstorton Model The depth dstorton (D d ) leads to the renderng poston error (D r ) n the vew synthess, and then ths D r leads to the vew synthess dstorton (D VS ). Therefore, to analyze the D VS &D d relaton and the ADD n vew synthess, we dvde the analyses nto two sub-steps, whch are analyzng D VS &D r and D r &D d relatonshp. When the uncompressed depth maps are used n vew synthess, the vrtual vew mage I V,Dorg s projected from the pxels of reference color mage I T wth dsparty d,.e. I V,Dorg = I T (d), where d s a 2D dsparty map n terms of I T and I V,Dorg. The dsparty map d can be expressed as {(d x (, j), d y (, j)) [0, M); j [0, N)}, where d x (, j) and d y (, j) are horzontal and vertcal dsparty at (, j), M and N are the wdth and heght of the mage I T, respectvely. However, when dstorted depth maps are used n vew synthess, the vrtual vew mage I V,Drec s also projected from I T but wth dfferent dsparty d+ r. Ths r = {( rj x, r y j ) [0, M); j [0, N)}s a 2D renderng poston error caused by the depth dstortons, where rj x and r y j are the horzontal and vertcal renderng poston errors at poston (, j). Thus, the new vrtual vew mage s I V,Drec = I T (d + r). Consequently, the dfference map between syntheszed vrtual vew mages D V can be calculated as D V = I V,Dorg I V,Drec = I T (d) I T (d + r). (13) It means the syntheszed mage dfference D V caused by depth dstorton can be presented as the dfference among neghborng pxels n the reference color mage I T [31]. Therefore, the

7 ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4885 Fg. 7. error. Pecewse relaton between depth dstorton and renderng poston Fg. 6. Statstcal D VS &D r relatonshp n terms of MSE and MAD. (a) MSE. (b) MAD. average vew synthess dstorton D VS s computed by D VS = 1 ( I T (, j) I T + rj x MN j), j + r y β, (14), j where I T (, j) s the pxel value n mage I T wth poston (, j), β s1formadand2formse. For parallel camera settngs, the dsparty s manly ether horzontal or vertcal,.e. one of rj x and r y j element s approxmately equal to zero. Thus, to analyze ths relaton between( D VS and) renderng poston error n (14), each rj x, r y j was randomly set as one of the four sets, {( r j, 0), (- r j, 0), (0, r j ),(0,- r j )}, to calculate the D VS,where r j {1, 2, 3, 4, 5, 6, 7}. Seven dfferent 3DV sequences were tested. Fg.6 plots the relatonshp between D VS &D r, where the x-axs s MSE or MAD of r j,.e.d r,they-axs s vew synthess dstorton measured wth MSE or MAD,.e. D VS. The ponts wth dfferent symbols are real collected data and dash dot lnes are the lnear fttng results of the collected data. We used the correlaton coeffcent to ndcate the goodness of fttng and the fttng s better when t s closer to 1. In Fg.6, the average correlaton coeffcents are and for the lnear fttngs of the MSE and MAD values, respectvely, whch ndcates real data and fttng results are hghly correlated. Thus, we can conclude that t has a lnear relatonshp between D VS and D r. Therefore, the D VS can be modeled as [26] D VS = K 1 D r + K 2, (15) where D r = MN 1 rj β s MSE or MAD of the renderng poston error r j. K 1 and K 2 are constants, K 1 s correlated wth color texture and usually ncreases as the texture gets complex. Derved from (15), ths lnear relatonshp s also true when D VS and D r are measured wth ether Sum of Absolute Dfference (SAD) or Sum of Squared Dfference (SSD). Based on the ADD analyses n Secton II and subsecton III.A, an example of relatonshp between the renderng poston error and depth dstorton s llustrated n Fg.7. The x-axs s depth dstorton at poston (, j) ( v j ) and y-axs s renderng poston error at poston (, j) n the rendered mages ( r j ). It s a many-to-one mappng whle projectng depth dstorton v j to the renderng poston error r j. Ths r j and v j relaton map mght not be symmetrc wth the orgn of the coordnate. Mathematcally, r j can be presented as r j = vj v + j W DI + 1 v j > v + j vj + v j W DI 0 vj v j v j + (16) 1 v j < vj, where s a cel operaton, v j s the dfference between the orgnal depth value v j and the reconstructed depth value ṽ j at poston (, j). If v + j and v j are zero, W DI equals to1and r j equals to v j. In ths case, the ADD based dstorton metrc s just the same as the tradtonal dstorton metrc. Accordng to the defntons, the depth dstorton D d and renderng poston error D r are the MAD or MSE of v j and r j, Therefore, the D VS &D d relatonshp can mplctly be revealed by combnng (15) and (16). C. ADD Based RD Model for Mode Decson and ME/DE The RD model n H.264/AVC based vdeo codec can be presented as [26], [34] R (D) = k ln ( σ 2 /D ), (17) where D s output dstorton and σ 2 s varance of an nput pcture, k s a constant. Takng the dervatve of R(D) wth

8 4886 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 respect to D and settng ts value to 1/λ MODE yelds [26] dr(d)/dd 1/λ. (18) Substtutng (17) nto (18), we get the optmal Lagrangan multpler as [26] λ = D/k. (19) As the depth vdeo can be treated as Y component of color and coded by tradtonal hybrd H.264/AVC based codng standard, ths model n (17) s also applcable to the depth codng. Thus the RD model for depth codng s R d (D d ) = k d ln ( σ 2 d /D d), (20) where D d s output dstorton; σd 2 s the varance of an nput depth; k d s a constant. However, snce the reconstructed depth vdeo s used for vrtual vew renderng, the vew synthess dstorton (D VS ) wll be taken nto account n the new RD model. On the other hand, compressed depth bt rate R d s actually transmtted. To calculate the new Lagrangan factor (λ VS ) for vew synthess orented vdeo codng, we take the dervatve of R d wth respect to D VS and set ts value to 1/λ VS. Thus, dr d = dr d (D d ) /dd d 1 dd VS dd VS /dd d λvs. (21) Accordng to (16), the renderng poston error r j can be rewrtten as r j = 1 v j + ε j, (22) W DI where ε j s a zero mean unform dstrbuted roundng error. As ndcated by the Law of Large Number (LLN), the average value of all the samples approxmates to ther mathematcal expectaton when the number of samples s large. When the dstorton s measured wth MSE, the D r can be presented as D r E ( r 2) ( ) 1 2 = E ( v 2) W DI +2 1 E ( vε) + E ( ε 2) (23) W DI where E() s the mathematcal expectaton functon. The depth dstorton v and the roundng error ε can be regarded as ndependent varables n the codng process. Thereby, E( vε) s equal to E( v)e(ε). The quantzaton error v j and roundng error ε j can be regarded as zero mean dstrbuted [31], thus, E( vε) = 0snceE( v) = 0andE(ε) = 0. Therefore, (23) can be expressed as D r = 1 WDI 2 D d + E ( ε 2), (24) where D d and D r are measured wth MSE. Snce E(ε 2 ) s ndependent to D d, ts dervatve E(ε 2 )/ D d = 0. Therefore, for the mode decson, we apply (15) to (21) and (21) can be rewrtten as k d /D d = 1 d (K 1 D r + K 2 ) /dd d λ VS. (25) MODE where λ VS MODE s Lagrangan multpler for the mode decson. Hence, applyng (24) nto (25), we obtan ( ) λ VS MODE = K 1 D d / k d WDI 2. (26) When the depth vdeo s coded by the H.264/AVC based vdeo codec, the D and k n (19) equal to D d and k d. Then, applyng (19) to (26), the Lagrangan multpler for the mode decson s λ VS MODE = K 1λ MODE /WDI 2. (27) The objectve of depth codng s to mnmze the vew synthess dstorton at a gven depth bt rate,.e. mnmzng R&D VS cost functon. Therefore, we need to mnmze the ADD based Lagrangan cost functon for mode decson as mn J VS MODE, J VS MODE = SSD VS + λ VS MODE R d,mode ( ) = K 1 SSD r + K 1 λ MODE /WDI 2 R d,mode + MNK 2, (28) where K 1 and K 2 are constants, R d,mode s the bts of encodng mode and resdue. Snce MNK 2 and K 1 are constants, the optmzaton objectve n (28) can be rewrtten as mn J VS MODE, J VS MODE = W 2 DI SSD r +λ MODE R d,mode (29) where SSD r = rj 2 and rj s a pecewse functon n (16). Equaton (29) mples that f we apply ADD based mode decson to the tradtonally vdeo codec and encode the depth vdeo wth t, we need to replace the orgnal dstorton term wth WDI 2 SSD r,wherew DI s calculated from camera parameters and baselnes, etc., accordng to (10). On the other hand, the new dstorton term s rrelevant to coeffcents K 1 and K 2. As for the ME/DE process, the second order dstorton metrc, MSE/SSD, s replaced by the frst order dstorton metrc, such as MAD/SAD, for low complexty purpose. Therefore, we can deduce the new RD model for ME/DE n a smlar way. Derved from (16), the absolute r j,.e. r j, can be calculated as rj 1 = vj + ζj, (30) W DI where ζ j s the roundng error satsfyng unform dstrbuton wth W DI /2 mean. Smlarly, when the depth dstorton s measured wth MAD, D r can be presented as the mathematcal expectaton of r j accordng to the LLN, whch s D r E ( r ) = 1 E ( v ) + E (ζ ), (31) W DI where E( v ) s the MAD of dstorted depth mage,.e. D d, E(ζ ) = W DI /2 s a constant and ndependent to the dstorton D d. Applyng (31) to (21), we get kw DI 1 = K 1 D d λ VS. (32) MOTION By applyng (19) to (32) as D and k are replaced by D d and k d, the Lagrangan multpler for ME/DE s λ VS MOTION = K 1λ MOTION / WDI. (33)

9 ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4887 TABLE I PROPERTIES OF THE TEST 3D VIDEO SEQUENCES Fg. 8. Flowchart of the proposed ADD based DBR. The new Lagrangan cost functon for ME/DE and reference frame selecton mn JMOTION VS, J MOTION VS = SAD VS + λ VS MOTION R d,motion = K 1 SAD r + (K 1 λ MOTION /W DI ) R d,motion + MNK 2 (34) where R d,motion ndcates the codng bts of moton/ dsparty vectors and reference frame ndces. Therefore, the optmzaton target can be rewrtten as mn JMOTION VS, J MOTION VS = W DI SAD r + λ MOTION R d,motion (35) where SAD r = rj and r j can be referred from (16). It mples that we can replace the orgnal dstorton term wth W DI SAD r n the calculaton of RD cost n the processes of the ME/DE and reference frame selecton, etc. D. ADD Based Depth Bt Reducton (DBR) In the above secton, a new ADD model s utlzed to mnmze the vew synthess dstorton for mode decson, reference frame selecton, and moton/dsparty estmaton. Though the vew synthess dstorton s mnmzed n these processes, the bt rate could also be reduced to further mprove the RD performance. In depth codng, depth resdues wll be encoded and transmtted to compensate the dfferences between the predcted and orgnal sgnals. These resdues cost the depth bt rate. However, the depth vdeo s used for vrtual vew renderng n 3D vdeo system and the vew synthess dstorton s fnally measured. Some encoded resdues reduce D d but not necessary reduce D VS due to the ADD n vew synthess. These resdues cost depth bt rate but do not contrbute to reducng D VS. Thus, they should not be encoded and these codng bts could be saved consequently. The flowchart of the proposed ADD based DBR s llustrated n Fg.8. Frstly, the current MB s encoded wth ntal QP of the current slce. If the Coded Block Pattern (CBP) of the best mode equals to zero, no resdual bts are encoded n the current MB. It means no more bts could be saved and the codng process for the current MB can be early termnated. Otherwse, we calculate the renderng poston error (.e. MSE of the r, D r 1 ) whch s caused by the depth dstorton of the current MB usng (16). Then, ncrease the QP by a step sze of N,.e. QP = QP + N, and re-encode the current MB wth the new QP. We need to re-calculate the renderng poston error D r 2 for the current MB. If the D r 2 s larger than D r 1, t mples that the vew synthess dstorton ncreases when usng the new QP. The prevous QP s the best one and we wll load the prevous best codng nformaton and end ths codng process. Otherwse, we wll further ncrease QP and re-encode the current MB untl the resdual coeffcent are all-zero (.e. CBP s zero) or QP reaches the pre-defned maxmum value, MAX_QP. The ADD based DBR algorthm s a multple pass codng whch maxmzes the R&D VS performance for the non-zero coeffcent MB. Meanwhle, the optmal MB mode and ME/DE vectors wll be selected wth dfferent QPs. The ncremental step N ndcates the fdelty of QP. AstheQP step sze N ncreases, t reaches the termnaton condtons more quckly, and thus, the codng complexty can be reduced. However, the codng effcency may decrease as N ncreases snce the encoder mght mss the optmal QP when uses coarse fdelty. In ths paper, N s set as the mnmum value 1 and fxed for all test sequences to maxmze the R&D VS performance. For the MB whose CBP s zero, ts codng complexty s dentcally the same as the orgnal JMVC. Based on the statstcal analyses on the fve dfferent 3D depth sequences and dfferent QPs, we found that the CBPs of 71% MBs on average n INTRA frames and 96% MBs on average n INTER frames are zero. It means only 29% INTRA MBs and 4% INTER MBs on average need the multple pass codng n the proposed ADD based DBR scheme. IV. EXPERIMENTAL RESULTS AND ANALYSES To evaluate the codng effcency of the proposed algorthms, MVC reference software JMVC 8.3 was used. Eght 3D vdeo sequences, ncludng Kendo, Balloons, Champ.Tower, Pantomme, Dog [35], Doorflowers [36], PoznanStreet [37] and UndoDancer [38], wth varous moton propertes, resolutons and camera baselnes were used. Ther propertes are shown n Table I. Three depth vews

10 4888 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 TABLE II BDBR AND BDPSNR COMPARISONS FOR INTRA DEPTH CODING,VIRTUAL VIEW IMAGES WERE RENDERED WITH THE ORIGINAL COLOR IMAGES.(UNIT:%/DB) were encoded n the three-vew confguraton of the JMVC codec. The two ntermedate vews syntheszed by the orgnal color and depth vdeos were used as a reference for the vew synthess qualty comparson [32]. For example, the 1 st,3 rd and 5 th depth vew were encoded as the 0, 1 and 2 vew n the three-vew confguraton codec. The 2 nd and 4 th vew are syntheszed as vrtual vews for mage qualty evaluaton. In these test sequences, depth vdeos of Kendo, Balloons, PoznanStreet and UndoDancer are avalable, the rest of depth sequences were generated by Depth Estmaton Reference Software (DERS) [39]. Vew Synthess Reference Software (VSRS) [40] was used for the vew synthess, where both nteger and half-pxel renderng precson were tested. Averagng process was used for the vew mergng and hole-fllng n the vew synthess whch s the same as the settngs n [15]. Bass QPs were set as 16, 20, 24 and 28. Sx dfferent codng schemes, the orgnal JMVC, Zhao s scheme [15] denoted as ZhaoTIP, our prevous scheme [26] denoted as ZhangTIP, the proposed ADD based RDO scheme (denoted by ADD_RDO ), proposed DBR scheme (denoted by ADD_DBR ), the proposed overall scheme whch ntegrates ADD_RDO and ADD_DBR (denoted by ADD_RDO+DBR ) were mplemented for comparson. Average PSNR of syntheszed mages was used for the depth mage qualty evaluaton. The PSNR of syntheszed mage s calculated as PSNR VS,χ = 10 log 10 (36) MSE VS,χ MSE VS,χ = 1 MN N 1 =0 M 1 j=0 Vχ (, j) V O (, j) 2 (37) where V O (, j) s the rendered mage pxel value at (, j) generated by the orgnal color vdeo and the orgnal depth vdeo. V χ (, j) s the rendered mage pxel value at (, j) generated by the orgnal color vdeo and reconstructed depth vdeo, χ ndcates depth codng scheme, χ {JMVC, ZhaoTIP [15], ZhangTIP [26], ADD_RDO, ADD_DBR, ADD_RDO+DBR}, M and N are the wdth and heght of the rendered mages, respectvely. When the nteger-pxel precson settng was used n the vew synthess, ZhaoTIP scheme mproves the codng effcency for most sequences compared wth the orgnal JMVC. It has Bjontegaard Delta PSNR (BDPSNR) [41] gan of 0.11dB, whch means 0.11dB mprovement n PSNR under the same amount of bt consumpton or 7.83% Bjontegaard Delta Bt Rate (BDBR) [41] whch means reducng 7.83% of bt consumpton wth the same qualty n PSNR, as shown n Table II. Note that n Tables II and III, some BDBR values are not avalable because the fttng algorthm s not applcable or the gap s too large. Thus, they are labeled as NA. These unavalable data were not n the calculaton of the average value. For hgh defnton vdeo sequences (e.g. PoznanStreet and UndoDancer), ZhaoTIP scheme s nferor to JMVC, whch reveals the codng gan of ZhaoTIP s not stable and sequence content dependent. The man reason s that ZhaoTIP scheme s a pre-processng for depth vdeo, whch smoothes the depth vdeo to reduce INTRA predcton resdues and thus mproves codng effcency. Though the smoothng dstorton can be controlled wthn ADD, t can hardly guarantee the total dstorton (.e. the quantzaton dstorton plus the smoothng dstorton) s stll wthn the ADD depth nterval W DI, especally n the case of usng larger QPs. For ZhangTIP scheme, t explots the regonal selectvty of the depth redundances. It mproves the RD performance for all sequences and acheves 0.61 db BDPSNR gan on average for nteger-pxel accuracy settng. In terms of BDBR, t acheves 16.01% bt rate savng on average. As for the proposed algorthms, the proposed ADD_DBR scheme not only reduces sze of unnecessary depth codng

ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4889 TABLE III BDBR AND BDPSNR COMPARISONS FOR INTRA DEPTH CODING,WHERE VIRTUAL VIEW IMAGES WERE RENDERED WITH THE CODED COLOR IMAGES.

(a) Rendered mage, (b) enlarged vrtual mage rendered by usng the orgnal color and depth maps, (c) to (f) are enlarged mages rendered from the depth maps coded by the JMVC, ZhaoTIP, ZhangTIP and

27 db to 0.99 db and 0.63 db on average for the nteger pxel precson settng, as shown n Table II. The proposed ADD_RDO algorthm mproves BDPSNR from 0.13 db to 4.89 db and 2.

11 ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4889 TABLE III BDBR AND BDPSNR COMPARISONS FOR INTRA DEPTH CODING,WHERE VIRTUAL VIEW IMAGES WERE RENDERED WITH THE CODED COLOR IMAGES.(UNIT:%/DB) Fg. 9. Vsual comparsons among the rendered mages, 2 nd vew of Kendo sequence. (a) Rendered mage, (b) enlarged vrtual mage rendered by usng the orgnal color and depth maps, (c) to (f) are enlarged mages rendered from the depth maps coded by the JMVC, ZhaoTIP, ZhangTIP and proposed ADD_RDO+DBR scheme, respectvely. bts, but also mproves the rendered mage qualty by optmal mode selecton wth dfferent QPs. The BDPSNR gan of the proposed ADD_DBR scheme s acheved from 0.27 db to 0.99 db and 0.63 db on average for the nteger pxel precson settng, as shown n Table II. The proposed ADD_RDO algorthm mproves BDPSNR from 0.13 db to 4.89 db and 2.19 db on average when compared wth the orgnal JMVC, whch s a sgnfcant gan. By combnng the ADD_DBR and ADD_RDO algorthm together, the proposed overall algorthm (ADD_RDO+DBR) mproves BDPSNR from 0.21 db to 5.76 db and 2.68 db on average. The RD gans of the proposed ADD_DBR and ADD_RDO are addtve, whch ndcates the two algorthms mprove the RD performance Fg. 10. Vsual comparsons among the rendered mages, 3.5 vew of PoznanStreet sequence. (a) Rendered mage, (b) enlarged vrtual mage rendered by usng the orgnal color and depth maps, (c) to (f) are enlarged mages rendered from the depth maps coded by the JMVC, ZhaoTIP, ZhangTIP and proposed ADD_RDO+DBR scheme, respectvely. n two dfferent aspects. In terms of the BDBR, the three proposed algorthms acheve 18.74%, 48.19% and 51.29% average bt rate savng, respectvely. These sgnfcant codng gans ndcate that the proposed algorthms are effectve n explotng the spatal redundances n INTRA depth codng. Addtonally, the vrtual vew mages rendered from dfferent schemes are also vsually compared. Fg. 9 and Fg. 10 show the comparsons for Kendo and PoznanStreet, where (a) s an example of rendered mage where the green rectangle s enlarged, (b) s the enlarged regon of the vrtual mage rendered by usng the orgnal color and depth maps. It s regarded as the ground truth accordng to [32]. Subfgures (c) to (f) are enlarged mages that are rendered from

12 4890 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 the depth maps coded by the JMVC, ZhaoTIP, ZhangTIP and the proposed ADD_RDO+ DBR scheme, respectvely. As for the Kendo sequence, we can observe that the ground truth s most concdent wth the real vew. Meanwhle, comparng (c) to (f) wth (b), we can observe that there are artfacts n the boundares and ths dstorton s gradually reduced from (c) to (f). The rendered mage qualty from the proposed scheme s the best among the compared benchmarks. For the PoznanStreet sequence, there are some vsble artfacts for the ground truth. Ths s because the nose may be ncluded n the orgnal depth maps. In ths paper, we assume that the orgnal depth map s nose free, and we regard the rendered mage qualty s better f t s closer to the ground truth mage [32]. We can fnd the mage by the proposed algorthm s closer to the ground truth compared wth those from the benchmark schemes, whch thereby proves the effectveness of the proposed algorthm n the vsual comparson. When the half-pxel precson was used n the vew synthess, the ADD nterval W DI and the ADD range are reduced. The RD comparsons of usng half-pxel precson settng are llustrated n bottom part of the Table II. In ths settng, ZhaoTIP scheme s nferor to the orgnal JMVC for most sequences because the quantzaton error plus the smooth error exceeds the ADD nterval (W DI ) and range more easly. For Dog and Pantomme, ZhaoTIP scheme acheves sgnfcant RD mprovement manly because much nose s n these depth sequences as they are generated by DERS. For ZhangTIP scheme, t acheves 0.52 db BDPSNR gan or 10.77% bt reducton on average compared wth the orgnal JMVC. The proposed ADD_DBR, ADD_RDO and ADD_RDO+DBR acheve BDPSNR gans of 0.47 db, 1.25dB and 1.58 db on average, respectvely. Ther bt rate savngs are 7.79%, 18.99% and 23.88%, respectvely, whch are almost half of the bt reducton of the nteger pxel precson settng. Though the BDPSNR gans are smaller than those of the nteger precson settng due to ther smaller W DI, the proposed overall scheme stll outperforms the JMVC, ZhaoTIP and ZhangTIP sgnfcantly. Furthermore, the RD performances of the benchmarks and the proposed algorthms were also evaluated when the color mages wth compressed dstortons were rendered n the vew synthess. Color and depth vdeos were separately compressed wth QP C and QP D,whereQP C, QP D {16, 20, 24, 28}. The color vdeo was encoded wth the orgnal JMVC and depth vdeo was encoded wth the tested codng schemes. Ther reconstructed color and depth from decodng, QP C = QP D, were used to synthesze the vrtual vew mages, and total color plus depth bts were counted n x-axs [26], [32] for RD evaluaton. Table III shows the BDBR and BDPSNR of ZhaoTIP, ZhangTIP and the three proposed algorthms when compared wth the orgnal JMVC. We can observe that ZhaoTIP reduces 3.04% bt rate but degrades BDPSNR 0.08dB on average for the nteger pxel precson settng. On the other hand, t ncreases bt rate 12.81% and degrades BDPSNR 0.19dB on average for half-pxel precson settng. Its RD performance s nferor to the orgnal JMVC n average BDPSNR and BDBR. Another benchmark, ZhangTIP scheme, reduces the BDBR 9.17% and 5.28%, respectvely, for nteger and half pxel precson. Or, t acheves gans of 0.26 db and 0.16 db on average n terms of BDPSNR. The proposed schemes, ADD_DBR, ADD_RDO and ADD_RDO+DBR, acheve 8.13%, 20.93% and 23.82% BDBR reducton on average for nteger-pxel precson, respectvely, when compared to the orgnal JMVC. Meanwhle, they can acheve 3.01%, 7.42% and 9.62% BDBR reducton on average for half-pxel precson, respectvely. In terms of BDPSNR, the proposed ADD_DBR, ADD_RDO and ADD_RDO+DBR can acheve average gans of 0.22 db, 0.68dB and 0.82 db, respectvely, for the nteger pxel precson. And they acheve gans of 0.15 db, 0.32dB and 0.40dB on average, respectvely, for the half-pxel renderng precson. We fnd that 1) compared wth the half pxel precson, more gans are acheved for the nteger pxel precson. 2) The proposed ADD_RDO and ADD_RDO+DBR schemes acheve much better codng performance than the comparatve benchmarks, ncludng the JMVC, ZhaoTIP and ZhangTIP. 3) The gans are smaller than n the cases rendered by the orgnal color mages. It s because the color encoder was not optmzed, and meanwhle the color bts were counted n the bt rate whch shares the gans. In addton to the INTRA depth codng, the RD performance for the INTER frame depth codng s also evaluated. Full ME/DE was enabled and ther search ranges are ±64, SAD metrc was used for both full and sub-pxel ME/DE search. The number of b-predcton teraton s 4 and search range for teraton s 8. The maxmum number of reference frames s 2 for each memory lst and Group-Of-Pcture (GOP) length s 8. Three depth vews were encoded wth MVC structure usng Herarchcal B predcton [1]. Sx 3D vdeo sequences wth dfferent characterstcs and resolutons, Balloons, Kendo, Doorflowers, Dog, UndoDancer and PoznanStreet, were tested and ther depth vdeos were encoded. The reconstructed depth mages and the orgnal/coded color mages were used to render the vrtual vew mages. The nteger pxel precson was used n the vew synthess. The PSNR of the vrtual vew mages were measured between the mages rendered from the coded color/depth mages and the mages rendered from the orgnal color/depth mages. Fve schemes, ncludng JMVC, ZhangTIP, the proposed ADD_DBR, ADD_RDO and ADD_RDO+DBR, were mplemented and compared. From the upper part of the Table IV, we can observe that ZhangTIP scheme reduces the depth bt rate and meanwhle mproves the vew synthess mage qualty. Compared wth the orgnal JMVC, t acheves 0.56 db BDPSNR gan on average or 18.16% BDBR reducton. ADD_DBR acheves 0.34dB BDPSNR gan on average or 12.27% BDBR reducton, whch s a lttle bt nferor to ZhangTIP. Moreover, the ADD_RDO scheme acheves BDPSNR gan of 3.71 db on average or 58.34% bt reducton. Whle combnng ADD_DBR and ADD_RDO schemes together, t mproves BDPSNR more whch s 4.07 db gans on average. Codng gan s especally hgh for UndoDancer snce t has relatve larger W DI. Compared wth RD performance of INTRA codng, the ADD_DBR algorthm acheves less BDPSNR gan because INTER frames usually contan less resdue and thus have smaller room for the bt reducton optmzaton. Besdes, the

13 ZHANG et al.: EFFICIENT MULTIVIEW DEPTH CODING OPTIMIZATION 4891 TABLE IV BDBR/BDPSNR COMPARISON FOR INTER AND INTRA CODING (UNIT:%/DB) Fg. 11. Computatonal complexty comparson. proposed ADD_RDO scheme acheves more BDPSNR gan compared wth INTRA depth codng. Ths s because the new RDO model n the INTER depth codng enables the ME/DE to fnd most matchng block wth less vew synthess dstorton. Generally, the RD gans acheved by ADD_DBR and ADD_RDO are addtve. Furthermore, the bottom part of the Table IV shows the BDBR and BDPSNR comparson among the four dfferent schemes, where coded color mages were used n vew synthess and total bts (.e. color plus depth bts) were counted. It s observed that ZhangTIP acheves 9.97% BDBR reducton or 0.24dB BDPSNR gan on average when compared wth the JMVC. Besdes, the proposed ADD_DBR, ADD_RDO and ADD_RDO+DBR acheve 5.53%, 34.66% and 35.77% BDBR reducton or acheve average BDPSNR gans of 0.11dB, 1.05dB and 1.11dB, respectvely. Generally, ADD_DBR s better than JMVC but nferor to ZhangTIP. Meanwhle, the proposed ADD_RDO and ADD_RDO+DBR are much better than ZhangTIP scheme. Usually, the RD gans are gettng smaller when the color bts possess lager proporton of the total bts because the color vdeo encoder was not optmzed. Fg.11 shows the average encodng tme for each GOP among the tested depth codng schemes. Compared wth the JMVC, ZhangTIP scheme ncreases the complexty from 9.5% to 10.1%, 9.8% on average. It s manly caused by addtonal operatons of mage segmentaton and pre-analyss on MAD for the depth vdeo. The ADD_DBR ncreases the codng complexty from 4.9% to 30.8% and 17.0% on average, due to the multple pass codng for non-zero coeffcent blocks. On the other hand, ADD_RDO ncreases the complexty 247% on average. It s because the new ADD based dstorton metrc requres addtonal operatons n software mplementaton, ncludng one dvson, one subtracton, one f-else operaton and several loadng operatons, when compared wth the tradtonal SAD. It s tme-consumng when ntegrated n the ME/DE and hgh frequently called by RD cost calculaton. The overall algorthm ADD_RDO+DBR ncreases complexty about 320% due to the multple pass codng plus the new dstorton metrc n RD cost calculaton. V. CONCLUSIONS In ths paper, we formulate vew synthess dstorton and depth dstorton as a many-to-one mappng relatonshp, and then present an Allowable Depth Dstorton (ADD) model for depth vdeo codng optmzaton. Based on ths ADD model, we propose a new RD model for mode decson and moton/dsparty estmaton by mnmzng the vew synthess dstorton at gven bt rate. In addton, ADD based depth bt reducton algorthm s also presented to extensvely explot the ADD redundances and mprove codng effcency. Expermental results over dfferent vdeo sequences, parameters and metrcs demonstrate the hgh effcency of the proposed ADD based algorthms. REFERENCES [1] K. Muller, P. Merkle, and T. Wegand, 3D vdeo representaton usng depth maps, Proc. IEEE, vol. 99, no. 4, pp , Apr [2] C. Fehn, Depth-mage-based renderng (DIBR), compresson and transmsson for a new approach on 3D-TV, Proc. SPIE, vol. 5291, pp , May [3] A. Vetro, T. Wegand, and G. J. Sullvan, Overvew of the stereo and multvew vdeo codng extensons of the H.264/MPEG-4 AVC standard, Proc. IEEE, vol. 99, no. 4, pp , Apr [4] Y. Zhang, S. Kwong, G. Jang, X. Wang, and M. Yu, Statstcal early termnaton model for fast mode decson and reference frame selecton n multvew vdeo codng, IEEE Trans. Broadcast., vol. 58, no. 1, pp , Mar [5] Y. Zhang, S. Kwong, G. Jang, and H. Wang, Effcent mult-reference frame selecton algorthm for herarchcal B pctures n multvew vdeo codng, IEEE Trans. Broadcast., vol. 57, no. 1, pp , Mar [6] K. Muller and A. Vetro, AHG Report on 3D Vdeo Codng, document JCT3V-A1001, ITU-T SG16 WP3&ISO/IEC JTC1/SC29/WG11, Stockholm, Sweden, Jul [7] K. J. Oh, A. Vetro, and Y. S. Ho, Depth codng usng a boundary reconstructon flter for 3D vdeo systems, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 21, no. 3, pp , Apr [8] S. Lu, P. La, D. Tan, and C. Chen, New depth codng technques wth utlzaton of correspondng vdeo, IEEE Trans. Broadcast., vol. 57, no. 2, pp , Jun [9] V.-A. Nguyen, D. Mn, and M. N. Do, Effcent technques for depth vdeo compresson usng weghted mode flterng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 23, no. 2, pp , Feb [10] J. Cho, D. Mn, D. Km, and K. Sohn, 3D JBU based depth vdeo flterng for temporal fluctuaton reducton, n Proc. 17th IEEE ICIP, Sep. 2010, pp [11] L. Zhu, Y. Zhang, X. Wang, and S. Kwong, Vew synthess dstorton elmnaton flter for depth vdeo codng n 3D vdeo broadcastng, Multmeda Tools Appl., Feb. 2014, do: /s [12] V. De Slva, W. Fernando, S. Worrall, H. K. Arachch, and A. Kondoz, Senstvty analyss of the human vsual system for depth cues n stereoscopc 3D dsplays, IEEE Trans. Multmeda, vol. 13, no. 3, pp , Jun

4892 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 [13] D. V. S. X. De Slva, E. Ekmekcoglu, W. A. C. Fernando, and S. T. Worrall, Dsplay dependent preprocessng of depth maps based on just notceable depth dfference modelng, IEEE J.

, vol. 19, no. 5, pp. 295 298, May 2012. [15] Y. Zhao, C. Zhu, Z. Chen, and L. Yu, Depth no-synthess-error model for vew synthess n 3D Vdeo, IEEE Trans. Image Process., vol. 20, no. 8, pp.

2013, do: 10.1007/s11554-013-0328-3. [17] J. Seo, D. Park, H.-C. Wey, S. Lee, and K. Sohn, Moton nformaton sharng mode for depth vdeo codng, n Proc. 3DTV-CON, Tampere, Fnland, Jun. 2010, pp. 1 4.

Crcuts Syst. Vdeo Technol., vol. 21, no. 12, pp. 1859 1868, Dec. 2011. [20] M.-K. Kang and Y.-S. Ho, Depth vdeo codng usng adaptve geometry based ntra predcton for 3D vdeo systems, IEEE Trans.

14 4892 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER 2014 [13] D. V. S. X. De Slva, E. Ekmekcoglu, W. A. C. Fernando, and S. T. Worrall, Dsplay dependent preprocessng of depth maps based on just notceable depth dfference modelng, IEEE J. Sel. Topcs Sgnal Process., vol. 5, no. 2, pp , Apr [14] Q. Lu, Y. Yang, R. J, Y. Gao, and L. Yu, Cross-vew down/upsamplng method for multvew depth vdeo codng, IEEE Sgnal Process. Lett., vol. 19, no. 5, pp , May [15] Y. Zhao, C. Zhu, Z. Chen, and L. Yu, Depth no-synthess-error model for vew synthess n 3D Vdeo, IEEE Trans. Image Process., vol. 20, no. 8, pp , Aug [16] Z. Pan, Y. Zhang, and S. Kwong, Fast mode decson based on texture depth correlaton and moton predcton for multvew depth vdeo codng, J. Real-Tme Image Process., Mar. 2013, do: /s [17] J. Seo, D. Park, H.-C. Wey, S. Lee, and K. Sohn, Moton nformaton sharng mode for depth vdeo codng, n Proc. 3DTV-CON, Tampere, Fnland, Jun. 2010, pp [18] S.-T. Na, K.-J. Oh, C. Lee, and Y.-S. Ho, Mult-vew depth vdeo codng usng depth vew synthess, n Proc. IEEE ISCAS, May 2008, pp [19] J. Y. Lee, H.-C. Wey, and D.-S. Park, A fast and effcent mult-vew depth mage codng method based on temporal and nter-vew correlatons of texture mages, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 21, no. 12, pp , Dec [20] M.-K. Kang and Y.-S. Ho, Depth vdeo codng usng adaptve geometry based ntra predcton for 3D vdeo systems, IEEE Trans. Multmeda, vol. 14, no. 1, pp , Feb [21] W.-S. Km, A. Ortega, P. La, D. Tan, and C. Gomla, Depth map dstorton analyss for vew renderng and depth codng, n Proc. 16th IEEE ICIP, Nov. 2009, pp [22] Q. Zhang, P. An, Y. Zhang, and Z. Zhang, Effcent renderng dstorton estmaton for depth map compresson, n Proc. 18th IEEE ICIP, Sep. 2011, pp [23] W.-S. Km, A. Ortega, P. La, D. Tan, and C. Gomla, Depth map codng wth dstorton estmaton of rendered vew, Proc. SPIE, vol. 7543, pp B B-10, Jan [24] T.-Y. Chung, W.-D. Jang, and C.-S. Km, Effcent depth vdeo codng based on vew synthess dstorton estmaton, n Proc. IEEE VCIP, Nov. 2012, pp [25] H.-P. Deng, L. Yu, B. Feng, and Q. Lu, Structural smlartybased syntheszed vew dstorton estmaton for depth map codng, IEEE Trans. Consum. Electron., vol. 58, no. 4, pp , Nov [26] Y. Zhang, S. Kwong, L. Xu, S. Hu, G. Jang, and C.-C. J. Kuo, Regonal bt allocaton and rate dstorton optmzaton for multvew depth vdeo codng wth vew synthess dstorton model, IEEE Trans. Image Process., vol. 22, no. 9, pp , Sep [27] H. Yuan, S. Kwong, J. Lu, and J. Sun, A novel dstorton model and Lagrangan multpler for depth maps codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 24, no. 3, pp , Mar [28] G. Tech, H. Schwarz, K. Muller, and T. Wegand, 3D vdeo codng usng the syntheszed vew dstorton change, n Proc. PCS, May, 2012, pp [29] Y. Lu, Q. Huang, S. Ma, D. Zhao, and W. Gao, Jont vdeo/depth rate allocaton for 3D vdeo codng based on vew synthess dstorton model, Sgnal Process., Image Commun., vol. 24, no. 8 pp , Sep [30] S. Hu, S. Kwong, Y. Zhang, and C.-C. J. Kuo, Rate-dstorton optmzed rate control for depth map-based 3D vdeo codng, IEEE Trans. Image Process., vol. 22, no. 2, pp , Feb [31] H. Yuan, Y. Chang, J. Huo, F. Yang, and Z. Lu, Model-based jont bt allocaton between texture vdeos and depth maps for 3D vdeo codng, IEEE Trans. Crcuts Syst. Vdeo Technol., vol. 21, no. 4, pp , Apr [32] B. T. Oh, J. Lee, and D.-S. Park, Depth map codng based on syntheszed vew dstorton functon, IEEE J. Sel. Topcs Sgnal Process., vol. 5, no. 7, pp , Nov [33] L. Xao, M. Johansson, H. Hnd, S. Boyd, and A. Goldsmth, Jont optmzaton of communcaton rates and lnear systems, IEEE Trans. Autom. Control, vol. 48, no. 1, pp , Jan [34] K. Takag, Y. Takshma, and Y. Nakajma, A study on rate dstorton optmzaton scheme for JVT coder, Proc. SPIE, vol. 5150, pp , Jun [35] M. Tanmoto, T. Fuj, and N. Fukushma, 1D Parallel Test Sequences for MPEG-FTV, document M15378, ISO/IEC JTC1/SC29/WG11, Archamps, France, Apr [36] I. Feldmann et al., HHI Test Materal for 3D Vdeo, document M15413, ISO/IEC JTC1/SC29/WG11, Archamps, France, Apr [37] M. Domañsk et al., Poznañ Multvew Vdeo Test Sequences and Camera Parameters, document ISO/IEC JTC1/SC29/WG11 MPEG 2009/M17050, X an, Chna, Oct [38] Undo-Dancer Vdeo Sequences. [Onlne]. Avalable: accessed Dec [39] M. Tanmoto, T. Fuj, M. P. Tehran, and M. Wldeboer, Depth Estmaton Reference Software (DERS) 5.0, document M16923, ISO/IEC JTC1/SC29/WG11, X an, Chna, Oct [40] M. Tanmoto, T. Fuj, and K. Suzuk, Vew Synthess Algorthm n Vew Synthess Reference Software 3.0 (VSRS 3.0), document M16090, ISO/IEC JTC1/SC29/WG11, Feb [41] G. Bjontegaard, Calculaton of Average PSNR Dfferences Between RD- Curves, document VCEG-M33, ITU-T Vdeo Codng Experts Group (VCEG), Austn, TX, USA, Yun Zhang (M 12) receved the B.S. and M.S. degrees n electrcal engneerng from Nngbo Unversty, Nngbo, Chna, n 2004 and 2007, respectvely, and the Ph.D. degree n computer scence from the Insttute of Computng Technology, Chnese Academy of Scences (CAS), Bejng, Chna, n From 2009 to 2014, he was a Post- Doctoral Research Assocate and Vstng Scholar wth the Department of Computer Scence, Cty Unversty of Hong Kong, Hong Kong. In 2010, he became an Assstant Professor wth the Shenzhen Insttute of Advanced Technology, CAS, where he has served as an Assocate Professor snce Hs research nterests are vdeo compresson, 3D vdeo processng, and vsual percepton. Sam Kwong (M 93 SM 04 F 13) receved the B.S. degree n electrcal engneerng from the State Unversty of New York at Buffalo, Buffalo, NY, USA, n 1983, the M.S. degree n electrcal engneerng from the Unversty of Waterloo, Waterloo, ON, Canada, n 1985, and the Ph.D. degree from the Unversty of Hagen, Hagen, Germany, n From 1985 to 1987, he was a Dagnostc Engneer wth Control Data Canada, Montreal, QC, Canada. He joned Bell Northern Research, Ottawa, ON, Canada, as a member of the scentfc staff. In 1990, he became a Lecturer wth the Department of Electronc Engneerng, Cty Unversty of Hong Kong, Hong Kong, where he s currently a Professor wth the Department of Computer Scence. Hs research nterests are vdeo and mage codng and evolutonary algorthms. Sudeng Hu receved the B.Eng. degree from Zhejang Unversty, Hangzhou, Chna, n 2007, and the M.Phl. degree from the Department of Computer Scence, Cty Unversty of Hong Kong, Hong Kong, n From 2010 to 2011, he was a Research Assocate wth the Department of Computer Scence, Cty Unversty of Hong Kong. In 2012, he took an nternshp wth Mtsubsh Electrc Research Laboratores, Cambrdge, MA, USA. He s currently pursung the Ph.D. degree wth the Department of Electrcal Engneerng, Unversty of Southern Calforna, Los Angeles, CA, USA. Hs research nterests nclude mage and vdeo compresson, rate control, scalable vdeo codng, and 3-D vdeo codng. Chung-Cheh Jay Kuo (F 99) receved the B.S. degree from Natonal Tawan Unversty, Tape, Tawan, n 1980, and the M.S. and Ph.D. degrees from the Massachusetts Insttute of Technology, Cambrdge, MA, USA, n 1985 and 1987, respectvely, all n electrcal engneerng. He s currently the Drector of the Multmeda Communcatons Laboratory and a Professor of Electrcal Engneerng, Computer Scence, and Mathematcs wth the Mng- Hseh Department of Electrcal Engneerng, Unversty of Southern Calforna, Los Angeles, CA, USA. Hs research nterests nclude dgtal mage/vdeo analyss and modelng, multmeda data compresson, communcaton and networkng, and bologcal sgnal/mage processng. He has co-authored about 200 journal papers, 850 conference papers, and 10 books. He s a fellow of the Amercan Assocaton for the Advancement of Scence and the Internatonal Socety for Optcal Engneers.

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration Improvement of Spatal Resoluton Usng BlockMatchng Based Moton Estmaton and Frame Integraton Danya Suga and Takayuk Hamamoto Graduate School of Engneerng, Tokyo Unversty of Scence, 6-3-1, Nuku, Katsuska-ku,