View Synthesis using Depth Map for 3D Video

View Synthesis using Depth Map fo 3D Video Cheon Lee and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 1 Oyong-dong, Buk-gu, Gwangju, 500-712, Republic of Koea E-mail: {leecheon, hoyo}@gist.ac.k Tel: +82-62-970-2258 Abstact Thee-dimensional (3D) video involves steeoscopic o multi-view images to povide depth expeience though 3D display systems. Binocula cues ae peceived by endeing pope viewpoint images obtained at slightly diffeent view angles. Since the numbe of viewpoints of the multi-view video is limited, 3D display devices should geneate abitay viewpoint images using available adjacent view images. In this pape, afte we explain a view synthesis method biefly, we popose a new compensation algoithm fo view synthesis eos aound object boundaies. We descibe a 3D waping technique exploiting the depth map fo viewpoint shifting and a hole filling method using multi-view images. Then, we popose an algoithm to emove bounday noises that ae geneated due to mismatches of object edges in the colo and depth images. The poposed method educes annoying bounday noises nea object edges by eplacing eoneous textues with altenative textues fom the othe efeence image. Using the poposed method, we can geneate peceptually impoved images fo 3D video systems. I. INTRODUCTION Consideable impovements of infomation technologies (IT) enable us to enjoy divese foms of multimedia sevices. Recently, 3D video becomes an emeging medium that can povide moe ealistic and natual expeiences to uses by auto-steeoscopic displays o fee viewpoint TV (FTV). One of main challenges is endeing continuous viewpoint images using 3D video. Thus, the image intepolation method to geneate abitay viewpoint images using the multi-view video is a key pat of 3D display systems. Since steeoscopic images wee intodued by Chales Wheatstone [1], thee have been a lot of inteesting poposals fo thee-dimensional video (3DV). Since Decembe 2001, the moving pictue expets goup (MPEG) has investigated the 3D audio-visual (3DAV) fomat. Focusing on the multiview video, MPEG and joint video team (JVT) have ecently developed an intenational standad on multi-view video coding (MVC) [2]. As a second phase of FTV effots, MPEG has stated a new wok on 3D video coding since Apil 2007. It basically employs multi-view video and its coesponding depth data; thus, it povides functionalities fo fee viewpoint navigation and 3D display [3]. In ode to pefom exploation expeiments, the MPEG 3DV goup has collected 10 test sequences and examined validity of sequences by viewing on 3D display systems in Apil 2008 [4]. The input data of 3DV system consists of multi-view video sequences and thei associated depth data. The multi-view video is captued by multiple camea aays to povide wide angle of view. The depth data can be geneated by both the compute-vision-based depth estimation method o with the time-of-flight (TOF) depth camea. The fist appoach compaes pixel similaities between multi-view images accoding to dispaity values. The second appoach measues the eflection time of ays. In this wok, we mainly focus on the compute-vision-based algoithm. Since the 3DV system involves moe than two images, the amount of data inceases popotional to the numbe of cameas. Hence, the 3DV system needs an efficient video codec. Afte compessing the 3D video data, the decode econstucts the coded data and endes a 3D scene by selecting steeoscopic o multi-view images. If the endeing device needs abitay intemediate viewpoint images, we need to geneate them using the view synthesis method in conjunction with the decoded depth data. The 3D waping technique in image-based endeing (IBR) is an appopiate method fo abitay viewpoint endeing. It geneates a neaby viewpoint image by pojecting a point fom the efeence view to the vitual view using the depth data [6]. If we estimate the depth data using a softwae-based method, the hole poblem occus due to occlusion and disocclusion. Handling the hole poblem is one of the difficult tasks in IBR. We explain the view synthesis method and a solution to the hole poblem. When we geneate the intemediate view image using depth infomation, we obtain a synthesized image containing bounday noises geneated by inaccuate depth values aound object edges. We popose a bounday noise emoval algoithm fo the 3D video system. The est of this pape is oganized as follows. In Section II, we biefly explain the concept of the 3D video system. In Section III, we descibe the view synthesis method using depth data. In Section IV, we define the bounday noise and popose the emoval method. In Section V, impovement of synthesized images using the poposed algoithm is epesented. Finally, this pape is completed with ou conclusions pesented in Section VI. II. 3D VIDEO SYSTEM Figue 1 shows a geneal 3D video system [3] whose inputs ae multi-view video sequences and associated depth data. Afte obtaining the 3D video data, we compess it using a new codec and tansmit the bitsteam to the channel. Then, the eceive econstucts all coded data and endes pope view images on the 3D display monito. If the numbe of decoded views is less than that of necessay views fo the 3D displays, the endeing device needs to geneate moe views using the view intepolation method. 09-0103500357 2009 APSIPA. All ights eseved.

(a) Unectified case Fig. 1. 3D Video System A. Captuing Multi-view Video Data The 3D video is epoduced by the multi-view video captued by multiple cameas. In ode to develop the expeimental envionments, the subgoup fo 3D video coding in MPEG has diected equiements on captuing 3D test mateials [7]. 1) Camea aangement: the multi-view camea should be 1- D paallel having unifom camea inteval 5 ~ 6.5 cm. Othe types of camea aangement ae possible, but we ae focusing on only the 1-D paallel case. 2) Rectification: all videos should be ectified to povide paallel views. With ectified multi-view images, we can find the coesponding pixels at the same height of images. 3) Synchonization: accuate tempoal synchonization of the multiple cameas is inevitably equied. 4) Colo consistency: although all cameas ae of the same type of model, colo consistency between views can be easily boken by inaccuate camea settings. Theefoe, all videos should be coected by a pope colo coection method. 5) Camea calibation: accuate camea paametes by an accuate calibation method ae inevitably equied. Those paametes explain the geometical infomation of the cameas. 6) Content: since the goal of 3D video is the standadization of a new codec, contents should be pope fo developing good coding algoithms. Sufficient textue vaiation, moving objects, and complex depth stuctue should be consideed in the scene. Expets of the subgoup of 3D video coding conside only the 1-D paallel multi-view camea aay. Although all 1-D paallel cameas ae pointing the same diection, the captued images ae not ectified, as shown in Fig. 2(a). If we use those images as input fo steeoscopic displays, we hadly feel depth impession. Theefoe, we should ectify captued images, as shown in Fig. 2(b) [8][9]. (b) Rectified case Fig. 2. Ovelapped images of thee viewpoint images Anothe impotant equiement fo the 3D test data is colo consistency between neighboing views. Even if the use of same camea, it does not guaantee colo consistency with diffeent views. Colo mismatches between adjacent views deteioate the pefomance of seveal methods, such as depth estimation, view synthesis, and video compession. Hence, we need to coect colo inconsistency of the 3D test data. B. Geneation of Depth Map The 3D video system utilizes the depth data fo geneation of abitay view images. Conceptually, the depth value descibes distance fom the camea to objects in the scene. Thee ae mainly two appoaches to obtain the depth data. The hadwae-based depth estimation method, such as TOF depth camea, uses a senso measuing the eflection time of ays. It povides quite accuate depth data, but it has some constaints: the scene should be captued indoo and hai should be avoided since it scattes ays. On the othe hand, the softwae-based method has no constaint except fo image ectification, while accuacy of the estimated depth map is quite unstable aound object boundaies due to occlusion and disocclusion. Handling occlusion and disocclusion is the wellknown issue in depth estimation. Cuently, the 3D video subgoup in MPEG is testing the depth estimation softwae [10]. It uses two efeence images, left and ight views, to geneate intemediate depth images. Basically, it employs a block matching method to calculat the cost function, and applies gaph cut algoithm fo dispaity efinement [11]. Moeove, we can coopeate with othe methods, such as sub-pel pecision and tempoal enhancement [12][13]. The sub-pel pecision method upsamples efeence images as much as 2 o 4 times in the hoizontal diection to estimate fine and accuate depth values. The tempoal enhancement method educes flickeing atifacts in the depth video by detemining the static aea.

Fig. 3. 3D video endeing with view synthesis C. 3D Video Coding The main objective of the MPEG activity of 3D video coding is the standadization of a new codec dealing with the 3D video data. The new standad of 3D video will enable steeo devices to cope with vaious display types and sizes, and diffeent viewing pefeences [14]. In ode to achieve the vision on 3D video, they ae consideing multi-view video sequences and associated depth data as inputs because those data can povide a wide angle of the scene. Howeve, since the amount of the input data inceases popotional to the numbe of cameas, an efficient code fo multi-view video and its depth data is necessay. Cuently, the fomat of 3D video is not yet detemined, but the depth data should be encoded simultaneously with the multi-view video. In addition, an impotant equiement fo the new standad is compatibility with existing standads, such as H.264/AVC and MVC. D. 3D Video Rendeing Afte econstucting the 3D video data at the eceive, the 3D display device should ende pope images on the monito. Fo example, if the type of display is a steeoscopic device, the display device selects two viewpoint images among the econstucted multi-view images. Moeove, if the device is the N-view auto-steeoscopic display, it should select N-view images. If the numbe of the econstucted views is less than N, the decode should geneate moe views using a view intepolation method. Similaly, Fig. 3 explains the functionality of fee-view navigation. If the 3D display device has only thee econstucted views and associated depth data, it should geneate moe intemediate images fo natual 3D display. III. VIEW SYNTHESIS USING DEPTH MAP In this section, we explain the pocedue of the view intepolation method. Among vaious techniques of IBR, we use the 3D waping technique because it utilizes depth infomation fo viewpoint shifting. The depth value descibes the distance between the camea and objects in the scene. Using this geometical infomation, we can map coesponding pixels between diffeent viewpoints. When we change the viewpoint, some backgound egions ae disappeaed o appeaed because of foegound objects. This induces the hole poblem. Afte we explain the 3D waping technique, we explain the hole poblem and popose its solution in detail. The whole pocedue of the view synthesis method is pesented in Figue 4. A. Pixel Coespondence using Depth Map Camea paametes descibe the elationship between camea coodinates and wold coodinates. They consist of one intinsic paamete A and two extinsic paametes: the otation matix R and the tanslation vecto t. If a point X in the eal wold is pojected to the pixel x of an image, we can fomulate it using the pojective matix P=A[R t]. Assuming that all multi-view cameas ae calibated, we can define the pixel coespondences fo all images captued by multiple cameas. When a point M ~ in the wold coodinates is pojected to the camea coodinates, a pixel m ~ in the image can be found by m ~ ~ ~ = PM = A[ R t]m (1) whee a single point ~ M = [ X Y Z T 1] in wold coodinates and a pojected point m ~ = [ x y T 1] epesents fom of the homogeneous pixel position. When we find coesponding pixels between the efeence and taget viewpoint images, we can put a pixel m in the efeence image back to the wold coodinates by M = R 1 A 1 m d(m ) R 1 whee A, R, and t epesent camea paametes of the efeence view, and d(m ) is a depth value fo the position of m in the efeence image. Afte this backwad pojection, we poject M into coodinates of the vitual camea using Eq. (3). As a esult, we can find the elationship between two positions m and m t as follows. t t [ Rt tt ] M t (2) m ~ ~ = A (3) A popula fomat fo the depth infomation is the 8-bit single channel image. Hence, the ange of epesenting the depth of field is fom 0 to 255. It means that the numbe of depth planes is 256 at most. Using this limitation, we can find 256 homogaphy matices by defining pixel coespondences between adjacent views. A homogaphy matix is fomed by a 3x3 matix having eight degee of feedom (DOF); hence we

Figue 4. Whole pocedue of view synthesis: (a) oiginal efeence images, (b) depth maps coesponding two efeence images, (c) geneated depth maps using 3D waping and small hole filling, (d) synthesized images on the vitual viewpoint, (e) hole filled images by efeing to the othe synthesized image, (f) noise emoved images using the poposed method. can detemine it with fou coesponding pixels fo one depth level. Equation (4) explains elationship between two coesponding pixels with a homogaphy matix. H d epesents the homogaphy matix fo a depth value d. m ~ = H ~ (4) t dm B. Depth Map Waping Since the 3D waping technique uses coesponding pixels between neighboing views, we can synthesize a vitual image by mapping all pixels between the taget view and efeence view images. Pactically, we use a diffeent tick to map the pixel coespondences between views using the depth map waping. Once we obtain the depth map of the taget view using the 3D waping method by exploiting the efeence depth map itself, we set pixel coespondence between the taget and efeence views in conjunction with the waped depth map. Duing this pocess, small holes due to viewpoint shifting ae filled using a median filte. This tick educes effects of small holes geneated by the ounding opeation duing viewpoint shifting. Figue 5(c) shows both the waped depth map having small holes and the hole-filled depth map containing only wide holes painted in black. C. Textue Mapping The next step is textue mapping fo the taget viewpoint image. Using 3D waping, we obtained depth maps waped fom two efeence views. Hence, we can find the mapped pixels between two efeence images and one taget image. With this method, we can map colos of the taget view image. If the calculated position is not a full-pel position in the efeence view, we intepolate the pixel intensity using neighboing pixel values. Resultant images ae epesented in Fig. 5(d), whee thee ae no small holes. Black aeas ae the newly exposed egions which have no textue in the efeence image. We fill in those egions by the next pocess.

Fig. 5. Bounday Noise aound Depth Discontinuity D. Hole Filling The hole aea is the newly exposed egion geneated by viewpoint shifting. Specifically, when the viewpoint is shifted fom left to ight in the hoizontal diection, occluded egions in the left image appea in the ight image aound the ight side of the object. If we have only one efeence image and its depth map, we cannot fill holes with textues, except fo inpainting with neighboing textues. Howeve, since we use multi-view images and thei depth maps, we can easily find coesponding textues fom othe efeence images fo the hole egion. Using this method, we can fill in the hole aea, as shown in Fig. 4(e). A. Bounday Noises aound Depth Discontinuity The pefomance of the view synthesis method using 3D waping is highly dependent on accuacy of depth infomation. Hence, false depth values aound depth discontinuities ceate bounday noises. Figue 5 shows an example image of bounday noises which ae distibuted in the backgound egion nea objects. It is because of inaccuate depth values aound object boundaies. If a depth value of the foegound object is simila to that of the backgound, the textue of the foegound object is mapped to the backgound egion in the taget view image. When a use sees these noises though the 3D display, he o she may feel uncomfotable. Hence, we need to eliminate such noises in the synthesized image fo moe natual 3D image display. B. Poposed Bounday Noise Removal Method Figue 6 explains the whole pocedue of the poposed method. Since bounday noises appea in the backgound aea nea object boundaies in the taget view, we can solve the poblem by pocessing backgound textues. In ode to detect the backgound, we check significant depth changes and detemine the backgound bode adjacent to foegound objects. Along the object bode, we set the taget egion. Then, we eplace noise textues in the taget image by textue of the othe efeence image. IV. BOUNDARY NOISE REMOVAL We obtained the hole-filled images efeing, to both the left and ight views. If the depth map is accuate, we do not need any futhe pocess. Howeve, since we use the depth map estimated by a softwae-based method, we should eliminate bounday noises geneated by depth mismatches aound object boundaies. In this section, we define the bounday noise and popose a bounday noise emoval method. (a) Detecting bounday of hole segment Fig. 6. Pocedue of Bounday Noise Removal (b) Extacting backgound bounday Fig. 7. Extacting Backgound Bode

Fig. 8. Bounday Noise Removal C. Detection of Backgound Bounday Although the bounday noise appeas aound object boundaies, we do not conside gadually changing depth values because those depth diffeences do not geneate bounday noises. Hence, we detect significant depth discontinuities using a pedetemined depth theshold value. Pactically, we detemine the contou of hole egions, as illustated in the left image in Fig. 7(a). The top ight image of Fig. 7(a) is the detemined hole bounday. Using the contou, we divide the synthesized image into thee egions: the foegound egion, the hole egion, and the backgound egion. Figue 7(b) explains steps of extacting the backgound contou. Afte defining the bounday contou, we check the depth diffeence of two valid depth values and decide which depth is located at the backgound. In Fig. 7(b), if the left depth value is 101 and the ight depth value is 155, the left one is located at the backgound: hence, we decide that the left pixel is in the backgound contou. In this manne, we can obtain the backgound bounday, as shown in the ight image of Fig 7(b). D. Bounday Noise Removal Afte detemining the backgound contou along the hole egions, we detemine to filte taget egions, as painted in gay in Fig. 7(a). Towad the opposite diection of the hole egion, we detemine the filteing egion along the backgound bode using a pedetemined width. The filteing egion should not ovelap the foegound egion. Using filteing egions, we find coesponding textues in the othe efeence image. Since we ae using the multi-view video, the altenative textue infomation exists in the othe efeence view. Fo filteing egions, we copy the altenative infomation fom the othe efeence image. Replacing textues of the taget image with altenative textues, we obtain the final synthesized image, as shown in Fig. 8(d). V. EXPERIMENTAL RESULTS AND DISCUSSIONS We used two metics to evaluate the pefomance of the poposed method: PSNR value and viewing subjective peception. The main goal of ou eseach is to impove visual quality of the synthesized image, thus we compaed quality of two esultant images: an image without bounday noise, and the image geneated by the conventional method. As an objective measue, we calculated PSNR values fo all synthesized images. In ode to calculate the PSNR value, we set the existing intemediate views to vitual viewpoints. Fo example, we geneated an image at View 1 and View 2 using View 0 and View 3, and then we compaed PSNR values between the oiginal and geneated images. We chose fou sequences. Two sequences, Beakdances and Ballet, ae povided by Micosoft Reseach. The othe two sequences, Pantomime and Lovebid1, ae selected fom the 3D video test sequences povide by MPEG. In the following subsections, we show esults of synthesized images and thei objective quantities. Table 1. Compaison of PSNR fo Multi-view Sequences Pevious method (A) Poposed method (B) PSNR (C) = (B) (A) Test Data View1 View2 View1 View2 View1 View2 Beakdances 30.361 31.692 30.428 32.076 0.067 0.384 Ballet 25.060 25.521 25.101 25.574 0.042 0.053 (Unit: db) A. Expeimental Results on Multi-view Video Sequences Micosoft Reseach has povided multi-view video sequences and thei depth maps. The depth map is geneated View 1 View 2 (a) oiginal images (b) pevious method (c) poposed method Fig. 9. Synthesized Images of Beakdances

View 1 View 2 (a) oiginal images (b) pevious method (c) poposed method Fig. 10. Synthesized Images of Ballet by the segment-based depth estimation method [15]. The file size is 1024x768, and the multiple camea ig is ac with eight cameas. All views ae not ectified but calibated. Among eight views, we selected two efeence views to geneate intemediate view images. The numbe of total synthesized fames is 100. Table 1 shows aveage PSNR values fo each vitual view. As you can notice, the PSNR values ae simila except fo View 2 of the Beakdances sequence, which implies that the poposed algoithm does not degade the quality of the esultant image. Figue 10 shows compaison of visual quality of the Beakdances sequence. You can easily obseve bounday noises aound the thumb in Fig. 9(b). These noises ae geneated by mismatches of depth boundaies in the depth map compaing to that of the colo image. Afte pocessing the poposed method duing the view synthesis method, bounday noises have been emoved clealy, as shown in Fig. 9(c). Similaly with the Beakdances sequence, the Ballet sequence has the same poblem in the depth image. Figue 10 shows synthesized images. When we use the pevious method, some bounday noises of the man s hai appeaed on the wall. Howeve, the poposed method emoved those noises fom the synthesized images, as shown in Fig. 10(c). B. Expeimental Results on 3D Video Test Sequences Since thee is no available depth data, we need to geneate the depth video using the povided depth estimation efeence softwae (DERS) [16]. Cuently, MPEG has developed both the depth estimation softwae and the view synthesis softwae as auxiliay softwaes fo the 3D video system. Seveal algoithms, such as the sub-pel pecision and the tempoal enhancement, have been integated in the softwae. We used the half-pel pecision and tempoal enhancement method when we geneate depth maps. The Pantomime sequence is povided by Nagoya Univesity [17]. It is captued by 80 cameas with a 1-D paallel camea ig. All views ae ectified and colo coected. The file size is 1280x960, and the numbe of fames is 500. All cameas ae ectified and colo coected, and all paametes ae povided. Anothe test sequence is the Lovebid1, which is povided by ETRI and MPEG Koea [18]. It consists of ectified 12 views with camea paametes. The file size is 1024x768. Table 2. Compaison of PSNR fo 3D Video Sequence Test Data Pantomime Lovebid1 Pevious method (A) Poposed method (B) PSNR (C) = (B) (A) View39 View40 View39 View40 View39 View40 35.112 34.568 35.075 35.488-0.036 0.920 View6 View7 View6 View7 View6 View7 31.362 31.646 31.356 31.302-0.006-0.344 (Unit: db) Table 2 shows compaison of PSNR values of esultant images. Both sequences showed elatively high view synthesis esults. In case of View 40 of the Pantomime sequence, the poposed method impoved quality of the image by 0.92 db. Othe delta PSNR values ae vey small while subjective quality has been impoved slightly. Figue 11 shows compaison of synthesized images of the Lovebid1 sequence. This sequence has much bounday noises aound the man s ight am, as shown in Fig. 11(b). (a) (b) (c) Fig. 11. Synthesized Images of Lovebid1 : (a) oiginal images (b) pevious method (c) poposed method

ACKNOWLEDGMENT This eseach was suppoted by the MKE, Koea, unde the ITRC suppot pogam supevised by the IITA (IITA-2009- C1090-0902-0017) REFERENCES (a) pevious method (b) poposed method Fig. 12. Synthesized Images of Pantomime Afte applying the poposed method to the view synthesis algoithm, we obtained enhanced esults, as shown in Fig. 11(c). In Fig. 12, we compae synthesized images of the Pantomime sequence. As you can see, the bounday noises ae geneated aound the touse. Afte applying the poposed algoithm, we obtained the noise emoved image, as shown in Fig. 13. Geneally, poblems in view synthesis ae caused by depth estimation eos aound depth discontinuity. In othe wods, if depth estimation pefoms well on the aea, the bounday noises will not appea in the synthesized image. Howeve, due to the occlusion egion, this poblem is inevitable. Theefoe, the poposed algoithm can be a good eo compensation tool fo the 3D video system peventing bounday noises. VI. CONCLUSION Two key techniques fo view synthesis ae 3D waping and hole filling methods. In this pape, we have descibed the pocedue of the view synthesis method which geneates intemediate view images using depth infomation. We also poposed the bounday noise emoval method eliminating bounday noises fom the synthesized image aound object boundaies. Since the numbe of tansmitted views is limited due to channel capacity, the 3D video system employs the view synthesis method to geneate intemediate images fo 3D endeing. In addition, we poposed the bounday noise emoval method. Afte we detect the taget egion using the povided depth map, we detemine the taget egion whee we eplace textues of the synthesized image with altenative textues efeing to the othe efeence image. Fo expeiments, we used fou multi-view sequences and compaed visual peception of esultant images. In ode to check objective quality, we compaed PSNR values on esultant images. Image quality is peceptually impoved, while bounday noises ae emoved. In some cases, the PSNR value was inceased by 0.92 db when we applied the poposed filteing method. [1] C. Wheatstone, Phenomena of Binocula Vision, Philosophical Tansactions of the Royal Society of London, 371-394, 1838. [2] JVT of ISO/IEC MPEG & ITU-T VCEG, JVT-AD207, WD 4 Refeence softwae fo MVC, Feb. 2009. [3] ISO/IEC JTC1/SC29/WG11, Intoduction to 3D Video, N9784, May 2008. [4] ISO/IEC JTC1/SC29/WG11, Desciption of Exploation Expeiments in 3D Video Coding, N9783, May 2008. [5] ISO/IEC JTC1/SC29/WG11, Results of 3D Video Expet Viewing, N9992, July 2008. [6] L. McMillan, An Image-based Appoach to Thee-dimensional Compute Gaphics, Technical epot, Ph.D. Dissetation, UNC Compute Science TR97-013, 1999. [7] ISO/IEC JTC1/SC29/WG11, Call fo Contibutions on 3D Video Test Mateial, N9595, Jan. 2008. [8] R. Hatley, Theoy and Pactice of Pojective Rectification, Intenational Jounal of Compute Vision, vol. 35, no. 2, 1999. [9] Y.S. Kang, and Y.S. Ho, Geometical Compensation Algoithm of Multi-view Images fo Ac Multi-Camea Aay, LNCS, 5353, pp. 543-552, 2008. [10] ISO/IEC JTC1/SC29/WG11, Refeence Softwaes fo Depth Estimation and View Synthesis, m15377, Apil 2008. [11] Y. Boykov and V. Kolmogoov, An Expeimental Compaison of Min-Cut/Max-Flow Algoithms fo Enegy Minimization in Vision, Poc. Intenational Wokshop Enegy Minimization Methods in Compute Vision and Patten Recognition, pp. 359-374, Sept. 2001. [12] ISO/IEC JTC1/SC29/WG11, Expeimental Results on Depth Estimation and View Synthesis with subpixel-pecision, m15584, July 2008. [13] ISO/IEC JTC1/SC29/WG11, Expeimental Results on Impoved Tempoal Consistency Enhancement, m16063, Jan. 2009. [14] ISO/IEC JTC1/SC29/WG11, Vision on 3D video, N10357, Jan. 2009. [15] L. Zitnick, S.B. Kang, M. Uyttendaele, S. Winde, and R. Szeliski, High-quality Video View Intepolation using a Layeed Repesentation, SIGGRAPH 2004, Aug. 2004. [16] ISO/IEC JTC1/SC29/WG11, Depth Estimation Refeence Softwae (DERS) 4.0, M16605, July 2009. [17] ISO/IEC JTC1/SC29/WG11, 1-D Paallel Test Sequences fo MPEG-FTV, M15378, Apil 2008. [18] ISO/IEC JTC1/SC29/WG11, Contibution fo 3D Video Test Mateial of Outdoo Scene, M15371, Apil 2008.