106 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014

Size: px

Start display at page:

Download "106 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014"

Verity Richardson
5 years ago
Views:

1 106 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 Depth Map Coding for View Synthesis Based on Distortion Analyses Feng Shao, Weisi Lin, Senior Member, IEEE, Gangyi Jiang, Member, IEEE, Mei Yu, and Qionghai Dai, Senior Member, IEEE Abstract In 3-D video, view synthesis with depth-image-based rendering is employed to generate any virtual view between available camera views. Distortions in depth map induce geometry changes in the virtual views, and thus degrade the performance of view synthesis. This paper proposes a depth map coding method to improve the performance of view synthesis based on distortion analyses. The major technical innovation of this paper is to formulate maximum tolerable depth distortion (MTDD) and depth disocclusion mask (DDM), since such depth sensitivity for view synthesis and inter-view redundancy can be well utilized in coding. To be more specific, we define two different encoders (e.g., base encoder and side encoder) for depth maps in left and right views, respectively. For base encoding, different types of coding units are extracted based on the distribution of MTDD and assigned with different quantitative parameters for coding. For side encoding, a warped-skip mode is designed to remove inter-view redundancy based on the distribution of DDM. The experimental results show that the proposed scheme not only achieves high view synthesis performance, but also reduce the computational complexity of encoding. Index Terms Depth disocclusion mask, depth map coding, maximum tolerable depth distortion, three-dimensional (3-D) video, view synthesis. I. INTRODUCTION WITH the advancement of 3-D related technologies [1], e.g., content creation, video coding, network transmission, and stereoscopic display, 3-D video applications have drawn increasing attention during recent years. Especially since 2009, the great success of Avatar has greatly promoted 3-D research and markets [2]. Since a 3-D video system requires an enormous amount of information captured by at least two cameras, efficient storage and transmission is the main challenge. Manuscript received August 26, 2013; revised November 30, 2013; accepted December 29, Date of publication January 20, 2014; date of current versionmarch07,2014.thisworkwassupportedinpartbythenaturalscience Foundation of China under Grant , Grant , Grant U , and Grant ), and in part by the K. C. Wong Magna Fund in Ningbo University. This paper was recommendedbyguesteditorb.yan. F. Shao, G. Jiang, and M. Yu are with the Faculty of Information Science and Engineering, Ningbo University, Ningbo , China ( shaofeng@nbu.edu.cn; jianggangyi@nbu.edu.cn; yumei@nbu.edu.cn). W. Lin is with the Centre for Multimedia and Network Technology, School of Computer Engineering, Nanyang Technological University, Singapore ( wslin@ntu.edu.sg). Q. Dai is with the Broadband Networks and Digital Media Lab, Tsinghua University, Beijing , China ( qhdai@tsinghua.edu.cn). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /JETCAS One promising solution is to encode only limited views and synthesize the virtual view by a depth image-based-rendering (DIBR) technique [3]. Recently, multi-view video plus depth (MVD) scene description format has been standardized by MPEG and ITU-T as an efficient data representation for 3-D systems [4]. For the past several years, dozens of works were concentrated on design of various multi-view video coding (MVC) methods by exploiting temporal and inter-view dependencies, which had been standardized by both the joint multi-view video model (JMVM) [5] and the joint multi-view video coding (JMVC) [6]. For depth maps, in order to be backward compatible with the MVC standard, they are often treated as gray scale image sequences, and can be compressed by JMVM or JMVC reference software. Since the characteristic of depth maps is very different from that of color texture video, some special depth map coding approaches were proposed. Morvan et al. proposed the quadtree decomposition scheme to model regions by concentrating on depth smooth properties [7]. Oh et al. proposed a depth boundary reconstruction filter to compress the depth map and utilized it as an in-loop filter [8]. Hidalgo et al. proposed a segmentation-based coding method by considering the smooth structure and sharp edges of depth maps [9]. Milani et al. employed over-segmentation to split the depth map into multiple segmentation regions, and merged these regions to create an object-based quality scaleable prediction for depth map coding [10]. Nguyen et al. proposed a weighted mode filtering to suppress the coding artifacts and reconstructed the depth map from the reduced spatial resolution [11]. Besides, it is possible to make use of the correlation between color texture and depth, and some joint depth/texture coding schemes were proposed to improve the coding efficiency [12], [13]. However, the correlation between color texture video and depth map is not strong as expected, and more importantly, the effects of color texture and depth distortions on view synthesis should be taken into account in depth map coding. It is important to note that the unique property of depth maps is that they are not directly used for display, but only provide supplementary data (i.e., geometric information of the captured scene) for view synthesis. Therefore, in addition to conventional rate-distortion (R-D) criteria, the R-D property of the synthesized view should also be fully utilized in depth map coding. Kim et al. proposed a new R-D criterion to replace the conventional distortion function [14], in order to quantify the effect of depth coding distortion on the synthesized view. Oh et al. proposed a view synthesis distortion function by involving the co-located color texture information [15], and applied to the op IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 107 timal macroblock mode decision of depth map coding. Liu et al. proposed a linear distortion model to approximate the view synthesis distortion [16], and determined the optimal bitrate ratio between color texture and depth by minimizing the view synthesis distortion. Yuan et al. proposed a new virtual view oriented R-D criterion to replace the mean squared error (MSE) criterion during the R-D optimization process [17]. Tech et al. proposed a new distortion metric to account for the changes of the overall synthesized view distortion [18]. Zhang et al. proposed a regional bit allocation and rate distortion optimization algorithm by applying different view synthesis distortion models [19]. The related works were also proposed in [20], [21]. However, these view synthesis oriented R-D criterions could not completely reflect the view synthesis process, so the performance improvement of view synthesis is limited. From another perspective, it is advantageous to take the properties of depth sensitivity into account to enhance the performance of view synthesis as well as 3-D display. In addition to being used for view synthesis, depth maps can also assist 3-D display (e.g., autostereoscopic display) to enhance depth perception. De Silva et al. proposed just noticeable depth difference (JNDD) model to represent the quantitative threshold below which the human vision system (HVS) cannot perceive the depth change in 3-D display [22]. Nguyen et al. derived a theoretical upper bound of geometric error on the mean absolute error in the synthesized view [23]. Zhao et al. proposed depth no-synthesis-error (D-NOSE) model to represent the allowable depth distortions in view synthesis without introducing any geometric changes [24]. Cheung et al. defined a range of depth values as the don t-care region (DCR) within which any depth values lead to insignificant synthesized view distortion [25]. However, the depth sensitivity for 3-D display (e.g., JNDD) only provides depth distance of the scene (cannot be directly reused for view synthesis), and human visual perception redundancies were not considered in establishing the depth sensitivity for view synthesis (e.g., D-NOSE and DCR models). The current hybrid coding framework (with motion or disparity compensated prediction) does not fully exploit the redundancies of 3-D data, and there is room for further improvement. For example, the locations of corresponding samples in different views can be determined by view warping; even if temporal and inter-view correlation has been adopted to determine the skipped blocks of depth maps [26], depth-based view warping can be an effective means to remove the inter-view redundancy. Lee et al. proposed to skip some blocks of the depth image at the early stage based on temporal and inter-view correlation between the previously encoded color texture images [27]. Daribo et al. used the 3D-warping technique to produce the right view at the decoder, but the quality of right view will be largely decreased because of the produced disoccluded region [28]. Zamarin et al. used the 3D-warping approach to replace disparity compensated prediction in the encoding architecture to improve coding performance [29]. Jager et al. warped prediction only for key pictures and replaced intra-coded pictures of the enhancement views [30]. In order to handle the disocclusion in warping, Gautier et al. proposed a depth-based image completion algorithm to fill the disoccluded region [31]. However, directly applying 3-D-warping to depth maps usually cannot obtain the accurate warped positions due to low depth inconsistence across viewpoints. In our previous work [32], [33], we focused on joint encoding in MVD data, in which different R-D models were applied to texture video and depth map coding by characterizing the relationship between coding distortion and view synthesis distortion, and optimal bitrates were allocated to texture and depth. However, these methods still have the following limitations: 1) the distortion analyses for depth map is insufficient because the characteristics of depth maps are completely different with those of texture video; 2) depth map coding should be specially devised in order to ensure optimal 3-D video coding performance. In this paper, we propose a depth map coding method to improve view synthesis performance based on distortion analyses. Weconcentrateondepthmapcodinginthisworktoimprove the performance of view synthesis, since depth maps are only used for view synthesis as nonvisual data and the influence upon 3-D perception will not be direct. The main contributions of this work are as follows. 1) A comprehensive analysis of the important factors affecting the quality of the synthesized view has been presented. These factors are fully taken into account in depth map coding. 2) By considering depth-sensitivity for view synthesis, we derive the maximum tolerable depth distortion (MTDD) model and design a base encoder for depth map coding according to the distribution of MTDD. 3) To eliminate inter-view redundancy of depth maps as much as possible, we derive the depth disocclusion mask (DDM) model and design a side encoder for depth map coding. The rest of the paper is organized as follows. Problems in 3-D video coding systems are discussed in Section II. Sections III and IV present the definitions of MTDD and DDM, respectively. Then, the proposed method is introduced in Section V, and experimental results are analyzed in Section VI. Finally, conclusions are drawn in Section VII. II. PROBLEM DESCRIPTION IN 3-D VIDEO SYSTEM Fig. 1 illustrates a typical 3-D video coding system framework, in which color texture video and depth maps are independently or jointly encoded using different MVC encoders. At the client side, the arbitrary virtual views are synthesized from the decoded color texture video and depth maps by DIBR. In this paper, we do not intend to study rate allocation between texture video and depth maps (see our previous work [32], [33]). Since virtual views are synthesized from the adjacent two views (i.e., left view and right view) [34], we only consider two-view MVD format in this work, and it can be easily extended to multiple views. For example, for three-view MVD format (i.e., cases of I-view, P-view, and B-view), the P-view and B-view can be synthesized from the I-view, in which indexes for I, P, and B pictures can represent hierarchical levels of the prediction structure. In the two-view MVD format, suppose that one view is regarded as left view and the other view is regarded as right view, the typical prediction structure for two-view based 3-D video coding is shown in Fig. 2. Pictures in each view form a hierarchical B picture prediction structure [35]. The left view (as I-view) is encoded with temporal motion compensated prediction (MCP), and the right view (as P-view) is encoded with both the temporal MCP and the disparity compensated predic-

108 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 Fig. 1. Framework of 3-D video system. Fig. 2. Typical prediction structure for MVC.

In DIBR-based view synthesis, 3D-warping is usually used to synthesize the virtual view, which can be separated into two steps: projection of the reference image into the 3-D world coordinates, and

where is the depth value calculated from the pixel in the depth map, and are the intrinsic and rotation matrices of the reference camera, and is the translation vector of the reference camera.

3 108 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 Fig. 1. Framework of 3-D video system. Fig. 2. Typical prediction structure for MVC. tion (DCP) between views. Since the depth maps can be treated as the monochromatic video, they are also encoded by the same prediction structure in Fig. 2. In DIBR-based view synthesis, 3D-warping is usually used to synthesize the virtual view, which can be separated into two steps: projection of the reference image into the 3-D world coordinates, and followed by projection of the 3-D scene into the target image plane. A pixel in the reference image is projected into the 3-D world coordinate [36] Fig. 3. Pixel projection between two views using 3-D warping. where is the depth value calculated from the pixel in the depth map, and are the intrinsic and rotation matrices of the reference camera, and is the translation vector of the reference camera. In the next step, the world coordinates are projected into the target camera plane via (1) Fig. 4. Effect of compression artifact in homogeneous depth region: (a) uncompressed depth map; (b) synthesized image with uncompressed depth map; (c) compressed depth map; (d) synthesized image with compressed depth map. (2) where is the homogeneous coordinates of the target image plane, and,,and are the intrinsic matrix, rotation matrix and translation vector of the target camera, respectively. The corresponding pixel location in the synthesized image of the target camera is.the matrices,,and,aswellas, are known in advance for specific cameras. As illustrated in Fig. 3, the warped pixel position using the wrong depth (the black point in the figure) will deviate from its actual position using the original depth (the gray point in the figure), so the error (compression artifact usually) in the depth map will lead to geometric error in the synthesized virtual view. Moreover, the position deviation induced quality degradation of the synthesized view may differ from regions to regions. That is, the synthesized virtual view with the same geometric error results in different visual quality in different regions. We illustrate the effect of compression artifacts in homogeneous and discontinuous depth regions respectively in Figs. 4 and 5. From the figures, we can see that, even though the compression artifact in the homogeneous depth region is evident, the resultant synthesized Fig. 5. Effect of compression artifact in discontinuous depth region: (a) uncompressed depth map; (b) synthesized image with uncompressed depth map; (c) compressed depth map; (d) synthesized image with compressed depth map. image does not show significant quality change with the original one, while the difference between the corresponding synthesized images in the discontinuous depth region is significant. Also, depth distortion will affect the revealing of disoccluded region (locations of the black pixels in the figures are changed). Therefore, it is necessary to quantify the effect of depth distortion (e.g., depth sensitivity) and utilize this property in depth map coding. However, current depth estimation method is based on stereo matching in essence. As shown in Fig. 6, the estimated depth maps are inconsistent across viewpoints since inter-view correlation is not fully exploited in depth estimation [37]. As a consequence, depth maps can be locally erroneous with spot noise in some regions. This reduces both the coding performance for depth maps and the quality of synthesized view. Currently, some depth map preprocessing algorithms were proposed to enhance

4 SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 109 where denotes 3D-warping operation that warps A (color texture or depth) to the virtual view using the depth information B, indicates the synthesized color texture image, is the warped pixel position for the synthesized view using original depth map, and is the horizontal geometric error inducing by the distorted depth map. Similar formulation as (3) can be obtained for the right view. It is already proven that a linear relationship is satisfied between the geometric error and the distortion of depth map [14], i.e., (4) where is a coefficient determined by the following equation: (5) Fig. 6. Inter-view inconsistence analysis of depth maps. the inter-view consistence [37], [38]. In this work, considering that inter-view correlation of color texture video is usually high, it can assist to measure the inter-view correlation of depth maps using 3-D-warping. In other words, we can skip these coherent regions in coding (i.e., with the proposed warped-skip mode design), instead of aiming to improve prediction accuracy by view synthesis prediction as done in [26]. More importantly, low depth consistence across viewpoint can be relieved by applying the proposed warped-skip mode design. From the above analysis, on one hand, inter-view correlation can be well exploited by the 3D-warping process. If this correlation is appropriately utilized, the coded (transmitted) information can be largely reduced. On the other hand, the coding distortion of depth maps leads to geometric error in the synthesized view. Besides, during this process, some pixel positions in the virtual view are not mapped from the reference view, because some areas exist at the reference view but are invisible at the virtual view, such as occluded/disoccluded regions. Therefore, in order to effectively describe the 3-D video (aiming at lower transmitted bitrate and higher synthesized quality), the factors above should be taken into account in depth map coding. In this work, we try to derive the characteristic description models for depth maps, and apply these models to depth map coding. III. MAXIMUM TOLERABLE DEPTH DISTORTION It is known that depth map distortion will lead to geometric error in the synthesized view and will affect the quality of synthesized view. In this work, in order to investigate how and the extent to which depth map distortion affects view synthesis, the distortion of the virtual views [measured using the squared differences (SD)] synthesized from the original left color texture image using the original depth map and the distorted depth map is defined as (3) where denotes the focal length of the camera in the horizontal direction, expresses the baseline distance between the current and the virtual views, and and are the values of the nearest and farthest depth of the scene, respectively. Equation (4) reveals the fact that the distortion of the depth map will cause the warped position deviation. Since parameters, and are fixed for a specific imaging system, is known once the virtual view is established. The ground-truth geometric error is the one that minimizes the distortion of the synthesized view However, it would be impractical to directly calculate the geometric error from the above equations, because: 1) it needs to calculate the warping for each pixel in different position deviations, and this requires enormous computation; 2) the groundtruth geometric error does not necessarily exist, because the minimum distortion may be the one that the geometric error equals to zero; 3) disocclusion in the synthesized virtual view is hard to measure because there is no point of reference for comparison. Considering that virtual view is synthesized from the left view, it is assumed that, and a certain amount of distortion in the synthesized view can be tolerated by considering human visual perception; e.g., background can tolerate more distortion. Therefore, the distortion of the synthesized view in (3) is redefined as where and are the lower and upper bounds of the geometric error, for the resultant within these ranges to be lower than a given threshold.inthe experiments, the maximum search range for establishing the lower and upper bounds is set to the maximum disparity range between the left and right views. Considering that the synthesized virtual view is eventually perceived by the human, the factors that affect human visual perception, e.g., contrast sensitivity function, luminance adaptation, and contrast masking are taken into account in determining the threshold.itiswell (6) (7)

5 110 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 known that visual masking effect [e.g., just-noticeable difference (JND)] has played an important role in pro-hvs signal processing. In this work, the JND threshold in [39] is selected, i.e.,. In this work, only monocular visual characteristic is considered in the JND threshold because we only apply the model for left view in depth map coding that follows. For the actual binocular visual response process, binocular JND (BJND) [40] will be a good choice for efficient coding framework design. Finally, by finding the lower bound and upper bound of the geometric error, the MTDD is defined as (8) In the experiments, according to the definition in (5), is calculated with the largest baseline (i.e., distance between left and right-most cameras), in order to cover all the range of virtual views. Thus, the MTDD for each view can be obtained by implementing the above operation independently. Besides, the definition of MTDD is per pixel, and this provides useful information about how much distortion we can tolerate in depth map coding. IV. DEPTH DISOCCLUSION MASK The above MTDD depicts the depth sensitivity property (i.e., the internal characteristic of depth maps) for view synthesis. For depth map coding, the external characteristic (e.g., inter-view redundancy) should also be exploited. However, as analyzed in the previous section, inter-view correlation of depth maps is usually weak due to the limitations of available depth estimation methods. In this work, we measure the inter-view correlation of depth maps from the co-located color texture video. The synthesized right color texture image can be obtained from the left color texture image using the original depth map by However, one prominent issue in the warping process is that hole and disocclusion (defined as disoccluded region in this work) will inevitably occur due to the viewing angle difference. As shown in Fig. 7, the background behind the foreground in the original view will be disoccluded in the virtual view. That is, the original view does not provide any information about the background, so disocclusion gaps in the synthesized virtual view occur. In this work, in order to determine which pixels will be disoccluded in the synthesized right view, we compare the difference of the original and the synthesized right views. If the pixel is successfully warped, the probability value of the pixel is set to 1; otherwise, it is set to 0, as described as follows: (9) (10) where,and is a threshold controlling the difference strength. The disoccluded pixels are Fig. 7. Disocclusion problem description. marked as black (value equals to 255) in the synthesized view so that these pixels can be correctly differentiated by comparing the difference between the original and synthesized views (the wrongly warped pixels can also be differentiated). In the experiments, is set to 10. Since the above definition of probability is per pixel, to be compatible with the block-based coding system, the probability of each coding unit (CU) is calculated by (11) where denotes the size of the CU, e.g., Besides, considering that edges in the depth map will have great impact on view synthesis [41], we firstextracttheedgecusoftheleft view by performing Canny edge detection on the MTDD map. Then, these edge CUs are warped to the right view and marked as, and the remaining CUs in the right view are marked as. Finally, the DDM of the right view is defined as (12) In the experiments, is set to 0.5, and this means that if more than a half of the pixels in the CU are disoccluded, the CU belongs to the disoccluded region; otherwise, it belongs to the warped region. In the proposed scheme, based on the derived DDM, only the disoccluded region needs to be encoded, and the warped region can be skipped in coding and then synthesized at the decoder. Since the DDM of the right view is dependent on the left view, similar DDM of the left view can be obtained by inversely warping the right view. V. PROPOSED DEPTH MAP CODING METHOD To effectively reduce the transmitted bitstreams of the compressed depth maps while maintaining high view synthesis quality, a depth map coding method is proposed by considering the above characteristics models (i.e., MTDD and DDM). In the proposed method, we define two different encoders (i.e., a base encoder and a side encoder) for left and right views,

6 SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 111 respectively. Specifically, for the base encoder, different types of CUs are extracted according to the distribution of the MTDD map and assigned different quantization parameters (QPs). For the side encoder, a warped-skip mode is designed to remove the inter-view redundancy according to the distribution of DDM map. At the decoder, view reconstruction is performed to reconstruct the right view. Therefore, how to effectively encode the left view and right view is the challenge for the success of the method. A. Base Encoder for Left View Coding As we have discussed, depth sensitivity for view synthesis is space-variant. Therefore, different CUs can be represented with different flexibility. To effectively reduce transmitted bitstream of the depth map, we propose QP selection for coding depth maps according to the distribution of MTDD. In order to facilitate the following process, the MTDD values are mapped to [0, 255]. Firstly, since edges of the depth map have great impact on view synthesis [41], the edge CUs are extracted by performing Canny edge detection on the MTDD map and marked as. For the remaining nonedge regions (marked as ), the mean and variance of the nonedge CUs are first calculated, and different types of CUs (defined as A1, A2, A3, and A4) in the nonedge regions are divided by comparing the mean and variance with predefined thresholds. The specific steps are as follows: When and,thecuisdefined as the type A1, where and are the thresholds of the mean and variance, respectively. When, the CU can tolerate larger distortion; when, the CU is relatively smooth. The CU is set to types A2, A3, and A4 for and, and,and,and, respectively. Then, different QPs are assigned to the CUs by (13) where, is the base QP for coding, and controls QP (to be analyzed in the next subsection). For the type, is directly used. Fig. 8 describes the flowchart of the proposed coding scheme for the left view. According to the statistical analysis of and on Leaving Laptop and Lovebird1 test sequences, we found the optimum thresholds and for all the possible distributions of the types. As a consequence, and are set to 6.5 and 746 in our experiments. Fig. 9 shows the example of the block types of Leaving Laptop and Lovebird1. It is obvious that type A1 (black regions in the figure) is usually for smooth regions of color texture image and depth map, and thus, depth distortion in these regions does not affect the synthesized virtual view significantly. The type A2 (dark gray regions in the figure) is mainly concentrated in the regions with relatively small depth variations and relatively smooth texture. The type A3 (light gray regions in the figure) and A4 (white regions in the figure) are mainly distributed in the regions with large depth variations (e.g., depth discontinuities) and complex texture. The proposed base encoder design in this work improves the view synthesis performance, as demonstrated in Section VI-C. Fig. 8. Flowchart of the proposed left view coding. Fig. 9. Examples of the block types of the left depth map. (a) Leaving Laptop. (b) Lovebird1. B. Side Encoder for Right View Coding For the right view, only the disoccluded region is encoded so that the transmitted bitrate can be largely saved; the disoccluded information can be reconstructed from the left view at the decoder. This idea has been demonstrated in the layered depth images, and it is possible to detect occlusion in rendering [42]. Of course, the quality of the reconstructed right view will have some degree of degradation because the depth map information used in 3-D-warping is not the original one. As analyzed in the previous section, depth map distortion will lead to geometric distortion in the synthesized view. In the implementation process, the warped region does not need to be encoded (i.e., skipped in coding). Similar to SKIP mode design in the traditional video coding standard, we define a SKIP mode here (termed as warped-skip mode) for the warped region to remove the inter-view redundancy. Simultaneously, the mode information is transmitted to the decoder (few bits are needed to transmit the mode information). Fig. 10 describes the flowchart of the proposed right view coding. If is chosen to be 1, no further processing is carried out and the mode of the CU is marked as warped-skip; when is not equal to 1, we process the CU as normal (encoded with the R-D optimization process). Fig. 11 shows examples of the DDM distribution (using Leaving Laptop and Lovebird1: black and white blocks represent the warped and

112 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 Fig. 10. Flowchart of the proposed right view coding. Laptop, Lovebird1, and Pantomime.

7 112 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 Fig. 10. Flowchart of the proposed right view coding. Laptop, Lovebird1, and Pantomime. For Alt Moabit, Book Arrival, and Leaving Laptop containing 16 views with 6.5 cm spacing between adjacent views, the tenth and eighth views, denoted as view10 and view8, are adopted as left and right views, and the virtual view, denoted as view9, is synthesized. For Lovebird1 containing 12 views with 3.5 cm spacing between adjacent views, the sixth and eighth views, denoted as view6 and view8, are adopted as left and right views, and the virtual view, denoted as view7, is synthesized. For Dog and Pantomime containing 80 views with 5 cm spacing between adjacent views, the forty and forty-second views, denoted as view40 and view42, are adopted as left and right views, and the virtual view, denoted as view 41, is synthesized. The depth maps of the test sequences are generated by Depth Estimation Reference Software (DERS) [44] ver For all experiments, we used the JMVC software [6] ver. 8.3 to encode color texture video and depth maps, and used the view synthesis reference software (VSRS) [34] version 3.5 to synthesize the virtual view. The detailed encoding settings used for color texture video and depth maps are as follows: the basic QP values are 22, 27, 32, and 37, the temporal GOP size is set to 8, and the total number of the encoded frames in each view is 50. Fig. 11. Examples of the DDM map. (a) Leaving Laptop. (b)lovebird1. disoccluded regions, respectively). It is obvious that the information to be encoded is comparatively small (the number of CUs with the warped-skip mode is far more than other CUs). Besides, computational complexity of the encoder is largely reduced (R-D optimization consumes most of the computation resources). The synthesized right depth map is obtained from the decoded left depth map using the decoded depth map by (14) Then, based on the mode information of the decoded CU, the reconstructed right depth map is obtained by (15) where denotes the decoded right depth map. However, as analyzed in Section II, depth distortion affects the identification of disoccluded regions, and thus, some small holes will still appear in the reconstructed depth map. For these small holes, we use total variation (TV) model to inpaint their pixel values [43]. In addition, a three-tap low-pass filter is applied to the boundaries of DDM in both horizontal and vertical directions to eliminate the ghost contour. VI. EXPERIMENTAL RESULTS AND ANALYSES A. Experimental Setup In the experiments, we select the MPEG 3-D video test sequences: Alt Moabit, Book Arrival, Dog, Leaving B. Parameters Determination In the proposed scheme, we determine the optimum values of by comparing Bjonteggrad delta PSNR (BD-PSNR) [45] values with different settings. In the experiment, we apply the base encoder for the left and right views simultaneously. Considering that the type usually has great impact on view synthesis, for simplicity, we set.then, we determine the remaining parameters,, and by comparison method. To determine the optimum value of,weset,,and, respectively. Similarly, we set, and to determine the optimum value of, and set, and to determine the optimum value of,using the synthesized virtual view with, and as benchmark. Finally, the BD-PSNR of the synthesized virtual view with different,,and is calculated, and the optimum value of is selected with the maximum BD-PSNR value. In the experiments, we pre-encode two GOPs of Leaving Laptop and Lovebird1 test sequences, and use Gaussian function to fit the curves. The optimum value of is located in the wave crest of the fitted curves, and the determination results are, and.inthefollowingexperiments, thesamevaluesof,,and are used for all the test sequences. C. View-Synthesis R-D Performance Comparison In order to objectively evaluate the performance of view synthesis, with the virtual view synthesized from the original color texture video using the original depth maps as reference, the average peak signal-to-noise ratio (PSNR) is measured. The color texture video is encoded using the same QP with the depth

8 SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 113 Fig. 12. View synthesis R-D performances of the three schemes. (a) Alt Moabit, (b) Book arrival. (c) Dog. (d) Leaving laptop. (e) Lovebird1. (f) Pantomime. maps by the JMVC encoder. We compare the view-synthesis R-D performance of three schemes (the original JMVC scheme [6], Lee s scheme [27] and the proposed scheme) in Fig. 12, denoted by JMVC, Lee s [27], and Proposed, respectively. The vertical axis in each sub-figure shows the average PSNR of the synthesized virtual view, while the horizontal axis corresponds to the total bitrate of depth maps. For the original JMVC scheme, the original JMVC encoder is directly applied to coding depth maps. For Lee s scheme, only the right view is encoded by predicting the skipped blocks based on the temporal and inter-view correlation, and the left view is encoded as normal (using original JMVC encoder). The results show that the performance of Lee s scheme is lower than the proposed scheme, although it is superior to the original JMVC scheme. The reason is that the depth sensitivity for view synthesis is not considered in Lee s scheme, and thus, the overall performance enhancement is impressive for the proposed scheme. Besides, the performance of Lee s scheme is highly dependent on the number of predicted views (in fact, it is more suitable to a three-view case, as demonstrated in [27]). In order to demonstrate the impact of each component in the proposed scheme, we show the detailed bitrate and synthesized quality, and the corresponding BD-PSNR in Table I, where we separately provide the performance for Lee s scheme, the proposed scheme with base encoder only (Scheme-1), the proposed scheme with side encoder only (Scheme-2), and the proposed scheme (combining Scheme-1 and Scheme-2), with the original JMVC scheme as benchmark (due to the limitation of space in the table, we use Lee, S-1, S-2, and Pro to represent the four schemes respectively in the table and the following tables). Overall, the performance of the proposed scheme is better than the two constituent schemes (i.e., Scheme-1 and Scheme-2) that only utilize depth-sensitivity or inter-view redundancy of depth maps. Scheme-2 outperforms Scheme-1 for all test sequences, because the estimated depth maps have relatively low inter-view inconsistence, leading to lower inter-view prediction accuracy, while Scheme-2 perform better in this regard. The performance of Lee s scheme is superior to Scheme-2 for most of the test sequences except for Dog and Lovebird1, because more blocks are encoded with R-D optimization in the two test sequences due to the significant inter-view inconsistence. To further analyze the performance of view synthesis, we use peak signal-to-perceptible-noise ratio (PSPNR) [46] to evaluate the perceptual quality of the virtual view. With the original JMVC scheme as benchmark again, we compare Bjonteggrad delta PSPNR (BD-PSPNR) and Bjonteggrad delta bitrate (BD-RD) of Lee s [27], Scheme-1, Scheme-2 and Proposed, as tabulated in Tables II and III. It is obvious that the proposed scheme provides better performance of view synthesis than other schemes. The overall performance of view synthesis is improved under the same bitrate. D. Subjective View-Synthesis Performance Comparison Even though the purpose of the proposed scheme is to save bits for depth map coding, we do not further allocate the saved bits for texture coding, because we only focus on depth map coding in this work. In order to show the impact of depth map coding on view synthesis, we compare the synthesized virtual views of Leaving Laptop and Lovebird1 with or without the proposed scheme. Fig. 13, Fig. 14(a) and (b) show the decoded right depth maps with the original JMVC scheme and the reconstructed right depth maps with the proposed scheme under the

9 114 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 TABLE I CODING BITRATE AND SYNTHESIZED QUALITY COMPARISON OF THE FOUR SCHEMES TABLE II BD-PSPNR PERFORMANCE COMPARISON OF THE SCHEMES TABLE III BD-RD PERFORMANCE COMPARISON OF THE SCHEMES same basic QPs, respectively. The differences between the two depth maps are obvious due to low depth consistence between left and right views. The corresponding synthesized virtual views using the above depth maps are shown in Fig. 13 and Fig. 14(c) and (d), respectively. It is obvious that the two synthesized virtual views are very similar, and only small differences are found in local regions. Since we use the same texture images and different depth maps (encoded with the original JMVC and the proposed scheme) to synthesize the virtual view, better quality of the synthesized view will be obtained if the saved bits are allocated to texture (this conclusion is noticeable). These further explain that certain distortions in depth maps are tolerable, and depth distortion does not significantly degrade the quality of synthesized virtual view. In addition, the subjective test is implemented to measure the virtual view video. Supposed the virtual view video synthesized from the original texture video and the original depth maps as reference, the virtual view synthesized from the distorted texture video using JMVC coding and the distorted depth maps using three depth coding methods, i.e., JMVC, Lee s [27], and Proposed, are compared. The subjective tests were conducted in the laboratory designed for subjective quality tests according to the recommendation BT Two different videos are displayed simultaneously in a screen, and subject is asked to rank the overall quality of the videos on fiveoptions{5,4,3,2,1}, representing Excellent, Good, Fair, Poor, and Unsatisfactory, respectively. Eight nonexpert adult viewers with ages range from 20 to 25 were participated in the subjective evaluation. Fig. 15 shows the Mean Option Score (MOS) for the subjective evaluation, where the higher MOS means the corresponding video have relative higher visual quality. We can observe that the reference video usually has relative higher visual quality, except Alt Moabit and Lovebird1, because the synthesized view video using the original depth map has serious geometric distortion. The subjective visual qualities for the three depth coding methods almost have no difference. E. Computational Complexity Analysis The complexity of the proposed scheme mainly depends on the predetermination of the MTDD and DDM, the encoding process with R-D optimization, and the view reconstruction process. Since predetermining the MTDD and DDM is

SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 115 Fig. 13. View synthesis results of Leaving Laptop: (a) decoded depth map with original JMVC scheme (41.

01 db). Fig. 14. View synthesis results of Lovebird1: (a) decoded depth map with original JMVC scheme (47.41 db); (b) reconstructed depth map with the proposed scheme (38.

10 SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 115 Fig. 13. View synthesis results of Leaving Laptop: (a) decoded depth map with original JMVC scheme (41.68 db); (b) reconstructed depth map with the proposed scheme (32.58 db); (c) synthesized virtual view with original JMVC scheme (39.20 db); (d) synthesized virtual view with the proposed scheme (39.01 db). Fig. 14. View synthesis results of Lovebird1: (a) decoded depth map with original JMVC scheme (47.41 db); (b) reconstructed depth map with the proposed scheme (38.68 db); (c) synthesized virtual view with original JMVC scheme (36.92 db); (d) synthesized virtual view with the proposed scheme (37.51 db). performed in an offline mode, the accurate measure of the execution time is difficult. In fact, the predetermination time is lower than encoding time. Therefore, we only conduct a qualitative analysis of the encoding computational complexity. Through statistical analysis, we find that warped-skip mode occupies more than 75% of all encoding modes in right view coding (for Lovebird1 and Dog, the warped-skip mode proportion is more than 95%), while the encoding time can be omitted for the warped-skip mode. Besides, the processing time for the view synthesis is significantly lower than R-D optimization encoding time. Therefore, the overall encoding computational complexity of the proposed scheme is lower than the original JMVC scheme.

11 116 IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, VOL. 4, NO. 1, MARCH 2014 Fig. 15. MOS of the subjective evaluation. VII. CONCLUSION This paper has presented a new depth map coding method to improve view synthesis performance based on distortion analysis. The prominent advantage of the proposed method is that we define MTDD and depth DDM to fully exploit the internal characteristic (e.g., depth sensitivity property) and the external characteristic (e.g., inter-view redundancy) of depth maps for view synthesis. Specifically, different types of CUs are extracted based on the distribution of MTDD and assigned different QPs for base encoding, and a warped-skip mode is designed to remove inter-view redundancy based on the distribution of DDM for side encoding. Experimental results have confirmed that the proposed method significantly improves the performance of view synthesis. In the current implementation, color texture video is directly encoded by the original JMVC encoder. In the further work, we plan to tackle the issues along the directions below: 1) since color texture video is directly encoded by the original JMVC encoder in the current implementation, we should research how to expand the proposed depth map coding to color texture video coding (distortion characteristics in color texture video and depth maps are different); 2) since the current 3-D video coding standard does not consider special depth map encoder, we can embed the MTDD and DDM models into the view synthesis optimization encoding option in the 3-D-HEVC anchor software; 3) more accurate region classification for depth maps should be considered; 4) more effective assessment metric is needed in designing the view synthesis distortion criterion. REFERENCES [1] K. Muller, P. Merkle, and T. Wiegand, 3-D video representation using depth maps, Proc. IEEE, vol.99,no.4,pp ,Apr [2] A.Gotchev,G.B.Akar,T.Capin,D.Strohmeier,andA.Boev, Threedimensional media for mobile devices, Proc. IEEE, vol. 99, no. 4, pp , Apr [3] C. Fehn, Depth-image-based rendering (DIBR), compression and transmission for a new approach on 3-D-TV, in Proc. SPIE, San Jose, CA, Jan. 2004, vol. 5291, pp [4] WD 3 Reference Software for MVC, JTC1/SC29/WG11, ISO/IEC, Busan, Korea, Oct [5] Joint Multiview Video Model (JMVM) 7.0, JTC1/SC29/WG11, ISO/ IEC, Antalya, Turkey, Jan [6] Draft Reference Software for MVC,, ISO/IEC MPEG & ITU-T VCEG, London, U.K., Jul [7] Y. Morvan, D. Farin, and P. H. N. de With, Depth-image compression based on an R-D optimized quadtree decomposition for the transmission of multiview image, in Proc. IEEE Int. Conf. Image Process.,San Antonio, TX, Sep. 2007, pp [8] K. J. Oh, A. Vetro, and Y. S. Ho, Depth coding using a boundary reconstruction filter for 3-D video system, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 3, pp , Mar [9] J.R.Hidalgo,J.R.Morros,P.Aflaki, F. Calderero, and F. Marqués, Multiview depth coding based on combined color depth segmentation, J. Vis. Commun. Image Represent., vol. 23, no. 1, pp , Jan [10] S. Milani and G. Calvagno, A Depth image coder based on progressive silhouettes, IEEE Signal Process. Lett., vol. 17, no. 8, pp , Aug [11] V. A. Nguyen, D. Min, and M. N. Do, Efficient techniques for depth video compression using weighted mode filtering, IEEE Trans. Circuits Syst. Video Technol., vol. 23, no. 2, pp , Feb [12] I. Daribo, C. Tillier, and B. Pesquet-Popescu, Motion vector sharing and bit-rate allocation for 3-D video-plus-depth coding, EURASIP J. Adv. Signal Process., vol. 2009, Jan [13] J. Zhang, M. M. Hannuksela, and H. Q. Li, Joint multiview video plus depth coding, in Proc. IEEE Int. Conf. Image Process., Sep. 2010, pp [14] W.S.Kim,A.Ortega,P.L.Lai,D.Tian,andC.Gomila, Depthmap distortion analysis for view rendering and depth coding, in Proc. IEEE Int. Conf. Image Process., Cairo, Egypt, Nov. 2009, pp [15] B. T. Oh, J. Lee, and D. S. Park, Depth map coding based on synthesized view distortion function, IEEE J. Sel. Topic Signal Process., vol. 5, no. 7, pp , Nov [16] Y. W. Liu, Q. M. Huang, S. W. Ma, D. B. Zhao, and W. Gao, Joint video/depth rate allocation for 3-D video coding based on view synthesis distortion model, Signal Processing: Image Commun., vol. 24, no. 8, pp , Sep [17] H. Yuan, J. Liu, Z. Li, and W. Liu, Virtual view oriented distortion criterion for depth map coding, IET Electron. Lett., vol. 48, no. 1, pp , Jan [18] G.Tech,H.Schwarz,K.Muller,andT.Wiegand, 3-Dvideocoding using the synthesized view distortion change, presented at the Picture Coding Symp., Krakow, Poland, May [19] Y. Zhang, S. Kwong, L. Xu, S. D. Hu, G. Y. Jiang, and C.-C. Jay Kuo, Regional bit allocation and rate distortion optimization for multiview depth video coding with view synthesis distortion model, IEEE Trans. Image Process., vol. 22, no. 9, pp , Sep [20] J. M. Xiao, T. Tillo, and H. Yuan, Real-time macroblock level bits allocation for depth maps in 3-D video coding, in Advances in Multimedia Information Processing-PCM New York: Springer, 2012, vol. 7674, Lecture Notes Computer Science, pp [21] F. Shao, M. Yu, G. Y. Jiang, F. C. Li, and Z. J. Peng, Depth map compression and depth-aided view rendering for 3-D video system, IET Signal Process., vol. 6, no. 3, pp , May [22] D. V. S. X. De Silva, W. A. C. Fernando, S. T. Worrall, S. L. P. Yasakethu, and A. M. Kondoz, Just noticeable difference in depth model for stereoscopic 3-D displays, in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2010, pp [23] H. T. Nguyen and M. N. Do, Error analysis for image-based rendering with depth information, IEEE Trans. Image Process., vol.18,no.4, pp , Apr [24] Y. Zhao, C. Zhu, Z. Z. Chen, and L. Yu, Depth no-synthesis-error model for view synthesis in 3-D video, IEEE Trans. Image Process., vol. 20, no. 8, pp , Aug [25] G. Cheung, A. Kubota, and A. Ortega, Sparse representation of depth maps for efficient transform coding, presented at the IEEE Picture Coding Symp., Nagoya, Japan, Dec [26] S. Yea and A. Vetro, View synthesis prediction for multiview video coding, Signal Process.: Image Commun., vol. 24, no. 1 2, pp , Jan [27] J. Y. Lee, H. C. Wey, and D. S. Park, A fast and efficient multi-view depth image coding method based on temporal and inter-view correlations of texture images, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 12, pp , Dec [28] I. Daribo, C. Tillier, and B. Pesquet-Popescu, Distance dependent depth filtering in 3-D warping for 3DTV, in Proc. IEEE Int. Workshop Multimedia Signal Process., Crete, Greece, Oct. 2007, pp [29] M. Zamarin, S. Milani, P. Zanuttigh, and G. M. Cortelazzo, A novel multi-view image coding scheme based on view-warping and 3-D-DCT, J. Vis. Commun. Image Represent., vol. 21, no. 5 6, pp , Jun

SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 117 [30] F. Jager and C. Feldmann, Warped-Skip mode for 3-D video coding, presented at the Picture Coding Symp.

Jiang,M.Yu,K.Chen,andY.S.Ho, Asymmetric coding of multi-view video plus depth based 3-D video for view rendering, IEEE Trans. Multimedia, vol. 14, no. 1, pp. 157 167, Feb. 2012. [33] F.Shao,G.Y.Jiang,W.

[34] 3DV/FTV EE2: Report on VSRS Extrapolation, JTC1/SC29/WG11, ISO/IEC, Guangzhou, China, Oct. 2010. [35] Draft Reference Software for MVC,, ISO/IEC MPEG & ITU-T VCEG, London, U.K., Jul. 2009.

2009. [37] E. Ekmekcioglu, V. Velisavljević, and V. S. T. Worrall, Content adaptive enhancement of multi-view depth maps for free viewpoint video, IEEE J. Sel. Topics Signal Process., vol. 5, no.

, Poznan, Poland, May 2012. [39] X. H. Zhang, W. S. Lin, and P. Xue, Just-noticeable difference estimation with pixels in images, J. Vis. Commun. Image Represent., vol. 19, no. 1, pp. 30 41, Jan.

12 SHAO et al.: DEPTH MAP CODING FOR VIEW SYNTHESIS BASED ON DISTORTION ANALYSES 117 [30] F. Jager and C. Feldmann, Warped-Skip mode for 3-D video coding, presented at the Picture Coding Symp., Krakow, Poland, May [31] J. Gautier, O. Le Meur, and C. Guillemot, Depth-based image completion for view synthesis, presented at the 3DTV Conf., Antalya, Turkey, May [32] F.Shao,G.Y.Jiang,M.Yu,K.Chen,andY.S.Ho, Asymmetric coding of multi-view video plus depth based 3-D video for view rendering, IEEE Trans. Multimedia, vol. 14, no. 1, pp , Feb [33] F.Shao,G.Y.Jiang,W.S.Lin,M.Yu,andQ.H.Dai, Jointbitallocation and rate control for coding multi-view video plus depth based 3-D video, IEEE Trans. Multimedia, vol. 15, no. 8, pp , Dec [34] 3DV/FTV EE2: Report on VSRS Extrapolation, JTC1/SC29/WG11, ISO/IEC, Guangzhou, China, Oct [35] Draft Reference Software for MVC,, ISO/IEC MPEG & ITU-T VCEG, London, U.K., Jul [36] Y. Mori, N. Fukushima, T. Yendo, T. Fujii, and M. Tanimoto, View generation with 3-D warping using depth information for FTV, Signal Process.; Image Commun., vol. 24, no. 1 2, pp , Jan [37] E. Ekmekcioglu, V. Velisavljević, and V. S. T. Worrall, Content adaptive enhancement of multi-view depth maps for free viewpoint video, IEEE J. Sel. Topics Signal Process., vol. 5, no. 2, pp , Apr [38] M. Kurc, O. Stankiewicz, and M. Dpmanski, Depth map inter-view consistency refinement for multiview video, presented at the Picture Coding Symp., Poznan, Poland, May [39] X. H. Zhang, W. S. Lin, and P. Xue, Just-noticeable difference estimation with pixels in images, J. Vis. Commun. Image Represent., vol. 19, no. 1, pp , Jan [40] Y. Zhao, Z. Chen, C. Zhu, Y. P. Tan, and L. Yu, Binocular JND model for stereoscopic images, IEEE Signal Process. Lett., vol. 18, no. 1, pp , Jan [41] P.Merkle,Y.Morvan,A.Smolic,D.Farin,K.Muller,P.H.N.deWith, and T. Wiegand, The effect of multiview depth video compression on multiview rendering, Signal Process.: Image Commun., vol. 24, no. 1 2, pp , Jan [42] L. S. Karlsson and M. Sjostrom, Layer assignment based on depth data distribution for multiview-plus-depth scalable video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 6, pp , Jun [43] T.Chen,W.Yin,X.S.Zhou,D.Comaniciu,andT.S.Huang, Total variation models for variable lighting face recognition, IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 9, pp , Sep [44] Depth Estimation Reference Software (DERS) 3.0, JTC1/SC29/WG11, ISO/IEC, Maui, HI, Apr [45] Calculation of Average PSNR Differences Between RD-Curves, SG16/Q6, ITU-T, Austin, TX, [46] Perceptual Measurement for Evaluating Quality of View Synthesis, JTC1/SC29/WG11, ISO/IEC, Maui, HI, Apr Weisi Lin (M 92 SM 98) received the B.Sc. and M.Sc. degrees from Zhongshan University, Guangzhou, China, and the Ph.D. degree from King s College London, London, U.K. He was the Lab Head, Visual Processing, and the Acting Department Manager, Media Processing, for the Institute for Infocomm Research, Singapore. Currently, he is an Associate Professor in the School of Computer Engineering, Nanyang Technological University, Singapore. His areas of expertise include image processing, perceptual modeling, video compression, multimedia communication, and computer vision. He has published 200+ refereed papers in international journals and conferences. He is on the editorial board of the Journal of Visual Communication and Image Representation. Dr. Lin is on the editorial boards of the IEEE TRANSACTIONS ON MULTIMEDIA and the IEEE SIGNAL PROCESSING LETTERS. He served as the Lead Guest Editor for a special issue on perceptual signal processing, IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, He chairs the IEEE MMTC Special Interest Group on Quality of Experience. He has been elected as a Distinguished Lecturer of APSIPA (2012/3). He is the Lead Technical Program Chair for Pacific-Rim Conference On Multimedia (PCM) 2012, and a Technical Program Chair for the IEEE International Conference on Multimedia and Expo (ICME) He is a Chartered Engineer (U.K.), a fellow of Institution of Engineering Technology, and an Honorary Fellow, Singapore Institute of Engineering Technologists. Gangyi Jiang received the M.S. degree from Hangzhou University, China, in 1992, and the Ph.D. degree from Ajou University, Gyeonggi-do, South Korea, in He is now a Professor in Faculty of Information Science and Engineering, Ningbo University, Ningbo, China. His research interests mainly include digital video compression, multi-view video coding, etc. Mei Yu received the M.S. degree from Hangzhou Institute of Electronics Engineering, Hangzhou, China, in 1993, and the Ph.D. degree form Ajou University, Gyeonggi-do, South Korea, in SheisnowaProfessorinFaculty of Information Science and Engineering, Ningbo University, Ningbo, China. Her research interests include image/video coding and video perception. Feng Shao received the B.S. and Ph.D. degrees from Zhejiang University, Hangzhou, China, in 2002 and 2007, respectively, all in electronic science and technology. He is currently an Associate Professor in Faculty of Information Science and Engineering, Ningbo University, Ningbo, China. He was a visiting Fellow with the School of Computer Engineering, Nanyang Technological University, Singapore, from February 2012 to August His research interests include 3-D video coding, 3-D quality assessment, and image perception, etc. Qionghai Dai (SM 05) received the B.S. degree from Shanxi Normal University, Shanxi, China, in 1987, and the M.E. and Ph.D. degrees from Northeastern University, Shenyang, China, in 1994 and 1996, respectively. Since 1997, he has been with the faculty of Tsinghua University, Beijing, China, and is currently Professor and the Director of the Broadband Networks and Digital Media Laboratory. His research areas include video communication, computer vision, and computational photography.

View Generation for Free Viewpoint Video System

View Generation for Free Viewpoint Video System Gangyi JIANG 1, Liangzhong FAN 2, Mei YU 1, Feng Shao 1 1 Faculty of Information Science and Engineering, Ningbo University, Ningbo, 315211, China 2 Ningbo