Fast Mode Decision for Depth Video Coding Using H.264/MVC *

Similar documents
Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

LBP-GUIDED DEPTH IMAGE FILTER. Rui Zhong, Ruimin Hu

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

Coding of 3D Videos based on Visual Discomfort

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

View Synthesis Prediction for Rate-Overhead Reduction in FTV

WITH the improvements in high-speed networking, highcapacity

HEVC based Stereo Video codec

arxiv: v1 [cs.mm] 8 May 2018

An Independent Motion and Disparity Vector Prediction Method for Multiview Video Coding

A Fast Depth Intra Mode Selection Algorithm

Research Article Fast Mode Decision for 3D-HEVC Depth Intracoding

An Efficient Mode Selection Algorithm for H.264

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile

IN RECENT years, multimedia application has become more

Analysis of Depth Map Resampling Filters for Depth-based 3D Video Coding

An Efficient Inter-Frame Coding with Intra Skip Decision in H.264/AVC

View Synthesis for Multiview Video Compression

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Fast Mode Decision for H.264/AVC Using Mode Prediction

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Pattern based Residual Coding for H.264 Encoder *

Fast Encoding Techniques for Multiview Video Coding

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Final report on coding algorithms for mobile 3DTV. Gerhard Tech Karsten Müller Philipp Merkle Heribert Brust Lina Jin

Fast Mode Decision Algorithm for Multiview Video Coding Based on Binocular Just Noticeable Difference

A HIGHLY PARALLEL CODING UNIT SIZE SELECTION FOR HEVC. Liron Anavi, Avi Giterman, Maya Fainshtein, Vladi Solomon, and Yair Moshe

Focus on visual rendering quality through content-based depth map coding

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

View Generation for Free Viewpoint Video System

Unit-level Optimization for SVC Extractor

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

3366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 22, NO. 9, SEPTEMBER 2013

Analysis of 3D and Multiview Extensions of the Emerging HEVC Standard

Key-Words: - Free viewpoint video, view generation, block based disparity map, disparity refinement, rayspace.

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

Enhanced View Synthesis Prediction for Coding of Non-Coplanar 3D Video Sequences

Decoding-Assisted Inter Prediction for HEVC

MULTIVIEW video is capable of providing viewers

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

On the Adoption of Multiview Video Coding in Wireless Multimedia Sensor Networks

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

A deblocking filter with two separate modes in block-based video coding

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264.

Graph-based representation for multiview images with complex camera configurations

Fast Mode Decision Algorithm for Scalable Video Coding Based on Probabilistic Models

Depth Map Boundary Filter for Enhanced View Synthesis in 3D Video

5LSH0 Advanced Topics Video & Analysis

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

DEPTH-LEVEL-ADAPTIVE VIEW SYNTHESIS FOR 3D VIDEO

A study of depth/texture bit-rate allocation in multi-view video plus depth compression

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

View Synthesis for Multiview Video Compression

DISPARITY-ADJUSTED 3D MULTI-VIEW VIDEO CODING WITH DYNAMIC BACKGROUND MODELLING

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

Quality versus Intelligibility: Evaluating the Coding Trade-offs for American Sign Language Video

Motion Vector Coding Algorithm Based on Adaptive Template Matching

An Efficient Table Prediction Scheme for CAVLC

Lossless Compression of Stereo Disparity Maps for 3D

Development and optimization of coding algorithms for mobile 3DTV. Gerhard Tech Heribert Brust Karsten Müller Anil Aksay Done Bugdayci

[30] Dong J., Lou j. and Yu L. (2003), Improved entropy coding method, Doc. AVS Working Group (M1214), Beijing, Chaina. CHAPTER 4

Next-Generation 3D Formats with Depth Map Support

Performance Comparison between DWT-based and DCT-based Encoders

DEPTH MAPS CODING WITH ELASTIC CONTOURS AND 3D SURFACE PREDICTION

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 24, NO. 5, MAY

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264

Categorization for Fast Intra Prediction Mode Decision in H.264/AVC

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

Extensions of H.264/AVC for Multiview Video Compression

User-Friendly Sharing System using Polynomials with Different Primes in Two Images

Texture Sensitive Image Inpainting after Object Morphing

ARTICLE IN PRESS. Signal Processing: Image Communication

A Fast Intra/Inter Mode Decision Algorithm of H.264/AVC for Real-time Applications

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

Early Determination of Intra Mode and Segment-wise DC Coding for Depth Map Based on Hierarchical Coding Structure in 3D-HEVC

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

Homogeneous Transcoding of HEVC for bit rate reduction

Mode-Dependent Pixel-Based Weighted Intra Prediction for HEVC Scalable Extension

Low-Complexity, Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras

Automatic Video Caption Detection and Extraction in the DCT Compressed Domain

Template based illumination compensation algorithm for multiview video coding

Block-based Watermarking Using Random Position Key

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER

New Rate Control Optimization Algorithm for HEVC Aiming at Discontinuous Scene

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

Depth Modeling Modes Complexity Control System for the 3D-HEVC Video Encoder

Depth Estimation for View Synthesis in Multiview Video Coding

Edge Detector Based Fast Level Decision Algorithm for Intra Prediction of HEVC

High Efficient Intra Coding Algorithm for H.265/HVC

A comparison of CABAC throughput for HEVC/H.265 VS. AVC/H.264

DEPTH PIXEL CLUSTERING FOR CONSISTENCY TESTING OF MULTIVIEW DEPTH. Pravin Kumar Rana and Markus Flierl

Transcription:

JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 31, 1693-1710 (2015) Fast Mode Decision for Depth Video Coding Using H.264/MVC * CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG* Department of Communication Engineering National Central University Jhongli City, Taoyuan County, 32001 Taiwan E-mail: {chihhunglu13; u10427}@gmail.com; cwtang@ce.ncu.edu.tw Multi-view video plus depth (MVD) is presented as one of 3D video formats. With color and depth videos, virtual views can be synthesized using depth image based rendering. Due to high complexity of 3D video coding, reduction of the time complexity of depth video coding is essential to success of MVD. Thus, this paper proposes a fast mode decision scheme for depth video coding, aiming at speeding up both mode decision and synthesized-view distortion estimation simultaneously. The proposed scheme includes two stages. First at all, features of color and depth videos are referred to reduce the candidates for mode decision with high prediction accuracy and negligible degradation of rate-distortion (RD) performance. For a depth macrobloc (MB) having the co-located MB with high complexity of vertical texture in the reconstructed color video, the second stage further selects the optimal mode using the synthesized-view distortion directed RD cost to reduce the synthesized-view distortion. For fast estimation of synthesized-view distortions, a mathematical approximation of a previous synthesized-view distortion function that mimics view rendering is proposed. Experimental results conducted by JMVC 6.0.3, the reference software of H.264/MVC, show that the proposed scheme achieves significant time reduction with negligible RD performance degradation. Keywords: depth video coding, multi-view video coding (MVC), fast mode decision, ratedistortion cost (RD cost), synthesized-view distortions 1. INTRODUCTION Nowadays, 3D technologies are emerged due to requirements of entertainments such as three-dimensional television (3DTV) and free viewpoint video (FVV) [1-3]. To compress 3D videos efficiently, Joint Video Team (JVT) has finalized the multi-view video coding standard (MVC) [4] in early 2008. The multi-view video plus depth (MVD) format, enabling depth image based rendering (DIBR), is introduced to 3DTV systems and adopted by MPEG-3DV [5-7], where plans of standardization of 3D video coding (3DVC) [8] are still in progress. After the MVD data is encoded on a 3D server, the compressed data is stored on 3D Blu-ray discs or transmitted over the networ to a 3D receiver for displaying. Being a 3D server, the requirements of a personal computer (PC) are indicated in call for proposals on 3D video coding [9]. Coding of MVD incurs heavy computational overhead since it explores spatial, temporal, interview, and even inter-data (between color and depth videos) redundancies. For MVC, mode decisions with motion and disparity estimations lead to high computational load of 3D servers. Generally, the characteristic of a depth video differs from that of a color video although they represent Received March 28, 2014; revised July 10, 2014; accepted August 20, 2014. Communicated by Jen-Hui Chuang. * This wor was supported in part by the National Science Council of Taiwan under the Grants NSC-100-2221-E-008-098 and NSC-101-2221-E-008-060. 1693

1694 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG the same scene. Moreover, it is color video but not depth video perceived by users. Hence, problems of fast mode decision for coding of MVD are not completely identical to those for color video coding. Besides time reduction and RD performance, fast algorithms of depth video coding should also retain the quality of synthesized views. A depth map and its corresponding color video represent the same scene. Hence, one macrobloc (MB) in a depth video will be coded by SKIP if its co-located MB in the color video is coded by SKIP for fast depth video coding, proposed in [10]. Optimal modes of depth video coding are not always correlated with those of color video coding since depth and color videos have different texture characteristics. Thus, fast mode decision of depth video coding will test both SKIP and Intra modes if the corresponding MB in the color video is coded by either Inter or SKIP mode in [11]. Speed-up of depth video coding can be achieved by assigning different sets of candidates to different regions of a depth map for fast mode decision. For non-region-of-interests with small rate-distortion (RD) cost of Inter 16 16 mode, only SKIP mode is tested in depth video coding using H.264/AVC in [12]. In a depth video, for boundaries with the co-located MBs in the color video coded by Intra modes, bacground with the stationary co-located MBs in the color video, and foreground, SKIP and Intra modes are tested in depth video coding using H.264/AVC, proposed in [13]. Since the probability mass function (PMF) of optimal mode of depth video coding is usually imbalanced in blocs with small variances, the set of candidates for mode decision of depth video coding using H.264/AVC in regions with small variances can be reduced [14, 15]. The quality of synthesized views is related to that of reconstructed color and depth videos. The quality of reconstructed depth videos can be improved with the aid of post-processing, e.g. bilateral filter, for artifacts removal and edge preservation [16, 17]. Reducing distortions of synthesized views in rate-distortion optimization (RDO) of depth video coding is another method [18-20]. With the help of camera matrices, distortions of synthesized views can be estimated from distortions of depth maps and characteristics of scenes [18]. An accurate function for synthesized view distortion change, with a renderer model for fast computation, is integrated into RDO of the reference software of high efficiency video coding [19]. Alternatively, the synthesized view distortion function can be estimated using both the characteristic of vertical texture of color videos and coding errors of depth maps, mimicing the rendering process [20]. This paper proposes a fast mode decision algorithm for depth video coding using H.264/MVC. Different from previous wor, the first part reduces the candidates for the mode decision of depth video coding by referring to the information of color and depth videos simultaneously. Both the features of the neighboring MBs and the current MB are considered. Probabilistic analyses in this paper indicate that the candidate set for fast mode decision, achieving negligible degradation of RD performance, should include modes in addition to direct/skip and Intra modes. Next, in order to reduce the distortions of synthesized views, mode decisions are made based on virtual-view distortion directed RD cost. To speed up the computation, this paper proposes a fast approximation of the distortion function of Oh et al. [20]. The remainder of this paper is organized as follows. Section 2 proposes the fast mode decision algorithm for depth video coding based on probabilistic analyses of mode decisions. Section 3 proposes fast approximation of distortion estimation of synthesized views. Section 4 presents experimental results and analyses. Conclusions are given in Section 5.

FAST MODE DECISION FOR DEPTH VIDEO CODING 1695 2. THE PROPOSED FAST MODE DECISION FOR DEPTH VIDEO CODING USING MVC To reduce the candidates for fast mode decision of depth video coding, this section analyzes the probability mass function (PMF) of mode decision of depth video coding using a 3-view prediction structure (Fig. 1), and proposes a fast mode decision algorithm. The group of picture (GOP) is 8. Frames indexed by T0 and T8 are anchor frames. The others are non-anchor ones. S0, S1, and S2 are the base view, the 1 st enhancement view, and the second enhancement view, respectively. Tests are conducted using JMVC 6.0.3, the reference software of MVC, for both color and depth video codings. Depth video coding is applied after color video coding. Coding statistics of Boo_Arrival (1024 768, 97 frames, and search range is 96.), Kendo (1024 768, 297 frames, and search range is 96.), Poznan_Hall2 (1920 1088, 193 frames, and search range is 128.), and Undo_Dancer (1920 1088, 249 frames, and search range is 128.) from MPEG 3-DV AHG, is analyzed. Among these videos, only Undo_Dancer is computer generated. Kendo has high object motion. Data given in Tables 1-5 is an average of data of three views. In H.264/MVC, the supported MB modes include SKIP, direct, Inter 16 16, Inter 16 8, Inter 8 16, Inter 8 8, Intra 16 16, and Intra 4 4. Inter 8 8 can be further divided into Inter 8 8, Inter 8 4, Inter 4 8, and Inter 4 4. The time complexity of direct mode is quite low, compared with the other coding modes. Lagrangian optimization decides the optimal mode, and the RD cost is calculated by [21] J mode (S, I mode ) = SSD(S, I ) + mode R(S, I ), (1) where S and I denote the th MB and the corresponding MB mode, respectively. mode is the Lagrange parameter for mode decision and varies with QP, the quantization parameter. SSD(S, I ) is the sum of squared difference between the reconstructed MB and the original MB, and R(S, I ) is the rate after entropy coding. Based on tests using different QPs and videos, encoding time of depth videos using H.264/MVC occupies around half of the total coding time of MVD, where the mode decision with motion and disparity estimations occupies a high percentage of time. Analyses of PMFs of optimal mode of color and depth video codings indicate that the probability that direct mode is optimal increases with QP. Moreover, for both color and depth video codings, direct mode always occupies the largest proportion of optimal modes and the probability is at least 50%. The prior probability that either the optimal mode of the co-located MB in the corresponding color video or a neighboring MB in a depth video is direct mode is high. This motivates us to explore the conditional probabilities related to direct mode (Section 2.1) to speed up mode decision of depth video coding (Section 2.2). Fig. 1. The adopted 3-view prediction structure for depth video coding.

1696 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG 2.1 Probabilistic Analyses of Mode Decision of Depth Video Coding Using MVC A color video and its corresponding depth video represent the same scene. Test results show that the conditional probability of optimal mode is direct in depth video coding is always the largest given that the optimal mode of the co-located MB of the corresponding color video is direct. However, this probability decreases in case of a dynamic video, e.g. Kendo (63.36% for QP22 and 81.3% for QP37). Inter 16 16, Inter 16 8, Inter 8 16, and Intra 16 16 also occupy a significant proportion of optimal modes. However, the probability of Inter 8 8 is very small, ranging from 0% to 0.58%. The probability of Intra 4 4 is small sometimes. In fact, the characteristic of a depth video quite differs from that of its corresponding color video. Most regions of depth videos are smooth except that only a few contain sharp edges. Thus, the correlation of optimal modes between the current depth MB and its neighboring MBs, including the upper, left, upper left, and upper right MBs, are analyzed given that the optimal mode of the current MB is direct. The result shows that the optimal mode of the upper MB has the highest correlation with that of the current MB. Thus the optimal mode of the upper MB can characterize the coding information of the neighborhood. For Boo_Arrival, Poznan_Hall, and Undo_Dancer, the conditional probability that the optimal mode is direct is greater than 90% for small QPs, e.g. QP22, and it increases with QP. Again, the probability is slightly lower for Kendo, 82.84% for QP22 and 91.97% for QP37, due to the dynamic content of Kendo. In Yooh et al. [15], intra mode decision of depth video coding is speed up in regions with small variances, using an adaptive threshold to avoid distortions of boundaries, T intra 15 QP/22 QP%22 5. (2) Inspired by this wor, the conditional probability that the optimal mode of the current depth MB is direct is analyzed given that both of optimal modes of its co-located MB in the color video and upper MB in the current depth frame are direct modes and the variance of intensity in the current MB is small, i.e. not greater than the threshold, T 1. In the test, T 1 ranges from 0 to 100. For Boo_Arrival, Poznan_Hall2, and Undo_Dancer, the probability is high for all combinations of T 1 and QPs. The probability ranges from 94.47% (Kendo) to 97.16% (Boo_Arrival) for QP22 and T 1 = 0. However, since Kendo has higher object motion, the probability is less than 90% for QP22 and T 1 = 10. Interestingly, T intra is 10 for QP 22 in (2). Thus, this paper adopts (2) to detect regions with small variance in the following analyses. Given that both of the optimal modes of the corresponding MB in the color video and upper MB in the depth video are direct, and the variance of the current MB in the depth video is small, our analysis shows that the probability that the optimal mode of the current MB is either direct or Inter16 16 is greater than 97% for all combinations of QP and video. Regarding that both of the optimal modes of the corresponding MB in the color video and upper MB in the depth video are direct, and the variance of the current depth MB is not small, the conditional probability that the optimal mode is Inter 8 8 is 0.7% and 2.80% for Boo_Arrival (QP22) and Kendo (QP22), respectively (Table 1). Let A denote the event that both the optimal modes of the co-located MB in the corresponding color video and the upper MB in the current frame of the depth video are direct modes and the variance of the current MB is

FAST MODE DECISION FOR DEPTH VIDEO CODING 1697 not small. Let B denote the event that both the optimal modes of the co-located MB in the corresponding color video and the upper MB in the current frame of the depth video are direct, the variance of the current MB is not small, and the optimal mode of the current MB is Inter 8 8. Table 1. The conditional PMF of optimal mode in the depth video given that both of the optimal modes of the corresponding MB in the color video and upper MB in the depth video are direct and the variance of the current MB in the depth video is not small. Boo_Arrival Kendo Optimal Mode QP22 QP37 QP22 QP37 direct 93.03% 92.21% 12.01% 64.84% Inter 16 16 4.61% 5.79% 33.83% 21.85% Inter 16 8 0.66% 0.66% 13.52% 3.79% Inter 8 16 0.49% 0.53% 8.70% 2.85% Inter 8 8 0.70% 0.05% 2.80% 0.19% Intra 16 16 0.19% 0.60% 12.95% 5.37% Intra 4 4 0.32% 0.19% 16.09% 0.04% Poznan_Hall2 Undo_Dancer Optimal Mode QP22 QP37 QP22 QP37 direct 40.08% 90.40% 44.20% 90.74% Inter 16 16 27.05% 7.31% 37.60% 7.11% Inter 16 8 9.41% 0.85% 6.00% 0.73% Inter 8 16 3.78% 0.69% 4.98% 0.83% Inter 8 8 0.94% 0.02% 5.45% 0.10% Intra 16 16 16.82% 0.67% 0.69% 0.11% Intra 4 4 1.91% 0.07% 1.08% 0.40% Analyses show that P(A) are 21.57%, 1.51%, 0.41%, and 0.20% for Boo_Arrival, Kendo, Poznan_Hall2, and Undo_Dancer, encoded by QP22, respectively. Accordingly, P(B) are 0.15% and 0.04% for Boo_Arrival (QP22) and Kendo (QP22), respectively. P(A) changes with videos and thus it may be high, e.g. Boo_Arrival. However, Inter 8 8 can be still sipped in the fast mode decision since P(B) is very small. Given the optimal mode of the upper MB of depth videos is not direct mode while the optimal mode of its co-located MB of the corresponding color video is, the conditional PMF of optimal mode of the current MB is shown in Table 2 and the following observations are made. First, the conditional probability of Intra 4 4 with modes 3-8 is very low. Hence, these modes can be sipped. Here, the variance of the current MB is not considered since modes 3-8 of Intra 4 4 are always sipped due to their low probability. Secondly, the conditional probabilities of Inter 8 8 are 2.57% and 1.05% for Boo_Arrival (QP22) and Kendo (QP22), respectively. The conditional probabilities of Intra 4 4 with modes 0 and 1 range from 0.92% to 1.73% for Boo_Arrival (QP22) and Kendo (QP22), respectively. Although these probabilities are not small and thus these modes should not be sipped, our analysis shows that the prior probabilities that the optimal mode of the upper MB is not direct while that of the co-located MB in the corresponding color video is direct are 8.99% and 21.58% for Boo_Arrival (QP22) and

1698 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG Table 2. Conditional PMF of optimal mode of depth video coding given that the optimal mode of the co-located MB in the corresponding color video is direct but the optimal mode of the upper MB in the depth video is not. Boo_Arrival Kendo Optimal Mode QP22 QP37 QP22 QP37 direct 39.09% 45.03% 22.85% 30.32% Inter 16 16 30.73% 28.85% 27.42% 23.17% Inter 16 8 4.79% 3.46% 5.10% 2.70% Inter 8 16 4.06% 2.34% 4.71% 2.11% Inter 8 8 2.57% 0.12% 1.05% 0.10% Intra 16 16 11.68% 18.43% 28.91% 40.2 % Intra 4 4 mode 0 1.26% 0.14% 1.73 % 0.07 % mode 1 0.99% 0.06% 0.92% 0.02% mode 2 4.24% 1.53% 6.87% 1.26% modes 3-8 0.59% 0.05% 0.44% 0.02% Poznan_Hall2 Undo_Dancer Optimal Mode QP22 QP37 QP22 QP37 direct 26.27% 22.07% 51.20% 41.45 % Inter 16 16 35.72% 38.34% 44.46 % 39.79 % Inter 16 8 3.18 % 4.87 % 1.35 % 3.15 % Inter 8 16 3.07 % 2.11 % 1.29 % 4.26 % Inter 8 8 0.08 % 0.01 % 0.41 % 0.27 % Intra 16 16 31.08% 32.48% 1.22 % 10.77 % Intra 4 4 mode 0 0.04% 0.01% 0.00% 0.01 % mode 1 0.01% 0.00% 0.00% 0.00% mode 2 0.54% 0.11% 0.07% 0.06% modes 3-8 0.00% 0.00% 0.05% 0.09% Kendo (QP22), respectively. Let E denote the event that the optimal mode of the current MB in a depth video is Inter 8 8 and the optimal mode of the co-located MB in the corresponding color video is direct. Accordingly, P(E) is around 0.23% for both Boo_Arrival and Kendo. That is, for one depth frame with 3072 MBs, there is about 7 MBs on average with the optimal mode Inter 8 8. Since Inter 8 8 is much more complex than modes 0 and 1 of Intra 4 4, the fast mode decision should sip Inter 8 8. Given that the optimal mode of the co-located MB in the corresponding color video is not direct and the variance of intensity in the current depth MB is small, the conditional PMF of optimal mode of depth video coding is provided in Table 3. On average, Inter 8 8 has the lowest probability among all modes and thus it can be eliminated from the set of candidates for fast mode decisions. Boo_Arrival coded with QP22 has the highest probability, 0.08%, of Inter 8 8. Modes 3-8 of Intra 4 4 also occupy low percentage except Boo_Arrival (QP22). For Boo_Arrival coded with QP22, the probability of modes 3-8 of Intra 4 4 is 0.59%. However, the prior probability that the optimal mode of the co-located MB in the corresponding color video is not direct and the variance of intensity in the current MB is small is 18.98% for Boo_Arrival (QP22). Accordingly, the probability that the optimal mode of the co-located MB in the corresponding color video is not direct, the variance of intensity in the current MB is small, and the optimal

FAST MODE DECISION FOR DEPTH VIDEO CODING 1699 Table 3. Conditional PMF of optimal mode of depth video coding given that the optimal mode of the co-located MB in the corresponding color video is not direct and the variance of intensity in the current MB of the depth video is small. Boo_Arrival Kendo Optimal Mode QP22 QP37 QP22 QP37 direct 38.25% 31.91% 35.87% 32.03 % Inter 16 16 21.44% 22.23% 20.86% 14.17% Inter 16 8 2.26 % 2.82 % 2.05 % 1.49% Inter 8 16 1.94 % 1.69 % 1.84 % 0.79 % Inter 8 8 0.08 % 0.00 % 0.05 % 0.00 % Intra 16 16 24.97% 40.59% 33.32% 50.64% Intra 4 4 mode 0 1.67 % 0.02 % 0.73 % 0.01 % mode 1 0.83 % 0.01 % 0.37 % 0.01 % mode 2 8.13 % 0.71 % 4.78 % 0.67 % modes 3-8 0.43 % 0.00 % 0.13 % 0.00% Poznan_Hall2 Undo_Dancer Optimal Mode QP22 QP37 QP22 QP37 direct 45.92% 44.87% 58.76% 55.97 % Inter 16 16 22.43% 17.57% 21.78 % 20.74 % Inter 16 8 1.33 % 2.88 % 1.04 % 1.89 % Inter 8 16 1.03 % 0.62% 0.78 % 0.56 % Inter 8 8 0.01 % 0.00 % 0.02 % 0.00 % Intra 16 16 28.04% 34.01% 17.21 % 20.77 % Intra 4 4 mode 0 0.05 % 0.00 % 0.01 % 0.00 % mode 1 0.01 % 0.00 % 0.00 % 0.00 % mode 2 1.16 % 0.06 % 0.40 % 0.07 % modes 3-8 0.01 % 0.00 % 0.00 % 0.00% low. Thus, modes 3-8 of Intra 4 4 can be also reduced for fast mode decisions of depth video coding. Finally, we also analyze the conditional PMF of optimal mode in depth video coding given that the optimal mode of the co-located MB in the corresponding color video is not direct and the variance of intensity in the current MB is not small. The PMF is not as imbalanced as the PMFs in Tables 1-3. Moreover, there is not any mode with very low conditional probability. For QP 22, the conditional probability of Inter 8 8 ranges from 2.65% to 11.4%, and the conditional probability of Intra 4 4 ranges from 0.56% to 7.42%. Thus, fast mode decision should not be applied in this case. 2.2 Fast Mode Decision for Depth Video Coding According to probabilistic analyses in the previous subsection, the proposed fast mode decision algorithm for depth video coding using MVC is summarized as follows (Fig. 2). Since non-anchor frames are coded by referring to the coding information of anchor frames, the quality of anchor frames is more important. To avoid error propagation, we apply the fast mode decision to non-anchor frames only. For fast mode decision of depth video coding, one of the following five sets is selected for each MB. (i) If both of the optimal modes of the co-located MB in the corresponding color video and the up-

1700 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG Fig. 2. The flow chart of the proposed fast mode algorithm for depth video coding. per MB in the current depth frame are direct, and the variance of intensity of the current MB of the current depth frame is small, only direct and Inter 16 16 modes will be tested. (ii) If both the optimal modes of the co-located MB in the corresponding color video and the upper MB in the current depth frame are direct, and the variance of intensity of the current MB in the depth frame is not small, the fast algorithm will sip Inter 8 8. (iii) If the optimal mode of the co-located MB in the corresponding color video is direct and the optimal mode of the upper MB in the depth frame is not direct, the fast algorithm will sip modes 3-8 of Intra 4 4. (iv) If the optimal mode of the co-located MB in the corresponding color video is not direct and the variance of intensity of the current MB of the depth frame is small, Inter 8 8 and modes 3-8 of Intra 4 4 will be sipped. (v) If the optimal mode of the co-located MB in the corresponding color video is not direct, and the variance of intensity of the current MB in the depth frame is not small, the set of candidates for mode decision will not be reduced. 3. THE PROPOSED FAST APPROXIMATION OF DISTORTION ESTIMATION OF SYNTHESIZED VIEWS For FVV, it is the color video instead of the depth video displayed to end users. Among previous schemes that measure distortions of synthesized views, a rendering

FAST MODE DECISION FOR DEPTH VIDEO CODING 1701 view distortion function measures quality of a synthesized view based on both the errors of depth video coding and the complexity of vertical texture of the co-located MB in the corresponding color video [20]. To speed up mode decision of depth video coding while retain the quality of synthesized views simultaneously, this section proposes a mathematical approximation of the distortion function in [20] for speedup of computation of distortions. The distortion measure is incorporated into the proposed fast mode decision. Since the horizontal shift of warped position in a synthesized view can be derived from coding errors of depth videos by using camera matrices, a distortion function for one MB of a synthesized view is defined by Oh et al. [20]. SSD syntesis f ( C ~, D ) f ( C, D ) ~ ~ ~ ( f ( C, D ) f ( C, D,) 2 2 f ( C, D ) 2 ) (3) where C and D are the th pixel in the original color and depth videos, respectively, C ~ and D ~ are the th pixel in the reconstructed color and depth videos, respectively, and C is the coding error of the th pixel in the color video. f(p C, P D ) is a warping function, synthesizing one pixel in the color video of a virtual view by using the pixel P C in the color video and pixel P D in the depth video. Since this paper targets at speed-up of depth video coding but not color video coding, f(c, L ) 2 does not vary with the mode decision of depth video coding. Accordingly, only ~ ~ ~ 2 f ( C, D ) f ( C, D,) of (3) is estimated. Instead of the intensity of a synthesized pixel, the coding error of the th pixel, D, in the depth video changes the warped position. For further analysis, the following assumptions are made. (i) To estimate the pixel-wise distortion in the synthesized view, only the current pixels C and D are distorted to be C ~ and D ~. (ii) By using linear interpolation for warping, the warped position in the synthesized view corresponding to the th pixel in the original view will be located between ( 1) and ( 1). (iii) The MVD is captured using 1-D parallel camera configuration. The pixel-wise distortion at the instead of the intensity of a synthesized th pixel can be approximated as sum of two triangles with translational rendering position error, D [20, 22]. Thus, SSD synthesis in (3) is rewritten by [19] SSD syntesis D 1 2 4 avg 1 ~ D[ C 2 ( ) 2 ~ D [ C ~ C 1 ~ C ~ C 1 ~ C ~ C 1 ~ C ] 1 2 ] 2 (4) where D avg () is the distortion of the th pixel in the synthesized view, caused by both D and vertical texture of the reconstructed color video, and fl 1 1 (5) 255 Z n Z f

1702 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG where f is the focal length of a camera, L is the baseline between the original and synthesized views, and Z n and Z f are the minimal and the maximal depth values in the scene, respectively. C is equal to C ~ since C ~ with D ~ is warped to C in the synthesized view. Moreover, D avg () shall be revised to be zero if the th pixel is occluded after warping [20]. However, since the percentage of occluded pixels after warping is usually low, the occlusion case is neglected here. For mode decision using (4), SSD synthesis corresponding to each mode is calculated using pixel-wise squared distortions, where both pixel-wise D and the corresponding pixel-wise feature of vertical texture in the reconstructed color video are needed for pixel-wise multiplication. To reduce the time complexity of (4), the variance of D of each MB in depth videos is analyzed as follows. Most MBs have small variances of D. For the first GOP of Boo_Arrival (5 th view) and Kendo (5 th view) coded with QP = 22 (Table 4), the maximum and minimum of D are 35 and 37, respectively. In this case there are 92.61% and 96.62% of MBs have standard deviations of D that are less than 2 for Boo_Arrival and Kendo, respectively. For Boo_Arrival and Kendo coded with QP = 37, the maximum and minimum of D are 117 and -119, respectively. In this case there are 93.98% and 92.00% of MBs with standard deviations of D less than 5 for Boo_Arrival and Kendo, respectively. In order to reduce the time complexity of (4) for fast mode decision of depth video coding, this section proposes to substitute mean of D of one MB in a depth video, D mean, for D into (4), and defines the complexity of vertical texture of one color MB as ~ ~ ~ ~ C 2 C 1 C C 1 M (6) Accordingly, (4) can be approximated by SSD syntesis 1 2 2 Dmean M. (7) 4 Table 4. The cumulative distribution function of variance of depth coding error. Boo_Arrival Kendo Variance QP22 QP37 QP22 QP37 1 55.60% 38.69% 69.37% 53.02% 4 96.62% 54.58% 92.61% 66.30% 9 99.75% 83.44% 99.31% 78.23% 16 99.97% 90.22% 99.97% 85.88% 25 100% 93.98% 100.00% 92.00% 100 100% 99.82% 100.00% 99.75% Instead of estimating SSD synthesis of each mode accurately from pixel-wise squared distortions using (4), this paper proposes to approximate SSD synthesis using (7) for fast mode decision. Since the effects of depth and color video codings on the quality of synthesized views are summarized separately in (7), i.e. D 2 mean and M, M is computed once in the whole process of mode decision of depth video coding, regardless of the number of coding modes to be tested. On the other hand, for SSD synthesis, M plays the role of a weighting factor of D 2 mean, where D2 mean varies with the coding mode and QP of depth

FAST MODE DECISION FOR DEPTH VIDEO CODING 1703 video coding. To incorporate SSD synthesis in (7) into the RDO of mode decision, the strategy is stated as follows. If M is large, the quality of synthesized views will be highly sensitive to the distortion caused by depth video coding. Thus, according to (7), the RD cost should focus on the distortion of the depth video, D 2 mean, to reduce the distortion of synthesized views. On the contrary, since the distortion from depth video coding will have lower impact on the quality of synthesized views if M is small, instead of using SSD synthesis only, the original RD cost of depth video coding is still used. Moreover, since most MBs in color videos usually do not have high complexity of vertical texture, the RD cost of depth video coding should consider both rate and distortion to avoid RD performance degradation in most cases. In summary, the RD cost of mode decision of depth video coding is categorized into two cases: J mode ( S, I 2 Dmean, M T2 mod ) (8) e SSD( S, I ) moder( S, I ), M T2 where the th MB, S, is coded with coding mode, I, and T 2 is the threshold to detect regions with complex vertical texture in the color video. Based on RD cost in (8), the RDO selects the optimal mode for depth video coding. To select the value of T 2, Table 5 analyzes the average of M corresponding to each optimal mode of color video coding for different combinations of QP and video. Among all modes, it is Inter 8 8 having the most significant value of M on average, except Undo_Dancer. In addition to Inter 8 8, Inter 8 16 also has large value of M on average. Since it is suggested to sip Inter 8 8 in some fast mode decision algorithms of color video coding due to its high complexity and low proportion, for each test video, the average of M corresponding to Inter 8 16 in its color video is the threshold, T 2, to detect MBs with high complexity of vertical texture. In the proposed scheme, the optimal mode of a coded color MB is saved, and the complexity of vertical texture of the reconstructed color video, M, is computed. The adaptive threshold, T 2, to detect high complexity of vertical textures is also determined Table 5. Average of M corresponding to each optimal mode in color video coding. Boo_Arrival Kendo Optimal Mode QP22 QP37 QP22 QP37 direct 24034 21723 8724 8469 Inter 16 16 48979 58165 27630 47876 Inter 16 8 63529 74564 37241 48237 Inter 8 16 72880 81844 45286 56007 Inter 8 8 117577 136526 73480 91810 Intra 16 16 9193 12360 4689 8346 Intra 4 4 41799 86559 17861 45008 Poznan_Hall2 Undo_Dancer Optimal Mode QP22 QP37 QP22 QP37 direct 6767 5941 35332 27462 Inter 16 16 17376 38842 79378 132790 Inter 16 8 27916 39377 118181 146578 Inter 8 16 38450 49019 217982 193832 Inter 8 8 116910 173575 208094 173404 Intra 16 16 15976 14835 312704 143620 Intra 4 4 26229 124251 168319 132052

1704 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG by the average of M of the MBs with the optimal mode. The first stage of the proposed algorithm aims at reducing the candidates for mode decision of depth video coding (Section 2.2). The second stage selects the RD cost function and then maes the mode decision. If the complexity of vertical texture of the corresponding MB in the reconstructed color video is greater than T 2, the RD cost of each candidate taes the distortion into account only. Otherwise, the RD cost of each candidate combines both rate and distortion. 4. EXPERIMENTAL RESULTS Tests are conducted by revising the reference software JMVC 6.0.3. GOP is 8. FRExt is turned on. Entropy coding is CABAC. The number of reference frame is 2, consisting of one temporal frame and one inter-view frame. Motion estimation is fast search. The set of values of QP includes 22, 27, 32, and 37, according to Tanimito et al. [24]. Three views are encoded while two reconstructed views are used for view synthesis by the reference software VSRS 3.0 [25]. The computation platform is equipped with 2.83 GHz CPU and 2GB RAM. In addition to Kendo and Undo_dancer that have been used for probabilistic analyses (Section 2.1), Lovebird1, Newspaper, and Poznan_Street are also adopted in tests. Test conditions are based on call for proposals on 3DVC [9]. Camera configurations of all video clips are 1D/parallel and images are rectified. Lovebird1 contains outdoor scenes and objects move slowly toward the camera. Newspaper includes indoor scenes and it has fade-in. Poznan_Street features both fade-in and fadeout and inconsistent object motions. Table 6 compares performance between the proposed scheme and original JMVC 6.0.3. The settings of QPs for color video coding and depth video coding are identical. PSNR is computed between two virtual views, where one is synthesized from two uncompressed color videos and the corresponding two uncompressed depth videos, and the other is synthesized from two decompressed color videos and the corresponding two decompressed depth videos. Here, the bitrate is the total bitrate of color and depth videos. Let the bitrate difference denote the difference between the bitrate of the proposed scheme and that of the original JMVC 6.0.3. Bitrate is defined as the ratio of the bitrate difference and the bitrate of the original JMVC 6.0.3. Table 6 indicates time reduction of the proposed scheme ranges from 48% to 60%. Degradation of RD performance is nearly negligible, where the average increase of bitrate is around 0.028% and the average PSNR is 0.011 db. Compared with original JMVC, the slight increase of bitrate and decrease of PSNR of the proposed scheme are based on two reasons. First, based on probabilistic analyses (Section 2.1), the prediction accuracy of optimal mode by the first stage of the proposed scheme is high except Kendo. The lower prediction accuracy of Kendo is due to the dynamic content, and thus the bitrate of Kendo is larger than those of the other test videos. Next, only a small percentage of decisions are made using the revised distortion at the second stage of the proposed scheme. The fast mode decision algorithm proposed by Peng et al. [14] is one of the previous fast depth video coding schemes that have significant time reduction. The main idea is that direct and Intra modes are tested for MBs with small variances. We implement Peng et al. [14] by revising JMVC 6.0.3. Table 7 provides comparisons of time reduction and RD performance between the proposed scheme and Peng et al. [14], where Bjontegaard delta pea signal-to-noise ratio (BDPSNR) of synthesized color views and Bjon-

FAST MODE DECISION FOR DEPTH VIDEO CODING 1705 tegaard delta bit rate (BDBR) [25] of depth video coding are evaluated using the original JMVC 6.0.3 as the benchmar. The average BDBR of the proposed scheme and Peng et al. [14] are 0.25% and 3.21%, respectively, where the BDBR of the proposed scheme ranges from 0.36% to 0.7%, and that of Peng et al. [14] ranges from 0.56% to 7.25%. The av erage BDPSNR of the proposed scheme and Peng et al. [14] are 0.01 db and 0.08 db, respectively, where the smallest BDPSNR are 0.021 db (Poznan_Street) and 0.2 db (Undo_dancer) for the proposed scheme and Peng et al. [14], respectively. The Table 6. Comparisons of coding performance between the proposed scheme and JMVC 6.0.3. QP Bitrate(%) PSNR(dB) Time(%) Lovebird1 22 0.095 0.067 62.96 27 0.141 0.010 63.14 32 0.184 0.008 62.25 37 0.321 0.012 53.14 Average 0.186 0.01525 60.37 Newspaper 22 0.179 0.004 51.75 27 0.207 0.002 57.32 32 0.136 0.007 59.71 37 0.156 0.001 60.77 Average 0.170 0.003 57.39 Poznan_Street 22 0.061 0.006 58.32 27 0.224 0.011 59.07 32 0.361 0.025 57.67 37 0.314 0.014 58.61 Average 0.240 0.014 58.42 Kendo 22 1.004 0.031 43.31 27 1.131 0.012 47.06 32 1.312 0.012 50.46 37 1.557 0.019 53.06 Average 1.251 0.0185 48.47 Undo_Dancer 22 0.015 0.024 56.64 27 0.025 0.016 56.17 32 0.046 0.001 54.20 37 0.028 0.003 56.49 Average 0.028 0.011 55.87 Table 7. Comparisons of coding performance between Peng et al. [14] and the proposed algorithm. Peng et al. [14] Proposed Algorithm BDBR(%) BDPSNR(dB) Time(%) BDBR(%) BDPSNR(dB) Time(%) Lovebird1 7.25 0.09 84.3 0.36 0.0006 60.4 Newspaper 0.56 0.02 75.0 0.11 0.004 57.4 Poznan_Street 1.06 0.03 87.7 0.7 0.021 58.4 Kendo 3.47 0.11 79.0 0.54 0.015 48.5 Undo_Dancer 4.82 0.2 86.6 0.26 0.011 55.9 Average 3.21 0.08 82.5 0.25 0.01 56.1

1706 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG average time reduction of the proposed scheme and Peng et al. [14] are 56.1% and 82.5%, respectively. BDBR and BDPSNR of the proposed scheme outperform those of Peng et al. [14] although time reduction of the proposed scheme is not comparable to that of Peng et al. [14]. The BDBR of Peng et al. [14] changes significantly with video content while the BDBR of the proposed scheme varies slightly with videos. The RD-plots of depth video coding using original JMVC 6.0.3, proposed scheme, and Peng et al. [14] are given in Fig. 3. In this figure, the bitrate is the average of three coded color and depth views, and Y-PSNR is the PSNR of the Y component of a synthesized view. There is almost no discrimination between RD curves of the proposed scheme and those of original JMVC 6.0.3 for five test videos. However, the proposed scheme outperforms Peng et al. [14] for Lovebird1, Kendo, and Undo_dancer for QP 22 and QP 27. The reasons are stated as follows. Lovebird1 has higher percentage of color MBs with large M and thus distortions of higher percentage of depth MBs are reduced by the second stage of the proposed scheme. Lovebird1 also has high percentage of depth MBs with small variances and thus Peng s scheme degrades more PSNR performance. Undo_dancer has medium percentage of color MBs with large M and thus the percentage of depth MBs that is reduced distortions by the proposed scheme is not high. However, Undo_dancer has high percentage of depth MBs with small variances and Peng s scheme degrades more PSNR performance. Thus the proposed scheme wors better on Undo_dancer. Although Kendo has lower percentage of depth MBs with small variances and Peng s scheme degrades the PSNR performance of Kendo slightly, the first stage of the proposed scheme also does not degrade the PSNR performance of Kendo as much as the other test videos. This is because that Kendo has high motion and it is coded by low percentage of direct modes. Moreover, Kendo has low percentage of MBs with large M and thus low percentage of depth MBs is reduced distortions by the second stage of the proposed scheme. As the result, the proposed scheme outperforms Peng s scheme on Kendo. Finally, the performance of the proposed scheme is similar to that of Peng s scheme for larger QPs. In fact, lower percentage of MBs can be reduced distortions by the second stage of the proposed scheme since lower percentage of MBs has large M in the reconstructed color video. The percentage of color and depth MBs having the direct mode as the optimal mode is usually much higher for larger QPs. And thus the first stage of the proposed scheme leads to more PSNR performance degradation. Finally, comparisons of number of arithmetic operations of mode decision in one MB among different distortion measures are provided in Table 8. Let N and n denote the number of pixels in one MB and the number of candidates for mode decision of depth video coding, respectively, where n is much smaller than N for MVC. P is the probability that M is not less than T 2. The number of operations of each distortion measure is stated as follows; (a) For traditional SSD, each pixel needs one addition and one multiplication to compute the square of pixel-wise coding distortion. Summation of N pixel-wise coding distortions within one MB needs N 1 additions. This SSD has to be computed for each candidate of one MB; (b) For mode decision using (4) by Tech et al. [19], SSD synthesis is computed for each candidate of one MB. Each pixel needs three additions to get its corresponding M, one addition for the pixel-wise distortion of depth video coding, and two multiplications where one is for square. Summation of N pixel-wise coding distortions within one MB needs N 1 addition. Three extra multiplications includes 2 /4 that is invariant with mode; (c) For our proposed approximation of Oh et al. [20], each MB

FAST MODE DECISION FOR DEPTH VIDEO CODING 1707 42.5 42 Origianl JMVC6.0.3 Proposed Scheme Peng's Scheme 42 41 Origianl JMVC6.0.3 Proposed Scheme Peng's Scheme Y-PSNR(virtual view)(db) 41.5 41 40.5 40 Y-PSNR(virtual view)(db) 40 39 38 39.5 37 Y-PSNR(virtual view)(db) 39 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 total bitrate(3 Color+3 Depth)(b/s) 41 40 39 38 37 36 Origianl JMVC6.0.3 Proposed Scheme Peng's Scheme (a) 36 1000 2000 3000 4000 5000 6000 7000 8000 9000 total bitrate(3 Color+3 Depth)(b/s) Y-PSNR(virtual view)(db) 42 41 40 39 38 37 Origianl JMVC6.0.3 Proposed Scheme Peng's Scheme (b) 35 36 34 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 total bitrate(3 Color+3 Depth)(b/s) (c) 41 40 39 Origianl JMVC6.0.3 Proposed Scheme Peng's Scheme 35 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 total bitrate(3 Color+3 Depth)(b/s) x 10 4 (d) Y-PSNR(virtual view)(db) 38 37 36 35 34 33 32 31 0 0.5 1 1.5 2 2.5 total bitrate(3 Color+3 Depth)(b/s) x 10 4 (e) Fig. 3. Comparisons of RD performance among three different depth video coding schemes: original JMVC 6.0.3, proposed scheme, and Peng et al. [14]; (a) Lovebird1; (b) Newspaper; (c) Poznan_street; (d) Kendo; (e) Undo_dancer. needs 3N additions (pixel-wise difference of intensity in the color video) and N 1 additions (summation) for M in (6). The benefit of (6) is that no matter how many modes have to be tested for mode decision, it is only computed once. Then, each mode needs N 1 dditions to get D mean and six multiplications to compute (7); (d) For the proposed

1708 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG scheme using (8) that combines both traditional SSD and (7), the number of additions and that of multiplications have to be decided using the probability, P. For each MB, the value of M is computed once only where 3N + ( N 1) additions and N multiplications are needed. If M is greater than or equal to T 2, D 2 mean corresponding to each mode will be calculated using N 1 additions and two multiplications. Otherwise, the number of operations is equal to that of traditional SSD. Since both the number of operations of traditional SSD and that of (7) are significantly smaller than that of Oh et al. [20], the number of operations of the proposed scheme using (8) is smaller than that of Oh et al. [20]. Table 8. The number of additions and that of multiplications in the mode decision for one MB of different distortion measures. Method Number of dditions Number of Multiplications SSD (N+(N1)) n N n Oh et al. [20] (4N +(N1)) n (2N+3) n Approximation of Oh et al. [20] (N1) n+(3n+(n1)) 6n+N Proposed scheme (3N+(N1))+(N1) n P + (N+(N1)) n (1P) N+2n P+ N n (1P) 5. CONCLUSIONS This paper proposes a fast mode decision algorithm of depth video coding using MVC. This scheme speeds up mode decisions and retains the quality of synthesized views simultaneously. Based on probabilistic analyses, information of color and depth videos is exploited to reduce the candidates for mode decisions. For MBs with high complexity of vertical texture in the reconstructed color video, the second stage reduces distortions of synthesized views by maing virtual-view distortion directed mode decisions. The proposed approximation of a previous distortion function speeds up the computation of virtual-view distortions. Experimental results show that the proposed scheme achieves significant time reduction with negligible degradation of RD performance. REFERENCES 1. Y. S. Ho and K. J. Oh, Overview of multi-view video coding, in Proceedings of IEEE International Worshop on Systems, Signals and Image Processing, 2007, pp. 5-12. 2. Y. Morvan, D. Farin, and P. H. N. de With, System architecture for free-viewpoint video and 3D-TV, IEEE Transactions on Consumer Electronics, Vol. 54, 2008, pp. 925-932. 3. P. Merle, K. Muller, and T. Wiegand, 3D video: acquisition, coding, and display, IEEE Transactions on Consumer Electronics, Vol. 56, 2010, pp. 946-950. 4. ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JMVC 1.0 software, JVT- AA212, 2008. 5. A. Smolic, K. Muller, P. Merle, N. Atzpadin, C. Fehn, M. Muller, O. Schreer, R. Tanger, P. Kauff, and T. Wiegand, Multi-view video plus depth (MVD) format for

FAST MODE DECISION FOR DEPTH VIDEO CODING 1709 advanced 3D video systems, Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, JVT-W100, 2007. 6. Y. Chen, M. M. Hannusela, T. Suzui, and S. Hattori, Overview of the MVC+D 3D video coding standard, Journal of Visual Communication and Image Representation, Vol. 25, 2014. pp. 679-688. 7. L. Luo, R. X. Jing, X. Tian, and Y. W. Chen, Rate-distortion based reference viewpoints selection for multi-view video plus depth, IEEE Transactions on Consumer Electronics, Vol. 59, 2013, pp. 657-665. 8. ISO/IEC JTC1/SC29/WG11, 3D Video standardization plans, Doc. N12557, 2012. 9. Call for proposals on 3D video coding technology, ISO/IEC JTC1/SC29/WG11, N12036, 2011. 10. Q. Zhang, P. An, Y. Zhang, L. Shen, and Z. Zhang, Low complexity multiview video plus depth coding, IEEE Transactions on Consumer Electronics, Vol. 57, 2011, pp. 1857-1865. 11. G. Cernigliaro, M. Naccari, F. Jaureguizar, J. Cabrera, E. Pereira, and N. Garcia, A new fast motion estimation and mode decision algorithm for H.264 depth maps encoding in free viewpoint TV, in Proceedings of IEEE International Conference on Image Processing, 2011, pp. 1013-1016. 12. M. Wang, C. Liu, T. Zhang, X. Jin, and S. Goto, Region of interest oriented fast mode decision for depth map coding in DIBR, in Proceedings of IEEE International Colloquium on Signal Processing and its Applications, 2011, pp. 177-180. 13. F. Shao, M. Yu, G. Jiang, F. Li, and Z. Peng, Depth map compression and depthaided view rendering for a three-dimensional video system, IET Signal Processing, Vol. 6, 2012, pp. 247-254. 14. Z. Peng, M. Yu, G. Jiang, Y. Si, and F. Chen, Virtual view synthesis oriented fast depth video encoding algorithm, in Proceedings of IEEE International Conference on Industrial and Information System, 2010, pp. 204-207. 15. D.-H. Yoon and Y.-S. Ho, Fast depth video coding method using adaptive edge classification, in Proceedings of IEEE Conference on Visual Communications and Image Processing, 2011, pp. 1-4. 16. W. Hu, O. C. Au, S. Lin, W. X. Sun, L. F. Xu, and Y. J. Li, Adaptive depth map filter for blocing artifact removal and edge preserving, in Proceedings of IEEE International Symposium on Circuits and Systems, 2012, pp. 369-372. 17. D. D. Silva, W. Fernando, H. Kodiaraarachchi, S. Worrall, and A. Kondoz, A depth map post-processing framewor for 3D-TV systems based on compression artifact analysis, IEEE Journal of Selected Topics on Signal Processing, 2011, to appear. 18. W.-S. Kim, A. Ortega, P. Lai, T. Dong, and C. Gomila, Depth map distortion analysis for view rendering and depth coding, in Proceedings of IEEE International Conference on Image Processing, 2009, pp. 721-724. 19. G. Tech, H. Schwarz, K. Muller, and T. Wiegand, 3D video coding using the synthesized view distortion change, in Proceedings of Picture Coding Symposium, 2012, pp. 25-28. 20. B. T. Oh, J. Lee, and D.-S. Par, Depth map coding based on synthesized view distortion function, IEEE Journal of Selected Topics on Signal Processing, Vol. 5, 2011, pp. 1344-1352.

1710 CHIH-HUNG LU, HAN-HSUAN LIN AND CHIH-WEI TANG 21. T. Wiegand, H. Schwtarz, A. Joch, F. Kossentini, and G. J. Sullivan, Rate-constrained coder control and comparison of video coding standards, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, 2003, pp. 688-703. 22. H. Yuan, Y. Chang, J. Huo, F. Yang, and Z. Lu, Model-based joint bit allocation between texture videos and depth maps for 3-D video coding, IEEE Transactions on Circuits and Systems or Video Technology, Vol. 21, 2011, pp. 485-497. 23. Y. P. Su, A. Vetro, and A. Smolic, Common test conditions for multiview video coding, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, JVT-T207, 2006. 24. M. Tanimoto, T. Fuji, and K. Suzui, View synthesis algorithm in view synthesis reference software 2.0 (VSRS2.0), ISO/IEC JTC1/SC29/WG11, M16090, 2009. 25. G. Bjontegaard, Calculation of average PSNR difference between RD-curves, ITU-T Q6/SG16, Doc. VCEG-M33, 2001. Chih-Hung Lu () received his M.S. degree in Communication Engineering from National Central University, Jhongli, Taiwan, in July 2012. He is a Software Engineer with the Advanced Safety Vehicle Project in Chung-Shan Institute of Science and Technology, Taoyuan, Taiwan from October 2012. His research interests include image and video processing, video coding, and the vision systems for the intelligent vehicle system. Han-Hsuan Lin () received her M.S. degree in Communication Engineering from National Central University, Jhongli, Taiwan, in July 2014. Her research interest is 3D video coding. She is an Engineer with the software department in Coretronic Corporation, Hsinchu, Taiwan from November 2014. Chih-Wei Tang () received her B.S. and M.S. degrees in Computer Science and Information Engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 1995 and 1997, respectively. She received the Ph.D. degree in Electronics Engineering from National Chiao-Tung University, Hsinchu, Taiwan, in 2004. From 2005 to 2006, she was a Researcher with Intelligent Robotics Technology Division, Mechanical and System Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan. Since 2006, she joined the Department of Communication Engineering, National Central University, Jhongli, Taiwan, where she is currently an Assistant Professor. Her research interests include video coding and image processing.