978 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015

Size: px

Start display at page:

Download "978 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015"

Vivien Lane
6 years ago
Views:

1 978 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 Fast Mode Decision Using Inter-View and Inter-Component Correlations for Multiview Depth Video Coding Jianjun Lei, Member, IEEE, Jing Sun, Zhaoqing Pan, Member, IEEE, Sam Kwong, Fellow, IEEE, Jinhui Duan, and Chunping Hou Abstract With the development of three-dimensional (3-D) display technologies, 3-D video has attracted more and more interest. Multiview video plus depth (MVD) is one of the most popular representation formats of 3-D video. In MVD coding system, multiview depth video needs to be coded and transmitted in addition to the texture video. This paper presents a novel fast mode decision (FMD) method for odd views in multiview depth video coding. First, the inter-view and inter-component coding correlations are analyzed to provide efficient reference information. Then, with a view to the characteristics of different types of frames, different early termination strategies are proposed. For the nonanchor frame, the early termination criterion is based on the rate-distortion cost information of the even views and the coded block pattern information. For the anchor frame, the criterion is set stricter to maintain the coding accuracy. Experimental results show that the proposed method can reduce 78.07% coding time on average, without significant loss of video quality. Index Terms Fast mode decision (FMD), multiview depth video coding, three-dimensional (3-D) video. I. INTRODUCTION T HREE-DIMENSIONAL (3-D) video is now becoming more and more popular due to its improved visual experience with depth perception [1] [4]. It can be used in various multimedia applications, such as free-viewpoint television (FTV), 3-D television (3-DTV) broadcasting, handheld game consoles, immersive teleconference, intelligent surveillance, and so on [5] [9]. Multiview video plus depth (MVD) is one of the main 3-D video representation formats for 3-D applications [10], Manuscript received October 10, 2014; revised February 25, 2015 and April 29, 2015; accepted June 09, Date of publication June 16, 2015; date of current version July 31, This work was supported in part by the Natural Science Foundation of China under Grant , Grant , Grant , and Grant , and in part by the Natural Science Foundation of Tianjin under Grant 12JCYBJC Paper no. TII J. Lei, J. Sun, J. Duan, and C. Hou are with the School of Electronic Information Engineering, Tianjin University, Tianjin , China ( jjlei@tju.edu.cn). Z. Pan is with the Jiangsu Engineering Center of Network Monitoring, School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing , China, and also with the Department of Computer Science, City University of Hong Kong, Kowloon , Hong Kong ( zqpan3-c@my.cityu.edu.hk). S. Kwong is with the Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong, and also with the City University of Hong Kong Shenzhen Research Institute, Shenzhen 51800, China ( cssamk@cityu.edu.hk). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TII [11]. It consists of multiview texture videos and corresponding depth videos. Multiview video data are acquired by a set of cameras capturing synchronously the same scene from different viewpoints [12] [14]. The geometrical information of the corresponding texture videos is provided by per-pixel depth videos [15]. With the increased number of cameras, the tremendous video data significantly increase the bandwidth requirement. In addition to the traditional texture video coding, the depth videos make the storage and encoding burden heavier [16]. The straightest approach to encode MVD data is to use two traditional H.264 multiview video coding (MVC) encoders. The MVC is developed as an extension of H.264/Advanced Video Coding (AVC) standard to remove the spatial, temporal, and inter-view redundancies by using motion estimation (ME) and disparity estimation (DE) [17]. The variable block-size ME and DE techniques make the MVC achieve an excellent coding efficiency. However, with the use of advanced coding tools, the computational complexity of the MVC increases dramatically, which limits the MVC encoder to be used in real-time applications. Currently, a large amount of fast coding methods have been studied in video coding. Basing on the encoding process, those methods mostly follow two main trends, including fast ME/DE and fast mode decision (FMD). The common fast ME/DE methods mainly design different search patterns, such as three-step search (TSS) [18], four-step search (FSS) [19], diamond search [20], iterative search [21], and so on. On the other side, the FMD algorithms focus on the number of candidate modes, because the original exhaustive mode decision process checks all the possible modes and then chooses the one with the smallest cost as the best mode. Wang and Yang proposed a fast mode selection algorithm based on the analyses of block details and texture direction, in which the video images were first divided into flat background region, complex background region, and foreground region. The direction features of the objects in the images were also taken into consideration. Then, different sets of effective prediction modes were chosen to get the best mode [22]. Lu and Chen proposed a FMD algorithm, called the dcbased FMD (dc-fmd) [23]. First, a self-defined parameter was calculated by using the dc coefficient of the SKIP mode and the rate-distortion (RD) cost of Inter mode. Then, according to the parameter, they reduced the modes to be checked. To further exploit the inter-view correlation, Kuo et al. developed a FMD method for the nonanchor frame (NAF). By comparing the RD costs and motion/disparity vectors of Inter IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 LEI et al.: FMD USING INTER-VIEW AND INTER-COMPONENT CORRELATIONS 979 mode in temporal domain and view domain, they determined whether the macroblock (MB) is suitable for encoding in SKIP mode [24]. Yu et al. used the inter-view correlation in training the classification and regression tree (CART), and the time saving was achieved by using the CART classifier to predecide whether the best mode is the SKIP mode [25]. Han and Lee divided all the possible modes into background mode and object mode. The region partition and the mode information of the MB obtained by the global disparity vector (GDV) was used to determine mode of the current coding MB and achieve time saving [26]. Because the SKIP mode is the most frequently encountered optimal mode in H.264, Zeng et al. proposed a mode correlation-based early termination (MET) method. The RD cost of the SKIP mode in NAF was compared to a threshold, which is calculated from anchor frame (AF). If the RD cost was smaller than the threshold, SKIP mode would be set as the best mode immediately; otherwise, exhaustive mode decision would be used [27]. Zeng et al. further consummated the method by introducing city-block distance of predicted motion vector (PMV) [28]. For NAF, if the early termination criterion was not satisfied, the PMV would be used to determine the motionactivity class of the current MB. According to the classification result, different candidate mode sets would be chosen. The additional multiview depth videos in the MVD have their own characteristics, the FMD methods should be studied specially for the depth video coding. The coding methods for multiview depth video coding can be categorized into two categories: 1) independent coding; and 2) joint coding. The independent depth video coding techniques encode the depth videos only using the characteristics of depth data [29], [30]. The joint coding methods additionally take the inter-component correlation into consideration. The mode correlation between color videos and depth maps has been investigated in [31], and the mode complexity of an MB in the depth map was analyzed based on the mode information of the corresponding MBs from the coded color video. If the mode complexity is simple, the mode checking will be terminated after checking SKIP mode and Inter mode. In [32], Pan et al. exploited the motion vector information of the texture video and the coded block pattern (CBP) to figure out whether to use the same mode as the texture video. For multiview depth video coding in MVD, reference information comes from two aspects, including inter-view correlation and inter-component correlation. Both of them can provide reference information for the mode decision of the current encoding MB. In this paper, we propose a novel FMD algorithm using inter-view and inter-component correlations for multiview depth video coding. The main contributions of this paper can be summarized as follows. 1) We combine the correlation of inter-view and intercomponent directions to get a predict mode information, which is more close to the best one. 2) We adopt different mode decision methods for AFs and NAFs based on their different statistical characteristics, which can achieve a good time saving on the basis of quality assurance. 3) The early termination criterion is decided by statistical analyses of the RD cost under DIRECT mode of two inter-view depth videos, and is combined with CBP. Fig. 1. Variable block partitions for MVC and the corresponding inter modes. Experimental results demonstrate the effectiveness of our proposed method. This paper is organized as follows. Section II introduces the basic points of mode decision in MVD, as well as the observations and statistical analyses. Section III presents the details of the proposed FMD method. Experimental results and conclusion are given in Sections IV and V, respectively. II. MOTIVATION AND STATISTICAL ANALYSES MVC removes the spatial and temporal redundancies by adopting variable block-size ME and DE. In joint multiview video coding (JMVC) reference software, there are DIRECT/SKIP mode, Inter mode, Inter 16 8 mode, Inter 8 16 mode, Inter 8 8 mode, and some other intra modes for inter-frame MBs. The SKIP mode in the P frames does not need to perform ME, and the motion information is predicted from its neighboring MBs in the MB reconstruction process. The DIRECT mode in the B frames is similar to the SKIP mode. However, the prediction residual data are transmitted. In the proposed method in this paper, they are collectively called DIRECT mode. The Inter 8 8 mode also includes Inter 8 4 mode, Inter 4 8 mode, and Inter 4 4 mode, as shown in Fig. 1. To facilitate understanding, the summary of abbreviations is presented in Table I. The best mode of one encoded MB is often chosen according to the minimization of the Lagrangian RD optimization function [33] as m =argmind(m)+λ MODE R(m) (1) m M where M is the candidate mode set; D means the sum of squared differences between the original MB and its reconstructed MB, which is obtained by coding the original MB with the candidate mode m; λ MODE is the Lagrangian multiplier for the mode decision; R(m) means the number of bits needed for encoding the MB with the candidate mode m. The hierarchical B picture (HBP) is used as the prediction structure in JMVC [34]. There are three types of frames within one group of pictures (GOPs) of the HBP structure, namely intra-coded frame (I-frame), predicted frame (P-frame), and bidirectional predicted frame (B-frame). I-frames are all intracoded. P-frames are either intra-coded or inter-coded using previous I-/P-frames as references for motion-compensated

980 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 TABLE I SUMMARY OF ABBREVIATIONS Fig. 2. Coding structure of an MVD video coding. TABLE II TEST CONDITIONS prediction.

3 980 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 TABLE I SUMMARY OF ABBREVIATIONS Fig. 2. Coding structure of an MVD video coding. TABLE II TEST CONDITIONS prediction. B-frames are coded using the bipredictive slice syntax of H.264/MPEG4-AVC [35]. The first frame of each GOP is denoted as the AF, the rest frames of GOP are named as NAFs. Multiview depth video is the depth information of its associated texture video. Hence, the mode selections of the depth video and its associated texture video may be similar, since they have some common object structures. Considering the above aspects, the simplified joint MVD coding structure in [32] is adopted in our study. As shown in Fig. 2, when coding the View 1 in depth video, the coding information of View 1 in texture video and View 2 in depth video is available. The more accurate reference information can be provided by taking both of them as references. Before using the reference information, it is suitable to test and verify the validity of reference information. We define two parameters to represent the validity as P M1 = N((M D1 = M D2 )&&(M D1 = M T 1 )) N(M D2 = M T 1 ) P M2 = N(M D2 = M T 1 ) N(all MBs) where M D1 is the mode of the MB in depth View 1, M D2 is the mode of the MB in depth View 2, and the M T1 is the mode of thembintextureview1.n ( ) represents the number of MBs, which corresponds to the event. P M1 represents the percentage of all of the three involving MBs (D 1, D 2, and T 1 ) that have the same mode on the premise of the two reference MBs (D 2 and T 1 ). P M2 represents the proportion of the MBs whose two reference MBs have the same mode in all encoding MBs mode. The testing conditions and the results are shown in Tables II and III. From Table III, it can be seen that NAFs have far more mode similarity than AFs. (2) (3) TABLE III VALIDITY OF REFERENCE INFORMATION (%) P M1 for NAFs can achieve up to 99.73%, and 98.34% in average. In contrast, P M1 for AFs is 60.76% on average. Especially in the Newspaper, it is only 16.62%. P M2 also appears with a high percentage in NAFs and low in AFs. Hence, different mode decision strategies should be designed for them. It is well known that the DIRECT mode is the simplest mode in mode decision process, and it is dominant in the mode distribution as shown in Table IV. In NAFs, about 96% MBs are encoded with the DIRECT mode, and 2.65% MBs are encoded with Inter mode. In AFs, the percentage is 46.07% and 15.06% for the DIRECT mode and Inter mode, respectively. That is to say, over 98% MBs in NAFs are encoded with the MB partition, and 61% in AFs. In RD-based mode decision process, the mode with the minimum RD cost is selected as the best mode. Here, the RD cost under the DIRECT mode is analyzed. First, we export the RD cost after testing the DIRECT mode of every MB. Then, the costs in each frame are sorted and figured out. Fig. 3 shows the RD costs of every MB

4 LEI et al.: FMD USING INTER-VIEW AND INTER-COMPONENT CORRELATIONS 981 TABLE IV MODE DISTRIBUTION (%) in one NAF frame of depth View 1. Fig. 4 shows the similar performance of the corresponding frame in depth View 2. From Figs. 3 and 4, we can easily find that a majority of the MBs have very low RD costs under the DIRECT mode like the black parts, and only a few MBs have large RD costs like the yellow parts. Moreover, among the MBs whose best mode is DIRECT mode, about 99% in average have a RD cost in the black part. The separatrix of the black part on Y -axis (RD cost) is about 0.08 times of the maximum. At the same time, Views 1 and 2 have a very similar cost level. The separatrix calculated from depth View 2 can be used in depth View 1 to decide whether the RD cost of the current MB is in the black part or not. III. PROPOSED FMD METHOD A. Mode Decision for NAFs It can be observed from Table III that when the collocated MBs of the current depth MB in depth View 2 and texture View 1 are encoded in the same mode, there is a large probability that the current depth MB selects the same mode with its collocated MBs. In this paper, this kind of mode is named as base mode (BM), which can be expressed as { MD2, if (M BM = D2 = M T 1 ) (4) none, otherwise where M D2 is the mode of the MB in depth View 2, and M T 1 is the mode of the MB in texture View 1. In general, if one depth MB has a BM, it means that this depth MB may have simple content or uniform motion. Thus, an early termination method based on the characteristics of the RD cost is proposed for the MBs, which have BM. For the MBs without BM, which are often related to the moving object, the original mode decision method is adopted. For MBs whose BMs are DIRECT mode, we test the probability of taking DIRECT mode as their best mode on the premise of their RD cost of DIRECT mode being smaller than the threshold TH in the following equation: TH = α RD max2 (5) where RD max2 represents the max RD cost in depth View 2, and α is a scaling parameter. Based on our extensive experiments, the parameter α is set as The RD costs of the early determined MBs in one frame are figured out in Fig. 5. The yellow ones best modes in original JMVC are DIRECT mode, and the red ones are not. The percentages of the yellow MBs in all MBs in the frames are 99.8% (Doorflowers), 99.93% (Newspaper), 99.94% (Lovebird1), and 99.84% (Poznan_Street), respectively. Fig. 6 shows the MBs in red of Fig. 5, whose best modes are not DIRECT mode. The white MBs in Fig. 6 represent the red MBs in Fig. 5. It can be seen from the figures that most of the MBs can be correctly determined. To further improve the accuracy of the method, the CBP is combined with the RD cost. The CBP is a syntax element in each encoded MB header. If the CBP value equals zero, it means that the MB is well predicted by the current mode [36]. So, the early termination criterion for MBs whose BM is DIRECT mode can be expressed as RDCost DIRECT < TH&CBP DIRECT =0 (6) where the RDCost DIRECT represents the RD cost value of the DIRECT mode; the TH is a threshold, which is defined in (5); and the CBP DIRECT denotes the CBP value of the DIRECT mode. For MBs whose BM is Inter mode, the DIRECT mode, and Inter mode will be encoded first. If the CBP of the Inter mode is zero, the other inter modes will be skipped. In order to evaluate the efficiency of the proposed early mode decision criterion, hit rate (HR) and determination rate (DR) are adopted [37], which are defined as follows: { HR (X Y )=N(X Y)/N (Y ) (7) DR(Y X) =N(Y X)/N (X) where HR(X Y ) and DR(Y X) represent the HR and DR, respectively. N( ) represents the number of total MBs of the corresponding event, the event X represents the best mode of the current MB that is correctly selected, and the event Y denotes the early mode decision condition. The X Y and Y X are two conditional events. If the DR is large, it means that more computational complexity could be reduced. If the HR is large and close to 100%, it denotes that the best mode is correctly predicted and almost no RD performance degradation would be caused. The HR and DR of the criterion in NAFs are listed in Table V. From the table, we can observe that the HR of the proposed early mode decision condition is from 98.97% to 100%, which means most of MBs are correctly encoded. The DR of the proposed condition is from 90.87% to 98.77%, 96.2% on average. These values demonstrate that the proposed early mode decision can work efficiently. B. Mode Decision for AFs From Table III, it can be seen that the P M1 and P M2 are very low in AFs. The reference information cannot well predict the best mode of the current depth MB. However, the coding quality of the AFs highly concerns the coding performance of the NAFs. Hence, the FMD method for the AFs should be strict. In

982 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 Fig. 3. RD costs of all MBs under the DIRECT mode in one frame of depth View 1. (a) Doorflowers. (b) Newspaper.

RD costs of the early determined MBs. (a) Doorflowers. (b) Newspaper. (c) Lovebird1. (d) Poznan_Street.

5 982 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 Fig. 3. RD costs of all MBs under the DIRECT mode in one frame of depth View 1. (a) Doorflowers. (b) Newspaper. (c) Lovebird1. (d) Poznan_Street. Fig. 4. RD costs of all MBs under the DIRECT mode in one frame of depth View 2. (a) Doorflowers. (b) Newspaper. (c) Lovebird1. (d) Poznan_Street. Fig. 5. RD costs of the early determined MBs. (a) Doorflowers. (b) Newspaper. (c) Lovebird1. (d) Poznan_Street. Fig. 6. MBs in red of Fig. 5, whose best modes are not DIRECT mode. (a) Doorflowers. (b) Newspaper. (c) Lovebird1. (d) Poznan_Street. order to utilize the inter-view and inter-component correlations, a region of support (ROS) is defined for the current depth MB, as shown in Fig. 7, and the ROS set is defined as ROS = {t i,i=1, 2,...,9; d j,j =1, 2,...,9} (8) where t i means the texture MBs in View 1, d j denotes the depth MBs in View 2. If all MBs in ROS are encoded in the DIRECT mode, the current depth MB has a high probability to be encoded in the DIRECT mode. Therefore, the best mode Md of the current depth MB can be determined as { ( 8 DIRECT, if Md = i=0 Ft i + ) 8 j=0 Fd j Non-DIRECT, otherwise =0 where Ft i is the DIRECT flag of the corresponding MB in texture video and Fd j is the DIRECT flag of the corresponding MB in depth video. If the corresponding MB is encoded in the DIRECT mode, the flag is equal to 0; otherwise, it is equal to 1. The HR and DR of the FMD method for AFs are listed in (9)

6 LEI et al.: FMD USING INTER-VIEW AND INTER-COMPONENT CORRELATIONS 983 TABLE V HR AND DR OF THE PROPOSED EARLY TERMINATION FOR NAF (%) that the FMD method for AFs can maintain the mode decision accuracy and have a proper time saving. C. Overall Algorithm Based on the above analyses, the proposed FMD algorithm is summarized in Algorithm 1. Fig. 7. Example of ROS. TABLE VI HR AND DR OF THE MODE DECISION FOR AF (%) Algorithm 1. Proposed fast mode decision algorithm 1: Start: calculate the BM. 2: if BM=DIRECT 3: Test DIRECT mode 4: if RDCost DIRECT <TH& CBP DIRECT =0 5: Go to Step 24 6: else 7: test Inter mode 8: if CBP = 0 9: Go to Step 24 10: else 11: test all the other inter modes 12: Go to Step 24 13: else if BM = Inter : Test DIRECT mode 15: Test mode 16: if CBP = 0 17: Go to Step 24 18: else 19: test all the other inter modes 20: Go to Step 24 21: else 22: test all inter modes 23: Go to Step 24 24: Test all intra modes and get the best mode with minimal RD cost 25: Process the next MB. IV. EXPERIMENTAL RESULTS To evaluate the efficiency of the proposed algorithm, the MVC reference software JMVC8.5 is adopted as the software platform. The test conditions are listed in Table II. The hardware platform is Intel(R) Core(TM) i CPU at 2.90 GHz, 3.10 GHz, 4.00 GB RAM with Microsoft Windows 7 32-bit operating system. Six test sequences, including Doorflowers, Newspaper, Lovebird1, GT_Fly, Poznan_Street, and Champagne_Tower [38] [41], are tested and compared in terms of peak signal-to-noise ratio (PSNR), bit rate (BR), Bjontegaard delta PSNR (BDPSNR), Bjontegaard delta BR (BDBR) [42], and total encoding time. The experimental results are summarized in Table VII. In this table, ΔPSNR, ΔBR, and ΔT are computed as ΔPSNR = PSNR p PSNR 0 (10) Table VI. From this table, it can be seen that the HR of the proposed criterion in (9) is 100%, which means that all MBs are correctly encoded. The DR of the proposed condition is from 0% to 38.97%, 18.22% on average. The results demonstrate ΔBR = BR p BR o BR o 100% (11) ΔT = T p T o T o 100% (12)

7 984 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 TABLE VII PERFORMANCE COMPARISON OF DIFFERENT METHODS Fig. 8. RD curves of different methods. (a) Doorflowers. (b) Newspaper. (c) Lovebird1. (d) GT_Fly. (e) Poznan_Street. (f) Champagne_Tower.

8 LEI et al.: FMD USING INTER-VIEW AND INTER-COMPONENT CORRELATIONS 985 where the subscript o represents the original JMVC8.5; the p denotes the proposed method, Pan s method [32] and Yeh s method [43]. From Table VII, it can be seen that the Yeh s method can reduce the encoding time from 32.40% to 89.16%, 66.44% on average. Meanwhile, the PSNR ranges from to db, db on average; and the BR ranges from 4.07% to 2.22%, 0.59% on average. The average BDPSNR and BDBR between the original JMVC8.5 and the Yeh s method are db and 2.843%, respectively. The Pan s method can reduce the computational complexity from 64.01% to 79.82%, 73.30% on average; the PSNR ranges from to db, db on average; and the BR ranges from 1.33% to 5.05%, 0.91% on average. The average BDPSNR and BDBR between the original JMVC8.5 and the Pan s method are db and 1.407%, respectively. The proposed method can reduce the computational complexity from 70.57% to 83.37%, 78.07% on average. The PSNR ranges from to db, db on average. The BR ranges from 1.78% to 0.08%, 0.79% on average. The BDPSNR and BDBR between the original JMVC8.5 and the proposed method are db and 0.389%, respectively. From these results, we can conclude that the proposed method achieves a better computational complexity saving than the Yeh s and Pan s methods. At the same time, our PSNR decline is a little smaller than the Pan s and Yeh s; the BR increase is also smaller than them. That is to say, our algorithm is better on both video quality and encoding complexity. In order to show the RD performance intuitively, the RD curves of the six test sequences are shown in Fig. 8. It can be seen that the proposed method achieves a similar RD performance as the Yeh s and Pan s methods, which is very close to the original JMVC. It means that the computational redundancy of mode decision can be effectively removed by the proposed algorithm with nearly the same RD performance as original JMVC. V. CONCLUSION In this paper, a FMD method is proposed for reducing the computational complexity of multiview depth video coding. The proposed method incorporates the inter-view and intercomponent correlations, and analyzes the best mode similarity between the current coding MB and the reference MBs. With a view to the different similarity of different types of frames, different mode decision methods are designed. After analyzing the mode similarity of NAFs, a BM, which can predict the best mode to some extent, is defined. Then, an early termination criterion based on the RD cost and CBP is proposed for NAFs. The threshold of RD cost is calculated by statistical analysis of RD information between even and odd depth views. As for AFs, in order to maintain the coding accuracy, a strict early termination method is adopted. Experimental results show that the proposed FMD algorithm can significantly reduce the complexity of depth video coding in MVD, and maintain almost the same coding quality. The proposed method can also be combined with other fast ME and DE algorithms to reduce the overall complexity. REFERENCES [1] A. Vetro, T. Wiegand, and G. J. Sullivan, Overview of the stereo and multiview video coding extensions of the H. 264/MPEG-4 AVC standard, Proc. IEEE, vol. 99, no. 4, pp , Apr [2] H. Liu, S. Chen, and N. Kubota, Intelligent video systems and analytics: A survey, IEEE Trans. Ind. Informat., vol. 9, no. 3, pp , Aug [3] Y. Fang, J. Wang, M. Narwaria, P. L. Callet, and W. Lin, Saliency detection for stereoscopic images, IEEE Trans. Image Process., vol.23,no.6, pp , Jun [4] H. Liu, M. Yuan, F. Sun, and J. Zhang, Spatial neighborhood-constrained linear coding for visual object tracking, IEEE Trans. Ind. Informat., vol. 10, no. 1, pp , Feb [5] C. Zhu, Y. Zhao, L. Yu, and M. Tanimoto, Eds., 3D-TV System with Depth-Image-Based Rendering: Architecture, Techniques and Challenges. New York, NY, USA: Springer, [6] G. Wang, L. Tao, H. Di, X. Ye, and Y. Shi, A scalable distributed architecture for intelligent vision system, IEEE Trans. Ind. Informat., vol. 8, no. 1, pp , Feb [7] T. Zhang, S. Liu, C. Xu, and H. Lu, Mining semantic context information for intelligent video surveillance of traffic scenes, IEEE Trans. Ind. Informat., vol. 9, no. 1, pp , Feb [8] X. Bai, Y. Fang, W. Lin, L. Wang, and B.-F. Ju, Saliency-based defect detection in industrial images by using phase spectrum, IEEE Trans. Ind. Informat., vol. 10, no. 4, pp , Nov [9] D. Mukherjee, Q. M. J. Wu, and T. M. Nguyen, Gaussian mixture model with advanced distance measure based on support weights and histogram of gradients for background suppression, IEEE Trans. Ind. Informat., vol. 10, no. 2, pp , May [10] P. Merkle, A. Smolic, K. Müller, and T. Wiegand, Multi-view video plus depth representation and coding, in Proc. IEEE Int. Conf. Image Process., Oct. 2007, pp. I201 I204. [11] F. L. Lian, Y. C. Lin, C. T. Kuo, and J. H. Jean, Voting-based motion estimation for real-time video transmission in networked mobile camera systems, IEEE Trans. Ind. Informat., vol. 9, no. 1, pp , Feb [12] J. Lei, S. Li, C. Zhu, M.-T. Sun, and C. Hou, Depth coding based on depth-texture motion and structure similarities, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 4, pp , Feb [13] X. Zhou, Y. Li, B. He, and T. Bai, GM-PHD-based multi-target visual tracking using entropy distribution and game theory, IEEE Trans. Ind. Informat., vol. 10, no. 2, pp , May [14] F. Shao, W. Lin, G. Jiang, M. Yu, and Q. Dai, Depth map coding for view synthesis based on distortion analyses, IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 4, no. 1, pp , Mar [15] D. Kim, D. Min, and K. Sohn, A stereoscopic video generation method using stereoscopic display characterization and motion analysis, IEEE Trans. Broadcast., vol. 54, no. 2, pp , Jun [16] A. Smolic et al., Intermediate view interpolation based on multiview video plus depth for advanced 3D video systems, in Proc. 15th IEEE Int. Conf. Image Process., Oct. 2008, pp [17] H. Zeng, C. Cai, and K. Ma, Fast mode decision for H. 264/AVC based on macroblock motion activity, IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 4, pp , Apr [18] R. Li, B. Zeng, and M. L. Lious, A new three-step search algorithm for block motion estimation, IEEE Trans. Circuits Syst. Video Technol., vol. 4, no. 4, pp , Aug [19] L. M. Po and W. C. Ma, A novel four-step search algorithm for fast block motion estimation, IEEE Trans. Circuits Syst. Video Technol., vol. 6, no. 3, pp , Jun [20] S. Zhu and K. K. Ma, A new diamond search algorithm for fast blockmatching motion estimation, IEEE Trans. Image Process., vol. 9, no. 2, pp , Feb [21] Z. P. Deng, Y. L. Chan, K. B. Jia, C. H. Fu, and W. C. Siu, Fast motion and disparity estimation with adaptive search range adjustment in stereoscopic video coding, IEEE Trans. Broadcast., vol. 58, no. 1, pp , Mar [22] H. Wang and Y. Yang, Fast mode selection based on texture segmentation and view prediction in JMVC, in Proc. 14th IEEE Int. Conf. Commun. Technol., 2012, pp [23] G. Lu and L. Chen, Fast mode decision for H. 264 based on DC coefficient, in Proc. 7th IEEE Int. Conf. Inf. Technol., Apr. 2010, pp [24] T. Y. Kuo, Y. Y. Lai, and Y. C. Lo, Fast mode decision for non-anchor picture in multiview video coding, in Proc. IEEE Int. Symp. Broadband Multimedia Syst. Broadcast., 2010, pp. 1 5.

986 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 [25] T. Yu, Y. Zhang, and P. C. Cosman, Classification based fast mode decision for stereo video coding, in Proc.

Commun. Netw. Symp., Mar. 2003, pp. 209 213. [27] H. Zeng, K. Ma, and C. Cai, Mode-correlation-based early termination mode decision for multi-view video coding, in Proc. IEEE Int. Conf.

11, pp. 1659 1666, Nov. 2011. [29] B. B. Chai, S. Sethurarnan, and P. Hatrack, Mesh-based depth map compression and transmission for real-time view-based rendering, in Proc. IEEE Int. Conf.

2005, pp. V117 V120. [31] L. Shen, Z. Zhang, and Z. Liu, Inter mode selection for depth map coding in 3D video, IEEE Trans. Consum. Electron., vol. 58, no. 3, pp. 926 931, Aug. 2012. [32] Z. Pan, Y.

1007/s11554-013-0328-3. [33] Description of core experiments in MVC, ISO/IEC JTC1/SC29/WG11 MPEG2006/W8019, Montreux, Switzerland, Apr. 2006. [34] P. Merkle, A. Smolic, K. Mueller, and T.

Wiegand, Analysis of hierarchical B pictures and MCTF, in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2006, pp. 1929 1932. [36] B. Y. Chen and S. H. Yang, Using H.

9 986 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 11, NO. 4, AUGUST 2015 [25] T. Yu, Y. Zhang, and P. C. Cosman, Classification based fast mode decision for stereo video coding, in Proc. 20th IEEE Int. Conf. Image Process., Sep. 2013, pp [26] D. H. Han and Y. L. Lee, Fast mode decision using global disparity vector for multiview video coding, in Proc. 2nd IEEE Int. Conf. Future Gener. Commun. Netw. Symp., Mar. 2003, pp [27] H. Zeng, K. Ma, and C. Cai, Mode-correlation-based early termination mode decision for multi-view video coding, in Proc. IEEE Int. Conf. Image Process., 2010, pp [28] H. Zeng, K. Ma, and C. Cai, Fast mode decision for multiview video coding using mode correlation, IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 11, pp , Nov [29] B. B. Chai, S. Sethurarnan, and P. Hatrack, Mesh-based depth map compression and transmission for real-time view-based rendering, in Proc. IEEE Int. Conf. Image Process., 2002, pp [30] S. Y. Kim and Y. S. Ho, Mesh-based depth coding for 3D video using hierarchical decomposition of depth maps, in Proc. IEEE Int. Conf. Image Process., Mar. 2005, pp. V117 V120. [31] L. Shen, Z. Zhang, and Z. Liu, Inter mode selection for depth map coding in 3D video, IEEE Trans. Consum. Electron., vol. 58, no. 3, pp , Aug [32] Z. Pan, Y. Zhang, and S. Kwong, Fast mode decision based on texture depth correlation and motion prediction for multiview depth video coding, J. Real Time Image Process., pp. 1 10, 2013, doi: /s [33] Description of core experiments in MVC, ISO/IEC JTC1/SC29/WG11 MPEG2006/W8019, Montreux, Switzerland, Apr [34] P. Merkle, A. Smolic, K. Mueller, and T. Wiegand, Comparative study of MVC prediction structures, in Proc. JVT 22nd Meeting, Marrakech, Morocco, Doc. JVT-U091, Jan , [35] H. Schwarz, D. Marpe, and T. Wiegand, Analysis of hierarchical B pictures and MCTF, in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2006, pp [36] B. Y. Chen and S. H. Yang, Using H. 264 coded block patterns for fast inter-mode selection, in Proc. IEEE Int. Conf. Multimedia Expo, Apr. 2008, pp [37] Z. Pan, S. Kwong, M. T. Sun, and J. Lei, Early MERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC, IEEE Trans. Broadcast., vol. 60, no. 2, pp , Jun [38] I. Feldmann et al., HHI test material for 3D video, ISO/IEC JTC1/SC29/WG11, Doc. M15413, Archamps, France, Apr [39] Electronics, and Telecommunications Research Institute, and Gwangju Institute of Science, and Technology. (2008, Apr.). 3DV Sequences of ETRI and GIST [Online]. Available: ftp:// [40] M. Domański, Poznań multiview video test sequences and camera parameters, ISO/IEC JTC1/SC29/WG11, M17050, Xian, China, [41] M. Tanimoto, T. Fujii, and N. Fukushima, 1D parallel test sequences for MPEG-FTV, ISO/IEC JTC1/SC29/WG11, M15378, Archamps, France, [42] G. Bjontegaard, Calculation of average PSNR differences between RDcurves, in Proc. Doc. VCEG-M33, 13th Meeting, Austin, TX, USA, [43] C. H. Yeh, M. F. Li, M. J. Chen, M. C. Chi, X. X. Huang, and H. W. Chi, Fast mode decision algorithm through inter-view rate-distortion prediction for multiview video coding system, IEEE Trans. Ind. Informat., vol. 10, no. 1, pp , Feb Jing Sun received the B.S. degree in telecommunication engineering from Tianjin University, Tianjin, China, in 2012, and is currently pursuing the M.S. degree from the School of Electronic Information Engineering, Tianjin University. Her research interests include 3-D imaging, video processing, and multiview video coding. Zhaoqing Pan (S 09 M 15) received the B.S. degree in computer science and technology from Yancheng Normal University, Yancheng, China, and the Ph.D. degree in computer science from the City University of Hong Kong, Kowloon, Hong Kong, in 2009 and 2014, respectively. In 2013, he was a Visiting Scholar with the Department of Electrical Engineering, University of Washington, Seattle, WA, USA, for 6 months. Currently, he is a Professor with the School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, China. His research interests include video compression and 3-D video processing. Sam Kwong (M 93 SM 04 F 13) received the B.S. and M.S. degrees in electrical engineering from the State University of New York at Buffalo, Buffalo, NY, USA, and the University of Waterloo, Waterloo, ON, Canada, in 1983 and 1985, respectively, and the Ph.D. degree in electrical engineering from the University of Hagen, Hagen, Germany, in From 1985 to 1987, he was a Diagnostic Engineer with Control Data Canada, Mississauga, Canada. He joined Bell Northern Research Canada, as a Scientific Staff Member. In 1990, he became a Lecturer with the Department of Electronic Engineering, City University of Hong Kong, Kowloon, Hong Kong, where he is currently a Professor with the Department of Computer Science. His research interests are video and image coding, and evolutionary algorithms. Jinhui Duan received the B.S. degree in telecommunication engineering from Tianjin University, Tianjin, China, in Currently, she is pursuing the M.S. degree at the School of Electronic Information Engineering, Tianjin University. Her research interests include 3-D video processing and video coding. computer vision. Jianjun Lei (M 11) received the Ph.D. degree in signal and information processing from the Beijing University of Posts and Telecommunications, Beijing, China, in Currently, he is a Full Professor with the School of Electronic Information Engineering, Tianjin University, Tianjin, China. From August 2012 to August 2013, he was a Visiting Researcher with the Department of Electrical Engineering, University of Washington, Seattle, WA, USA. His research interests include 3-D video processing, 3-D display, and Chunping Hou received the M.Eng. and Ph.D. degrees in electronic engineering from Tianjin University, Tianjin, China, in 1986 and 1998, respectively. Since 1986, she has been with the Faculty of the School of Electronic and Information Engineering, Tianjin University, where she is currently a Full Professor and the Director of the Broadband Wireless Communications and 3-D Imaging Institute. Her research interests include 3-D image processing, 3-D display, wireless communication, and the design and applications of communication systems.

MULTIVIEW video is capable of providing viewers

MULTIVIEW video is capable of providing viewers IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 15 Efficient Multi-Reference Frame Selection Algorithm for Hierarchical B Pictures in Multiview Video Coding Yun Zhang, Sam Kwong, Senior Member,