MULTIVIEW video is capable of providing viewers

Size: px

Start display at page:

Download "MULTIVIEW video is capable of providing viewers"

Abel Owen
5 years ago
Views:

1 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH Efficient Multi-Reference Frame Selection Algorithm for Hierarchical B Pictures in Multiview Video Coding Yun Zhang, Sam Kwong, Senior Member, IEEE, Gangyi Jiang, and Hanli Wang, Member, IEEE Abstract Multiple reference frame prediction technology is adopted in Multiview Video Coding (MVC) to explore temporal and inter-view redundancies of multiview videos, resulting in extremely high encoding complexity by searching the best reference frame indices and the best reference directions for each macroblock (MB). In order to reduce MVC coding computations while keeping the coding efficiency and thus to advance MVC in real-time multimedia broadcasting applications, we propose a Fast Multi-reference Frame Selection Algorithm (FMFSA) for hierarchical B picture prediction structure in this paper. Due to high spatial correlations within a MB, there is a high probability for smaller MB partition modes selecting the same reference frame and direction as B16 16 does. Therefore, the reference information of latter checked MB partition modes can be directly set according to the reference information of previous examined mode. Experimental results on MVC show that the proposed FMFSA can achieve 68.34% 79.01% total encoding time reduction while the average bit rate increase and peak signal-to-noise ratio degradation are within 0.54% and 0.04 db, respectively for test multiview sequences with various motion properties and camera arrangements. Index Terms Hierarchical B picture, multi-reference frame prediction, multiview video coding. I. INTRODUCTION MULTIVIEW video is capable of providing viewers with a totally new stereoscopic vision and interactive viewing experience [1]. It would be useful for many new multimedia applications, such as Free-viewpoint TeleVision (FTV), Three Dimensional TeleVision (3DTV) broadcasting, Manuscript received March 22, 2010; revised August 24, 2010; accepted September 07, Date of publication November 11, 2010; date of current version February 23, This work was supported in part by Hong Kong RGC General Research Fund (GRF) Projects (CityU ) and in part by the Natural Science Foundation of China under Grants and Y. Zhang is with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen , China, and also with the Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong ( yunzhang@cityu.edu.hk). S. Kwong is with the Department of Computer Science, City University of Hong Kong, Hong Kong ( cssamk@cityu.edu.hk). G. Jiang is with the Faculty of Information Science and Engineering, Ningbo University, Ningbo , China ( jianggangyi@nbu.edu.cn). H. Wang is with the Department of Computer Science and Technology and the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University, Shanghai , China ( hanliwang@tongji. edu.cn). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TBC immersive teleconference, virtual reality and games. With the advances in the area of 3D display technology [2], image analysis and depth image based rendering [3], many difficulties that hampered a technical application of FTV or 3DTV so far have been overcome. However, since the data volume of multiview video sequences is proportional to the number of views, it requires huge storage space, wide transmission bandwidth and large computing power. Therefore, it is essential to develop Multiview Video Coding (MVC) algorithms with high compression efficiency and low complexity for real-time video applications, such as live 3D broadcasting, remote control and interactive video communication. Many attempts on MVC have been made to improve compression efficiency and lower complexity. Motion Picture Experts Group (MPEG) has surveyed some of MVC schemes, such as group-of-gop prediction, checkerboard decomposition, sequential view prediction and so on [4]. Merkle et al. proposed a MVC scheme using Hierarchical B Pictures (MVC-HBP) with superior compression efficiency and temporal scalability [5]. This MVC-HBP prediction structure has been adopted into MVC standardization draft by Joint Video Team (JVT), which was organized by ISO/IEC MPEG and ITU-T Video Coding Experts Group (VCEG), and used in reference software Joint Multiview Video Coding (JMVC). However, MVC-HBP is quite complex and it is necessary to develop fast algorithms to reduce its complexity for practical application [6] [8]. Peng et al. proposed a fast mode decision algorithm for MVC with dynamic early termination [6]. Li and Shen et al. proposed fast Motion Estimation (ME) and Disparity Estimation (DE) to reduce MVC s complexity [7], [8]. Furthermore, MVC schemes adopt Multi-Reference Frame (MRF) prediction technology to further explore both temporal and inter-view redundancies to improve coding efficiency. However, adopting MRF prediction technology increases computational complexity significantly when compared with that of single reference frame in video coding. Several methods have been proposed to reduce the complexity of MRF estimation for H.264/AVC [9] [11]. Su and Sun proposed a fast MRF algorithm by adopting continuity of motion vectors among different reference frames [9]. Huang et al. proposed a fast MRF algorithm by searching either the previous or every reference frame based on the result of ME from the previous frame [10]. Kuo and Lu reduced the number of reference frames based on the best reference frame selected by the B8 8 mode and the variance of their motion vectors [11]. However, these schemes are mainly proposed for P frames in mono-view H.264/AVC, so they can hardly /$ IEEE

16 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 Fig. 1. MVC-HBP prediction structure in JMVC. be applied to MVC using hierarchical B pictures.

Lin and Tang presented a fast decision algorithm to predict the direction of motion compensation prediction or disparity compensation prediction for MVC [12]. Zhu et al.

2 16 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 Fig. 1. MVC-HBP prediction structure in JMVC. be applied to MVC using hierarchical B pictures. In addition, both inter-view and temporal predictions are adopted in MVC, which has different statistical properties from the temporal prediction that is employed only in mono-view video coding. Lin and Tang presented a fast decision algorithm to predict the direction of motion compensation prediction or disparity compensation prediction for MVC [12]. Zhu et al. proposed a fast Inter mode decision scheme, in which inter-view prediction of other variable block size modes are reduced based on B16 16 reference information while encoding inter-view views [13]. However, multi-reference frame selection of temporal views was not optimized. Zhang et al. proposed adaptive reference selection that reduces coding complexity and improves random accessibility according to inter-view and temporal correlation of multiview video sequences [14]. However, the encoding time saving ratio is still limited and unstable for multiview videos with different spatio-temporal correlations. In this paper, we propose an efficient Fast Multi-reference Frame Selection Algorithm (FMFSA) for complexity reduction of MVC. The rest of this paper is organized as follows. The mechanism of multi-reference selection in MVC is reviewed in Section II. Then, the proposed FMFSA for MVC is presented in Section III. Experimental results and analyses are shown in Section IV. Finally, Section V concludes this paper. II. REVIEW ON MRF SELECTION FOR HIERARCHICAL B PICTURE IN MVC The MVC-HBP structure [5] is inter-view and temporal prediction hybrid and has been adopted into MVC standardization draft since it is with high coding efficiency by applying bi-directional prediction hierarchically, i.e. adopting hierarchical B pictures [15]. Fig. 1 shows an example of MVC-HBP prediction structure for an 8-view sequence when the Group-Of- Picture (GOP) length is 12. As we can see that frames in the GOP are coded with high complex hierarchical B pictures in order to achieve high compression efficiency, where stands for the floor operation. The illustration of variable block size mode decision and MRF selection in MVC-HBP is shown in Fig. 2. There are two different loop levels for encoding each macroblock (MB). One is the variable block size mode decision loop in which the best Fig. 2. Illustration of mode decision and MRF selection in MVC-HBP. mode is selected by checking mode candidates one by one. There are DIRECT mode, Inter-MB modes (B16 16, B16 8, B8 16, B8 8Frext and B8 8) and intra-mb modes (I4MB, I8MB, I16MB and PCM) for MB in B slices. Each 8 8 block of the B8 8 mode can be further sub-partitioned into smaller blocks and its prediction modes include SubDIRECT, SubB8 4, SubB4 8 and SubB4 4. The sub-partitioned blocks within a B8 8 block have the same reference frame. The other is the MRF loop that selects the best reference frame by checking each active reference frame and direction for each variable block size mode. While encoding one MB, the MRF selection is performed for each Inter mode. As for the MRF selection loop, there are another two inner loop levels. The first loop is the reference direction loop over List0 (forward), List1 (backward) and bi iterative direction. List0 and List1 are the memory lists storing the forward and backward reference frames, respectively. In the bi-directional iterative search, all the frames in List0 and List1 are searched for refinement. In the following section, the forward, backward and bi-directional predictions are denoted as FWD, BWD and BI for short. The second loop is looping over reference frames in all active reference frames (1 NumberReferenceFrames (NRFs)) in each direction. Finally, after comparing the Rate Distortion (RD) cost of forward, backward and bi-directional iterative prediction, reference information, including reference frame indices and reference direction, with the smallest RD cost are stored. The MRF process in MVC is of extreme high complexity while combining with the variable block size technology, because the complex MRF selection process is required for each MB and sub-mb partitions.

ZHANG et al.: MULTI-REFERENCE FRAME SELECTION ALGORITHM FOR HIERARCHICAL B PICTURES 17 Fig. 3. Illustration of MRF selection for different MB partitions. III.

3 ZHANG et al.: MULTI-REFERENCE FRAME SELECTION ALGORITHM FOR HIERARCHICAL B PICTURES 17 Fig. 3. Illustration of MRF selection for different MB partitions. III. PROPOSED FAST MULTI-REFERENCE SELECTION ALGORITHM BASED ON BLOCK CORRELATION While encoding one MB in hierarchical B pictures, the optimal mode, optimal reference direction and reference frame,, are determined by RD optimization. It can be expressed as (1) where and are the current and reference block, respectively; represents the set of Inter modes. The reference frame indicator is composed of two elements: the reference index in List0 (RefIdx0) and the reference index in List1 (RefIdx1), RefIdx0, RefIdx1. indicates the reference prediction direction,. Fig. 3 shows an example of inter-view and temporal joint prediction, where Ref1 and Ref2 are reference frames in List0, Ref3 and Ref4 are in List1. Due to similar properties and high spatial correlation of the pixels within MB, smaller MB partition modes, e.g., B8 8 and B16 8, will probably select the same prediction direction and reference indices as B16 16 does, that is, it is of high probability that and for encoding modes with partition size smaller than This probability will be statistically identified as follows. A. Statistical Analyses of MRF Selection Let be the event that both reference frame and prediction direction of the best mode be equal to the reference frame and direction of B16 16 mode, be the event of selecting DI- RECT, SubDIRECT, I4MB, I8MB, I16MB and PCM mode as the best mode, and be the event of selecting other Inter modes. The corresponding probabilities of, and are denoted as, and, respectively. It can be seen that and are mutually exclusive and satisfy When takes place, we can assume that arbitrary reference frame and prediction direction are the best because the modes of do not need to do MRF selection. In this sense, it can also be (2) Fig. 4. Statistical analyses on probability P (AjB ) and mode probability P (B ) when bqp is 28: (a) probability P (AjB ); (b) mode probability P (B ). considered that always happens as takes place. Therefore, we have the conditional probability As happens, the probability of selecting the reference frame and prediction direction of B16 16 as the best can be represented as the conditional probability. In Fig. 4, the statistical analyses are given for probabilities and by searching all reference frames, directions and MB modes. Three multiview video sequences, Breakdancers (fast motion), Ballet (moderate motion) and Doorflowers (slow motion), are analyzed. Fast ME/DE is enabled and the parameter NRF is set to 2. Eight views, including four temporal coded views (even views) and four inter-view/temporal joint coded views (odd views), are encoded. The x-axis of the figures shows different view section and different frames in each view section. The y-axis is the probability of for Fig. 4(a) and mode probability for Fig. 4(b). It can be observed that the probability of is as high as 75% 99% for all the frames of the test sequences when the basis Quantization Parameter (QP), denoted by, is 28. In addition, Fig. 4(b) shows the (3)

18 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO.

The average conditional probability and average mode probability with different s are shown in Table I, where it can be seen that and increase as increases, and decreases as the motion gets fast.

According to conditional probability theory, we can obtain (4) Then, based on (4), we can rewrite as (5) Fig. 5.

4 18 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 TABLE I AVERAGE PROBABILITY OF P(AjB ), P(B ) AND HIT RATE P(A) [UNIT:%] percentage of MBs coded by modes in and about 60% 99% MBs are coded by the modes. The average conditional probability and average mode probability with different s are shown in Table I, where it can be seen that and increase as increases, and decreases as the motion gets fast. Generally, the average value of is larger than 66.31% for all the test sequences and s. Similar statistical results can also be found when full ME/DE search is enabled. According to conditional probability theory, we can obtain (4) Then, based on (4), we can rewrite as (5) Fig. 5. Statistical analyses on P (AjB ) and P (B ) with different NRFs: (a) probability P (AjB ); (b) mode probability P (B ). Taking Breakdancers as an example, the average and are 66.31% and 81.54%, respectively. According to (5) and (2) and statistical data in Table I, the hit rate. Higher values can be obtained for larger QP and the multiview video sequences with moderate and slow motion, e.g., Ballet and Doorflowers. As shown in the last four rows in Table I (i.e., results), very few MBs, i.e.,, miss the optimal reference frame or prediction direction. Fig. 5 shows the statistical analyses on and under the test condition that fast ME/DE is enabled with different NRFs and being 28. It can be observed that the probabilities and are consistent for different NRFs. In other words, more computational complexities can be reduced for larger NRF values. However, according to inter-view and temporal correlation analyses on multiview videos, frames in different views and at different time instants relative to the current encoding frame are usually of low dependencies and unnecessary to be referenced for the current frame [5]. So, in the following experiments of evaluating the proposed FMFSA, NRF is set to 2, which is the default setting of JMVC. Based on the above analyses, the following two aspects can be obtained. 1 If B16 16 mode selects one frame as the optimal reference frame, smaller MB partition modes, such as B16 8, B8 16, B8 8, will select the same reference frame as the optimal one in each prediction direction. 2 Smaller block partition modes will select the same prediction direction, i.e., FWD, BWD or BI, as B16 16 does. Accordingly, a novel FMFSA is proposed for MVC as follows. B. Proposed FMFSA and Complexity Analysis Due to high probability of and, we can solve the optimal problem of (1) with the following two steps: Firstly, check B16 16 mode with all active reference frames and directions and we can obtain the optimal reference and direction for B16 16 mode as Then, when checking other Inter modes, their reference frame and reference direction are set as and directly. That is Therefore, the coding complexity can be significantly reduced. The proposed FMFSA algorithm is described as follows. Step 1) Encode the current MB with DIRECT mode. (6) (7)

(Nagoya Univ.) Step 2) Encode the current MB with B16 16 mode and obtain the best reference frames in List0 and List1, respectively. Save the prediction directions of the B16x16 mode for later use.

5 ZHANG et al.: MULTI-REFERENCE FRAME SELECTION ALGORITHM FOR HIERARCHICAL B PICTURES 19 Fig. 6. Eight views of multiview video sequences: (a) Race1 (KDDI); (b) Ballroom (MERL); (c) Exit (MERL); (d) Doorflowers (HHI); (e) Lovebird1 (ETRI); (f) Ballet (MSR); (g) Breakdancers (MSR); (h) Dog (Nagoya Univ.) Step 2) Encode the current MB with B16 16 mode and obtain the best reference frames in List0 and List1, respectively. Save the prediction directions of the B16x16 mode for later use. Step 3) Encode the current MB with other Inter modes, by using the reference information of B16 16 which is available from Step 2. Step Step 4) Encode the current MB with Intra modes. 5) Store the coding parameters with the smallest RD cost and write coded bitstream. Then go to Step 1 for next MB. After presentation of the proposed FMFSA, its complexity analysis is given below. Let be the complexity of B16 16, the complexity of DIRECT, other Inter modes (including B16 8, B8 16, B8 8Frext, B8 8, SubB8 4, SubB4 8 and SubB4 4) and Intra modes (including I4MB I8MB, I16MB and PCM) can be represented as, and, where, and are positive multiplication factors. Therefore, the total complexity of encoding one MB via JMVC is. TABLE II TEST MULTIVIEW VIDEO SEQUENCES Let and be the complexity of multi-reference search for FWD and BWD, respectively, where reference frames

6 20 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 TABLE III RATE, PSNR AND ENCODING TIME COMPARISONS AMONG ORIGINAL JMVC, KUO S SCHEME, ZHU S SCHEME AND FMFSA are active in each memory list, i.e., NRF equals to. Based on our empirical experiences, can be considered approximately equal to, so only is used, i.e.,. Let be the complexity of bi-directional iterative search, where is a positive multiplication factor that depends on the number of iterations, iterative search algorithm and iter-

7 ZHANG et al.: MULTI-REFERENCE FRAME SELECTION ALGORITHM FOR HIERARCHICAL B PICTURES 21 ative search range. Because each Inter mode is composed of FWD/BWD multi-reference search and bi-directional iterative search, equals to. Let, and be the probabilities of selecting FWD, BWD and BI as the best direction, where. Thus, considering that the complexities of a MB for performing ME/DE on each reference frame are almost the same, the total computational complexity of encoding a MB via FMFSA can be calculated as. On the basis of complexity analyses of MVC coding process, we obtain,,, when search range is 96, fast ME/DE is enabled, equals 2, the number of bi-directional iterative search is 4, and iterative search range is 8. Hence, the total complexities of encoding one MB via original JMVC and FMFSA are and, respectively. For different multiview video sequences, usually ranges from 5 35%, and 20% in average. Thus, the total complexity of encoding one MB via FMFSA is, which means 70.0% complexity reduction in average can be achieved. IV. EXPERIMENTAL RESULTS AND ANALYSES The recent H.264/AVC based MVC reference software JMVC 3.0 [16] is utilized to evaluate the proposed FMFSA algorithm. Fast ME/DE is enabled and their search range is set to 96. The number of bi-prediction iteration is 4 and the search range for iterations is 8. The maximum number of reference frames is 2 and the GOP length is 12. Eight different multiview video test sequences, including Race1, Ballroom, Exit, Lovebird1, Doorflowers, Breakdancers, Ballet and Dog, with various motion properties and camera arrangements are adopted. Fig. 6 shows eight views of these test multiview video sequences. Detailed information of the test sequences is given in Table II. Eight views for each multiview video sequences and 61 frames for each view are encoded. Four values, 24, 28, 32 and 36, are used in our experiments. The coding parameters are consistent for the original JMVC, Kuo s scheme [11], Zhu s scheme [13] and the proposed FMFSA for a fair comparison. All video coding experiments are performed on Dell OPTIPLEX GX620 computer, Intel Pentium IV dual Core 3.20 GHz and 3.19 GHz CPU, 2 GB memory, Microsoft Windows XP Professional operating system. Table III shows the encoding time, Peak Signal-to-Noise Ratio (PSNR), bit rate comparison among the test algorithms, where the time saving ratio, PSNR difference and bit rate increment between the original JMVC encoder and test algorithms are computed as where, and are total encoding time, PSNR and bit rate of algorithm,,, and are total encoding time, PSNR and bit rate of the original JMVC. From Table III, Kuo s scheme reduces 29.69% 39.70% total encoding time for even views; meanwhile, the average (8) Fig. 7. RD curves of JMVC, Kuo s scheme, Zhu s scheme and the proposed FMFSA. Fig. 8. Encoding time saving ratio achieved by Kuo s scheme, Zhu s scheme and FMFSA. bit rate increase is within and the average PSNR degradation is within db. For odd views, Kuo s scheme can achieve more complexity reduction, 44.00% 56.37% in average; however, the average bit rate increases %, 6.41% in average. That large amount of bit rate increase is due to the fact that Kuo s scheme is proposed for the traditional mono-view video coding and has not taken the inter-view prediction into consideration. As for encoding odd views by using Zhu s scheme, 45.15% 55.96% computational complexity reduction is achieved, meanwhile, the bit rate increase within 0.33% and the average PSNR degradation is within 0.01 db. However, Zhu s scheme is proposed for complexity reduction for odd views and not applicable for encoding even views, i.e., encoding the even views by using original JMVC and no complexity reduction is achieved. As for the proposed FMFSA, the average bit rate increase is 0.32% for even views and 0.54% for odd views for all eight test multiview video sequences. Meanwhile, the PSNR degradation is 0.02 db in average and within 0.04 db for all test sequences and all views. For better observation, Fig. 7 shows the comparison of

22 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 RD curves among the test algorithms.

As far as the coding complexity is concerned, the proposed FMFSA can reduce the total encoding time by 68.34% 79.01% for all test sequences. FMFSA achieves 73.

As for the complexity comparison for coding odd views, FMFSA achieves 76.68% complexity reduction in average, which is 24.64% more than the complexity reduction achieved by Zhu s scheme. Fig.

8 22 IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 1, MARCH 2011 RD curves among the test algorithms. The proposed FMFSA retains almost the same RD performance as the original JMVC and Zhu s scheme, and outperforms Kuo s scheme. As far as the coding complexity is concerned, the proposed FMFSA can reduce the total encoding time by 68.34% 79.01% for all test sequences. FMFSA achieves 73.67% complexity reduction in average, which is much more than 41.45%, the average complexity reduction achieved by Kuo s scheme. As for the complexity comparison for coding odd views, FMFSA achieves 76.68% complexity reduction in average, which is 24.64% more than the complexity reduction achieved by Zhu s scheme. Fig. 8 shows the average encoding time saving ratio for even views and odd views, respectively. From the results, we obtain the following three facts: 1) the proposed FMFSA can achieve 5% more complexity reduction for odd views as compared to even views, because these odd views require more coding efforts originally. 2) The overall results indicate FMFSA retains reliable and consistent complexity reduction, approximately 70%, for all test sequences, even with various video contents, camera arrangements and motion properties. 3) FMFSA reduces much more computational complexity than Kuo s scheme and Zhu s scheme. The FMFSA is a flexible framework focusing on MRF selection and can be integrated with other existing fast algorithms, such as fast mode decision and fast ME/DE, to further reduce MVC encoding computations. V. CONCLUSION This paper presents an efficient multi-reference frame selection algorithm for hierarchical B pictures by exploiting high reference frame and direction correlation among variable block size coding modes. Experimental results show that the proposed FMFSA achieves 68.34% 79.01% total encoding time reduction as compared to the original JMVC3.0. Meanwhile, the average results about bit rate increase and PSNR degradation of FMFSA are within 0.54% and 0.04 db, respectively, which keep the RD performance of the original JMVC more or less intact. In the future, we will study new fast algorithms including fast mode decision and fast motion estimation based on FMFSA to further reduce the encoding computations for MVC. REFERENCES [1] M. Tanimoto, Overview of free viewpoint television, Signal Process: Image Commun., vol. 21, no. 6, pp , Jul [2] N. Nithiyanandam, A three-dimensional digital image display system, IEEE Trans. Broadcast., vol. BC-21, no. 4, p. 53, Dec [3] L. Zhang and W. J. Tam, Stereoscopic image generation based on depth images for 3DTV, IEEE Trans. Broadcast., vol. 51, no. 2, pp , Jun [4] Survey of Algorithms Used for Multi-View Video Coding (MVC), ISO/IEC JTC1/ SC29/WG11, N6909, Jan [5] P. Merkle, A. Smolic, K. Müller, and T. Wiegand, Efficient prediction structures for multi-view video coding, IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 11, pp , Nov [6] Z. J. Peng, G. Y. Jiang, and M. Yu, A fast multiview video coding algorithm based dynamic multi-threshold, in In Proc. IEEE ICME 09, Jun. 2009, pp [7] X. M. Li, D. B. Zhao, S. W. Ma, and W. Gao, Fast disparity and motion estimation based on correlations for multiview video coding, IEEE Trans. Consumer Electron., vol. 54, no. 4, pp , Nov [8] L. Q. Shen, Z. Liu, S. X. Liu, Z. Y. Zhang, and P. An, Selective disparity estimation and variable size motion estimation based on motion homogeneity for multi-view coding, IEEE Trans. Broadcast., vol. 55, no. 4, pp , Dec [9] Y. P. Su and M. T. Sun, Fast multiple reference frame motion estimation for H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 3, pp , Mar [10] Y. W. Huang, B. Y. Hsieh, and S. Y. Chien et al., Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC, IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 4, pp , Apr [11] T. Y. Kuo and H. J. Lu, Efficient reference frame selector for H.264, IEEE Trans. Circuits Syst. Video Technol., vol. 18, no. 3, pp , Apr [12] J. P. Lin and A. C. Tang, A fast direction predictor of inter frame prediction for multi-view video coding, in In Proc. IEEE ISCAS 09, Taipei, Taiwan, May 2009, pp [13] W. Zhu, W. Jiang, and Y. Chen, A fast inter mode decision for multiview video coding, in In Proc. ICIECS 09, Qiangdao, China, Dec. 2009, pp [14] Y. Zhang, G. Y. Jiang, M. Yu, and Y. S. Ho, Adaptive multiview video coding scheme based on spatio-temporal correlation analyses, ETRI Journal, vol. 31, no. 2, pp , Apr [15] H. Schwarz, D. Marpe, and T. Wiegand, Hierarchical B pictures, in JVT of ISO/IEC MPEG & ITU-T VCEG, Poznan, PL, Jul. 2005, Doc. JVT-P014. [16] Y. Chen, P. Pandit, and S. Yea, WD1 reference software for MVC (JMVC) 3.0, in JVT of ISO/IEC MPEG & ITU-T VCEG, Busan, Korea, Oct. 2008, Doc. JVT-AC207. Yun Zhang received the B.S. and M.S. degrees in electrical engineering from Ningbo University, Ningbo, China, in 2004 and 2007, respectively, and the Ph.D. degree in computer science from Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, in From 2009 to 2010, he was a Visiting Scholar with the Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong. In 2010, he joined the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, as an Assistant Researcher. His research interests are multiview video coding, video object segmentation and content based video processing. Sam Kwong (M 93-SM 04) received the B.S. and M.S. degrees in electrical engineering from the State University of New York at Buffalo in 1983, the University of Waterloo, Waterloo, ON, Canada, in 1985, and the Ph.D. degree from the University of Hagen, Germany, in From 1985 to 1987, he was a Diagnostic Engineer with Control Data Canada. He joined Bell Northern Research Canada as a Member of Scientific Staff. In 1990, he became a Lecturer in the Department of Electronic Engineering, City University of Hong Kong, where he is currently a Professor in the Department of Computer Science. His research interests are video and image coding and evolutionary algorithms. Gangyi Jiang received the M.S. degree from Hangzhou University, Hangzhou, China, in 1992, and received the Ph.D. degree from Ajou University, Korea, in In 2000, he joined the Faculty of Information Science and Engineering, Ningbo University, China, as a Professor. His research interests include digital video compression and communications, multi-view video coding and image processing.

ZHANG et al.: MULTI-REFERENCE FRAME SELECTION ALGORITHM FOR HIERARCHICAL B PICTURES 23 Hanli Wang (M 08) received the B.S. and M.S. degrees in electrical engineering from Zhejiang University, Hangzhou, China, in 2001 and 2004, respectively, and the Ph.

9 ZHANG et al.: MULTI-REFERENCE FRAME SELECTION ALGORITHM FOR HIERARCHICAL B PICTURES 23 Hanli Wang (M 08) received the B.S. and M.S. degrees in electrical engineering from Zhejiang University, Hangzhou, China, in 2001 and 2004, respectively, and the Ph.D. degree in computer science from City University of Hong Kong (CityU), Kowloon, Hong Kong, in From 2007 to 2008, he was a Research Fellow with the Department of Computer Science, CityU. From 2007 to 2008, he also was a Visiting Scholar with Stanford University, Palo Alto, CA, invited by Prof. C. K. Chui. From 2008 to 2009, he was a Research Engineer with Precoad, Inc., Menlo Park, CA. From 2009 to 2010, he was an Alexander von Humboldt Research Fellow in University of Hagen, Hagen, Germany. In 2010, he joined the Department of Computer Science & Technology, Tongji University, Shanghai, China, as a Professor. His current research interests include digital video coding, image processing, pattern recognition and video analysis.

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for