Motion Vector Coding Algorithm Based on Adaptive Template Matching

Similar documents
Decoding-Assisted Inter Prediction for HEVC

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Video Coding Using Spatially Varying Transform

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

Fast Mode Decision for H.264/AVC Using Mode Prediction

A reversible data hiding based on adaptive prediction technique and histogram shifting

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Advanced Video Coding: The new H.264 video compression standard

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

High Efficient Intra Coding Algorithm for H.265/HVC

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Video compression with 1-D directional transforms in H.264/AVC

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

A Fast Intra/Inter Mode Decision Algorithm of H.264/AVC for Real-time Applications

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

Fast Motion Estimation for Shape Coding in MPEG-4

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

STANDARD COMPLIANT FLICKER REDUCTION METHOD WITH PSNR LOSS CONTROL

Digital Video Processing

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Digital Image Stabilization and Its Integration with Video Encoder

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

An Efficient Inter-Frame Coding with Intra Skip Decision in H.264/AVC

Reduced Frame Quantization in Video Coding

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

Rate Distortion Optimization in Video Compression

Implementation and analysis of Directional DCT in H.264

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder

An Efficient Mode Selection Algorithm for H.264

Fast frame memory access method for H.264/AVC

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

IN RECENT years, multimedia application has become more

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

Context based optimal shape coding

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

A deblocking filter with two separate modes in block-based video coding

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile

NEW CAVLC ENCODING ALGORITHM FOR LOSSLESS INTRA CODING IN H.264/AVC. Jin Heo, Seung-Hwan Kim, and Yo-Sung Ho

Unit-level Optimization for SVC Extractor

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials

An Adaptive Cross Search Algorithm for Block Matching Motion Estimation

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

Pattern based Residual Coding for H.264 Encoder *

VIDEO streaming applications over the Internet are gaining. Brief Papers

Recent, Current and Future Developments in Video Coding

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

Spline-Based Motion Vector Encoding Scheme

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Module 7 VIDEO CODING AND MOTION ESTIMATION

Video Quality Analysis for H.264 Based on Human Visual System

For layered video encoding, video sequence is encoded into a base layer bitstream and one (or more) enhancement layer bit-stream(s).

ISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

ARTICLE IN PRESS. Signal Processing: Image Communication

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Affine SKIP and MERGE Modes for Video Coding

Optimal Estimation for Error Concealment in Scalable Video Coding

Localized Multiple Adaptive Interpolation Filters with Single-Pass Encoding

Professor, CSE Department, Nirma University, Ahmedabad, India

Next-Generation 3D Formats with Depth Map Support

Block-Matching based image compression

Homogeneous Transcoding of HEVC for bit rate reduction

10.2 Video Compression with Motion Compensation 10.4 H H.263

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

An Efficient Table Prediction Scheme for CAVLC

H.264 to MPEG-4 Transcoding Using Block Type Information

Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec

Performance Comparison between DWT-based and DCT-based Encoders

VIDEO COMPRESSION STANDARDS

Overview: motion-compensated coding

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264.

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

View Synthesis for Multiview Video Compression

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Optimizing Motion Estimation for H.264 Encoding

Multi-View Image Coding in 3-D Space Based on 3-D Reconstruction

Inter-prediction methods based on linear embedding for video compression

A Low Bit-Rate Video Codec Based on Two-Dimensional Mesh Motion Compensation with Adaptive Interpolation

Transcription:

Motion Vector Coding Algorithm Based on Adaptive Template Matching Wen Yang #1, Oscar C. Au #2, Jingjing Dai #3, Feng Zou #4, Chao Pang #5,Yu Liu 6 # Electronic and Computer Engineering, The Hong Kong University of Science and Technology Hong Kong 1 eeyangw@ust.hk, 2 eeau@ust.hk, 3 jjdai@ust.hk, 4 fengzou@ust.hk, 5 pcece@ust.hk MTI/ECE,Applied Science and Technology Research Institute HK Science Park, Shatin, NT, Hong Kong 6 liuyu@astri.org Abstract Motion estimation as well as the corresponding motion compensation is a core part of modern video coding standards, which highly improves the compression efficiency. On the other hand, motion information takes considerable portion of compressed bit stream, especially in low bit rate situation. In this paper, an efficient motion vector prediction algorithm is proposed to minimize the bits used for coding the motion information. First, a possible motion vector predictor (MVP) candidate set (CS) including several scaled spatial and temporal predictors is defined. To increase the diversity of predictors, the spatial predictor is adaptively changed based on current distribution of neighboring motion vectors. After that, adaptive template matching technique is applied to remove non-effective predictors from the CS so that the bits used for the MVP index can be significantly reduced. As the final MVP is chosen based on minimum motion vector difference criterion, a guessing strategy is further introduced so that in some situations the bits consumed by signaling the MVP index to the decoder can be totally omitted. The experimental results indicate that the proposed method can achieve an average bit rate reduction of 5.9% compared with the H.264 standard. I. INTRODUCTION Most state-of-the-art video compression standards such as MPEG4 and ITU JVT/H.264 [1] involve lots of techniques so that the compression efficiency is highly improved. For example, the block matching motion estimation (ME) is widely utilized to exploit temporal correlation between frames and achieve a good compression goal by reducing this redundancy. ME methods focus on finding a good prediction of current block from a restricted area in the reference frame, meanwhile trying to maintain low complexity load. The relative displacement between the best-match block and the current block is called the motion vector (MV). The corresponding reconstruction is named as motion compensation (MC) which reconstructs the block from the best-match block and the residue. In order to perform the MC at the decoder side, video standards need to specify the coding of MV. H.264 allows subblock motion estimation to acquire higher prediction accuracy, which means each sub-block has its own motion and needs extra bits to indicate its own MV, thus for inter blocks a large number of bits are spent on the coding of motion information. When bit rate is low, the percentage of the bits spent on motion Fig. 1. Proportion of bits for motion information vector coding can be up to 43%, which is shown in Figure 1. There is thus an impending need to carry out highly efficient coding schemes for the motion vector. To reduce the amount of compressed bits for representing the MV information, H.264 uses a predictive coding technique to encode the MV. For each block, H.264 constructs a motion vector predictor (MVP) by using the median of three neighboring MVs, mv H.264 = median(mv A, mv B, mv C ) (1) where mv A, mv B and mv C are the MVs of the neighboring blocks A, B and C respectively (The locations of block A, B and C are shown in Figure 2). The motion vector difference (MVD) between the MVP and the MV of current block is then encoded into the bit stream. Since the MV should be coded without any loss, the coding performance mainly depends on the prediction accuracy. The mv H.264 is effective in reducing the MV coding bits because it tends to be similar to the MV in most cases; however, it is not always optimal for minimizing MVD. If other more effective MVPs are considered, there is a high chance that the MV coding bits can be further reduced. In recent years, MV coding has attracted lots of attention and much research has been conducted [2-6]. In Kim and Ra s work [2], several neighboring MVP candidates were examined and the one producing the minimum bit rate in MVD coding

Fig. 2. Neighboring blocks of current block was selected. In order to obtain the MVP at the decoder, the predictor indexes for x and y components needed to be transmitted. Guillaume Laroche et al. [3] improved [2] by considering other useful predictors and jointly predicting the x and y components of the MV. They also proposed a MV prediction criterion based on a modified Rate-Distortion (RD) cost optimization which covered not only the MVD cost but also the index cost. [2] and [3] indeed provided the best results in terms of prediction error, however the required side information is considerably large, thus in [4] it proposed a guessing strategy to reduce the index bits in the situations when decoder can detect the MVP itself. Bongsoo Jung et al. [5] raised a new macroblock coding mode pooled zero vector coding as an efficient representation when all 4 4 blocks in one macroblock (MB) have zero MVDs, so that in this situation it can go with consuming only 5 bits for the header information instead of 32 bits in traditional coding. Finally in [6], S. Kamp et al. put forward a decoder-side motion estimation which totally eliminated the coding of MVD and reference index. They defined a set of decoded pixels surrounding current block as the template, and performed a matching process for this template to find the MV of current block. This process was called template matching (TM), which we will discuss in detail later. In this paper, we propose an efficient motion vector coding algorithm. First, a possible MVP candidate set (CS) including several spatial and temporal predictors is defined, among which one spatial predictor is adaptively changed based on the current distribution of neighboring MVs. After that, we apply TM to exclude some non-effective predictors from the CS so that the bits used for indexing the final MVP can be significantly reduced. Adaptive template width and shape strategies are advanced to increase the TM accuracy. The final optimal MVP of current block is selected among the size-reduced CS based on minimum MVD criterion. At last, the guessing strategy is further introduced so that in some situations the bits consumed by signaling the MVP index to the decoder can be totally avoided. Simulation results indicate that the proposed method can achieve a significant bit rate reduction of 5.9% on average compared with the H.264 standard. II. PROPOSED MOTION VECTOR PREDICTION ALGORITHM A. Motion Vector Predictor Scaling The MVs of neighboring blocks are usually used as possible MVP candidates for the current block. However, as multiple reference frames are allowed in H.264, different MVs may refer to different reference frames, their temporal distances relative to the current frame are different. Under this condition, even if the neighboring MVs follow the same motion with the current block, the virtual values may differ a lot, therefore MVP candidates should be scaled according to their temporal distances. Taking mv A in Eqn. 1 as an example, supposing the temporal distance between block A and its reference frame is d a, and the temporal distance between current block and its reference frame is d c, then the scaled predictor can be calculated as mv sa = mv A d c d a (2) Eqn. 2 can be used to calculate mv sb for block B, mv sc for block C (B, C have been identified in Fig. 2), mv scol for the collocated block which locates at the same position as current block but in the previous frame, etc. In the proposed method every predictor used is the corresponding scaled one. B. Motion Vector Predictor Candidate Set As depicted above, mv H.264 only considers the spatial correlation, and the nature of median can give rise to a false result when current MV follows the minority motion of the three neighboring MVs, which often happens at object boundaries. To improve the accuracy of MVP, other temporal and spatial related MVs should also be considered in the MVP candidate set(cs). In the proposed algorithm, the CS includes four predictors, CS = {mv sh.264, mv 0,0, mv scol, mv snei } (3) where mv sh.264 is similar to mv H.264 except that the three neighboring MVs are the scaled version, mv sa, mv sb and mv sc, according to Eqn. 2. mv 0,0 is the (0,0) motion vector. In many actual videos, there are lots of objects keeping stationary with MV (0,0), such as the background. In addition, when current block has a random motion which often happens in zooming or other complex scenes, mv 0,0 is probably a good predictor. mv scol is the scaled collocated predictor, it is considered because the situation that the current block and its collocated block belonging to the same object is very frequent in most video sequences, thus there is a high probability that the current block and its collocated block undergo a similar motion. The vector mv snei is the farthest vector from mv sh.264 among the three neighboring MVs mv sa, mv sb and mv sc [4]: mv snei = arg max mv i mv i mv sh.264 2, (4) mv i S = {mv sa, mv sb, mv sc } The mv snei is adaptively changed based on current distribution of the neighboring MVs. When only one of the neighboring blocks belongs to the object containing the current block, the mv sh.264 will choose the majority case which may have low correlation with the current MV. In such a case, mv snei is more correlated and there is a high chance that mv snei provides a better prediction.

Fig. 3. Template matching Note that at current stage only four predictors have been contained in the CS, however, as the proposed algorithm utilizes adaptive template matching to exclude non-effective predictors, the size of the CS and the predictors included in the CS can be further improved, without burdening the bitstream. C. Adaptive Template Matching and Candidate Set Reduction The initial CS can contain many MVPs and the index to specify the final predictor may require many bits. To solve this problem, the number of predictors in the CS is reduced by adaptive template matching (ATM) technique. In our implementation, we use ATM to retain two better predictors in the reduced candidate set (RCS), hence only 1 bit is needed for the index. In next section the experimental results will prove that keeping two predictors in the RCS is a good trade-off between prediction accuracy and index bits reduction. Template matching (TM) is primitively proposed for texture synthesis, in which the TM uses the neighboring information to synthesize the required image or video. Nowadays many scholars state that the TM can also be employed in general video coding like what in [6], [7]. The basic principle of the TM is shown in Figure 3 [6]: in order to derive a good prediction for the current block, a template region (TR) is defined around the target block. As ATM needs to be employed both at the encoder and decoder sides without mismatch, and at the decoder only the outer reconstructed top and left boundaries of the current block are available, thus the template is usually of L-shape. The template widthes M left and M up are defined to be the widthes of pixels extended to the left and top of the target in the TR. Usually, we calculate the sum of absolute difference (SAD) between the template of the current block and the template of the candidate block corresponding to a MVP to measure their similarity. If a template is highly correlated with its corresponding block, it is reasonable to assume that the block corresponded to the well-match template can also provide a good prediction of the current block. This happens when the template belongs to the same object as the current block. However, when some parts in the TR belong to another object with different motion, it may lead to a large template SAD even for a good MVP. To avoid this situation, in the following we propose adaptive template shape and width criterions based on the possible correlation between the template and the target block, as well as the similarity inside the template. The L-shaped template is divided into two parts: the top portion and the left portion. Typically, we allow the template to be the L-shaped template, left template or top template. The reason for only using one portion is that when a macroblock is divided into different sub-blocks, it means there is relatively low correlation between these sub-blocks, hence including pixels from another sub-block into current template may bring risk of matching inaccuracy for the current sub-block. Under this condition, the template shape selection strategy is defined as follows: if (MB type == P16 16) Current MB uses L-shape template; else if(mb type == P16 8) /*with upper 16 8 sub-block and lower 16 8 sub-block*/ The upper sub-block uses L-shape template; The lower sub-block uses left template; else if (MB type == P8 16)/*with left 8 16 sub-block and right 8 16 sub-block*/ The left sub-block uses L-shape template; The right sub-block uses top template; else if (MB type == P8 8)/*with four 8 8 sub-blocks, some of which may be subdivided into smaller blocks with size 4 8, 8 4, 4 4)*/ The block at the upper left corner of current MB uses L-shape template; The blocks at the left boundary of current MB use left template; The blocks at the top boundary of current MB use top template; Other blocks use L-shape template. /*For these blocks, although both of their top and left neighboring regions belong to other blocks, they don t have any other choices but still use the neighboring regions as template*/ If the current block is smooth/textured, we would like the pixels in the template to have similar smoothness/texture characteristics. If the current block has an edge, we would like the template to have an edge which is an extension of the edge inside the block. In other words, we expect the template to have the uniformity or similarity with the current block while excluding pixels nonuniform or dissimilar. This actually reflects an image segmentation issue. Image segmentation has been widely studied and lots of techniques have been realized in the literature, but in our situation, due to the limited TR, computation capability as well as our special purposes, many image segmentation approaches are not appropriate. For example, when the edge is vertically across the top template (or horizontally across the left template), the edge is an extension of what inside the current block and the pixels at both sides of the edge are included in current block, thus our purpose is to retain the pixels at both sides of the edge in the template. On the contrary, when the edge is horizontally across the top template (or vertically across the left template), the pixels at the outer side of the edge may have a relative low correlation with the current block, in such a case our purpose is to remove these pixels from the TR. Although both above situations have edges in the template, our purposes to deal with the edges are quite different. However, many image

Fig. 4. Different rows in template segmentation algorithms can only process both situations in the same way, so they are not fit for our application. In above example, as we only want to remove the outside pixels when there is an edge basically parallel to the target block boundary, simply it comes down to choice of adaptively deciding the width of the template for the current block. In the proposed method, we calculate the inside difference of the template to achieve this goal. Take the top template as an example (which is shown in Figure 4), the determination criterion of the template width is defined as follows: The biggest width of a template is 4, while the smallest width is 1 (i.e., at least 1 row will be included in the TR if the top template is considered); Calculate the difference between adjacent rows in template: SAD12= SAD between Row1 and Row2; SAD23= SAD between Row2 and Row3; SAD34= SAD between Row3 and Row4; if ( SAD12 SAD23) The top template only includes Row1; /*When SAD12 is much bigger than SAD23, it is probable that Row1 belongs to one object (probably the same as current block) while Row2 and Row3 belong to a different object, or there is an edge between Row1 and Row2. So we do not include Row 2-4 in the template.*/ else if (SAD23 SAD12 SAD12 SAD34) The top template includes both Row1 and Row2;/*At current stage, Row 1-2 have been included in the TR. When SAD23 is quite larger than SAD12, it is probable that there is an edge between Row2 and Row3. When SAD12 is much bigger than SAD34, it s probable that Row1 and Row2 belong to texture or edge region while Row3 and Row4 belonging to smooth region. In both situations, we do not include Row 3-4 in the template.*/ else if(sad34 SAD23) The top template includes Row1, Row2, Row3;/*At current stage, Row 1-3 have been included in the TR. When SAD34 is quite larger than SAD23, it is probable that there is an edge between Row3 and Row4. Thus we do not include Row4 in the template.*/ else The top template includes Row1, Row2, Row3, Row4; We use the same strategy to determine the width of the left template. After the shape and width of the template are determined for the current block, this adaptive template is utilized to measure the similarity between the template of the current block and that of the candidate block corresponding to each MVP in the CS. Considering in most cases the template belongs to the same object as the current block, if a candidate predictor provides a small template SAD, which means a high degree of template matching, we assume it can also provide a small SAD between the corresponding prediction block and the current block. Particularly, the current adaptive template is higher correlated with the current block than the traditional fixed template, the matching accuracy is further increased. Finally, we exclude the predictors with bigger template SAD and keep the smaller two in the RCS. Unlike [6] totally replacing ME by TM to avoid the load of MVD and reference index, the proposed method only uses ATM to reduce the size of the CS. Replacing ME by TM may introduce some problems: first, it arises a heavy computation load at the decoder. To reduce the computation complexity, in [6] the search area of TM was restricted much smaller than that of general ME, which reduced the prediction accuracy. Second, when the template and the current block belong to different objects with different motions, TM instead of ME would increase the prediction error thus increasing the residue entropy. As in our proposed scheme we only use ATM to remove non-effective predictors, the ME accuracy can be preserved. D. Final MVP and Index Coding Although ATM is effective in selecting good predictors, it can t be totally accurate in all of the times, thus instead of using ATM to choose the final MVP, we only use it to reduce the size of the CS. The final MVP pmv opt for the current block is decided on as follows: pmv opt = arg min pmv i D(pmv i mv), pmv i RCS (5) where mv represents the current MV, D( ) is the function used to measure the vector distance. So the final MVD of the current block is equal to mv pmv opt. When all the MVPs in the RCS are identical, no index is needed, otherwise 1 bit is needed to signal the selection of the final MVP. However, in some situations the final MVP can be self-derived by a guessing strategy at the decoder [4], in such a case there is no need to transmit the index. The guessing strategy relies on the fact that the final MVP is selected on the basis of minimum MVD criterion. We also introduce this strategy into our scheme, and in the following we list the concrete steps on how encoder/decoder applies this strategy to decide whether to encode/decode the index: Step1: Obtain two possible MVs based on the calculated/received MVD and predictors in the RCS: mv 1 = MV D + mv AT M1 and mv 2 = MV D + mv AT M2 ; Step2: Assuming mv 1 is the true motion vector, select the best predictor for mv 1 according to Eqn. 5 and calculate the new MVD. If the new MVD equals to the original one, set flag AT M1 = 1, otherwise flag AT M1 = 0; Step3: Assuming mv 2 is the true motion vector, repeat Step2 to determine the value of flag AT M2 ; Step4: If (flag AT M1 = 1&&flag AT M2 = 1) (flag AT M1 = 0&&flag AT M2 = 0), the index of the final MVP will be transmitted/decoded. Otherwise no index is

TABLE I SIMULATION CONDITIONS Profile Main Prediction structure IPPP Entropy coding CAVLC Quantization parameters 22,27,32,37 Reference frame 4 Search range 32 RDO On transmitted/decoded, and the final predictor can be derived according to flag AT M1 and flag AT M2. For example, when mv AT M1, mv AT M2 and current MV are (1,0), (2,0) and (3,1) respectively, from Eqn. 5 the mv AT M2 will be selected as the final MVP, and the MVD is (1,1). At the decoder side, following the previous steps we will get flag AT M1 to be 0 and flag AT M2 to be 1. In this situation, the decoder detects mv AT M2 to be the optimal MVP thus no index is needed. However, when mv AT M1, mv AT M2 and current MV are (1,0), (2,0) and (2,0) respectively, following the previous steps the decoder will find that both predictors are possible MVPs of the current block, so the encoder/decoder needs to transmit/decode the index. III. EXPERIMENTAL RESULTS AND DISCUSSION The proposed algorithm has been incorporated into the H.264/AVC key technical area (KTA) reference software 2.2 version developed based on H.264/AVC JM11.0. The coding efficiency of the proposed method is compared with the H.264 standard with traditional MV coding method (as the anchor). The results of the method in [3] are also listed for comparison. Besides, we generate a new method called OnlyATM as another comparison. OnlyATM uses ATM to select the final MVP directly without the need of sending the index. It is implemented to reveal the accuracy of ATM and prove that the proposed scheme is a better trade-off. To ensure the fairness, all of the three algorithms use the same predictor candidate set. The main simulation conditions are summarized in Table I. The Main profile is selected as the platform for simulation, allowing the usage of most latest H.264/AVC normative tools. The test set is composed of six CIF sequences of 150 frames each (for Stefan there are only 90 frames) and four 720P sequences of 100 frames each, with various representative contents and motions. Several typical quantization parameters (QP) are tested so that the quality is between 27 and 45 db which corresponds to a visual quality in line with most of the industrial applications. Table II exhibits the performances of each algorithm in terms of the Bjontegaard Delta bit rate (BDBR) and Bjontegaard Delta Peak Signal to Noise Ratio (BDPSNR) [8] compared with the anchor. It demonstrates that all of the MV coding algorithms have certain bit rate reductions compared with the anchor, among which the proposed method has the most significant bit rate reduction of 4.1% for CIF sequences and 8.5% for 720P sequences on average. Method in [3] needs average two extra bits for each MV coding, which reduces the profit obtained from precise MV coding. For Night sequence, it got even worse results than the anchor. In OnlyATM, when ATM selects a wrong predictor, the encoder may take the risk that even more bits than index bits are sacrificed to compensate the increment of MVD entropy, so at last the overall bit rate of OnlyATM is higher than that of the proposed method. The proposed method performs well especially for those video sequences with fast motion fields and without too many textures, such as Foreman, Bus and Spincalendar. In fast motion sequences, our proposed method can provide more precise MVP than the conventional median predictor, meanwhile, in sequences without too many textures, the motion information consumes a relatively large proportion of the total bits, which makes the improvement of MV coding more distinct. Figure 5 further depicts the RD curves of the MV coding schemes for several sequences in test. In these figures, the efficiency of the each algorithm is illustrated for every rate point. It shows that mostly the proposed method outperforms the other two MV coding schemes as well as the anchor under each QP. The distinction is more obvious when the bit rate is low, which may be explained by the fact that the motion information takes more significant portion of the whole bit stream at low bit rate. Other sequences also have similar results. We also calculate the statistical probabilities that the proposed algorithm and OnlyATM select the actual optimal predictor (the predictor which minimizes the MVD) among the CS to measure the template matching accuracy. As shown in Table III, by using fixed template (width 4, L-shape), the probability is about 77% for OnlyATM and 87% for the proposed method; while by using adaptive template, the probability raises to about 80% for OnlyATM and 89% for the proposed method. The reason of higher possibility in the proposed method compared with that in OnlyATM is straightforward: in the proposed method, we use ATM to select two predictors first and choose the final MVP based on minimum MVD criterion, which of course increases the preciseness. The data also indicates that the proposed adaptive template shape and width strategies indeed help to improve the template matching accuracy. In OnlyATM, the probability of choosing the optimal predictor only depends on template matching accuracy, therefore the probability increment by using adaptive template is more obvious in OnlyATM than that in the proposed method. Although the increments in both OnlyATM and the proposed method are not distinct, actually they can lead up to 2% bit rate reduction in some video sequences. IV. CONCLUSION In this paper, a novel motion vector prediction method is proposed to minimize the bits used for MV coding. First, a predictor candidate set is defined to exploit the spatial and temporal correlation in the motion fields. Especially, one spatial predictor can be adaptively changed depending on

TABLE II PERFORMANCE EVALUATION OF THE MV CODING METHODS Method in [3] OnlyATM Proposed Sequence BDBR(%) BDPSNR(dB) BDBR BDPSNR(dB) BDBR BDPSNR(dB) Bus -2.5% 0.14-4.6% 0.25-5.2% 0.29 Foreman -0.8% 0.04-3.5% 0.16-5.1% 0.23 Mobile -0.8% 0.04-3.2% 0.17-3.4% 0.19 CIF Stefan -0.3% 0.02-3.4% 0.18-3.0% 0.16 Table -2.0% 0.09-2.9% 0.12-4.4% 0.20 Tempete -0.2% 0.01-2.9% 0.14-3.1% 0.16 Average -1.1% 0.06-3.4% 0.17-4.1% 0.21 Night 0.6% -0.02-1.5% 0.06-2.5% 0.10 Raven 0.0% 0.0-7.3% 0.30-9.2% 0.38 720P Spincalendar -5.7% 0.13-8.9% 0.24-11.1% 0.32 Jets -9.0% 0.44-12.8% 0.62-11.3% 0.57 Average -3.5% 0.14-7.6% 0.31-8.5% 0.35 Total average -2.1% 0.13-5.1% 0.23-5.9% 0.27 Fig. 5. RD performance comparison for sequences TABLE III TEMPLATE MATCHING ACCURACY COMPARISON OnlyATM Proposed Sequence Fixed Adaptive Fixed Adaptive Bus 73.2% 75.0% 88.0% 89.4% Foreman 76.3% 79.2% 84.9% 86.1% Mobile 74.2% 75.9% 89.7% 90.6% CIF Stefan 78.1% 79.8% 91.6% 92.4% Table 77.7% 80.2% 85.4% 86.5% Tempete 75.0% 76.8% 86.5% 87.4% Average 75.7% 77.8% 87.7% 88.7% Night 73.6% 77.1% 82.5% 84.6% Raven 83.8% 86.2% 90.0% 90.6% 720P Spincalendar 77.2% 80.3% 88.7% 91.2% Jets 81.8% 84.7% 84.9% 89.0% Average 79.1% 82.1% 86.5% 88.8% Total average 77.1% 79.5% 87.2% 88.8% current distribution of the neighboring motion vectors. Then a template with adaptive shape and width for the current block is determined and is utilized to select predictors in the CS with corresponding better-match templates, so as to reduce the bits consumed for predictor index. Furthermore, a guessing strategy is performed to totally eliminate the index bits when the decoder can derive the predictor itself. Simulation results indicate that by using ATM, the best predictor can be selected in the probability up to 90%. And the results also demonstrate that the proposed scheme provides a significant bit rate reduction compared with the standard as well as other methods. ACKNOWLEDGMENT This work has been supported by the Hong Kong Applied Science and Technology Research Institute (ASTRI) in the Future Multimedia Standards Project (ART/037). REFERENCES [1] Advanced Video Coding (AVC)- 3rd Edition. ITU-T Recommendation H.264 and ISO/IEC 144962-10 (MPEG-4 Part 10), July 2004. [2] S. D. Kim and J. B. Ra, An efficient motion vector coding scheme based on minimum bitrate prediction, IEEE Transaction on Image Processing, vol. 8, no. 8, pp. 1117 1120, Aug. 1999. [3] G. Laroche, J. Jung, and B. Pesquent-Popescu, RD optimized coding for motion vector predictor selection, IEEE Transaction on Circuits and Systems for Video Technology, vol. 18, no. 9, pp. 1247 1257, Sep 2008. [4] J. J. Dai, O. C. Au, C. Pang, W. Yang, and F. Zou, Motion vector coding based on optimal predictor selection, in IEEE Pacific-Rim Conference on Multimedia, Bangkok, Dec. 2009. [5] B. Jung and B. Jeon, Pooled zero vector coding for enhanced compression of motion vectors, in Proc. IEEE Asia Pacific Conference on Circuits and Systems, 2008, pp. 1743 1746. [6] M. E. S. Kamp and M. Wien, Decoder side motion vector derivation for inter frame video coding, in IEEE International Conference on Image Processing, 2008, pp. 1120 1123. [7] T. Tan et al., Intra prediction by template matching, in Proc. ICIP 2006, Atlanta, GA, USA, Oct. 2006. [8] G. Bjontegaard, Calculation of average PSNR differences between RDcurves, in VCEG Contribution VCEG-M33, Austin, Apr. 2001.