IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

Similar documents
Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Mode Decision for H.264/AVC Using Mode Prediction

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

Reduced Frame Quantization in Video Coding

An Efficient Mode Selection Algorithm for H.264

ARTICLE IN PRESS. Signal Processing: Image Communication

An Efficient Table Prediction Scheme for CAVLC

A Fast Intra/Inter Mode Decision Algorithm of H.264/AVC for Real-time Applications

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Advanced Video Coding: The new H.264 video compression standard

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

Video Coding Using Spatially Varying Transform

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264.

Fast frame memory access method for H.264/AVC

EE Low Complexity H.264 encoder for mobile applications

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

NEW CAVLC ENCODING ALGORITHM FOR LOSSLESS INTRA CODING IN H.264/AVC. Jin Heo, Seung-Hwan Kim, and Yo-Sung Ho

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Block-based Watermarking Using Random Position Key

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile

Xin-Fu Wang et al.: Performance Comparison of AVS and H.264/AVC 311 prediction mode and four directional prediction modes are shown in Fig.1. Intra ch

Rate Distortion Optimization in Video Compression

Motion Vector Coding Algorithm Based on Adaptive Template Matching

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Performance Comparison between DWT-based and DCT-based Encoders

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

Realtime H.264 Encoding System using Fast Motion Estimation and Mode Decision

Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI)

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS

High Efficiency Video Coding (HEVC) test model HM vs. HM- 16.6: objective and subjective performance analysis

Jun Zhang, Feng Dai, Yongdong Zhang, and Chenggang Yan

Stereo Image Compression

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

STANDARD COMPLIANT FLICKER REDUCTION METHOD WITH PSNR LOSS CONTROL

Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations

Pattern based Residual Coding for H.264 Encoder *

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

Improved Context-Based Adaptive Binary Arithmetic Coding in MPEG-4 AVC/H.264 Video Codec

Video compression with 1-D directional transforms in H.264/AVC

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

Bit Allocation for Spatial Scalability in H.264/SVC

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

High Efficient Intra Coding Algorithm for H.265/HVC

Next-Generation 3D Formats with Depth Map Support

VHDL Implementation of H.264 Video Coding Standard

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

IN the early 1980 s, video compression made the leap from

A Novel Partial Prediction Algorithm for Fast 4x4 Intra Prediction Mode Decision in H.264/AVC

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

Implementation and analysis of Directional DCT in H.264

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

View Synthesis Prediction for Rate-Overhead Reduction in FTV

IBM Research Report. A Negotiation Protocol Framework for WS-Agreement

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec

Digital Video Processing

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Scalable Video Coding

An Efficient Inter-Frame Coding with Intra Skip Decision in H.264/AVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

THE H.264/MPEG-4 Part 10 AVC (or abbreviated as H.264/

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Testing HEVC model HM on objective and subjective way

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

Fast Intra Prediction Algorithm for H.264/AVC Based on Quadratic and Gradient Model

New Techniques for Improved Video Coding

Fast HEVC Intra Mode Decision Based on Edge Detection and SATD Costs Classification

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

SAD implementation and optimization for H.264/AVC encoder on TMS320C64 DSP

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

THE MPEG-2 video coding standard is widely used in

Adaptation of Scalable Video Coding to Packet Loss and its Performance Analysis

Fast Transcoding From H.264/AVC To High Efficiency Video Coding

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC

Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

IN RECENT years, multimedia application has become more

Adaptive Up-Sampling Method Using DCT for Spatial Scalability of Scalable Video Coding IlHong Shin and Hyun Wook Park, Senior Member, IEEE

Efficient Dictionary Based Video Coding with Reduced Side Information

Introduction to Video Encoding

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

Transcription:

RC24748 (W0902-063) February 12, 2009 Electrical Engineering IBM Research Report Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms Yuri Vatis Institut für Informationsverarbeitung Leibniz Universität Hannover Germany Ligang Lu, Ashish Jagmohan IBM Research Division Thomas J. Watson Research Center P.O. Box 218 Yorktown Heights, NY 10598 USA Research Division Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P. O. Box 218, Yorktown Heights, NY 10598 USA (email: reports@us.ibm.com). Some reports are available on the internet at http://domino.watson.ibm.com/library/cyberdig.nsf/home.

INTER MODE SELECTION FOR H.264/AVC USING TIME-EFFICIENT LEARNING-THEORETIC ALGORITHMS Yuri Vatis Institut für Informationsverarbeitung Leibniz Universität Hannover, Germany Ligang Lu, Ashish Jagmohan IBM T. J. Watson Research Center Yorktown Heights, USA ABSTRACT In this paper we present a novel algorithm to speed up the inter mode decision process for the H.264/AVC encoding. The proposed inter mode decision scheme determines the best coding mode between and P8x8 based on learning theoretic classification algorithms to discern between mode classes based on the evaluation of a simple set of features extracted from a motion compensated macroblock. We show that the proposed method can reduce the number of macroblocks for P8x8 mode testing by 80% on average, at the cost of only a small loss of 0.1 db in compression performance. Index Terms Mode decision, learning algorithms, H.264/AVC encoder. 1. INTRODUCTION The latest video coding standard H.264/AVC has provided state of the art tools that can be used to achieve significantly better compression performance than the previous standards over a wide range of bit rates. One of the key tools that H.264/AVC provides for higher coding efficiency is the employment of a large set of coding modes. For instance, in H.264, macroblock prediction can be done in 7 block sizes and intra- and inter-prediction modes. While the addition of coding modes provides greater flexibility to adapt to the signal characteristics for better coding gain, it also increases the encoding computational complexity enormously in searching for the best coding mode. In the H.264/AVC Reference Code [1], the macroblock mode selection is formulated as a rate-distortion optimization process and the Lagrangian minimization technique is used to minimize the cost functional. The computational complexity of the exhaustive search process is very large because it has to encode a macroblock using all possible modes to find the best mode that minimizes the rate-distortion functional. For example, each inter-coding mode needs to go through the entire encoding loop, including the steps of motion estimation, motion compensated prediction, transform, quantization, entropy coding (CABAC), inverse quantization, and inverse transform, to calculate the actual rate and distortion incurred by using this mode. Note that, even with such a high computational complexity method, this mode selection process is still not optimal since it does not take into account the rate-distortion impact of the current macroblock on other blocks which have a direct or indirect prediction dependency on it. Therefore, using a truly optimal mode selection algorithm is unrealistic, and low complexity and effective mode selection algorithms are highly desirable for deploying H.264/AVC in applications. A significant amount of research efforts has been spent on designing fast mode selection algorithms. Notably, Turaga and Chen [2] presented a pattern classification technique based mode decision scheme using maximum-likelihood (ML) criterion, but their features are not suitable for H.264/AVC mode decision. Jagmohan and Ratakonda [3] developed a supervised binary mode classification scheme for intra/inter mode selection using a learning-theoretic algorithm. More recently, Kim and Kuo [4] proposed an intra/inter mode selection scheme by classifying features derived from spatial correlation, temporal correlation, and motion activity. These classification based algorithms are proposed only for intra/inter mode decision. However the computational complexity from the inter coding mode decision is very demanding because it involves the complete inter coding process, especially the motion estimation step. In this paper, we will present a new fast inter coding mode selection scheme based on a decision tree classification algorithm. More specifically, our algorithm will analyze a subset of the prediction residues in the frequency domain and derive features that can be quickly calculated and classified by a tree structure. The test results show that our inter coding mode selection algorithm not only is time efficient but also has comparable performance as the H.264/AVC s exhaustive search mode decision algorithm. In the following, we will first provide a brief overview of the H.264/AVC macroblock coding modes and describe a new decision tree based algorithm for fast inter coding mode selection in Section 2. We will then detail our proposal and describe the feature extraction in Section 3. In Section 4 we will evaluate the performance of our new inter coding mode selection algorithm and compare it with the performance of the state of the art H.264/AVC reference encoder operated in the High-complexity mode. Finally we will summarize our

work and draw conclusions in Section 5. 2. PROPOSED MODE SELECTION FRAMEWORK The H.264/AVC standard provides seven block sizes to subdivide a macroblock in inter coding mode. The prediction for the current macroblock is generated based on data from prior frames. Operated in the High-complexity mode which provides the best quality at a given bit rate, the encoder determines the best inter mode by minimizing the Lagrangian function [5]: J(s, c, MODE QP, λ MODE) (1) SSD(s, c, MODE QP ) λ MODE R(s, c, MODE QP ) MODE [, P16x8, P8x16 and P8x8 modes], where SSD is the sum of absolute differences, s is the original signal, c is the reconstructed signal, QP is the given quantization parameter and λ MODE is the Lagrangian factor. The best inter mode is determined by performing motion estimation for all modes. Similarly, the best reference frame for a particular macroblock or a subblock is determined. Finally, the best inter mode is compared with the SKIP mode and the best intra mode. This results in the best coding efficiency but at the cost of the highest computational complexity, which is often prohibitive especially in real-time applications. Reducing the computational complexity is often done by using a subset of the available inter modes, by limiting the number of searches in the available modes based on greedy algorithms, and by approximating the rate-distortion optimization process with a less complex process. The key idea underlying the proposed mode selection approach is to develop simple heuristics such that macroblocks can be effectively classified into classes of different coding modes, for examples, predictive 16x16 () mode and predictive 8x8 (P8x8) mode. Towards this end, we propose the use of learning-theoretic approaches that operate on a set of simple macroblock features to classify the modes. In particular, our algorithm only needs to extract a set of three transform domain features from a predicted macroblock for the decision to select a certain mode. In this paper, we consider and P8x8 coding modes as the showcase of describing our mode selection method, for simplicity, and also for the following reasons. Firstly, our primary goal is to develop an efficient but also real-time capable encoder. These two modes are a popular illustrative subset of all modes and many real-time encoders, e.g. the IBM H.264/AVC encoder, often use these two modes. Secondly, the learning-theoretic approaches are most suitable for binary decisions. However, the method and the classifier trees can be readily extended to include other subsets based on the mode decision. The decision between intra and inter modes is not regarded in this paper and the reader is referred to [3]. The proposed algorithm can be described in the following steps: 1. Perform the motion compensated prediction for the macroblock to be coded in mode. 2. Based on the extracted features, either terminate here or go to the next step. 3. Perform the motion compensated prediction for the macroblock to be coded in P8x8 mode. 4. Find the best mode based on rate-distortion optimization. Note that for the last step other functions such as SAD or SATD as used in the Low-complexity mode [5] can also be applied, which further reduce the computational complexity. However, the last decision can be done independently of the second step, therefore we set our focus on the feature extraction and use rate-distortion optimization technique as well as common motion estimation techniques in order to compare our encoder with the JVT encoder operated in the Highcomplexity mode. 3. FEATURE EXTRACTION For a given macroblock, the selection between and P8x8 modes is made by employing supervised binary classification [6] using a set of transform domain features of the motion compensated macroblock. Consider a data set {x i } n i1, such that each element x i belongs to one of two classes C 1, C 2. The aim of binary classification is to infer the class to which each x i belongs, on the basis of a set of extracted features {f j (x i )} m j1. Supervised binary classification does this using two phases - a training phase and a test phase. During the training phase, a set of training data {y k } K k1 with known class membership labels (C 1 or C 2 ) is presented to the classifier. The training data and the class labels are used by the classifier to learn a classification function Λ( ) such that classification rules y k C 1 y k C 2 ) Λ ({f j (y k )} j 1 ) Λ ({f j (y k )} j 0 minimize the misclassification rate on {y k }. The form of the classification function Λ is constrained by the type of classifier used, for example, linear discriminant analysis yields classifier functions which are linear in the feature space. During the test phase, the learned classification function is used to classify the unknown data {x i }. The key issues in supervised binary classification are the choice on an appropriate classifier type, and the extraction of a set of features which allows good discrimination between the two classes.

For /P8x8 discrimination, the motion compensated 16x16 macroblock m is used. The center consisting of 8x8 pels of the current macroblock m is denoted as m 8. Denoting further the 8x8 Hadamard transform of m 8 as T H m 8, the following set of features was found to be effective: 1. The macroblock horizontal, vertical and low twodimensional frequency content: f 1 T H m 8 (0, j) j1 i1 j1 T H m 8 (i, 0) i1 2. The macroblock low horizontal high vertical and low vertical high horizontal frequency content: f 2 f 1 i4 j1 i1 j4 T H m 8 (i, j) 16 8 16 8 Hadamard Transform f 1 f 2 f 3 Fig. 1. Decision features. 3. The macroblock high frequency content: f 3 i4 j4 These features are also presented in Figure 1. Intuitively, the mode performs better when no high frequencies are contained in the motion compensated macroblock. The presence of high frequencies in the motion compensated macroblock indicates that different parts of this macroblock move differently and therefore P8x8 mode should be tested. In order to keep the complexity of our algorithm low, the 8x8 Hadamard transform is only applied to the center of the motion compensated macroblock. The same type of the classifier trees as in [7] was used. During the training phase, macroblocks from a large number of video sequences were used as a training set, and the /P8x8 mode decisions selected by the rate-distortion optimized reference H.264/AVC encoder were used as the known class labels. The classifier tree was trained on this data using the tree-partitioning procedure described in [7]. The training was performed for all quantization parameters in the range from 18 to 45, resulting in a reasonable set covering from the visually lossless quality at one end to where no P8x8 is used due to less efficiency in the rate-distortion sense at the other end. Roughly, the structure of the classifier tree could be subdivided into four types, where the importance of the feature f 3 decreased with the increase of the quantization parameter value. The decision thresholds were found to vary with changing quantization parameter. One such exemple tree is presented in Figure 2. Note that the output implies that the P8x8 mode is not tested. The P8x8 output implies that the P8x8 mode is tested and then the /P8x8 decision is made on the basis of rate-distortion cost. f 1 < t 1 f 2 < t 2 f 3 < t 3 P8x8 Fig. 2. Example of a learned classifier tree. For encoding a macroblock from a new video sequence, the feature set {f j } 3 j1 described above is extracted and the learned classifier tree is used to infer the -P8x8 decision. Since the learning is performed completely off-line, the proposed mode selection algorithm requires minimal complexity for making the mode selection. 4. EXPERIMENTAL RESULTS To evaluate the coding efficiency of the new decision tree based fast inter coding mode selection algorithm, we used

Y PSNR [db] 40 38 36 34 32 30 Comparison performance for the evaluated sequence 16x16 16x16 and 8x8 proposed 36.5 36 35.5 35 2200 2400 2600 28 500 1000 1500 2000 2500 3000 3500 4000 bit rate [kbit/s] Fig. 3. Compression performance for proposed encoder, R-D optimized -P8x8 encoder, and encoder. a composite sequence consisting of 2583 frames of the following ten CIF sequences: Formula1, Flowergarden, Mobile&Calendar, Concrete, Husky, Basketball, TableTennis, Stefan, Tempete and Bus. All simulations were performed using the Baseline Profile of H.264/AVC operated in the High-complexity mode. In Figure 3, three operational ratedistortion curves are presented, showing the compession performance of the reference encoder with enabled intra 16x16, intra 4x4, SKIP and modes, the performance of the reference encoder enhanced with additional P8x8 mode and the performance of our encoder. As can be seen, the performance gap between the reference encoder with P8x8 mode and our encoder is approximately 0.1 db while the performance gap between the reference encoder with P8x8 and without P8x8 mode is 0.4 db at same bit rates. QP 8x8 tested 8x8 chosen 20 20.4% 10.9% 24 28.2% 12.3% 28 18.8% 10.1% 32 13.5% 8.0% 36 18.5% 7.0% 40 6.5% 3.5% 44 2.6% 1.3% Table 1. Number of tested and chosen macroblocks in P8x8 mode. Table 1 evaluates the reduction of computational complexity compared to the reference encoder with enabled P8x8 mode. For a representative subset of quantization parameters, the number of tested P8x8 modes as well as selected P8x8 modes is given in columns 2 and 3, respectively. In the range from 20 to 36, the average number of macroblocks, where P8x8 mode was additionally evaluated, is approximately 20% for our proposal compared to 100% for the reference encoder. The number of chosen P8x8 macroblocks is 10% on average compared to approximately 30% for the reference encoder. Obviously, 10% of the chosen macroblocks could cover almost all of the coding gain, which shows that our fast inter mode selection algorithm is very effective and efficient. 5. SUMMARY In this paper, we proposed a novel algorithm for fast inter mode selection for and P8x8 modes based on a decision tree classification algorithm. The extracted features are derived from a motion compensated macroblock transformed into the frequency domain. The P8x8 mode for a macroblock is tested only in the case that the amount of the high frequencies is higher than a particular threshold, indicating that different parts of the macroblock move differently. Experimental results show that our proposed algorithm only needs one fifth of the complexity of the reference encoder for P8x8 mode testing while capturing 80% of its coding efficiency. 6. REFERENCES [1] H.264/AVC reference software version JM14.0, available at http://iphome.hhi.de/suehring/tml/download/old jm/, July 2008. [2] D. Turage and T. Chen, Classification based mode decisions for video over network, IEEE Trans. On Multimedia, vol. 3, no. 1, pp. 41 52, March 2001. [3] A. Jagmohan and K. Ratakonda, Time-efficient learning theoretic algorithms for H.264 mode selection, in Proc. IEEE Int. Conference on Image Processing (ICIP), Singapore, October 2004, pp. 749 752. [4] C. Kim and C.-C. J. Kuo, Feature-based intra-/inter coding mode selection for H.264/AVC, IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 4, pp. 441 453, April 2007. [5] K.-P. Lim, G. Sullivan, and T. Wiegand, Text description of joint model reference encoding methods and decoding concealment methods, in JVT-O097, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, Busan, Korea, April 2005. [6] R. O. Duda, P. E. Hart, and D. G. Stock, Pattern classification, Wiley, 2001. [7] P. Chou, Optimal partitioning for classification and regression trees, IEEE Trans. Pat. Anal. Mach. Intel., vol. 13, no. 4, pp. 340 354, April 1991.