Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

Similar documents
Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

A Binarization Algorithm specialized on Document Images and Photos

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Parallelism for Nested Loops with Non-uniform and Flow Dependences

An Approach to Selective Intra Coding and Early Inter Skip Prediction in H.264/AVC Standard

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Classification Based Mode Decisions for Video over Networks

An Image Fusion Approach Based on Segmentation Region

2 optmal per-pxel estmate () whch we had proposed for non-scalable vdeo codng [5] [6]. The extended s shown to accurately account for both temporal an

Support Vector Machines

Hybrid Non-Blind Color Image Watermarking

Improved H.264 Rate Control by Enhanced MAD-Based Frame Complexity Prediction

Shape Representation Robust to the Sketching Order Using Distance Map and Direction Histogram

Feature Reduction and Selection

Combined Rate Control and Mode Decision Optimization for MPEG-2 Transcoding with Spatial Resolution Reduction

EFFICIENT H.264 VIDEO CODING WITH A WORKING MEMORY OF OBJECTS

Efficient Video Coding with R-D Constrained Quadtree Segmentation

Subjective and Objective Comparison of Advanced Motion Compensation Methods for Blocking Artifact Reduction in a 3-D Wavelet Coding System

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Optimal Algorithm for Prufer Codes *

Fast CU Partition Strategy for HEVC Intra-Frame Coding Using Learning Approach via Random Forests

CHAPTER 3 ENCODING VIDEO SEQUENCES IN FRACTAL BASED COMPRESSION. Day by day, the demands for higher and faster technologies are rapidly

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Shape-adaptive DCT and Its Application in Region-based Image Coding

TN348: Openlab Module - Colocalization

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Load Balancing for Hex-Cell Interconnection Network

Edge Detection in Noisy Images Using the Support Vector Machines

S1 Note. Basis functions.

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

An Image Compression Algorithm based on Wavelet Transform and LZW

An Entropy-Based Approach to Integrated Information Needs Assessment

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

ALGORITHM FOR H.264/AVC

Mathematics 256 a course in differential equations for engineering students

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Lecture 5: Multilayer Perceptrons

A Gradient Difference based Technique for Video Text Detection

A Gradient Difference based Technique for Video Text Detection

Parallel matrix-vector multiplication

Problem Set 3 Solutions

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

[33]. As we have seen there are different algorithms for compressing the speech. The

Pictures at an Exhibition

A WAVELET CODEC FOR INTERLACED VIDEO

Dynamic Code Block Size for JPEG 2000

Reducing Frame Rate for Object Tracking

Incorporating Feature Point-based Motion Hypotheses in Distributed Video Coding

OPTIMAL VIDEO SUMMARY GENERATION AND ENCODING. (ICIP Draft v0.2, )

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Linear Hashtable Motion Estimation Algorithm for Distributed Video Processing

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Cluster Analysis of Electrical Behavior

The Codesign Challenge

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 11, NOVEMBER

Conditional Speculative Decimal Addition*

Classifier Selection Based on Data Complexity Measures *

Array transposition in CUDA shared memory

Wishing you all a Total Quality New Year!

Detection of Double AVC/HEVC Encoding

Key-Selective Patchwork Method for Audio Watermarking

Query Clustering Using a Hybrid Query Similarity Measure

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

End-to-end Distortion Estimation for RD-based Robust Delivery of Pre-compressed Video

COMPLEX WAVELET TRANSFORM-BASED COLOR INDEXING FOR CONTENT-BASED IMAGE RETRIEVAL

A DCVS Reconstruction Algorithm for Mine Video Monitoring Image Based on Block Classification

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

3D vector computer graphics

Detection of an Object by using Principal Component Analysis

Private Information Retrieval (PIR)

Analysis of Continuous Beams in General

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

X- Chart Using ANOM Approach

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

A Deflected Grid-based Algorithm for Clustering Analysis

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Programming in Fortran 90 : 2017/2018

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

Corner-Based Image Alignment using Pyramid Structure with Gradient Vector Similarity

High-Boost Mesh Filtering for 3-D Shape Enhancement

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

BITRATE ALLOCATION FOR MULTIPLE VIDEO STREAMS AT COMPETITIVE EQUILIBRIA

Meta-heuristics for Multidimensional Knapsack Problems

A fast algorithm for color image segmentation

The Research of Support Vector Machine in Agricultural Data Classification

MOTION PANORAMA CONSTRUCTION FROM STREAMING VIDEO FOR POWER- CONSTRAINED MOBILE MULTIMEDIA ENVIRONMENTS XUNYU PAN

An Improved Image Segmentation Algorithm Based on the Otsu Method

Wavelet-Based Image Compression System with Linear Distortion Control

CS 534: Computer Vision Model Fitting

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Face Recognition University at Buffalo CSE666 Lecture Slides Resources:

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Transcription:

Fast Intra- and Inter-Predcton Mode Decson n H.264 Advanced Vdeo Codng Mehd Jafar Islamc Azad Unversty, S and R Branch Department of Communcaton Engneerng P.O.Box 455-775, Tehran, Iran mjafar@mal.uk.ac.r Shohreh Kasae Sharf Unversty of Technology Department of Computer Engneerng P.O.Box 365-957, Tehran, Iran skasae@sharf.edu Abstract H.264/AVC, the latest vdeo codng standard, acheves better vdeo compresson rates snce t supports new features such as a large number of ntra- and nter- predcton canddate modes. H.264/AVC adopts rate-dstorton optmzaton (RDO) technque to obtan the best ntra- and nter-predcton, whle maxmzng vsual qualty and mnmzng the requred bt rate. However, the RDO reduces the encodng speed va the exhaustve evaluaton of all canddate modes. In ths paper, n conjuncton wth an overvew of proposed algorthms for fast ntra-mode decson n H.264/AVC encoders, we decrease the encodng tme by reducng the computatonal complexty of the cost functon and the number of canddate modes. Also, a new algorthm based on the propertes of smlar predcted pxels and the feature of reference pxels s proposed. The proposed algorthms use spatal and transform doman features (such as edge nformaton, smple drectonal propertes of ntra-mode, feature of reference pxels and adjacent blocks nformaton n the ntra- and nter-frame) to select a subset of all canddate modes. Subsequently, the RDO procedure uses the reduced subset of all canddate modes for extractng the fnal mode. Expermental results show that our algorthm, compared to the RDO and some other fast algorthms, reduces the total encodng tme wth neglgble loss n PSNR and a slghtly ncreased btrate. Keywords: H.264/AVC, Intra-predcton, RDO, smlar predcted-pxels.. Introducton As recent multmeda applcatons (usng varous types of networks) are growng rapdly, vdeo compresson requres hgher performance as well as new features. The newest vdeo codng standard s developed by the jont of vdeo teams of ISO/IEC MPEG and ITU_T VCEG as the nternatonal standard 4496-0 (MPEG-4 part 0) advanced vdeo codng (AVC) [, 2]. H.264/AVC has ganed more and more attenton; manly due to ts hgh codng effcency (the average btrate savng up to 50% as compared to H.263+ and MPEG-4 Smple Profle), mnor ncrease n decoder complexty compared to exstng standards, adaptaton to delay constrants (the low delay mode), error robustness, and network frendlness [, 2]. Table [3] and Fgure [4] show the performance comparsons usng MPEG-2, MPEG-4 (ASP), and H.264/AVC. To acheve outstandng codng performance, H.264/AVC employs several powerful codng technques such as 4x4 nteger transform, nter-predcton wth varable block-sze moton compensaton, moton vector of quarter-pel accuracy, n-loop deblockng flter, mproved entropy codng such as context-adaptve varable-length codng (CAVLC) and content-adaptve bnary arthmetc codng (CABAC), enhanced ntra-predcton, multple reference pcture, and the forth. Due to ths new features, encoder computatonal complexty s extremely ncreased compared to prevous standards. Ths makes H.264/AVC dffcult for applcatons wth low computatonal capabltes (such as moble devces). Thus untl now, the reducton of ts complexty s a challengng task n H.264/AVC. Table : Average bt-rate reducton compared to pror codng schemes.

Fgure: Performance comparson of dfferent vdeo codng standards. Among many new features, the ntra-predcton technque s recognzed to be one of the man factors that contrbute to the success of H.264/AVC. H.264/AVC employs the Lagrangan RDO method to fnd out the best codng mode of ntrapredcton wth hghest codng effcency. Fgure 2 [] shows the RDO process. RDO technque requres a lot of computatons snce t tests the encodng process wth all possble codng modes of ntra-codng, and calculates ther RD costs to choose the mode havng the mnmum cost. The ntra-predcton mode decson s very complex and the number of computng RD cost values for luma and chroma of a macroblock s 592 [5]. Therefore, the computatonal burden of ths type of brute force-searchng algorthm s far more demandng than any exstng vdeo codng algorthm. Input Vdeo Encodng Resdual Data Integer Transform / Quantzaton Varable Length Codng Rate Dstorton Mode Selecton Compute RD Cost Inverse Integer Transform / Inverse Quantzaton Fgure 2: Computaton of RD cost []. To reduce the computatonal complexty, many algorthms (such as fast moton estmaton, fast nter-mode predcton, and fast ntra-predcton) have been proposed. Fast mode selecton for ntra-predcton s consdered n ths paper; whch s a challengng subject n H.264/AVC, snce ntra-predcton s a new topc n H.264/AVC codng wth respect to other standards such as MPEG-/2/4 and H.26/H.263 and so far no prevous work exsts for that. Fast ntra-mode decson algorthms usng edge detecton hstogram and local edge detecton are proposed n [, 6, 7]. However, ther preprocessng stages stll consume a codng tme to detect the edge drecton and to classfy t nto a lmted drecton. The performance of those methods s about 20~30% (or 55~65%) faster than the RDO method at the cost of 2% (or 5%) extra bts. There exst fast algorthms to select the optmal ntra-predcton mode usng smple drectonal masks n [8] wth savng tme of 70%, and statstcal-based methods n [9] wth savng tme of 45%. Another fast ntra-mode decson scheme s proposed n [0], where the encodng speed s approxmately 30% faster than that of the RDO method. A new fast ntra-predcton algorthm based on macroblock propertes (FIPAMP) s presented n []. Ths algorthm can acheve 0% to 40% of computaton reducton whle mantanng smlar PSNR and bt rate performance of H.264/AVC codes. In [2], an effcent ntrapredcton (EIP) algorthm based on early termnaton, selectve computaton of hghly probable modes, and partal computaton of the cost functon s presented. Also, an mproved cost functon to mprove the codng performance s proposed n [3]. In [4], a fast algorthm based on the local edge nformaton obtaned by calculatng edge feature parameters, and subsamplng of matchng operatons s presented. That method can reduce the encodng tme about 26% wth less than.4% used extra bts and no more than 0.7 db PSNR s sacrfced. In ths paper, we frst present our new deas to reduce the computatonal complexty of some prevously proposed algorthms such as those presented n [, 6, 7, 4, 9]. Then, we propose a method for fast ntra-mode decson n whch the number of 4x4 and 6x6 ntra-mode for luma and 8x8 ntra-mode for chroma has been reduced usng drectonal edges and spatal correlaton between current block and top-left blocks n I-frame. The proposed algorthm s based on the fact that for any ntra-mode some pxels of a block are predcted wth smlar values such that the dfference values between them for any mode must be zero. These smlar pxels are along the exstng edge n the block. 2

The cost for each ntra-mode s extracted from the sum of absolute dfferences (SAD) of these smlar pxels. Smlar predcted pxels are extracted from JM7. (the JVT reference software [5]). After computng the smple costs the mnmum and second mnmum of cost are used as the two canddate modes. These canddate modes wth adjacent blocks nformaton are used to decde about a subset of canddate modes for the fnal challenge. In the worst case, for ntra-predcton of 4x4 luma and 6x6 luma only 3 and 2 modes, and for 8x8 chroma 2 modes are left for fnal mode decson made by RDO calculatons. The proposed method consumes less encodng tme by reducng the number of RDO computatons and by reducng the tme to obtan drectonal nformaton. We have verfed the proposed algorthm by mplementng t on JM7. reference software and comparng t wth the case of RDO search. Smulaton results show the proposed method reduces the encodng tme up to 47% wth loss n PSNR and neglgble ncrease of requred btrate. The remanng parts of the paper are organzed as follows. We revew the ntra-predcton scheme of H.264/AVC and the mode selecton method based on RDO technque n Sectons 2 and 3, respectvely. Secton 4 presents the new and mproved methods that are proposed for fast ntra-mode decson methods. Smulaton results are gven n Secton 5 and fnally Secton 6 concludes the paper. 2. Intra-Predcton n H.264/AVC H.264/AVC defnes a block-based hybrd vdeo codec. It s mostly smlar to the prevous standards (H.26, H.263, MPEG-, MPEG-2, and MPEG-4 Part 2-vdeo). The elements common to all vdeo codng standards that are presented n the current H.264/AVC recommendaton are: an MB s 6x6 n sze; luma s represented wth hgher resoluton than chroma wth 4:2:0 subsamplng; moton compensaton and block transforms are followed by scalar quantzaton and entropy codng, moton vectors are predcted from the medan of the moton vectors of neghborng blocks, and so on. Some new functonal elements such as ntra-/nter-predcton, nteger transformaton, quantzaton, entropy codng are enhanced wth some mportant changes that dstngush ths standard from ts predecessors. In common wth earler standards, H.264/AVC does not defne the encoder, but defnes the syntax of an encoded vdeo btstream together wth the method of decodng the btstream [6]. The codec combnes ntra-pcture predcton wth nter-pcture predcton to explot the spatal and temporal redundancy, respectvely. Intra-predcton s based on the observaton that adjacent macroblocks tend to have smlar propertes. Therefore, as a frst step n the encodng process for a gven macroblock, one may predct the macroblock of nterest from the surroundng macroblocks. The dfference between the actual macroblock and ts predcton s then coded; whch results n fewer bts to represent the macroblock of nterest. Predcton may be formed for each 4x4 luma block (I4MB), 6x6 luma MB (I6MB), and 8x8 chroma block. The resdual between the current MB and ts predcton s then transformed, quantzed, and entropy coded. For I4MB, whch are selected n non-homogeneous area, there are 9 drectonal predcton modes, whereas for I6MB, whch are selected n relatvely homogeneous areas, there are 4 drectonal predcton modes. For 8x8 chroma blocks, there are 4 drectonal predcton modes, and the same mode s appled to two chromnance components (U and V). Chromnance s encoded n the same way as I6MB. For each macroblock, one predcton mode whch defnes parttonng of the macroblock s transmtted. For each mode and partton, one of several predcton drectons s transmtted as well. In partcular, for I6 only one drecton for Luma s encoded, whle t s necessary to transmt 6 drectons for I4. 2. I4MB Predcton Modes For predcton of 4x4 lumnance blocks, the 9 drectonal modes consst of a DC predcton (Mode 2) and 8 drectonal modes; labeled 0,, 3, 4, 5, 6, 7, and 8 as shown n Fgure 3(a). In Fgure 3(b), the block (values of pxels a to p ) s to be predcted usng A to Q. Note that pxels A to Q from neghborng blocks have already been encoded and may be used for predcton. (a) (b) Fgure 3: (a) Intra-predcton modes for 4x4 lumnance blocks. (b) Labelng of predcton samples. Note that n some cases, not all of the samples A-Q are avalable wthn the current slce. In order to preserve ndependent decodng of slces, only samples wthn the current slce are avalable for predcton. The DC predcton (mode 2), useful for those blocks wth lttle or no local actvtes, s modfed dependng on whch samples A-M are avalable; the other modes (-8) may only be used f all of the requred predcton samples are avalable 3

(except that, f E, F, G, and H are not avalable, ther value s coped from sample D). The arrows n Fgure 5 ndcate the drecton of predcton n each mode. For modes 3-8, the predcted samples are formed from a weghted average of the predcton samples A-Q. The encoder may select the predcton mode for each block that mnmzes the resdual between P and the block to be encoded. Fgure 4 shows a lumnance macroblock n a QCIF formatted frame and a 4x4 luma block that s to be predcted. The samples above and to the left have prevously been encoded and reconstructed and are therefore avalable n the encoder and decoder to form a predcton reference. The predcton block P s calculated based on the samples labeled A-Q. Fgure 4: (a) 7st frame of orgnal walkng person sequence. (b) Orgnal macroblock. (c) 4x4 luma block to be predcted [6, 7]. Fgure 5: 4x4 luma predcton modes: (a) Mode 0 (vertcal). (b) Mode (horzontal). (c) Mode 2 (DC). (d) Mode 3 (dagonal down-left). (e) Mode 4 (dagonal down-rght), (f) Mode 5 (vertcal-rght). (g) Mode 6 (horzontal-down). (h) Mode 7 (vertcal left). () Mode 8 (horzontal-up) [2]. The 9 predcton modes (0-8) are calculated for the 4x4 block shown n Fgure 4. For example, n DC (mode 2) and dagonal down/rght (mode 4) predcton, dfferent samples are predcted by the algorthm llustrated n Fgure 6. Fgure 7 shows the predcton block P created by each of the 9 predctons. The sum of absolute error (SAE) for each predcton ndcates the magntude of the predcton error. 4

// Mode 0; make DC predcton s0 = 0; f (block_avalable_up && block_avalable_left) { s0 = (A+B+C+D+I+J+K+L+ 4)/(8); } else f (!block_avalable_up && block_avalable_left) { s0 = (I + J + K + L + 2)/4; } else f (block_avalable_up &&!block_avalable_left) { // left edge s0 = (A + B + C + D + 2)/4; } else //f (!block_avalable_up &&!block_avalable_left) { // top left corner, nothng to predct from s0 = 28; } for (j=0; j < 4; j++) { for (=0; < 4; ++) { // store DC predcton mg->mprr[dc_pred][][j] = s0; } } // Mode DIAG_DOWN_RIGHT_PRED f (block_avalable_up && block_avalable_left && block_avalable_up_left) { m = (L + 2*K + J + 2) / 4; =n = (K + 2*J + I + 2) / 4; e =j =o = (J + 2*I + Q + 2) / 4; a=f=k=p= (I + 2*Q + A + 2) / 4; b=g=l = (Q + 2*A + B + 2) / 4; c=h= (A + 2*B + C + 2) / 4; d = (B + 2*C + D + 2) / 4; } Fgure 6: A part of proposed ntra-predcton algorthm n JM7. [5]. (a) (b) (c) (d) (e) (f) (g) (h) () Fgure 7: Predcton blocks (4x4): (a) Mode 0, SAE=325. (b) Mode, SAE=340. (c) Mode 2 (DC), SAE=342. (d) Mode 3, SAE=32. (e) Mode 4, SAE=352. (f) Mode 5, SAE=346. (g) Mode 6, SAE=347. (h) Mode 7, SAE=39. () Mode 8, SAE=358[8]. 5

2.2 I6MB Predcton Modes For regons wth less spatal detals (.e., flat regons), H.264/AVC supports 6x6 ntra-codng; n whch one of four predcton modes (DC, vertcal, horzontal and planar) s chosen for the predcton of the entre lumnance component of the macroblock as shown n Fgure 8 [8], as: Mode 0 (vertcal): extrapolaton from upper samples (H). Mode (horzontal): extrapolaton from left-hand samples (V). Mode 2 (DC): mean of upper and left-hand samples (H+V). Mode 3 (Plane): a lnear plane functon s ftted to the upper and left-hand samples H and V (works well n areas wth smoothly-varyng lumnance). JA B C D E F G H J Fgure 8: Intra 6x6 predcton modes: (a) Mode 0 (vertcal). (b) Mode (horzontal). (c) Mode 2 (DC). (d) Mode 3 (plane) [8]. Fgure 9 shows a lumnance macroblock wth the prevously-encoded samples at the upper and left-hand edges. The results of predcton (shown n Fgure 0) ndcate that the best match s gven by mode 3. Intra 6x6 mode works best n homogeneous areas of an mage. Fgure 9: A 6x6 macroblock. Fgure 0: Intra 6x6 predctons: (a) Mode 0 (vertcal), SAE=8990. (b) Mode (horzontal), SAE=0898. (c) Mode 2 (DC), SAE=20. (d) Mode 3 (plane), SAE=6264[8]. 2.3 8x 8 Chroma Predcton Modes H.264/AVC supports four chroma predcton modes for 8x8 chromnance blocks, smlar to that of the I6MB predcton, except that the order of mode numbers s dfferent: DC (Mode 0), horzontal (Mode ), vertcal (Mode 2), and plane (Mode 3). The same predcton mode s always appled to both chroma blocks. The chroma predcton s ndependent from luma predcton. Fnally, for I-frames, whle all MBs are predcted as Intra, H.264/AVC encoder encodes the best mode usng all mode combnatons of luma and chroma and chooses the one that gves the best RDO performance. For P-frames, ntra- and nterpredcton s done and RDO used for best predcton. Here, two sequences of QCIF and CIF formatted frames are encode usng JM7.. The results are shown for two I- and P-frame n Fgures and 2, respectvely. 6

(a) (b) Fgure : I-frame of foreman sequence (QCIF). (a) I4MB and I6MB predcton mode decson. (b) I4MB and I6MB dvsons. (a) (b) Fgure 2: P-frame of football sequence (CIF). (a) I4MB and I6MB (yellow) and nter-predcton (blue) predcton mode decson. (b) I4MB, I6MB, and nter-mode dvsons. 3. RDO Procedure The RDO procedure to encode one MB n an I-frame s gven below [5]. a) Search the best ntra-mode for a 4x4 luma block among 9 modes that produces the mnmum rate-dstorton cost gven by: J ( s, c, MODE QP, λ mod ) = SSD( s, c, MODE QP) + λ. R( s, c, MODE QP) () e ( QP 2)/ 3 Where QP s the macroblock quantzaton parameter, λ = 0.85 2 s the Lagrangan multpler, and MODE ndcates one of the 9 predcton modes of a 4x4 luma block. R (.) represents the rate,.e., the number of bts assocated wth chosen MODE. SSD (.) denotes the sum of the squared dfferences between the orgnal 4x4 block lumnance sgnal denoted by s and ts reconstructed sgnal denoted by c, computed as: 4,4 x=, y = MODE 2 SSD ( s, c, MODE QP) = ( s( x, y) c( x, y, MODE QP) (2) 7

b) As contrary to the RDO technque for ntra 4x4 luma block mode decson, determne the best ntra-mode for a 6x6 macroblock among 4 modes by choosng the mode those results n the mnmum sum of absolute transformed dfference (SATD) gven by: SATD = ( x, y) b k T{ I( x, y) P( x, y)} (3) Where I and P represent the true and predcted pxel values, respectvely, and T denotes the Hadamard transform. c) Compare the RD cost for the two best modes,.e., the I4MB mode obtaned from Step and the I6MB mode obtaned from Step 2, and choose the best one as the macroblock predcton mode. d) Determne the best ntra-mode for 8x8 chroma block among 4 modes, as I6, by mnmzng Equaton (). Also, n the H.264 JVT reference software JM7.3 [5], the full search (FS) algorthm s used to examne all the possble ntra-predcton modes to fnd the best modes. The steps for best ntra-mode are smlar to RDO, but I6MB decson procedure (part (b)) can be summarzed as:. Generate 4 predcton MBs accordng to 4 modes of I6MB and then calculate ther resdual MBs. For each resdual MB: 2. Perform the Hadamard transform for each 4x4 block. 3. Extract all DCs from ths sxteen 4x4 blocks and dvde them by 4, to form another 4x4 block. Perform the Hadamard transform for ths 4x4 block. 4. Sum up the absolute value of all Hadamard transform coeffcents; use the summaton as the cost. The best I6MB s the mode wth the smallest cost. For the FS algorthm, part (a) and (c) are smlar to the RDO algorthm. Also n a P-frame, ntra- or nter-predcton can be selected. For ntra-predctve modes the mentoned procedure s used. For nter-predctve modes, the moton estmaton s done wthn a search range for the multple reference frames. At last, the best predcton mode among all possble ntra-/nter-predctve modes s acheved by mnmzng Equ. (), where SSD s defned as: SSD( s, c, MODE QP) = + 6,6 y x=, y= 8,8 u x=, y= + 8,8 ( s ( x, y) c ( x, y, MODE QP)) ( S ( x, y) c ( x, y, MODE QP)) ( S ( x, y) c ( x, y, MODE QP)) v x=, y= y U v Accordng to the RDO procedure of ntra-predcton n H.264/AVC, the number of mode combnatons for luma and chroma blocks n a macroblock s N8x(6xN4 + N6), where N8, N4, and N6, denote the number of modes for 8x8 chroma blocks, and 4x4 and 6x6 luma blocks, respectvely [5]. In other words, for a macroblock to be ntra-coded wth the best mode n H.264/AVC, the RDO procedure wll perform 592 rate-dstorton computatons for comparson. As a result, the complexty of the encoder s extremely hgh. To reduce the encodng complexty wth lttle RD performance degradaton, some fast ntra-mode decson methods and new trends to mprove them are proposed n the next sectons. 2 2 2 (4) 4. Proposed Methods for Fast Intra-Predcton Mode Decson Ths secton presents some fast ntra-predcton algorthms and our trends to mprove them. Also, a new fast algorthm s presented that s based on smlar predcted pxels. Ths s motvated by some observatons n our experments. The proposed method s based on several facts that we observed from the statstcs of dfferent sequences that: a) For ntra-predcton of lumnance samples the probablty of 4x4 block sze s sgnfcantly hgher than 6x6 block sze at usual quantzaton parameters (20~35). Ths fact s shown for a wde varety of nputs usng JM 7.. Fgure 3 shows the total number of 4x4 and 6x6 ntra-coded macro blocks at dfferent quantzaton parameters (QPs). Therefore fast detecton of 4x4 ntra-predcton mode can sgnfcantly mprove the encodng speed at low QP, whle 6x6 ntrapredcton at large QPs. b) The predcton modes of each block are correlated wth those of neghborng 4x4 lumnance blocks. The statstcs generated usng JM 7. encoder [5] shows that for a wde varety of nputs large neghborng blocks have the same I4 mode. There are four possble types of 4x4 blocks n a frame, based on ther locaton n the frame (see Fg. 4). 8

6000 5000 6x6 MB 4x4 MB 4000 Number of MB 3000 2000 000 0 0 5 20 25 30 35 40 45 50 Quantzaton Parameter Fgure 3: Number of 4x4 and 6x6 ntra-coded macro blocks at dfferent quantzaton parameters The 4x4 blocks of category A for whch nether top nor left block s present, of category B when left block s avalable but top block s not present, of category D when both top and left blocks are avalable. Among these, the 4x4 blocks of category D are sgnfcantly more n numbers and most of the computatons are spent on decdng the approprate mode for these blocks. From these observatons, for each 4x4 D luma block, we obtan two canddate modes from adjacent blocks,.e., upper block and left block. For C category only left block and for B category only top block s consdered as the canddate mode. Fgure 4: Types of 4x4 block based on avalablty of top and left 4x4 blocks[9]. c) Normally, pxels along the drecton of local edge have smlar values. Therefore, a good predcton can be acheved f we predct the pxels usng those neghborng pxels that are n the same drecton of the edge. Generally, drectonal correlaton of each block s consstent wth drectons of the edges. In [], a fast ntra-mode decson method s proposed, whch s based on edge detecton usng the Sobel operaton and edge drecton hstogram (EDH). d) The resdue values of ntra-predcton are usually large compared to nter-predcton usng moton estmaton (ME) technques. e) The optmal mode (found by full search) and other good (second or thrd best) modes are most lkely n smlar drectons. f) The drecton features of 4x4 blocks can be preserved roughly after down-samplng. g) There are a total of 3 reference pxels for ntra predcton of a 4x4 luma block, whch locate at the up and the left of the 4x4 luma block. Expermental results show that the reference pxels of a 4x4 luma block are smlar wth each other wth a hgh probablty [22]. Based on these observatons, we propose a fast ntra-predcton mode selecton algorthm. In ths secton some new deas are combned wth the fast mode selecton algorthms ntroduced n [, 6, 7, 4, 9] to mprove ther effcency. Also, a new fast mode selecton for I4MB and I6MB based on analyses of smlar predcted pxels are proposed. 4. Improved Pan s Method for Fast Decson of I4MB Pan et al. n [] present a fast mode selecton for ntra-predcton method n whch the average edge drecton of a gven block s measured. The Sobel operators are frst used to obtan drectonal vector of each pxel n a block by: v D = dx, dy } (5) Where the Sobel operator are:, j {, j, j dx dy, j, j = P = P, j+ +, j + 2 P + 2 P, j+ +, j + P + P +, j + +, j+ P P, j, j 2 P 2 P, j, j P P +, j, j + (6) Where dx, j and dy, j represent the degree of dfference n vertcal and horzontal drectons, respectvely. Therefore, the ampltude and angle of each edge vector are calculated by: 9

r Amp ( D j ) = dx, j + dy,, j (7) and, Whle Ang (.) s ftted nto one of the followng bns: r o 80 dy, Ang ( D, j ) = arctan( π dx j, j ). (8) o o a = ( 03.3, 76.6 ] 0 o o a = ( 3.3,3.3 ] o o a = (35.8,54.2 ] 3 o o a = ( 54.2, 35.8 ] 4 o o a = ( 76.6, 54.2 ] 5 o o a = ( 35.8, 3.3 ] 6 o o a = (54.2, 76.7 ] 7 o a = (3.3,35.8 8 o) (9) Then, the edge drectonal hstogram of the block s analyzed as: Where: Hsto( k) = ( m, n) set ( k ) r Amp( D m, n) r SET k) {, j Ang( D m ) a )} (, n k (0) () Where k=0,, 3, 4, 8 refers to the 8 drectonal predcton modes. The edge drecton hstogram (EDH) counts the number of pxels wth smlar edge drectons. Therefore, the cell k wth the maxmum ampltude ndcates that there s a strong edge along that drecton that s used for makng decson about the preferable drecton mode. Fgure 6 shows the edge drecton hstogram of Fgure 5. Fgure 5: An example of 4x4 edge patterns and ther proffered ntra-predcton drectons. 0

Fgure 6: Edge drecton hstogram of Fgure 5. In Pan s method, for I4MB there are 4 modes ( DC (mode 2), from maxmum ampltude of EDH and ts 2 neghbors) whle 2 modes ( DC mode and drectonal) for each 6x6 luma block and 8x8 chroma block. Here, we mprove Pan s method. That s, elmnatng the DC mode from the canddates f the drecton of the block s obvous, and otherwse, choosng only DC mode. To check whether the DC of the block s clear or not, the dff value, gven n Equaton (2), s checked whether t s smaller than a threshold or not: dff = avg = ( = 5 = 0 5 = 0 The mproved Pan s method s proposed as follows: avg p p + 8) >> 4 (2). For edge drectonal hstogram H, fnd ts maxmum. The correspondng mode s denoted by M. 2. If dff > T, RDO procedure s carred out for 3 modes at the most (M and ts two neghbors). 3. Else, f dff <T, RDO procedure I s carred out for two canddate modes at the most. DC wth maxmum of EDH (M). 4. For I6MB, based on the same observaton as above, after down-samplng by a factor of 2, f dff > T only prmary predcton mode decded by edge drecton hstogram s consdered as a canddate for the best predcton mode. The dff n ths case s presented as: dff = avg = ( = 64 = 0 64 = 0 avg p p + 32) >> 6 5. If dff < T, the maxmum predcton mode and DC mode are chosen. The maxmum predcton mode s extracted as I4MB but wth DC and only 3 drectons as: o o a0 = [ 2.5, 67.5 ) o o a = [ 22.5,22.5 ) (4) o o a = [ 67.5, 22.5 ) 3 where k=0,, and 3 refer to vertcal, horzontal, and plane predcton modes, respectvely. 6. For 8x8 chroma block, after down-samplng by a factor of 2, the same procedure as I6MB s used but by usng Equ. (). Pan s method can reduce RDO calculaton from 592 tmes to 32 tmes. Here, we mprove Pan s method. The number of canddate modes and the RDO calculaton n the worst and the best cases are shown n Table 2. Table 2. Number of canddate modes. Proposed Method (mn) Block Sze RDO Pan s Method Proposed Method (max) 4x4 (Y) 9 4 2 3 6x6 (Y) 4 2 2 8x8 (U/V) 4 3 or 2 2 Table 2 summarzes the number of canddates selected for RDO calculaton based on edge drecton hstogram. As can be seen from Table 2, the encoder wth the fast mode decson algorthm needs to perform only 33 or 00 RDO calculatons, whch are much less than that of Pan s method (32) and current H.264 vdeo codng, RDO (592).. 4.2 Fast Mode Selecton for 4x4 Luma Block Usng Subsamplng and Edge Informaton To extract the local edge nformaton, the algorthm ntroduced n [4] dvdes a 4x4 block nto four 2x2 blocks. Usng A, B, C, and D to denote the sum of ntensty of all pxels n the correspondng 2x2 blocks, gven as: (3) A = C = 3 j= 0 = 2 Pj = 0 = 0 Pj B = D = 3 j= 2 = 0 3 3 j= 2 = 2 Pj Pj (5) In order to obtan the local edge drecton wthn a 4x4 block, ths work ntroduces two edge feature parameters: vertcal edge parameter F and horzontal edge parameter F as: v h

( A + B) ( C + D) FV = S (6) ( A + C) ( B + D) Fh = S Where S s a scalng factor [4]. Accordng to these two parameters, the edge drecton nformaton wthn the current 4x4 block can be obtaned. Table 3 shows the results. On the other hand, the fast method only chooses the predcton mode along the edge as the canddates of the best predcton modes (CBPM). Accordng to Fv and F h, the determned dfferent CBPMs for the current block correspondng to 7 cases (see Table 3) are obtaned and summarzed n Table 4. Table 3: Edge drecton nformaton. Table 4: CBPMs accordng to F v and F h. Case 2 3 4 5 6 7 CBPMs 2 0,2,2 3,2 4,2 0,5,7 3,4,2,6,8 3,4,2 Snce the number of modes for case 6 and 7 are large and DC mode exsts n all cases, we modfy ths method usng mproved Pan s method to extract the DC mode, so we choose DC f dff <T. Also, after detecton of mode 6 and 7 from the above mentoned procedure we use Pan s method to detect the best method. From the smulaton results, t can be seen that the proposed method has reduces the encodng tme by 5.75% and 6.5% on average for Contaner and Dancer sequences, respectvely. Also, smulated results shows that the proposed method has less than.% used extra bts and no more than 0.2 db PSNR s sacrfced. Also, we modfed ths algorthm for I6MB luma and 8x8 chroma n Subsecton 4.4 to mprove the proposed algorthm. 4.3 An Improved Feature-Based Method Usng Fast Hadamard Transform In [9], to address the feature selecton problem, both spatal and frequency doman features are selected. Intutvely speakng, a good predcton should produce a small value of the sum of absolute dfferences SAD) and sum of absolute transform dfference (SATD), whch can be wrtten as [20]: Where C j denotes the element of C defned as: 4 4 C j = j= SATD = (7) C = T( I P) (8) 2

where I and P denote the current block and ts predcton, respectvely, and T s a certan 2D orthonormal transform. In ths work, for computatonal smplcty, T s chosen to be the separable Hadamard transform wth 4-pont along each dmenson as: T = (9) For a gven 4 4 block, the SAD and SATD values can order accordng to ther magntudes, whch s called the rank order. Most RDO modes fall n the wndow of 3 3 lowest ranks. Statstcally, 93-95% of RDO modes are n such a wndow. Thus, t may search the RDO mode usng the modes that fall n ths wndow, whch actually allows at most 3 dstnctve Intra-predcton modes. Thus, we can narrow down the search range based on the jont feature of (SAD, SATD). It s worthwhle to pont out that there always exst some canddate modes n the 3 3 wndow n all experments performed. However, f there s no canddate mode n ths wndow, we can enlarge the search wndow from 3 3 to 4 4 to fnd the possble canddate mode. The computaton SATD can be mproved usng a fast Hadamard transform as below. 4.3. Fast Hadamard Transform Algorthm for SATD The Hadamard transform defned n (9), wthout any fast algorthm, each mode needs 64 addtons and totally 576 addtons for all 9 modes. Now, we only need 22 addtons and 39 shfts f we modfed the fast algorthm presented n the [20]: For example mode 0 s computed as: a b c d a b c d (20) P = a b c d a b c d After computaton of TP, all row vectors are zeros except the frst row, s gven as, [4a, 4b, 4c, 4d]. That we need only 4 shfts. Also the horzontal predcton gves: a b P = c d a b c d a b c d a b c d (2) That the frst row s nonzero as, [a+b+c+d, a+b-(c+d),a-b-(c+d),a-b+c-d)], That we need only 8 addtons. Also other modes are as [20] and the fnal results s presented n table 5. Table 5: Number of addtons and shfts for the fast Hadamard transform. Mode Hadamard Transform No. of ADDs No. of Shfts 0 0 4 8 0 2 0 3 28 5 4 32 5 5 36 6 6 36 6 7 36 6 8 36 6 Total 22 39 Usng ths algorthm we can reduce the computatonal complexty of the man feature-based algorthm. We apply ths fast transform to feature-based method. The smulaton results show the reducton n encodng tme, wth approxmately smlar PSNR and bt rate. 3

4.4 Early Termnaton of RDO Calculaton Smlar to Pan s method, for ncreasng the speed of the algorthm we use the early termnaton of RDO calculatons for all proposed algorthms as n []. Durng the ntra-codng of any predcton mode, the calculaton can be termnated f t can foresee that the current mode wll not be the best predcton mode. By early termnaton of the RDO calculatons whch s deemed to be suboptmal, a great tmesavng can be acheved. In the RDO, the codng cost conssts of two parts: rate and dstorton. After calculatng the cost of rate, there mght be cases that the cost of rate s hgher than the codng cost of the best mode n the prevous modes. Ths mples that the current mode wll not be the best mode snce ts codng cost wll not be the smallest. Therefore, n such cases we termnate the RDO calculaton and hence the calculaton of dstorton wll be elmnated. An MB s encoded by ether I4MB or I6MB predctons. In the RDO, the selecton between these two codng modes s determned by the codng costs acheved by usng each of these codng modes. After I6MB predcton codng, the I4MB predcton codng wll apply to the sxteen 4x4 blocks n the MB and the cost of these blocks wll be accumulated. However, f the accumulated cost before encodng the entre sxteen 4x4 blocks s already hgher than that of I6MB predcton codng, the codng of the remanng of 4x4 blocks n the MB wll be termnated prematurely. 4.5 Fast Mode Selecton-Based on Smlar Predcted Pxels As an alternatve method, we proposed a fast ntra method usng smlar predcted pxels wth combnaton of the presented algorthms n prevous sectons. 4.5. I4MB Fast Mode Decson In Fgure 7, the arrows show the drecton of the desred mode and any two dots or squares, along that drecton show the smlar predcted pxels. The dfference of ths pxel can be used as a measure of drecton predcton. We propose 9 smple masks to obtan DC and drectonal nformaton wthn the block nstead of usng accurate edge detectors such as the Sobel operator. These masks are obtaned by the formulas that are used for mode decson n JM 7.. Some of formulas for mode decson are gven n Fgure 8 (a part of algorthm for ntra-mode decson n JM 7.). Usng these masks, smple equatons of 9 modes are yelded, as lsted n Table 6. For fnal mode decson, the proposed algorthm uses the mnmum and the second mnmum cost shown n Table 6 as the two canddate modes that are consstent wth ther adjacent block nformaton. (a) (b) (c) (d) Fgure 7. The proposed edge detector for 4x4 luma block. (a) Vertcal rght and horzontal down. (b) Vertcal and horzontal. (c) Dagonal down left and dagonal down rght. (d) Vertcal left and horzontal up. Based on these facts, Fgure 9 shows the flowchart of the proposed algorthm. Steps of the proposed method are as follows: 4

. Calculate the two most probable ntra-predcton modes. These are the mnmum (Frst Mnumum Cost=MC, mode MC=M_MC) and the second mnmum (MC2, M_MC2) of costs evaluated by Table 6. 2. For 4x4 luma block, MAD (mean of absolute dfference) of ts reference pxels s computed, f t s smaller than a threshold, M_MC s selected. Go to step 0. Ths result s yelded from ths fact that f the smlarty of reference pxels of a block s hgh, the dfference between dfferent predcton modes wll be very small. For ths case, t s not necessary to check all 9 predcton modes, but only one predcton mode s enough [22]. 4. If MADH (mean of absolute dfference of horzontal references) s less than a threshold and M_MC s a member of set {mode 0, mode 3, mode 7}, then M_MC s selected. Go to step 0. 5. Also, f MADV (mean of absolute dfference of vertcal references) s less than a threshold and M_MC s a member set of {mode, mode 8}, then M_MC s selected. Go to step 0. It s obvous that f the smlarty of horzontal reference pxels of a block s hgh, the dfference between predcton results obtaned wth predcton modes 0, 3 and 7 wll be very small. Also, f the smlarty of vertcal reference pxels of a block s hgh, the smlarty between modes and 8 s hgh. Table 6. Equatons to select two canddate modes. Mode Mode Name Cost Equaton 0 Vertcal Cost= a-m + b-n + c-o + d-p + e-m + d-l Horzontal Cost= a-d + e-h + -l + m-p + b-d + m-o 2 DC Cost= a-p + d-m + f-p + g-m + e-l + h- 3 Dag. Cost= d-m + c- + h-n + j-d + g-m Down/Left 4 Dag. Cost= a-p + b-l + e-o + f-p + a-k Down/Rght 5 Vertcal Rght Cost= a-j + e-n + b-k + f-o + c-l + g-p 6 Horzontal Cost= a-g + b-h + e-k + f-l + -o + j-p Down 7 Vertcal Left Cost= b- + f-m + c-j + g-n + d-k + h-n 8 Horzontal Up Cost= d-f + c-e + h-j + g- + l-n + k-m 6. If the modes for one of the top or the left blocks (M_A, M_B) are M_MC, then Mode M_MC s chosen as the best canddate mode for the current block. Go to step 0. 7. If MC-MC2 s less than a threshold, and MC and MC2 are adjacent, then MC s chosen. Go to step 0. 8. If MC-MC2 s less than a threshold and MC and MC2 are not adjacent, the mproved Pan s method s used. Go to step 0. 9. If MC-MC2 s greater than a threshold, the Improved Pan s method s used. 0. Termnate. As such, n the worst case only three dfferent 4x4 ntra-mode costs wll be evaluated. 4.5.2 I6MB Based on Fast Hadamard Algorthm As stated n FS algorthm, the decson for 6x6 block s based on SATD that s done on sxteen 4x4 subblocks. The computaton of SATD can be acheved usng fast algorthm presented n above secton. Ths technque reduces the computatonal tme effcently. 4.5.3 I6MB Based on Smlar Predcted Pxels (SPP) The 6x6 block s down-sampled by a factor of 4 and the masks smlar to the predcted pxels for 4x4 for horzontal, vertcal, plane, and DC are used for makng decson about the drecton of the block. 4.5.4 I6MB based on Horzontal and Vertcal Dfferences (HVD) Let v and h denote the vertcal and horzontal sum of dfferences between boundary pxels of current block and ts adjacent block, respectvely [8]: where: v = 5 = 0 5 u cu, = l cl (2) h = 0 5

u : boundary cu : upper boundary pxels of current MB l : boundary pxels of left MB cl : left boundary pxels of upper MB pxels of current MB Ths method obtans canddate modes by usng two dfferences values, under the condtons as follows: - v - h >T canddate modes are DC mode and horzontal mode. 2- h - v >T canddate modes are DC and vertcal mode. 3- v - h <T canddate modes are DC mode and plane mode. 4- Fnally, determne the best mode among canddate modes by choosng the mode that results n the mnmum SATD. For reducng the encodng tme we can use the fast Hadamard procedure for SATD computaton. that s used for mask extracton. //make DC predcton: a~p=(a+b+c+d+i+j+k+l+4)>> (BLOCK_SHIFT + ); // Mode DIAG_DOWN_LEFT_PRED For each 4x4 luma block a = (A + C + 2*(B) + 2) >> 2; b =e= = (B + D + 2*(C) + 2) >> 2; c =f=== (C + E + 2*(D) + 2) >> 2; d=g=j=m= (D + F + 2*(E) + 2) >> 2; h=k=n= (P_E + P_G + 2*(P_F) + 2) >> 2; l=o= (F + H + 2*(G) + 2) >> 2; p = (G + 3*(H) + 2) >> 2; // Mode VERT_LEFT_PRED a = (A + B + ) >> ; b == (B + C + ) >> ; c = j = (C + D + ) >> ; d = k = (D + E + ) >> ; l = (E + F + ) >> ; e = (A + 2*B + C + 2) >> 2; f=m= (B + 2*C + D + 2) >> 2; g=n = (C + 2*D + E + 2) >> 2; h=o = (D + 2*E + F + 2) >> 2; p = (E + 2*F + G + 2) >> 2; // Mode DIAG_DOWN_RIGHT_PRED m = (L + 2*K + J + 2) >> 2; =n=(k+2*j+i+2)>>2; Fgure 8: Algorthm for some drectonal mode n JM 7. I4MB MAD<T Yes No No M_MC = M_A or M_B Yes Bestmode = M_MC MC- MC2<T Yes Bestmode = M_MC No Improve Pan s Method 6 MC and MC2 are adjacent Yes Bestmode = Mode MC

Fgure 9. Flowchart for proposed fast ntra-mode decson of 4x4 lumnance blocks wth low quantzaton. 4.5.5 Fast Mode Selecton for 8x8 Chroma Block For 8x8 chroma blocks, we apply a smlar method to the method used for 6x6 luma macroblock (parts 4.5.3 and 4.5.4) except that we also apply a down-samplng by a factor of 2. Also, smlar predcted pxel s used for 8x8 chroma blocks, after sub-samplng by a factor of 2. 5. Expermental Results Our proposed algorthm was mplemented nto JM7., provded by JVT accordng to the test condtons specfed n VCEG-N8 document as lsted n Table 7[2]. Smulatons were carred out on the recommended sequences wth varous quantzaton parameters for IPPP type and I-frame only type. For IPPP experments, the total number of frames s 300 for each sequence, and the perod of I-frame s 00. The used test platform s Pentum IV-2.8 GHz wth 256 Mbytes RAM. Table 7: Experment condton. GOP IIIII or IPPP Codec JM 7. MV search range ± 6 QP 0, 6, 24,28,36,40 Number of Reference Common codng opton Hadamard transform, CABAC, RDO s enabled Sze CIF and QCIF Number of Frames 300 Comparsons wth the case of exhaustve search (RDO) were performed wth respect to the change of average PSNR ( PSNR), the change of average data bts ( Bt), and the change of average encodng tme ( Tme), respectvely. The PSNR s derved from average PSNRs of luma component (Y) and chroma component (U, V) based on below equatons: where, wth 2 PSNR = 0 log 255 (22) 0 MSE MSE 4 V MSE Y + MSE U + MSE = (23) 6 65025 MSE = (24) Y 0 0 PSNR 0 PSNR 0 Y 65025 MSE = (25) U U 65025 MSE V = (26) PSNR V 0 0 Therefore, n the rest of ths paper we use the overall PSNR value of all the three components Y, U and V usng Equaton (22). 7

Also, n order to evaluate the tme savng of the proposed fast ntra-mode decson algorthm, the followng calculaton s defned to fnd the tme dfferences. Let Tref denote the codng tme used by JM7. encoder and Tprop be the tme taken by the faster ntra-predcton algorthm, and tme be defned as: T prop T ref Tme % = 00 (27) T Also, btrate ncrease s defned as: btrate prop btrate ref Btrate % = 00 (28) btrate ref A group of experments were carred out on dfferent sequences and the results are shown below. The experments were ordered n 8 states as lsted n Table 8. The encodng bt rates, the PSNR values, and the tme savng factor (as compared wth the H.264 RDO method) for 4 test sequences wth dfferent quantzaton parameters are shown n Tables 9~4. Also, Tables 9~ and 2~4 show the expermental results for IPPPP and IIII sequences, respectvely. These tables compare the rate, dstorton, and complexty of proposed algorthms wth RDO procedure. Generally speakng, as can be seen from ths tables, we have saved 47~58% of the total encodng tme at the expense of only 0.~.5% rate ncrease n average and 0.05 dstorton loss n average for these test sequences. Fgures 20, 2, and 22 show the examples of RD and the complexty curves of sequences Akyo (class A), Foreman (Class B), and Stefan (Class C) for IPPP sequences. From these fgures, one can see that the proposed fast ntra-mode decson scheme gves almost dentcal RD performance whle provdng a speed-up factor (rato of encodng tme usng the RDO technque and the proposed scheme) of 3-6. In ths fgure the RDO, mproved Pan s method, mproved feature-based and 3 forms of fast proposed methods are compared (see Table 8). We see that for proposed algorthms the rate-dstorton performance loss ncreases slghtly wth a hgh complexty reducton. For subjectve qualty comparson purposes, the reconstructed frames based on RDO and the proposed methods are captured at QP of 22 are shown n Fgure 23 and 24 for Foreman and New sequences, respectvely. ref Table 8. Dfferent methods of experment. Category I4MB I6MB, Chroma Early Termnato n RDO RDO RDO NO M Pan s Method RDO Yes M2 Feature-Based RDO Yes Method M3 Alg. 4. RDO Yes M4 Alg. 4.3 RDO Yes M5 Alg. 4.5. RDO Yes M6 Alg. 4.5. Alg. 4.5.3 Yes M7 Alg.4.5. Alg. 4.5.4 Yes 6. Concluson In ths paper, n conjuncton wth an overvew of prevous proposed algorthms based on EDH, Jont features of SAD/SATD, and statstcal propertes of natural vdeo sequences for fast ntra-mode decson n the H.264/AVC encoders, we decreased the encodng tme by reducng the computatonal complexty of the cost functon and the number of canddate modes. Expermental results of proposed algorthms show that the number of mode combnatons for luma and chroma blocks n an MB that takes part n RDO calculaton process has been reduced sgnfcantly wth respect to orgnal algorthms wth neglgble reducton n PSNR and neglgble ncrease of btrate. At last, n order to acheve a better performance of computatonal complexty some new deas wth some strength pont of mproved algorthms are combned, and a new algorthm s presented. The smulaton results show that the proposed algorthm reduces the number of RDO calculatons wth respect to orgnal and mproved algorthms wth neglgble loss n PSNR and neglgble btrate ncrease. The proposed algorthm can be used for challengng work of ntra-predcton mode decson n the H.264/AVC vdeo encoders wth low computatonal cost 8

Table 9. Smulaton results for IPPP type sequences. Dstorton comparson. Foreman News Contaner Slent PSNR ( db ) 0 6 22 32 40 M -0.08-0.079-0.077-0.065-0.06 M2-0.5-0.2-0. 0.0 0. M3-0.09-0.08-0.073-0.05-0.05 M4-0.30-0.27-0.02-0.05-0.0 M5-0.02-0.0-0.02 -.002 -.00 M6-0.3-0.7-0.25-0.04-0.02 M7-0. -0.04-0.057-0.08-0.30 M -0.073-0.07-0.067-0.064-0.062 M2-0.047-0.023-0.0 0.00 0.0 M3-0.06-0.060-0.059-0.50-0.00 M4-0.03-0.0-0.00-0.0 0.0 M5-0.020-0.3-0.06-0.03-0.20 M6-0.07-0.008-0.00-0.05 0.00 M7-0.00-0.007-0.003-0.04-0.00 M -0.089-0.083-0.08-0.076-0.074 M2-0.5-0.7-0. -0. 0.00 M3-0.080-0.065-0.067-0.069-0.032 M4-0.46-0.2-0.0-0.02-0.03 M5-0.0-0.204-0.03-0.03-0.032 M6-0.20-0.340-0.036-0.0-0.00 M7-0.30-0.00-0.04-0.02-0.040 M -0.032-0.035-0.033-0.032-0.029 M2-0.04-0.037-0.023-0.09-0.0 M3-0. -0.02-0.0-0.02 0.00 M4-0.03-0.0-0.23-0.02-0.03 M5-0.324-0.04-0.080-0.00-0.035 M6-0.60-0.37-0.04-0.02-0.023 M7-0.230-0.32-0.03-0.02-0.07 Table 0: Smulaton results for IPPP type sequences. Rate comparson. 9

Foreman News Contaner Slent Bt % 0 6 22 32 40 M.650.540.536.354.230 M2.050.004 0.962 0.987 0.870 M3.230.20.035 0.670 0.345 M4.032 0.890 0.634 0.425 0.478 M5.30 0.735 0.97 0.890 0.098 M6.325.230 0.725 0.675 0.0346 M7 2.00 0.980 0.930 0.427 0.092 M.534.00.022.030 0.924 M2 0.924 0.940 0.876 0.830 0.90 M3.02 0.932 0.982 0.760 0.567 M4.20 0.0942 0.876 0.320 0.34 M5.098 0.954 0.897 0.830 0.20 M6 2.00 0.0845 0.872 0.828 0.62 M7 2.20 0.932 0.489 0.762 0.42 M.803.902.090 0.950 0.92 M2 0.983 0.987 0.732 0.50 0.32 M3.673 0.982 0.879 0.340 0.450 M4.345 0.94 0.52 0.70 0.342 M5 2.340 0.980 0.987 0.604 0.324 M6.348 0.82 0.742 0.82 0.436 M7 2.00 0.932 0.980 0.434 0.43 M 0.923 0.987 0.875 0.875 0.745 M2.624 0.720 0.945 0.742 0.439 M3 2.00 0.789 0.870 0.65 0.370 M4 2.30 0.872 0.425 0.576 0.346 M5.250 0.99 0.62 0.52 0.367 M6 2.00 0.832 0.874 0.540 0.23 M7 2.00.09 0.982 0.435 0.92 Table. Smulaton results for IPPP type sequences. Complexty comparson. Forman News Contaner Slent Tme % 0 6 22 32 40 M -37.05-35.42-33.49-32.60-30.45 M2-35.23-37.34-43.24-44.25-45.5 M3-42.23-39.25-35.24-35.02-35.25 M4-39.95-4.32-4.23-43.47-44.32 M5-43.50-42.23-42.2-40.25-38.25 M6-48.3-47.6-45.25-42.25-42.2 M7-49.2-43.32-44.23-47.25-48.0 M -4.32-38.24-35.24-3.2-3.0 M2-38.45-39.25-40.25-42.3-43.23 M3-43.25-4.50-40.2-39.34-38.22 M4-4.25-42.02-43.02-43.24-44.23 M5-40.24-40.34-39.49-39.56-38.97 M6-43.23-42.34-40.3-40.2-39.56 M7-49.25-48.27-47.34-47.2-43.4 M -3.03-35.26-34.02-33.03-34.25 M2-33.24-34.25-37.22-39.44-36.25 M3-33.24-34.25-37.22-39.44-36.25 M4-37.88-39.50-4.56-43.23-44.4 M5-4.25-40.45-40.37-39.33-38.0 M6-4.50-40.23-40.2-39.3-37.50 M7-45.50-47.34-46.34-44.20-40.35 M -36.02-35.0-3.02-30.98-30.37 M2-35.46-37.89-37.33-39.42-4.32 M3-40.2-39.25-39.0-38.23-38.2 M4-37.30-38.50-39.23-4.78-42.73 M5-4.34-39.55-39.33-39.0-38.28 M6-42.20-4.47-40.38-39.59-38.20 M7-45.46-43.99-43.22-42.67-39.78 20

Table 2. Smulaton results for IIII type sequences. Dstorton comparson. Forman News Contaner Slent PSNR 0 6 22 32 40 M -0.060-0.065-0.064-0.063-0.05 M2-0.45-0.02-0.0 0.00 0.0 M3-0.08-0.070-0.062-0.05-0.05 M4-0.30-0.22-0.022-0.00-0.0 M5-0.020-0.0-0.02 -.00 -.0 M6-0.3-0.07-0.20-0.020-0.02 M7-0. -0.020-0.030-0.020-0.03 M -0.052-0.060-0.042-0.050-0.05 M2-0.037-0.030-0.00 0.00-0.0 M3-0.050-0.05-0.042-0.40-0.00 M4-0.022-0.02-0.0-0.02 0.00 M5-0.032-0.020-0.024-0.022-0.024 M6-0.023-0.09-0.025-0.027 0.00 M7-0.02-0.05-0.02-0.025-0.022 M -0.059-0.064-0.062-0.056-0.068 M2-0.052-0.054-0.20-0.02 0.00 M3-0.82-0.045-0.053-0.048-0.042 M4-0.056-0.03-0.042-0.032-0.03 M5-0.20-0.323-0.033-0.054-0.067 M6-0.230-0.030-0.05-0.042-0.023 M7-0.40-0.0-0.24-0.032-0.062 M -0.06-0.054-0.053-0.04-0.035 M2-0.050-0.043-0.030-0.023-0.04 M3-0.05-0.034-0.042-0.032 0.020 M4-0.043-0.05-0.024-0.064-0.072 M5-0.092-0.037-0.056-0.037-0.042 M6-0.27-0.302-0.045-0.052-0.04 M7-0.260-0.82-0.049-0.053-0.046 Table 3. Smulaton results for IIII type sequences. Rate comparson. Forman News Contaner Slent Bts % 0 6 22 32 40 M.320.340.572.42.370 M2.32.234 0.985 0.997 0.980 M3.30.40.244 0.887 0.547 M4.42 0.993 0.79 0.562 0.539 M5.42 0.852.07.090 0.98 M6.24.322 0.834 0.752 0.42 M7.90.080 0.934 0.526 0.92 M.32.20.32.25 0.956 M2 0.970 0.967 0.893 0.742 0.932 M3.30.334 0.923 0.950 0.822 M4.20 0.0942 0.876 0.320 0.34 M5.80.095 0.985 0.73 0.55 M6.350 0.275 0.980 0.690 0.982 M7.320 0.978 0.999 0.957 0.689 M.203.002.042.095 0.942 M2.30.24.23 0.98 0.739 M3.32 0.990 0.989 0.720 0.730 M4.345 0.94 0.52 0.70 0.342 M5 2.27 0.999.087 0.824 0.62 M6.0 0.98 0.802 0.922 0.82 M7.900 0.92.20 0.84 0.0 M.223.087 0.92 0.905 0.980 M2.750 0.930.02 0.590 0.60 M3.940 0.32 0.590 0.435 0.560 M4.920 0.702 0.25 0.736 0.46 M5.467 0.929 0.732 0.702 0.62 M6.209 0.22 0.744 0.503 0.02 M7.90.00 0.907 0.55 0.435 Table 4. Smulaton results for IIII type sequences. 2

Complexty comparson. Forman News Contaner Slent Tme % 0 6 22 32 40 M -47.5-46.2-43.29-49.0-47.25 M2-50.3-47.0-52.45. -54.20-55.50 M3-55.34-59.25-45.34-42.82-45.5 M4-50.05-5.32-52.30-5.27-54.2 M5-53.2-55.3-54.34-52.3-54.6 M6-55.2-57.30-54.2-5.05-52.0 M7-58.0-5.20-53.37-55.85-57.2 M -52.24-58.20-53.6-5.90-50.46 M2-49.50-52.50-50.5-52.42-5.02 M3-50.2-5.23-50.78-49.74-48.89 M4-5.34-52.43-53.9-52.0-52.3 M5-5.35-50.27-49.9-50.06-49.55 M6-5.30-52.04-5.2-5.62-49.96 M7-58.45-56.3-57.03-52.53-50.04 M -46.23-45.32-44.72-43.67-44.39 M2-46.42-48.53-47.43-49.64-46.42 M3-45.34-44.65-47.39-49.5-48.2 M4-47.90-50.20-5.06-52.43-54.0 M5-5.85-50.75-52.37-56.2-48.32 M6-52.60-50.43-52.32-56.43-47.34 M7-55.40-57.64-56.04-55.2-52.30 M -49.2-46.3-43.32-45.38-43.29 M2-49.76-49.90-50.03-49.62-5.2 M3-50.30-49.50-50.37-49.38-48.92 M4-5.22-58.60-5.73-53.38-54.3 M5-50.49-53.8-58.28-59.43-58.7 M6-52.2-5.3-50.6-49.8-50.23 M7-55.6-50.9-50.22-52.67-56.82 (a) 22

(b) Fgure 20: Akyo sequence, (IPPP seq.). (a) R-D performance, (b) computatonal complexty. (a) (b) Fgure 2: Foreman sequence, (IPPP seq.). (a) R-D performance, (b) computatonal complexty (a) 23

(b) Fgure 22: Stefan sequence, (IPPP seq.). (a) R-D performance, (b) computatonal complexty. (a) (b) (c) (d) (e) (f) Fgure 23: Qualty of reconstructed frames of Foreman sequence: (a) RDO, (b) M3, (c) M4, (d) M5, (e) M6, (f) M7. (a) (b) (c) 24

(d) (e) (f) Fgure 24: Qualty of reconstructed frames of News sequence, (a) RDO, (b)m3, (c)m4, (d) M5, (e) M6, (f) M7. References [] F. Pan, X. Ln, S. Rahardja, K. P. Lm, Z. G. L, D. Wu, and S. Wu, Fast Mode Decson Algorthm for Intra-predcton n H.264/AVC Vdeo Codng, IEEE Trans. On crcuts and systems for vdeo Tech., Vol. 5, NO. 7, pp. 83-822, July 2005. [2] T. Wegand, G. J. Sullvan, G. Bjontegaard, and A. Luthra, Overvew of the H.264/AVC Vdeo Codng Standard, IEEE Trans. On crcuts and systems for vdeo technology, Vol. 3, no. 3, no. 7, pp. 560-576, 2003. [3] ece.ut.ac.r/classpages/multmeda/h264.ppt [4] Envvo http://www.envvo.com/products/h264.html [5] Changsung Km, Qng L, C. C. Jay Kuo, Fast Intra-Predcton Model Selecton for H.264 Codec, SPIE Internatonal Symposum ITCOM 2003, Orlando, Florda, Sept. 7-, 2003. [6] F. Pan, X. Ln, et al., Fast Mode Decson for Intra- Predcton, ISO/IEC JTC/SC29/WG and ITU-T SG6 Q.6, JVT 7th Meetng, Pattaya II, Thaland, March 2003. [7] F. Pan, X. Ln, S. Rahardja, K. P. Lm, and Z. G. L, A Drectonal Feld Based Fast Intra-Mode Decson Algorthm for H.264 Vdeo Codng, IEEE Inter. Conf. on Multmeda and Expo, vol. 2, pp. 47-50, June 2004. [8] J. Km and J. Jeong, Fast Intra-Mode Decson n H.264 Vdeo Codng Usng Smple Drectonal Masks, Proc. of SPIE Vol. 5960, pp 07-079, 2005. [9] R. Garg, M. Jndal, M. chauhan, Statstcs Based Fast Intra-Mode Detecton, Proc. of SPIE Vol. 5960, pp 2085-209, 2005. [0] B. Jeon and J.lee, Fast Mode Decson for H.264, ISO/IEC JTC/SC29/WG and ITU-T SG6 Q.6, JVT 0 th Meetng, Wakoloa, Hawa, December 2003. [] C. Yang, L. PO, W. Lam, A Fast H.264 Intra Predcton Algorthm Usng Macroblock Propertes, ICIP, pp. 46-464, 2004. [2] B. Meng, O. C. AU, C. Wong, H. Lam, Effcent Intra-Predcton Algorthm n H.264, IEEE, pp. 837-840, 2003. [3] C. Tseng, H. Wang, J. Yang, Improved and Fast Algorthms for Intra 4x4 Mode Decson n H.264/AVC, IEEE, pp. 228-23, 2004. [4] C. Hsu, M. Ho, J. Hong, An Effcent Algorthm for Intra-predcton n H.264, IEEE, pp. 35-36, 2006. [5] Jont Vdeo Team (JVT), reference software JM7., http://bs//hh.de/~suehrng/tml/download/jm7..zp. [6] M. Jafar, S. Kasae, An Effcent Intra Predcton Mode Decson Algorthm for H.263 to H.264 Transcodng, IEEE nternatonal conference on computer systems and applcatons, page 082-089, march 2006. [7] M. Jafar, s. Kasae, Prortsaton of data parttoned MPEG-4 vdeo over GPRS/EGPRS moble networks, Asan nternet engneerng conference (AINTEC), Taland, pp. 68-82, December 2005. [8] I. Rchardson, H.264/MPEG-4 part 0 whte paper, avalable: http://www.vcodex.fsnet.co.uk/resources.html. [9] C. Km, H.-H. Shh, and C.-C. J. Kuo, Feature-based ntra-predcton mode decson for h.264, n Proc. IEEE Internatonal Conference on Image Processng, 2004. [20] Chen Chen and Png-Hao Wu, Hornor Chen, Transform-doman ntra predcton for H.264, IEEE, pp. 497-500, 2005. [2] G. Sullvan, Recommended smulaton common condtons for H.26L codng effcency experments on low resoluton progressve scan source materal, presented at the 4th VCEG-N8 Meetng, Santa Barbara, CA, Sep. 200. [22] Jang Gang-y, L Sh-png, Yu Me, L Fu-cu, An effcent fast mode selecton for ntra predcton, IEEE Int. Workshop VLSI Desgn & vdeo Tech., Chna, pp. 357-360, May 28-30, 2005. 25