Video encoders have always been one of the resource

Similar documents
Advanced Video Coding: The new H.264 video compression standard

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

Rate Distortion Optimization in Video Compression

Professor, CSE Department, Nirma University, Ahmedabad, India

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

An Efficient Mode Selection Algorithm for H.264

High Efficiency Video Coding. Li Li 2016/10/18

Stereo Image Compression

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Scalable Extension of HEVC 한종기

Fast Mode Decision for H.264/AVC Using Mode Prediction

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER

Compression of Stereo Images using a Huffman-Zip Scheme

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

Decoding-Assisted Inter Prediction for HEVC

Fast Intra Mode Decision in High Efficiency Video Coding

VIDEO COMPRESSION STANDARDS

Objective: Introduction: To: Dr. K. R. Rao. From: Kaustubh V. Dhonsale (UTA id: ) Date: 04/24/2012

Fast HEVC Intra Mode Decision Based on Edge Detection and SATD Costs Classification

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Video Compression MPEG-4. Market s requirements for Video compression standard

Complexity Estimation of the H.264 Coded Video Bitstreams

Edge Detector Based Fast Level Decision Algorithm for Intra Prediction of HEVC

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

Video Coding Using Spatially Varying Transform

LECTURE VIII: BASIC VIDEO COMPRESSION TECHNIQUE DR. OUIEM BCHIR

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

Overview: motion estimation. Differential motion estimation

Testing HEVC model HM on objective and subjective way

Homogeneous Transcoding of HEVC for bit rate reduction

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

Module 7 VIDEO CODING AND MOTION ESTIMATION

MPEG-4: Simple Profile (SP)

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec

Implementation and analysis of Directional DCT in H.264

Digital Video Processing

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Fast Coding Unit Decision Algorithm for HEVC Intra Coding

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

High Efficiency Video Coding (HEVC) test model HM vs. HM- 16.6: objective and subjective performance analysis

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

CMPT 365 Multimedia Systems. Media Compression - Video

CODING METHOD FOR EMBEDDING AUDIO IN VIDEO STREAM. Harri Sorokin, Jari Koivusaari, Moncef Gabbouj, and Jarmo Takala

Zonal MPEG-2. Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

FAST: A Framework to Accelerate Super- Resolution Processing on Compressed Videos

Performance Comparison between DWT-based and DCT-based Encoders

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

Context-Adaptive Binary Arithmetic Coding with Precise Probability Estimation and Complexity Scalability for High- Efficiency Video Coding*

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Intra-Mode Indexed Nonuniform Quantization Parameter Matrices in AVC/H.264

A HIGHLY PARALLEL CODING UNIT SIZE SELECTION FOR HEVC. Liron Anavi, Avi Giterman, Maya Fainshtein, Vladi Solomon, and Yair Moshe

Rotate Intra Block Copy for Still Image Coding

FAST ALGORITHM FOR H.264/AVC INTRA PREDICTION BASED ON DISCRETE WAVELET TRANSFORM

Using animation to motivate motion

Effective Quadtree Plus Binary Tree Block Partition Decision for Future Video Coding

Complexity Reduction Tools for MPEG-2 to H.264 Video Transcoding

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

Video compression with 1-D directional transforms in H.264/AVC

A new predictive image compression scheme using histogram analysis and pattern matching

An Efficient Image Compression Using Bit Allocation based on Psychovisual Threshold

Lecture 13 Video Coding H.264 / MPEG4 AVC

A comparison of CABAC throughput for HEVC/H.265 VS. AVC/H.264

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

New Techniques for Improved Video Coding

Topic 5 Image Compression

MOTION COMPENSATION WITH HIGHER ORDER MOTION MODELS FOR HEVC. Cordula Heithausen and Jan Hendrik Vorwerk

DIGITAL IMAGE PROCESSING WRITTEN REPORT ADAPTIVE IMAGE COMPRESSION TECHNIQUES FOR WIRELESS MULTIMEDIA APPLICATIONS

ENCODER COMPLEXITY REDUCTION WITH SELECTIVE MOTION MERGE IN HEVC ABHISHEK HASSAN THUNGARAJ. Presented to the Faculty of the Graduate School of

Video Quality Analysis for H.264 Based on Human Visual System

Cross Layer Protocol Design

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Building an Area-optimized Multi-format Video Encoder IP. Tomi Jalonen VP Sales

5: Music Compression. Music Coding. Mark Handley

In the name of Allah. the compassionate, the merciful

Low-complexity video compression based on 3-D DWT and fast entropy coding

A Novel Partial Prediction Algorithm for Fast 4x4 Intra Prediction Mode Decision in H.264/AVC

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

An Improved Complex Spatially Scalable ACC DCT Based Video Compression Method

Department of Electrical Engineering, IIT Bombay.

EFFICIENT INTRA PREDICTION SCHEME FOR LIGHT FIELD IMAGE COMPRESSION

Video Compression An Introduction

Adaptive Quantization for Video Compression in Frequency Domain

Jun Zhang, Feng Dai, Yongdong Zhang, and Chenggang Yan

"Block Artifacts Reduction Using Two HEVC Encoder Methods" Dr.K.R.RAO

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

10.2 Video Compression with Motion Compensation 10.4 H H.263

Transcription:

Fast Coding Unit Partition Search Satish Lokkoju # \ Dinesh Reddl2 # Samsung India Software Operations Private Ltd Bangalore, India. l l.satish@samsung.com 2 0inesh.reddy@samsung.com Abstract- Quad tree based encoders do brute force search for finding out the best partition for Coding Unit (CU). This brute force search performs encoding for all the possible block sizes and selects the partition size that gives best compression. This search along with inherent complexity of the latest encoders makes it extremely difficult to attain real time performance of 30 fps and low power. The solution to this problem is to perform a low complexity analysis of the Coding Unit and suggest the partition of the CU based on the available CU characteristics without performing entire encoding to estimate the cost. The present paper describes a method to do this using Sum of Absolute Difference, hereby SAD, and gradient information of the Coding Unit. We show that the presented method results in 3x faster encoding when compared to the brute force algorithm with small increase in bitrate (approximately 5% increase in worst case) and no change in subjective quality. The complexity bitrate trade off and the res ult BD-PSNR values of this method are also presented. Keywords- Quad tree, REVC, Coding Unit Partition, SAD, Mode decision. 1. INTRODUCTION Video encoders have always been one of the resource consuming processes in the modern consumer electronics devices. The demand for more compression for sustaining a number of streaming solutions has lead to an increase in the complexity of the encoders. Though there are lot of advancements in processor technologies that directly result in more power it is not sufficient enough to get real time performance with latest encoders. Thus, there is a need for devising a method that results in reduction in complexity with less or no increase in bitrate. All modern encoders have a number of tools to attain the desired compression. One among them is the selection of mode based on the spatial and temporal characteristics, such as texture and motion vectors respecti vely, that give best compression. The sele ction is. usually done using brute force search by encodmg and calculating the cost of all different possible modes and selecting the best among them. Due to huge complexity of the brute force search this has been the target area of our algorithm. A generic multi depth quad tree based video codec gives us a lot of flexibility in terms of partitioning of the block. It also increases complexity. For example, for a Coding Unit of size 64x64 with minimum possible partition size 4x4, there are 18446744073709551616 ways in which the CU can be partitioned and we need to perform a minimum of 340 encoder-decoder cycles, on blocks of different sizes, to get the best possible partition information. This clearly is a time consuming exercise. This paper presents a generic method to partition a Coding Unit to different blocks based on the spatial and temporal characteristics of the unit. The complexity bitrate trade-off that can be achieved is also explained. The rest of the paper is divided into the following topics. Section IT gives a brief overview of the work that is done in this field and also the challenges that are faced when extrapolating those methods to the latest quad tree based video encoders. Section TIT will explain the proposed method in detail. Section IV will give the results of the tests and Section V will give some details about the future directions and conclusion. IT. CODING UNIT PARTITION Fast mode detection is one of the important ways in which encoder complexity can be reduced without compromising in the bitrate. The Coding Unit is partiti ned into blocks of different sizes to generate a layout that gives best compression. Discrete cosine transform (DCT) is one of the main tools that are present in almost all the modern encoders. The DCT exploits the spatial redundancy in a Coding Unit. Areas with uniform texture are best possible candidates for DCT. Thus our problem reduces to finding the uniform areas of a Coding Unit and partitioning them into blocks and performing the encoding on that layout. Texture analysis has been used to gauge the continuity of the Coding Unit. Filtering operation using Sobel operator is used in [1]. The gradient information thus generated is used to select the block size. The problem with this approach is that an area of the Coding Unit with small variations in the surrounding pixels is still marked as non-uniform. Logarithm of the ratio of energy of pixels to energy of perfectly non-uniform block [2] may also be used to decide on identifying uniform areas. This information is used against a threshold to calculate the block partition layout. These methods may not be used. in quad tree based video codecs directly as the relative variations in the energies of the blocks of different sizes may be small and it becomes difficult to differentiate one block from another. TIT. PARTITION SIZE DETERMINATION AND TWO PASS SEARCH The proposed methods employs two different techniques, based on the type of the Coding Unit i.e. whether intra or inter. 978-1-4673-5604-6112/$3l.00 20 12 IEEE 000315

For the intra Coding Units, prediction within the rrame is possible. Thus the spatial characteristics of the Coding Unit will decide on the way it is partitioned. The gradient information of the Coding Unit is a nice representation of the uniformity/continuity of the Coding Unit. The gradient of the Coding Unit is obtained by sobel filtering on the Coding Unit. The sobel filer is a derivate operator that provides gradient information in either of the directions based on the filter coefficients. ( 1 2 1) Sobe= 0 0 0-1- 2 ( 1 0-1) Sobely= 2 0-2 1 0-1 surrounding Coding Units. The predicted motion vectors thus generated are added to the collocated Coding Unit in the reference frame. The prediction unit generated is subtracted from the current Coding Unit and the SAD is generated. For example, consider a block 'r' as shown in figure 1. The collocated SAD for this block is calculated by obtaining the motion vector prediction from the surrounding blocks and adding it to the coordinates of the collocated block in the previous rrame, in this case's'. :'dictcd molioo s vector Gx = Pi-l,j+l + 2 * Pi,j+l + Pi+l,j+l - Pi-l,j-l (n-i) th rrame n th rrame Gy = -2 * Pi,j-l - Pi+l,j-l Pi+l,j-l + 2 * Pi+l,j + Pi+l,j+l - Pi-l,j-l -2 * Pi-l,j - Pi-l,j+l Gradient = Gy + Gx Once the gradient is obtained for 64x64 Coding Unit, the uniformity of the block is decided based on the standard deviation of the constituent blocks. For example, to decide if a 16x16 area in a Coding Unit is uniform, the standard deviation of the constituent 8x8 blocks are calculated and if it is less than a threshold, the blocks are marked as 16x16. The same process is applied for all the block sizes starting for 4x4 to 64x64 iteratively. The threshold is different for different block sizes and it is calculated after extensive tests on a variety of video streams. This method overcomes the problem with small variations in the surrounding pixels by using standard deviation values of the constituent blocks to arrive at the decision. The said method takes care of the intra frames where only spatial prediction, and thus the gradient analysis, is used. This method may not be used for inter frames as the blocks in the inter frame can use either spatial or temporal prediction. Thus, a combination of SAD and gradient analysis is appropriate. First consider the inter blocks in a inter frame. Best matching block from the previous reference frame is subtracted from the current frame and the residual in transformed and encoded. The best partition information cannot be obtained unless all the possible blocks are inter coded and cost determined. This includes motion vector prediction, motion estimation, transform and entropy coding which is cumbersome. Our method uses a combination of motion vector prediction, SAD estimation and gradient information to arrive at a best block partition. This method is described below. The motion vectors of a Coding Unit are closely related to its surrounding units. This information can be used to provide an approximate location in the reference rrame where the best match of the current Coding Unit can be found. Thus, the first step in estimating the inter mode block partition is to calculate the motion vector prediction of the Fig. I shows an example motion vector prediction for calculating the inter mode SA D. Standard deviations of the blocks are calculated using the SAD of the constituent blocks in a quad tree fashion. This gives information regarding the continuity that is the main criterion for deciding the block size. The standard deviation calculated is used against a threshold to decide the block size for inter mode. The threshold for inter mode blocks is calculated by extensive tests on different streams taking into account the quantization parameter used that is directly related to the quality of reference frame reconstructed and thus the SAD. As the Coding Units in the inter rrame may also be coded with intra modes, a check is performed on the gradient of the Coding Unit using the same method that is used in the case of Intra frames. The initial partition layout is thus generated. Once the block partition layout is determined, we need to perform inter mode and intra mode search on the constituent blocks of the layout to fine tune the partition layout. This method offers the flexibility of controlling the complexity with respect to bitrate. For example, consider the layout that is generated after the inter mode and intra mode partition search is performed is as shown in the figure below. BJ 64x64 Ii x DyDzD 32x32 ED p O ro so 16xl6 a 0 b 0 Co do 8x8 Fig. 2 shows an example Coding Unit partition layout with individual blocks color coded. 000316

The brown coloured blocks of the 64x64 Coding Unit are the 32x32 blocks which are hereby named as x, y and z as shown in the figure 2. Similarly, green coloured blocks are 16x16 blocks of the 2n d 32x32 blocks and named as p, r and s as shown in the figure 2. Same is the case with 4 blue coloured 8x8 blocks of the 2n d 16x16 block which are named as a, b, c and d. Once the Coding Unit layout is determined, the search is performed on the block size according to the layout. The first block is marked as 32x32 (x in figure 2) so the search is performed for the best intra mode if it is Intra frame and among intra or inter if it is Inter frame. The search is performed by doing one complete encoder -decoder cycle on the block and by finding the least cost. Now we move to the second block that has a combination of 8x8 (a, b, c and d) and 16x16 blocks (p, rand s). The first sub block of the second block has a depth of 3 * that corresponds to 8x8. So the brute force search is performed on this block for all prediction types, modes and the best mode is selected. This same process is performed on all 8x8 blocks of the 16xl6 block. Finally, these blocks are combined to form a 16x16 block and the cost is calculated. This cost is compared to the sum of the costs of the best modes of the 8x8 blocks. This comparison is done up to two levels, i.e. the best depth given by the analyse module and a depth plus one i.e. block size just greater than the one that is obtained using the partition mode detection described above. This significantly decreases the time consumption because the number of brute force searches is reduced. Now we move on to the 2nd sub block of the second block. As this block depth is 2 corresponding to 16x16(p in figure 2), the search is performed at this level and best is chosen. The same process takes places for 3rd (r in figure 2) and 4 th (s in figure 3) blocks that are 16x16 blocks. Now all these 16x16 blocks are combined to form a 32x32 block and search is performed at that level. These costs are compared and best block size is chosen. Finally search is performed at 64x64 level (as 3 blocks of the 64x64 Coding Unit are marked as 32x32 blocks) and mode with least cost is chosen. This is compared to the sum of the costs of best modes of the individual blocks. "Two pass" search significantly reduces the number of brute force searches performed, because initial information about the partition size is obtained by intra inter partition size search described above. For example the types of searches performed for the Coding Unit with layout as shown in the figure are, 32x32 for block x, yand z o 64x64 search for the complete Coding Unit and compare the cost with the sum of the least costs of the constituent sub blocks. 4 8x8 searches for sub blocks a, b, c and d o 16xl6 search that includes the 4 8x8 blocks. The least cost of 16x16 block is compared with sum of least costs of the constituent 8x8 blocks. 3 16xl6 searches for p, q and r o 32x32 search that includes 3 16x 16 subblocks p, q and rand 4 8x8 sub-blocks a, b, c and d. The least cost of 32x32 is compared with sum of least costs of 4 8x8 sub blocks and 3 16x16 sub blocks. A total of 13 brute force searches are performed when compared to 340 searches that need to be performed for a full search. The best possible scenario for this algorithm is when the partition search module gives the correct layout map.i.e. in which blocks with modes 32x32 are marked with either depth 1 (32x32) or 2(16xI6) by the partition search module. As, in both these cases search is performed for 32x32 block. The worst case is when a 32x32 block is marked as 4x4 or 8x8 by the partition search module that results in less compression. The performance degradation is more in the case when a 4x4 block is marked as 32x32 than a case where a 32x32 block is flagged as 8x8 or lower sizes. From the analysis we performed on different streams, it is observed that a higher block size with uniform texture is most likely correctly marked in the partition search module that results in better compression. The complexity vs. bitrate trade off can be achieved by manipulating the block sizes for which "two pass" search is enabled. By disabling the two pass search for 32x32.i.e. (for blocks that are marked as 32x32 no further search at 64x64 level is performed because the partition search module is sensitive enough to mark 32x32 block correctly in most of the cases). So, once a block is marked as 32x32 no further search is performed for 64x64 block in. This same technique can be applied to all the other block sizes according to the performance required. The results with different settings are discussed in the next section. TV. RESULTS The tests are performed on a set of test streams with different motion and texture variations. The results of the test streams with high motion namely BasketballDrill, BQMall, PartyScene, RaceHorses is presented here. Two different configuration settings, Random access and Low delay, are used to cover a broad range of use cases. These are similar to the settings "random access and low delay" configurations of the HEVC Test Model (HM) Reference Software. Random access configuration includes support for B frames and large motion search area and low delay corresponds to only I and P frames with low coding delay. The results are shown in Table I and Table 2. The Table 1 corresponds to configuration with hierarchical B coding enabled and other with hierarchical B coding disabled. On an average random access configuration gives a 1.23 increase in bit rate with a 0.19 db decrease in Y PSNR and 0.12 db decrease in U and V PSNR values. The complexity is reduced by 3x on an average. * The quad tee partition ofthe Coding unit is partitioned recursively. So the 64x64 block ofthe coding unit is said to have a depth ofo. Similarly, the 4 32x32 blocks ofthe 64x64 have a depth of I, 16xl6 blocks have a depth of2, 8x8 blocks have a depth of 3 and 4x4 blocks have a depth of 4. 000317

y u V QPISlice kbps psnr psnr psnr EncT SaskelbaliDrili 22 4375.06 39.58 42.86 43.37 1400.04 27 2107.17 36.36 40.42 40.46 1280.17 32 1000.23 33.39 37.69 37.36 1183.76 QPISlice kbps psnr psnr psnr SaskelbaliDrili 22 2286.44 40.48 43.28 43.03 27 1231.64 36.84 40.47 39.85 32 616.44 33.38 37.86 36.82 EncT 283.75 237.81 208.19 37 530.21 31.05 35.34 34.86 1128.12 37 313.62 30.52 35.70 34.28 196.41 SQMall 22 4379.97 39.57 43.38 44.80 1623.10 27 2116.27 36.89 41.44 42.43 1515.38 32 1063.15 33.94 39.30 40.03 1432.74 37 588.29 31.37 37.38 37.95 1394.86 PartyScene 22 7706.34 37.44 41.24 42.28 1310.17 27 3478.21 33.89 38.72 39.60 1152.33 32 1600.98 30.68 36.31 37.06 1063.98 SQMall 22 4476.75 38.20 41.11 41.81 27 1942.68 33.81 38.98 39.70 32 585.57 29.72 37.33 37.88 37 179.15 26.37 36.24 36.32 PartyScene 22 2887.81 37.77 39.97 40.49 27 1185.27 33.47 37.04 37.53 32 484.65 29.70 34.60 35.07 309.43 255.93 205.52 175.70 258.93 201.25 165.48 37 780.18 28.00 34.38 34.95 1011.64 37 205.47 26.79 32.68 33.17 144.09 RaceHorses 22 5478.69 38.37 41.14 42.53 1177.40 27 2400.65 35.00 38.71 40.27 1039.68 32 1119.05 31.96 36.38 38.04 930.99 37 554.48 29.54 34.60 36.11 858.30 RaceHorses 22 1805.35 39.66 41.10 42.09 27 910.45 35.69 38.21 39.25 32 433.25 32.02 35.54 36.54 37 208.43 29.27 33.36 34.24 170.76 144.92 124.99 113.72 Table 1: Full bruteforce searchfor Random access configuration Table 3: Full bruteforce search for low delay configuration. QPISlice kbps psnr psnr psnr EncT SaskelbaliDrili 22 4496.98 39.36 42.64 43.12 811.84 27 2124.82 36.17 40.22 40.23 700.42 32 998.37 33.25 37.54 37.18 593.70 QPISlice kbps psnr psnr psnr SaskelbaliDrili 22 2531.40 40.22 42.93 42.71 27 1329.12 36.62 40.15 39.56 32 659.59 33.08 37.54 36.45 EncT 138.10 120.27 95.66 37 525.37 30.93 35.23 34.75 526.63 37 330.28 30.23 35.37 33.93 79.72 SQMall 22 4601.56 39.34 43.20 44.59 944.46 27 2182.30 36.60 41.30 42.29 798.63 32 1071.64 33.64 39.19 39.91 723.78 37 584.62 31.10 37.30 37.85 656.57 PartyScene 22 8105.59 37.26 41.03 42.04 875.11 27 3603.63 33.60 38.51 39.38 673.78 32 1623.21 30.34 36.17 36.93 538.19 37 767.30 27.67 34.28 34.90 456.18 RaceHorses 22 4496.98 39.36 42.64 43.12 811.84 27 2124.82 36.17 40.22 40.23 700.42 32 998.37 33.25 37.54 37.18 593.70 37 525.37 30.93 35.23 34.75 526.63 SQMall 22 4816.47 38.07 40.93 41.64 27 2041.54 33.58 38.90 39.57 32 581.22 29.36 37.33 37.81 37 177.45 25.96 36.28 36.27 PartyScene 22 3307.51 37.53 39.64 40.12 27 1301.96 33.09 36.86 37.27 32 526.17 29.17 34.40 34.83 37 209.37 26.35 32.50 32.92 RaceHorses 22 2067.34 39.56 40.97 41.88 27 1018.12 35.53 37.98 39.05 32 476.35 31.81 35.30 36.31 37 221.01 29.02 33.08 33.97 187.98 161.27 108.10 78.82 142.00 110.12 77.93 58.60 101.60 86.41 67.75 54.78 Table 2: Reduced search (discussed algorithm) for Random access configuration (hierarchical B-frames). Table 4: Reduced search (discussed algorithm) for Low Delay configuration. One the other hand, the low delay configuration results in a 5.88 percent increase in bit rate with 0.28 db decrease in PSNR value for Y and 0.12 and 0.13 db decrease in PSNR values for U and V respectively. These results are presented in Table 3 and 4. 000318

40 38 >;- 36 0:: Z 34 (j) 0... 32 30 28 Bitrate vs PSNR for Random Access Configuration W'" -Present ---+-- Original 26 28 30 32 34 36 38 40 1 Q*log(bitrate) Figure 3: The Bitrate vs. PSNR graph for the BasketballDrill Stream with Random access configuration. less than 1 db. Tn the future this method can be extended to find the global best possible partition by taking into account the overall layout of the Coding Units in a frame or multiple frames thus reducing the bitrate. REFERENCES [I] Efficient Block-Size Selection Algorithm for Inter frame coding in H.264/MPEG-4 A YC, Andy C. Yu [2] Advanced Block Size Selection Algorithm For Inter Frame Coding inh.264/mpeg-4 A YC, Andy C. Yu and Graham R. Martin [3] Efficient Intra- and Inter-mode Selection Algorithms for H.2641 AVC, Andy C. Yu, Ngan King Ngi, Graham R. Martin [4] Low Complexity H.264 Yideo Encoding, Paula Carrillot, Hari Kalvat, and Tao Pint. tdept. of Computer Science and Technology, Tsinghua University, Beijing, China 1Dept. of Computer Science and Engineering, Florida Atlantic University, Boca Raton, FL, USA [5] Low Complexity Skip Prediction for H.264 through Lagrangian Cost Estimation, C. S. Kannangara, 1. E. G. Richardson, M. Bystrom, J. Sol era, Y. Zhao, A. MacLennan and R. Cooney The graph clearly illustrates that the present algorithm RD curve closely follows the Original. Bitrate vs PSNR for Low Delay 42 40 38 >;- 36 0:: Z 34 (j) 0... 32 30... -Present ---+-- Original 28 26 22 24 26 28 30 32 34 1 Q*log(bitrate) Figure 4: The Bitrate vs. PSNR graph for the BasketballDrill Stream with Low delay configuration. Please note that logarithm of bitrate is taken for both the graphs as the variation in bitrate for change in QP is non linear. The QP values 16, 20, 24 and 28 are used to collect the data. V. CONCLUSION The test results clearly show that there is a significant reduction in complexity for a small trade-off in bitrate and PSNR values. There is a 3x decrease in complexity on an average with less than 5 % increase in bitrate on an average. The complexity can be further reduced to'!. of the present by disabling the two pass search for 32x32 and 64x64 block sizes. This average decrease in PSNR values in this case is 000319