Fast frame memory access method for H.264/AVC

Similar documents
High Efficient Intra Coding Algorithm for H.265/HVC

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot

H.264 to MPEG-4 Transcoding Using Block Type Information

Reduced Frame Quantization in Video Coding

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

An Efficient Mode Selection Algorithm for H.264

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

Performance Comparison between DWT-based and DCT-based Encoders

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

An Efficient Table Prediction Scheme for CAVLC

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

MultiFrame Fast Search Motion Estimation and VLSI Architecture

Edge Detector Based Fast Level Decision Algorithm for Intra Prediction of HEVC

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

Pattern based Residual Coding for H.264 Encoder *

Fast Mode Decision for H.264/AVC Using Mode Prediction

Homogeneous Transcoding of HEVC for bit rate reduction

EE Low Complexity H.264 encoder for mobile applications

Optimizing the Deblocking Algorithm for. H.264 Decoder Implementation

Performance Analysis of DIRAC PRO with H.264 Intra frame coding

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

Multi-Grain Parallel Accelerate System for H.264 Encoder on ULTRASPARC T2

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

Digital Video Processing

Advanced Video Coding: The new H.264 video compression standard

Video Coding Using Spatially Varying Transform

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

Low-cost Multi-hypothesis Motion Compensation for Video Coding

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

Realtime H.264 Encoding System using Fast Motion Estimation and Mode Decision

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

White paper: Video Coding A Timeline

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264.

Lossless Frame Memory Compression with Low Complexity using PCT and AGR for Efficient High Resolution Video Processing

Smart Bus Arbiter for QoS control in H.264 decoders

Block-based Watermarking Using Random Position Key

Xin-Fu Wang et al.: Performance Comparison of AVS and H.264/AVC 311 prediction mode and four directional prediction modes are shown in Fig.1. Intra ch

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder

BLOCK MATCHING-BASED MOTION COMPENSATION WITH ARBITRARY ACCURACY USING ADAPTIVE INTERPOLATION FILTERS

A Dedicated Hardware Solution for the HEVC Interpolation Unit

Video Encoding with. Multicore Processors. March 29, 2007 REAL TIME HD

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

High Efficiency Video Coding (HEVC) test model HM vs. HM- 16.6: objective and subjective performance analysis

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

Fast Transcoding From H.264/AVC To High Efficiency Video Coding

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

NEW CAVLC ENCODING ALGORITHM FOR LOSSLESS INTRA CODING IN H.264/AVC. Jin Heo, Seung-Hwan Kim, and Yo-Sung Ho

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 9, SEPTEMBER

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

Next-Generation 3D Formats with Depth Map Support

High-Throughput Parallel Architecture for H.265/HEVC Deblocking Filter *

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile

A deblocking filter with two separate modes in block-based video coding

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

Fast Motion Estimation for Shape Coding in MPEG-4

Digital Image Stabilization and Its Integration with Video Encoder

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

Fast Intra Prediction Algorithm for H.264/AVC Based on Quadratic and Gradient Model

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

Complexity Reduction Tools for MPEG-2 to H.264 Video Transcoding

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION

A full-pipelined 2-D IDCT/ IDST VLSI architecture with adaptive block-size for HEVC standard

Video compression with 1-D directional transforms in H.264/AVC

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Comparative and performance analysis of HEVC and H.264 Intra frame coding and JPEG2000

FAST ALGORITHM FOR H.264/AVC INTRA PREDICTION BASED ON DISCRETE WAVELET TRANSFORM

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

A Motion Vector Predictor Architecture for AVS and MPEG-2 HDTV Decoder

DISPARITY-ADJUSTED 3D MULTI-VIEW VIDEO CODING WITH DYNAMIC BACKGROUND MODELLING

Improving the quality of H.264 video transmission using the Intra-Frame FEC over IEEE e networks

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Video coding. Concepts and notations.

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

Mark Kogan CTO Video Delivery Technologies Bluebird TV

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Video Compression An Introduction

IN RECENT years, multimedia application has become more

Transcription:

Fast frame memory access method for H.264/AVC Tian Song 1a), Tomoyuki Kishida 2, and Takashi Shimamoto 1 1 Computer Systems Engineering, Department of Institute of Technology and Science, Graduate School of Engineering, Tokushima University, Minami-Jyosanjima 2 1, Tokushima City, 770 8506, Japan 2 Department of Electrical and Electronic Engineering, Graduate School of Engineering, Tokushima University, Minami-Josanjima 2 1, Tokushima City, 770 8506, Japan a) tiansong@ee.tokushima-u.ac.jp Abstract: This paper presents an efficient memory access interface architecture for H.264/AVC encoder. In the implementation of H.264/AVC encoder, the bandwidth compression of frame memory becomes a challenging issue due to some bandwidth intensive coding tools, such as multiple frames motion estimation, deblocking filter and IN- TRA mode decision. In this work, by analyzing the memory access patterns of each coding function module of H.264/AVC, an efficient memory access method for the Direct Memory Access (DMA) module is proposed. The proposed method carefully designed an efficient memory mapping method to decrease the memory response delay. Simulation results show that over 50% memory access cycles can be saved by using proposed method. Keywords: H.264/AVC, VLSI, SDRAM, bandwidth compression Classification: Integrated circuits References [1] Joint Video Team of ISO/IEC MPEG and ITU-T VCEG, Draft ITU- T recommendation and final draft international standard of joint video specification (ITU-T Rec. H.264-ISO/IEC 14496-10 AVC), March 2003. [2] U. Bayazit, L. Chen, and R. Rozploch, A novel memory compression system for MPEG-2 Decoders, Proc. IEEE Int. Conf. Consum. Electron. (ICCE), pp. 56 57, 1998. [3] J. Tajime and Y. Miyamoto, A frame memory compression method for H.264 decoders, IEICE general conf., D-11 35, March 2006. [4] P. Zhang, W. Gao, D. Wu, and D. Xie, An efficient reference frame storage scheme for H.264 HDTV decoder, Proc. Int. Conf. Multimedia & Expo, pp. 361 364, July 2006. [5] H. Kim and I. C. Park, High-performance and low-power memoryinterface architecture for video processing applications, IEEE Trans. Circuits Syst. Video Technol., vol. 11, no. 11, pp. 1160 1170, Nov. 2001. 344

1 Introduction H.264/AVC [1] which can achieve high coding efficiency at variable bit rate is used in a variety of practical applications. H.264/AVC inherits the MC-DCT based hybrid structure which is also recommended by some other traditional standards. It employs the inter frame prediction and integer DCT to reduce the redundancy of high frequency gradient. Addition to these traditional algorithms, H.264/AVC introduces several new coding tools by which can highly improve the coding efficiency. In these coding tools the exhausted precoding process, named rate-distortion optimization (RDO), takes over 80% of the total computation complexity. The RDO process performs multiple frames referenced motion estimation, 1/4 pixel precision motion estimation, deblocking filter and 13 types INTRA mode coding. However, along with high coding gain, these new coding tools drastically increase the computation complexity as well as memory bandwidth. In a typical hardware implementation of H.264/AVC encoder, reference frames are temporarily saved in external frame memory, commonly in SDRAM. When macroblocks are encoded one by one, coding function modules access the SDRAM to read the current and reference macroblock data for each macroblock. In order to realize realtime encoding for H.264/AVC applications with full HD resolution, about 4.6 5 GB/s bandwidth is necessary. However, when using current DDR3 technology only 3.2 GB/s can be achieved. With the increasing demands of H.264/AVC applications, the memory interface solution becomes an important research issue. Some approaches which concentrate on data compression to cut down frame memory consumption are proposed [2, 3]. However, these proposed methods cut down the memory consumption at the sacrifice of the image quality such as a simple 5-bits quantization. A memory mapping approach which arranges the pixel data access for sub-pixel data have been proposed [4]. However, this proposal will increase the memory consumption. Another study about memory address generation has been proposed to optimize memory interface [5]. However, this proposed method is not suitable for H.264/AVC. In this paper, considering the features of frame memory access patterns of function modules, we introduce a novel Direct Memory Access (DMA) for H.264/AVC. 2 Features of memory access patterns and SDRAM command Considering a macroblock-order based encoding engine of H.264/AVC, many coding function modules may require pixel data of current or reference macroblock from frame memory. Many coding functions have to be performed to each macroblock one by one due to the correlation between the adjacent macroblocks. Based on data access features of each coding function module, we classify all memory access patterns to two groups: ME and MC groups. Motion estimation module, including integer pixel, sub-pixel motion estimation, and multiple frames motion estimation, always access the frame memory for reference pixels in a certain search range. We classify these memory acc IEICE 2008 345

Fig. 1. Data access request patterns of ME and MC cess patterns into ME modules group. Another three function modules, INTRA mode decision, motion compensation, and deblocking filter modules, always access the left or the upper macroblock to current coding macroblock to read the reference data. We classify these memory access patterns into MC modules group. The pixel data request patterns for the ME and MC modules groups are shown in Fig. 1. The function modules of ME perform motion estimation in a certain search range (typically ±16). After the motion estimation process for one macroblock, reference data for next macroblok need to be read from frame memory. As shown in Fig. 1, the memory request patterns for ME are typically four macroblocks, which located right or under the current search range. On the other hand, the memory request patterns for MC modules are always the pixel data of current macroblock and one encoded macroblock that located left or up to the current macroblock. These required pixel data need to be read out from SDRAM with no response delay. A typical SDRAM access control can be described as several continuous command generations. First, the bank of the SDRAM and line address are generated, followed by the low address generation. Using burst mode of SDRAM, multiple words in the same line address can be accessed in continuous cycles. However, when the required data are saved in different lines, the line address command has to be updated. This line address update will induce access delay of several cycles. To avoid this memory response delay, all the required data have to be mapped in the same line address. If the required data are mapped in different banks, line address for different banks has to be updated in advance to conceal the response delay [5]. 3 Proposed method In this work, a memory mapping method on the basis of the memory access patterns to reduce memory access delay is proposed. In this work, a typical SDRAM with 4 banks, data width of 32 bits, and 256 words in one line is used. As discussed in the previous section, pixel data of four adjacent macroblocks need to be read out from frame memory for ME modules. The pixel 346

Fig. 2. Proposed memory mapping method data of current and the upper coding macroblock are also need to be read out for MC modules. To realize no delay access of continuous pixel data, we proposed a memory mapping method. The proposed frame memory mapping method is shown in Fig. 2. As shown in Fig. 2, four continuous macroblocks in each line are collected as a group. Continuous groups in one line are mapped into different banks and the groups in adjacent line are mapped in different banks. This mapping method can realize no delay access. A0, B0, C0, D0, which are shown in the Fig. 2 indicate four macroblocks groups in which four macroblocks are included. The four macroblock groups are saved in different banks (Bank A, B, C, D). In this case, any group can be read out by ME modules without access delay, because all of the four macroblocks in each group are located in the same address line. In the case of B1, D1, B2, D2 pattern, all of the four macroblocks are mapped in two different banks, no access delay will occur. In the case of A4, B4 pattern, these reference data can be read out for ME without access delay, because the pixel data of first two macroblocks (A4) are mapped in the same line, and the second two macroblocks (B4) are mapped in different banks. On the other hand, for the access pattern of MC, a typical sample such as B3, C3, D3 pattern will not induce any access delay because B3 and D3 are mapped in difference banks and the C3, D3 are also mapped in different banks. When the C3, D3 are mapped in the same bank, it will also not induce access delay because they are mapped in the same line. Proposed memory mapping method is suitable for almost hardwareoriented algorithms except for those algorithems with random search patterns. 4 Simulation Results Proposed method is evaluated from the viewpoint of memory access cycle reduction. The DMA has to response to the MC and ME modules respectively. Due to the random data length of ME module, the memory bandwidth 347

Table I. Access requests reduction of the proposed method QCIF CIF VGA HDTV720p HDTV1080i Previous mapping 1,282 5,334 16,422 49,672 113,146 Proposed mapping 610 2,410 7,258 21,688 49,094 Reduction rate (%) 52.4 54.8 55.8 56.3 56.6 reduction is difficult to be evaluated directly. In this work, the average reduction number of SDRAM access request is used to evaluate the memory bandwidth. A dummy ME module which emulates the frame memory access patterns for motion estimation and a dummy MC module that emulates the frame memory access patterns for INTRA mode decision and deblocking filter are described by Verilog-HDL, as well as a DMA module. Depending on the motion estimation algorithms, the access timing and request numbers may be different. Therefore, random number is used for numbers of the required macroblock for ME at random intervals. The method that all the pixel data are saved in the same bank are defined as previous mapping method. The comparison results between the previous method and the proposed method are shown in Table I. As shown in Table I, compared with the previous method the proposed mapping method can cut down over 50% access requests. Furthermore, the proposed method can realize stable cycle reduction rate at any bitrate. 5 Conclusion In this paper an efficient memory mapping method and an embedded SRAM method are proposed to realize efficient bandwidth compression for H.264/AVC encoder. Proposed method analyzed the memory access patterns of each coding function module of H.264/AVC and classified all these function modules to two groups, namely two access patterns. Then we proposed an efficient memory mapping method by which can achieve no delay memory access. Using this method, proposed architecture can save over 50% access cycles than previous method. Proposed method has been verified to be efficient memory access methods for typical H.264/AVC-dedicated hardware encoder implementations. Cut down the memory access frequency will directly help to realize efficient memory bandwidth compression. However, how much these two proposed methods can save the total bandwidth of H.264/AVC depend on the implementation method of the appropriate motion estimation, INTRA mode selection, and deblocking filter modules. 348