BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

Similar documents
A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

Adaptation of Scalable Video Coding to Packet Loss and its Performance Analysis

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

Optimum Quantization Parameters for Mode Decision in Scalable Extension of H.264/AVC Video Codec

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

MCTF and Scalability Extension of H.264/AVC and its Application to Video Transmission, Storage, and Surveillance

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Scalable Video Coding

Fraunhofer Institute for Telecommunications - Heinrich Hertz Institute (HHI)

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

Smoooth Streaming over wireless Networks Sreya Chakraborty Final Report EE-5359 under the guidance of Dr. K.R.Rao

Bit Allocation for Spatial Scalability in H.264/SVC

Scalable Video Coding in H.264/AVC

Performance Comparison between DWT-based and DCT-based Encoders

Fast frame memory access method for H.264/AVC

Advanced Video Coding: The new H.264 video compression standard

On the impact of the GOP size in a temporal H.264/AVC-to-SVC transcoder in Baseline and Main Profile

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

(Invited Paper) /$ IEEE

High Efficiency Video Coding. Li Li 2016/10/18

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

Recent, Current and Future Developments in Video Coding

Advances of MPEG Scalable Video Coding Standard

Unit-level Optimization for SVC Extractor

Cross-Layer Optimization for Efficient Delivery of Scalable Video over WiMAX Lung-Jen Wang 1, a *, Chiung-Yun Chang 2,b and Jen-Yi Huang 3,c

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

Scalable Extension of HEVC 한종기

Research on Transcoding of MPEG-2/H.264 Video Compression

High Efficient Intra Coding Algorithm for H.265/HVC

ARCHITECTURES OF INCORPORATING MPEG-4 AVC INTO THREE-DIMENSIONAL WAVELET VIDEO CODING

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

VIDEO COMPRESSION STANDARDS

Homogeneous Transcoding of HEVC for bit rate reduction

In the name of Allah. the compassionate, the merciful

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

IN RECENT years, multimedia application has become more

ALGORITHM AND ARCHITECTURE DESIGN FOR INTRA PREDICTION IN H.264/AVC HIGH PROFILE

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

SCALABLE HYBRID VIDEO CODERS WITH DOUBLE MOTION COMPENSATION

Digital Video Processing

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

Fast Mode Decision for H.264/AVC Using Mode Prediction

VHDL Implementation of H.264 Video Coding Standard

EE Low Complexity H.264 encoder for mobile applications

Sergio Sanz-Rodríguez, Fernando Díaz-de-María, Mehdi Rezaei Low-complexity VBR controller for spatialcgs and temporal scalable video coding

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

The Scope of Picture and Video Coding Standardization

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

WITH the growth of the transmission of multimedia content

Department of Electrical Engineering, IIT Bombay.

An Efficient Mode Selection Algorithm for H.264

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Video Transcoding Architectures and Techniques: An Overview. IEEE Signal Processing Magazine March 2003 Present by Chen-hsiu Huang

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC

VIDEO AND IMAGE PROCESSING USING DSP AND PFGA. Chapter 3: Video Processing

Wavelet-Based Video Compression Using Long-Term Memory Motion-Compensated Prediction and Context-Based Adaptive Arithmetic Coding

Week 14. Video Compression. Ref: Fundamentals of Multimedia

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

MPEG-2. And Scalability Support. Nimrod Peleg Update: July.2004

ECE 634: Digital Video Systems Scalable coding: 3/23/17

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

ADVANCES IN VIDEO COMPRESSION

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

Xin-Fu Wang et al.: Performance Comparison of AVS and H.264/AVC 311 prediction mode and four directional prediction modes are shown in Fig.1. Intra ch

NEW CAVLC ENCODING ALGORITHM FOR LOSSLESS INTRA CODING IN H.264/AVC. Jin Heo, Seung-Hwan Kim, and Yo-Sung Ho

Advanced Encoding Features of the Sencore TXS Transcoder

IMPROVED CONTEXT-ADAPTIVE ARITHMETIC CODING IN H.264/AVC

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264.

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Quality Scalable Low Delay Video Coding using Leaky Base Layer Prediction

Low-complexity transcoding algorithm from H.264/AVC to SVC using data mining

Introduction to Video Coding

Frequency Band Coding Mode Selection for Key Frames of Wyner-Ziv Video Coding

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

Evaluation of H.264/AVC Coding Elements and New Improved Scalability/Adaptation Algorithm/Methods

Fast Mode Assignment for Quality Scalable Extension of the High Efficiency Video Coding (HEVC) Standard: A Bayesian Approach

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Reduced Frame Quantization in Video Coding

Video Compression Standards (II) A/Prof. Jian Zhang

Digital Image Stabilization and Its Integration with Video Encoder

Pattern based Residual Coding for H.264 Encoder *

Video compression with 1-D directional transforms in H.264/AVC

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Transcription:

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute of Electronics Engineering, National Taiwan University, Taipei, Taiwan Email: {ttchen, peterchuang, yjchen, lgchen}@video.ee.ntu.edu.tw ABSTRACT Scalable video coding is the state-of-art video coding standard for the future streaming applications over heterogeneous networks. Although SVC is based on H.264/AVC, some new coding tools increase the its difficulty in encoder hardware design, especially in memory issues. In this paper, hierarchical B-frame structure and fine granularity scalability are introduced and their memory bandwidth overhead are analyzed. Two method, adaptive spatial-temporal hierarchical ME and scan bucket algorithm are proposed to reduced bandwidth overhead 55% and 88%, respectively, while these algorithms have almost no quality degradation. The concept of proposed encoder framework can be further applied in high definition SVC encoder designs. 1. INTRODUCTION Video decimation decimation layer 2 Motioncompensated and intra prediction layer 1 Motioncompensated and intra prediction layer 0 texture/motion/residual texture/motion/residual SNR enh. layer coding Base-layer entropy coding SNR enh. layer coding Base-layer entropy coding SNR enh. layer coding Scalable Multiplex In the past years, video coding efficiency is always the main target of traditional video coding standards, such as MPEG- 2 and H.264/AVC. To meet the requirements from prevalent streaming multimedia applications over heterogeneous network, the ISO/IEC and ITU-T VCEG from the Joint Video Team (JVT) start to call for proposals of scalable video coding (SVC). These proposals are merged into the a Joint Scalable Video Model (JSVM) as the scalable extension of H.264/AVC [1][2]. Currently, the scalable baseline profile is finalized in July 2007, and the other profiles will be finalized soon. The SVC provides three main types of scalability, temporal scalability, spatial scalability, and SNR scalability so that SVC can be used for multi-resolution, frame-rate adaptation, or bandwidth-fluctuated transmission applications. It means that various network streaming applications such as mobile phone, personal computer, and high-definition TV (HDTV) can display the same video with different specifications from one scalable-encoded. Figure 1 illustrates the encoder structure of SVC encoder with 3 spatial layers. In SVC, temporal scalability is achieved by hierarchical B-frame coding structure [3] instead of wellknown IBBP prediction scheme in MPEG-2/4; spatial scal- Motioncompensated and intra prediction Base-layer entropy coding Fig. 1. SVC encoder structure with three spatial layers. ability is based on pyramid coding scheme and many adaptive inter-layer prediction techniques are adopted to exploit the inter-layer correlations between layers; SNR scalability is realized by adding enhancement layer information and embedded bit-plane coding techniques. SVC also allows on-the-fly bit-stream adaptation in these three dimensional scalabilities according to the receiver applications and network conditions. In our previous work[4, 5], the VLSI architecture of multilevel 5/3 and 1/3 MCTF coding scheme has been analyzed and memory issue is regarded as an important design challenges in SVC hardware design. In this paper, we will consider the whole SVC encoder which also includes fine granularity scalability (FGS) and propose a bandwidth-efficient SVC encoder framework. The differences of hardware be-

tween H.264/AVC and SVC are discussed and the new design challenges are addressed. In temporal prediction engine, an adaptive spatial-temporal hierarchical motion estimation is adopted by utilizing the information between spatial layers and hierarchical B-frame scheme. In FGS, a scan bucket algorithm is proposed to solve huge external memory bandwidth from frame-level data accessing. Then a case study is given to evaluate the performance of proposed bandwidth-efficient SVC encoder framework. This paper is organized as follows. In Section 2, the three main scalabilities of SVC and their corresponding techniques are analyzed. Then, the design challenges of an SVC encoder and the proposed algorithms in SVC encoder framework are given in section 3 and 4, respectively. A case study is addressed in section 5. Section 6 will conclude this paper. 2. SCALABLE VIDEO ENCODER STRUCTURE In this section, we will introduce the new-adopted coding techniques, hierarchical B-frame, inter-layer prediction, and FGS, for three main scalabilities in SVC, respectively. 2.1. Hierarchical B-frame in Temporal Scalability Temporal scalability allows single to be decoded and displayed at multiple frame rates. In previous coding standards, only very few choices of frame rate can be supported. In SVC, multi-level hierarchical B-frame [3] is utilized to support temporal scalability so that more decoded frame rate can be provided. As shown in Fig. 2, a 3-level hierarchical B-frame coding structure is composed of 8 consecutive frames. Since hierarchical B-frame can exploit more temporal correlations among one group-of-pictures (GOP), the coding efficiency can be improved with efficient framelevel bit allocation strategy [3]. However, in hierarchical B-frame structure, the long temporal distance between lower temporal levels frames implies the difficulty in temporal prediction. For example, in Fig. 2, the movement of the object between two key frames could be eight times than that of two neighboring B frames. That is, in key frames motion estimation (ME), the searching range should be enlarged in order to keep the efficiency of temporal prediction. As described in [6], in video encoder hardware design, the searching range of ME directly influences the requirements of external memory bandwidth and internal memory size if the full-search ME is applied. 2.2. Inter-layer Prediction in Scalability scalability in SVC is realized by decimating the original video sequences into a set of pyramid videos as shown in Fig. 1. In each spatial layer, the original or decimated frames are predicted and encoded as the same with H.264/AVC. To improve the coding efficiency of spatial scalability mode, the Display order Key Key I 0 /P 0 B 3 B 2 B 3 B 1 B 3 B 2 B 3 I 0 /P 0 0 1 2 3 4 5 6 7 8 coding order 0 5 3 6 2 7 4 8 1 Group of Pictures (GOP) Fig. 2. Closed-loop hierarchical B-frame coding scheme in SVC redundancy information among spatial layers are removed by the adaptive inter-layer prediction techniques : intra texture, motion and residual prediction. These data are upasmpled from base layer (decimated frames) according to the scaling ratio between layers after the smaller frames are encoded. 2.3. FGS in SNR Scalability In SVC, SNR scalability can be provided via two main strategies, coarse grain scalability (CGS) and fine grain scalability (FGS). However, CGS can only provide several pre-defined quality points. On the other hand, FGS is aimed arbitrarily truncate the original to provide more flexibility while maintaining good coding performance throughput whole frame. Figure 3 shows the block diagram of FGS encoding in case of three enhancement layers. First, four SNR layers are generated from four cascaded reconstruction loops with different QPs. The QP step size is six between adjacent layers. In each enhancement layer, only the differences between transformed coefficients and accumulated inverse-quantized ones from the preceding layers are coded. According to the coefficients in preceding layers, the coefficients in enhancement layers are classified into two categories, i.e. new coefficients (NCs) and refinement coefficients (RCs). NC is a coefficient which is never significant in all preceding layers. The whole value of NC should be coded; RC is a coefficient which is significant in any preceding layers. The value of RC is limited in {-1, 0, +1}. In Fig. 3, the quantized coefficients in enhancement layers are classified into NCs and RCs, and then the values of RCs are truncated within -1 to 1. To achieve progressive improvement on entire frame, FGS coding order applies multiple scans through every macroblocks (MBs) in one frame, instead of the traditional MB-by-MB order. As shown in Fig. 3, the coefficients in entire frame are stored in frame memory, and the FGS scan and entropy coding of enhancement layers are frame-level operations. Fig. 4 shows the FGS coding order in JSVM 7 with only four blocks in one frame and eight coefficients in one block for simplicity. In every scan, all blocks in the whole frame are examined in

residue T Q 0 base layer reconstructed residue (I/P frame) IT Norm. IQ 0 Q 1 Truncate Classify Memory FGS enh. layer 1 IQ 1 Classify Norm. Q 2 Truncate Memory FGS enh. layer 2 IQ 2 Classify Norm. Q 3 Truncate Memory FGS enh. layer 3 reconstructed residue (B frame) IT IQ 3 ( QP n = QP n-1-6 ) MB-level operation -level operation Fig. 3. Block diagram of FGS encoding in H.264/AVC scalable extension Block 0 Block 1 Block 2 Block 3 0 1 A~G Coeff. in zigzag scan 0 0 0 1 0 A 0 0 B 0 1 C 0 0 0 0 1 1 D 0 0 0 0 0 0 E F 0 G 1 0 0 : NC = 0 (insignificant NC) : NC 0 (significant NC) : RC (a) Coefficients 0 1 2 3 4 5 6 Fig. 4. Example of FGS scan Block 0 Block 1 Block 2 Block 3 0,0,0,1 B 1 0,0,1 0,1 1 E D F C NC end NC end NC end G A NC end (b) order turn. In the k-th scan, the k-th coefficient in zigzag scan is first considered. There are two conditions. If the k-th coefficient is NC, allncs before next significant one are coded, but RCs in the path are skipped. If the k-th coefficient is RC, only this coefficient is coded. As a result, the coding order in Fig. 4 (b) is {0,0,0,1}, B, 1, {0,0,1}, {0,1}, 1, E, D,..., G, A, NC end. 3. EXTERNAL MEMORY BANDWIDTH OVERHEAD ANALYSIS The main external bandwidth (BW) overhead of SVC compared to previous H.264/AVC encoder can be compiled into two main categories. The first is from the larger searching range data requirement of ME. The another is the frame-level data accessing of FGS. In previous video encoder designs [7], Level C data reuse scheme is applied to save the external memory bandwidth of searching range data. If the searching range (SR) is Horizontal : [ SR H,SR H ) and Vertical : [ SR V,SR V ), the current block (CB) of size N N and the corresponding search region are as shown in Fig. 5. The required SR memory size and external BW for one B-frame can be roughly modeled as follows : Mem.size =(2N +2SR H ) (2SR v + N) 2 (1) BW = N (2SR v + N) 2 (# of MB) (frame rate). (2) For a 4CIF 30Hz video sequences with searching range SR H = 64 and SR V =64, about 46 KBytes memory size and 219 MByte/sec bandwidth are required. However, if temporal distance of key frames is taken into consideration, for a 4-level hierarchical B-frame structure composed of 16 frames, the SR H and SR V may be enlarged according to temporal distance and induce higher memory requirements. For FGS, in the case of 4CIF 30Hz with three enhancement layers, direct implementation of the three frame memories in Fig. 3 takes 3 704 576 13bits 8bits 1.5(luma and chroma) =2.97MBytes while the coefficient length is assumed to be 13 bits. They are too huge to be implemented as on-chip SRAMs. Therefore, storing these enhancement layer coefficients in external DRAM for accessing is a must. However, the frame-level operation of FGS scan needs to access external memory up to 178 MBytes/sec (2.97 MBytes 30 frames per second read/write).

N Search Region 0 2SR H -1 N Search Region 1 QCIF CIF 2SR V + N-1 CB 0 CB 1 4CIF Centric Moving Row Buffer ±4 Refine Candidates 720p HD Fig. 5. The current block (CB) and search region for block matching algorithm motion estimation. SR H and SR V represent the horizontal and vertical searching range, respectively. Note that these BW number will be even higher if more than one spatial layer is considered. Therefore, it is necessary to reduce the large external memory bandwidth so that the system bus congestion can be relieved, and the huge power consumption from external memory access can be avoided, too. Fig. 6. The concept of adaptive spatial-temporal hierarchical ME scheme. The blue lines depict the whole frame; the yellow regions are the loaded searching range regions for ME; the red line represents the derived motion vectors. Key frame ME SR : H ±96, V ± 48 (25088 bytes) Curr. MB 4. PROPOSED HARDWARE-ORIENTED BANDWIDTH-SAVING ALGORITHM 4.1. Adaptive -Temporal Hierarchical ME To alleviate the large external BW and internal SR memory size from hierarchical B-frame, we adopt the concept of hierarchical ME into SVC encoding structure. As shown in Fig. 6, in smaller frames, the full-search block matching algorithm (FSBMA) is applied and a sufficient SR is assigned to insure the best coding efficiency. For larger frames such as 4CIF or HD resolution, since SVC utilize pyramid structure to achieve spatial scalability and the smaller frames must be encoded first, the base-layer motion vectors are upsampled to predict the possible locations of current motion vectors. First, the upsampled motion vectors of the MBs in the same row will be gathered to find the most possible vertical range and we perform the Level C data reuse in this region as centric moving row buffer (CMRB). Note that the vertical length of CMRB is much narrow compared to maximal vertical motion vector range so that the external BW can be reduced more. To keep the coding efficiency for sequences of larger or diverse motion, once the upsampled motion vector predictors are out of CMRB, the small yellow region in Fig. 6 will be loaded into SR memory for motion refinement. In Fig. 6, the refinement region is composed of 32 32 pixels due to ±4 refinements and 6-tap interpolation filter in fractional ME. Besides, to overcome the larger SR requirement of key frames without much memory size overhead, we combined the two SR memory of B-frame into one larger SR memory for uni-directional ME between key frames. As shown in Fig. 7, the SR of key frame can be enlarged 1.5 times in both B-frame ME SR : H ±64, V ± 32 (12800 bytes) Curr. MB SR : H ±64, V ± 32 (12800 bytes) Fig. 7. The concept of reconfig 2 B-frame SR memory to enlarge key frame s SR. horizontal and vertical directions compared to B-frames. By above two techniques, the proposed adaptive spatial-temporal hierarchical ME can efficiently reduce the external memory BW while limiting the internal SR memory size. 4.2. Bucket Algorithm for FGS The main concept of proposed scan bucket algorithm is to move FGS scan in Fig. 3 from frame level to MB level. Although the coding order is composed of many scans through all blocks in the whole frame, it can be modified to a more convenient way. At first, we decide the which coefficients are coded in each scan of one block. Then, the partial processed data are stored in internal memory, and then the external memory is exploited as transpose memory to conform to the correct coding order. Figure 8 shows the concept of this method following the example in Fig. 4. When block 0 is processed in MB level, the coefficients are analyzed based on the scan rule, and then put {0,0,0,1} into bucket 0, NC end into bucket 4, and A into bucket 5. Then, block 1 to 3 are processed in turn. Once the bucket is full, the data in that bucket are trans-

1 0 Block 0 0 1 B 1 E Block 1 Block 2 Block 3 0 1 G order 0 1 F # # 0 0 D C # A # (#: NC end) Bucket 0 1 2 3 4 5 6 Fig. 8. bucket algorithm corresponding to Fig. 4 Table 1. Specification of the SVC encoder with 3 spatial layer, GOP=16 frames Format QCIF CIF 4CIF Level 0 SR H 48 96 192 SR V 24 48 96 Level 1 SR H 32 64 128 SR V 16 32 64 Level 2 4 SR H 16 32 64 SR V 16 32 64 :TheSR V represents the maximal motion vector range. By centric moving row buffer, the SR V in proposed method is only ±16. MB level level MB level level Bucket Algorithm Binarizer Context Modeller CABAC Binary Arithmetic Encoder Bucket Algorithm Binarizer Context Modeller Fig. 9. Concept of early context modeling Binary Arithmetic Encoder CABAC ferred to external memory. After all blocks are processed, the external memory is accessed according to scan-by-scan coding order, as shown in Fig. 8. Since the data are well arranged in MB level, external data access becomes regular and simple. To further improve the external BW reduction, we propose the early context modeling as shown in Fig. 9. The binarizer and context modeller are moved from frame level to MB level. In entropy coding of FGS enhancement layers, context modeling of residual headers requires the information of adjacent (above and left) MBs, which is difficult to retrieve in frame level. This cumbersome problem can be solved by early context modeling in MB level, where relative positions between MBs are explicit. Please note that the early context modeling technique maintains the same context of entropy coding and has no quality degradation. 5. CASE STUDY In this section, for simplicity, we will discuss the performance of adaptive spatial-temporal hierarchical ME and scan bucket algorithm, respectively. First, we take a SVC structure of 3 spatial layers as example to evaluate the BW reduction performance of adaptive spatial-temporal hierarchical ME. Table 1 shows the specification of searching range of the SVC encoder in this case study. The conventional method is level C data reuse scheme for FSBMA in section 3. Table 2 shows the comparisons of internal SR memory size requirement and external memory BW between Level C [6] and proposed adaptive spatial-temporal hierarchical ME with centric moving row buffer. The number in Table 2 can Table 2. Comparison of memory requirements Level C Proposed Max. SR mem. size QCIF 9.2 9.2 (KBytes) CIF 25.6 25.6 4CIF 86.5 29.7 External mem. BW QCIF 4.47 4.47 (MBytes/sec) CIF 29.84 29.84 4CIF 215.17 78.55 Total 249.48 112.86 :The32 32 refinement SR regions are included. The number of loaded refinement blocks are derived from simulation. be easily derived from the equations in section 3. Please The maximal required memory size and external memory BW for two methods are the same in QCIF and CIF format. But in 4CIF format, the centric moving row buffer limits the vertical SR and applies 32 32 refinement block to maintain coding efficiency. As shown in Table 2, the required memory size of proposed method for 4CIF is only 34% compared to Level C scheme and quite similar to the requirement of CIF format. On the other hand, the external memory BW can be also largely reduced 63% in 4CIF and 55% for whole 3 spatial layers. Besides, as shown in Fig. 10, the proposed method has almost the same quality compared to FSBMA ME. Figure 11 shows the reduction of external memory bandwidth from simulation of 4CIF 704 576 30Hz sequences with GOP=16 frames and only one spatial layer. The result of direct implementation in Fig. 11 can be derived in Sec. 3. The results of proposed method are simulated by verilog-xl and averaged by many standard sequences. The number beside the bar is the reduced percentage out of its above bar. Totally, at most 88% bandwidth reduction is attained when scan bucket algorithm and early context modeling are both used. 6. CONCLUSION A bandwidth-efficient encoder VLSI architecture framework for H.264/AVC scalable extension is presented in this paper.

42 41 40 Soccer 4CIF JSVM 7.13 GOP=16, Qp=48/40 for QCIF/CIF Direct implementation bucket algorithm (80% reduction) 178 Mbyte/sec 36 Mbyte/sec PSNR (db) 39 38 37 JSVM7.13 w. full search Early context modeling with context reduction (41% reduction) 21 Mbyte/sec External memory bandwidth PSNR (db) 36 35 34 42 41 40 39 38 37 36 Proposed w. refine range 4 0 2000 4000 6000 8000 10000 12000 Bitrate(KB/s) (a)soccer Crew 4CIF JSVM 7.13 GOP=16, Qp=48/40 for QCIF/CIF JSVM 7.13 w. full search Proposed w. refine range 4 0 2000 4000 6000 8000 10000 Bitrate(KB/s) (b)crew Fig. 10. PSNR comparison of proposed adaptive spatialtemporal hierarchical ME and full search algorithm on (a) Soccer (b) Crew 4CIF 30fps with GOP=16 by JSVM 7.13. The main differences of SVC from non-scalable H.264/AVC in encoder hardware design are analyzed. To reduce the memory issues induced by hierarchical B-frames and FGS, two main hardware-oriented algorithms, adaptive spatial-temporal hierarchical ME and scan bucket algorithm, are introduced. The former one utilized the inter-layer information of SVC pyramid structure and the proposed centric moving row buffer to limit the internal SR memory size and efficiently reduce the external BW about 55%. The scan bucket algorithm can avoid the irregular and frame-level memory access of FGS and changes it into regular MB-level access to reduce external memory BW about 88% compared to direct implementation. The concept of proposed bandwidth-efficient encoder framework can be further extended to other high definition SVC encoder designs in the future. Fig. 11. External memory bandwidth reduction of proposed FGS scan bucket algorithm with the specification of 4CIF 30Hz sequence. 7. REFERENCES [1] ISO/IEC JTC1, Joint Draft 8 of SVC Amendment, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, Doc. JVT-U201, Oct. 2006. [2] ISO/IEC JTC1, Joint Scalable Video Model 8.0, ISO/IEC JTC1/SC29/WG11 and ITU-T SG16 Q.6, Doc. JVT-U202, Oct. 2006. [3] Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, Analysis of hierarchical B pictures and MCTF, in Proc. IEEE International Conference on Multimedia and Expo, 2006, pp. 1929 1932. [4] C.-T. Huang, C.-Y. Chen, Y.-H. Chen, and L.-G. Chen, Memory analysis of VLSI architecture for 5/3 and 1/3 motion-compensated temporal filtering, in Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. [5] C.-Y. Chen, C.-T. Huang, Y.-H. Chen, C.-J. Lian, and L.-G. Chen, System analysis of VLSI architecture for motion-compensated temporal filtering, in Proc. IEEE International Conference on Image Processing, 2005. [6] J.-C. Tuan, T.-S. Chang, and C.-W. Jen, On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture, IEEE Trans. CSVT, vol. 12, no. 1, pp. 61 72, Jan. 2002. [7] Y.-W. Huang and et al., A 1.3tops H.264/AVC singlechip encoder for HDTV applications, in Proc. of IEEE ISSCC, 2005, pp. 128 588.