ALGORITHM AND ARCHITECTURE DESIGN FOR INTRA PREDICTION IN H.264/AVC HIGH PROFILE

Similar documents
A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

MPEG-4: Simple Profile (SP)

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

Implementation and analysis of Directional DCT in H.264

Video Compression An Introduction

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder

H.264 / AVC (Advanced Video Coding)

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC

Advanced Video Coding: The new H.264 video compression standard

High Efficiency Video Coding. Li Li 2016/10/18

Video Coding Using Spatially Varying Transform

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

High Efficient Intra Coding Algorithm for H.265/HVC

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

AVS VIDEO DECODING ACCELERATION ON ARM CORTEX-A WITH NEON

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

H.264 STANDARD BASED SIDE INFORMATION GENERATION IN WYNER-ZIV CODING

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

Xin-Fu Wang et al.: Performance Comparison of AVS and H.264/AVC 311 prediction mode and four directional prediction modes are shown in Fig.1. Intra ch

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Multimedia Decoder Using the Nios II Processor

An Efficient Mode Selection Algorithm for H.264

H.264/AVC BASED NEAR LOSSLESS INTRA CODEC USING LINE-BASED PREDICTION AND MODIFIED CABAC. Jung-Ah Choi, Jin Heo, and Yo-Sung Ho

Advanced Encoding Features of the Sencore TXS Transcoder

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Lecture 13 Video Coding H.264 / MPEG4 AVC

ALMA TECHNOLOGIES VIDEO ENCODING & IMAGE COMPRESSION PRODUCTS CATALOG. Copyright 2012 ALMA TECHNOLOGIES S.A. All rights reserved.

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR.

VHDL Implementation of H.264 Video Coding Standard

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Selected coding methods in H.265/HEVC

High Efficiency Video Coding (HEVC) test model HM vs. HM- 16.6: objective and subjective performance analysis

Using animation to motivate motion

Wireless Communication

CMPT 365 Multimedia Systems. Media Compression - Video

4G WIRELESS VIDEO COMMUNICATIONS

FRAME-RATE UP-CONVERSION USING TRANSMITTED TRUE MOTION VECTORS

Reduced Frame Quantization in Video Coding

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

Digital Video Processing

COMPLEXITY REDUCTION FOR VP6 TO H.264 TRANSCODER USING MOTION VECTOR REUSE JAY R PADIA. Presented to the Faculty of the Graduate School of

10.2 Video Compression with Motion Compensation 10.4 H H.263

EE 5359 H.264 to VC 1 Transcoding

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

Multimedia Standards

1740 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 7, JULY /$ IEEE

Image and Video Coding I: Fundamentals

A Survey on Early Determination of Zero Quantized Coefficients in Video Coding

Complexity Reduction Tools for MPEG-2 to H.264 Video Transcoding

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Performance Comparison between DWT-based and DCT-based Encoders

The Basics of Video Compression

Comparative Study of Partial Closed-loop Versus Open-loop Motion Estimation for Coding of HDTV

LOW BIT-RATE INTRA CODING SCHEME BASED ON CONSTRAINED QUANTIZATION AND MEDIAN-TYPE FILTER. Chen Chen and Bing Zeng

High Efficiency Video Coding: The Next Gen Codec. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

1924 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 12, DECEMBER 2011

ISSN: An Efficient Fully Exploiting Spatial Correlation of Compress Compound Images in Advanced Video Coding

NEW CAVLC ENCODING ALGORITHM FOR LOSSLESS INTRA CODING IN H.264/AVC. Jin Heo, Seung-Hwan Kim, and Yo-Sung Ho

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

An HEVC Fractional Interpolation Hardware Using Memory Based Constant Multiplication

The Scope of Picture and Video Coding Standardization

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

A Parallel Transaction-Level Model of H.264 Video Decoder

FPGA Implementation of Intra Frame for H.264/AVC Based DC Mode

AN ENHANCED CONTEXT-ADAPTIVE VARIABLE LENGTH CODING FOR H.264.

The new Hybrid approach to protect MPEG-2 video header

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Updates in MPEG-4 AVC (H.264) Standard to Improve Picture Quality and Usability

Stereo Image Compression

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile

Content-Based Adaptive Binary Arithmetic Coding (CABAC) Li Li 2017/2/9

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

RATE DISTORTION OPTIMIZATION FOR INTERPREDICTION IN H.264/AVC VIDEO CODING

In the name of Allah. the compassionate, the merciful

Introduction to Video Compression

PREFACE...XIII ACKNOWLEDGEMENTS...XV

(Invited Paper) /$ IEEE

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

Area and Power efficient MST core supported video codec using CSDA

Overview, implementation and comparison of Audio Video Standard (AVS) China and H.264/MPEG -4 part 10 or Advanced Video Coding Standard

Fast frame memory access method for H.264/AVC

Transcription:

ALGOIH AND ACHIECE DEIGN FO INA PEDICION IN H.264/AC HIGH POFILE zu-der Chuang, Yi-Hau Chen, Chen-Han sai, Yu-Jen Chen, and Liang-Gee Chen DP/IC Design Lab., Graduate Institute of Electronics Engineering, National aiwan niversity, aipei, aiwan Email: {peterchuang, ttchen, phenom, yjchen, lgchen}@video.ee.ntu.edu.tw ABAC In this paper, we propose a novel two-stage intra prediction algorithm and hardware architecture that can support H.264/AC high profile for 1080p HD size. he proposed DC-based open-loop intra prediction algorithm can parallel predict each sub block with nearly no quality drop and skip unnecessary reconstruction loop to meet real-time encoding constrain. ith proposed reconfigurable 8-pixel parallelism processing elements, the proposed architecture can process intra prediction and reconstruction with almost 100% hardware utilization. he proposed architecture was implemented by C 90 nm technology with 100k gate counts at 223Hz. It is the first hardware architecture that can realtime encode 1080p HD intra frame sequence with H.264/AC high profile. 1. INODCION H.264/AC is the newest advanced video coding standards developed by the I- - IO/IEC Joint ideo eam (J). his new coding standard includes variable block size motion compensation, simplified integer transform and, multiple reference frame and context adaptive entropy coding. oreover, a new intra prediction which uses neighboring pixels to predict current coding block is proposed in this standard. he intra prediction can significantly improves the coding performance in intra frame, even when motion estimation fails to find a good match, intra prediction will have a good chance to further reduce the residues. In order to improve coding performance at high end application, a new amendment called the Fidelity ange Extensions (FExt, Amendment I) was added to the H.264 standard, in July 2004. he FExt project brings up a suite of some new profiles collectively called High profiles [1]. he high profiles support all features of the prior ain profile, and additionally support two main coding tools, 8 8 integer transform and 8 8 intra prediction. For inter prediction, macroblocks (Bs) are allowed to be encoded by 8 8 integer transform when the block size of current sub B mode 0 (vertical) 1 (horizontal) 2 (DC) 3 (diagonal down-left) 4 (diagonal down-right) 5 (vertical-right) 6 (horizontal-down) 7 (vertical-left) 8 (horizontal-up) Neighboring pixels of already reconstructed blocks ean (A...H,...) Current predicting pixels Fig. 1. Illustration of nine 8x8 luma prediction mode is larger than 8 8. For intra prediction, high profile introduces another 8 8 spatial luma prediction by extending the concepts of Intra 4x4 prediction as shown in Fig. 1. imilar to Intra 4x4, Intra 8x8 has eight different direction prediction modes and one DC prediction mode. he luma values of each sample in a given 8 8 block are predicted from neighboring reconstructed reference pixels base on prediction modes. Besides, a distinguishing element of Intra 8x8 prediction is a two-order binormal pre-filtering process on the boundary reference pixels before prediction step. It should be noticed that the transform of Intra 8x8 prediction must use the new 8 8 integer DC transform addressed by high profile. his new intra coding tool can improve I-frame coding efficiency significantly [2]. everal researches have been published on H.264 intra hardware architecture [3, 4, 5, 6]. hese previous architectures focus on main profile intra prediction which only contains Intra 4x4 and Intra 16x16 mode, and none of them can support Intra 8x8 mode. In order to support Intra 8x8 mode into hardware architecture, several problems should be solved,

0 1 4 5 2 3 6 7 8 9 12 13 10 11 14 15 0 2 1 3 PN (db) Optis @ 1280x720 30fps Fig. 2. Zig-zag scan order for 4x4 block and 8x8 block such as critical data dependency, insufficient operating cycles, larger memory requirement, and unbalance computation load. o solve these problems, we firstly present an efficient twostage intra prediction hardware architecture that can support H.264 high profile at 1080p HD size 30 fps spec. 2. DEIGN CHALLENGE OF H.264/AC HIGH POFILE INA PEDICION In previous H.264/AC designs [3, 4, 5, 6], intra prediction for baseline and main profile are well-developed for D1 and HD 720p specification. However, there are two main design challenges which lower the efficiency of above designs after Intra 8x8 of high profile being taken into consideration. he first issues comes from the data dependency of intra prediction between each sub-block. Based on H.264/AC standard definition for Intra 4x4 and Intra 8x8 modes, each sub-block should be processed by the zig-zag scan order as shown in Fig. 2, and the 13 or 25 reconstructed pixels are required for prediction as shown in Fig. 1. ince the reconstructed pixels can be only available until the neighboring blocks are predicted and reconstructed, each sub-block should be processed sequentially. In uh s [5] and Ku s [4] design take about 1000 cycles to process one B s intra prediction to meet HD 720p specification without Intra 8x8. However, the above designs will require much higher operating frequency when Intra 8x8 of high profile is taken into consideration. he other issue is the throughput and hardware utilization. For Intra 4x4 and Intra 16x16 mode, four pixel parallelism is usually adopted [3, 4, 6]. But for Intra 8x8 mode, eight pixel parallelism should be applied due to 8 8 transform. he mismatch throughput of different intra modes should be unified for intra predictor generator, transform, and reconstruction to improve the hardware utilization and processing capability. 3. POPOED HADAE OIENED INA PEDICION ALGOIH 3.1. DC-based AD Cost Function for ode Decision In H.264/AC reference software, Hadamard transform is involved to calculate AD for mode decision. It is because Hadamard transform has advantages of less computation complexity and no scaling effect. However, in H.264/AC encod- DC-based with Intra_16x16 DC-based without Intra_16x16 Proposed wo-tage 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 Bitrate(Kbits) PN (db) aven @ 1280x720 30fps DC-based with Intra_16x16 DC-based without Intra_16x16 Proposed wo-tage 0 10000 20000 30000 40000 50000 Bitrate(Kbits) Fig. 3. Performance comparison of different algorithm on 720p sequence, Optis and aven by 300 I-frame, CABAC encoding ing process, the DC transform is adopted for reconstruction loop. In order to estimate bitrate more accurately, we use 4 4 and 8 8 integer DC as cost function for Intra 4x4, Intra 16x16 and Intra 8x8 prediction mode decision to approximate the effect of transform and quantization in H.264 encoding process. As mentioned in [4], the scaling effect after DC should be taken into account. In Intra 4x4 mode, we choose the simplified scaling matrix in (1) for simplifying hardware architecture, where D is the 4 4 integer DC matrix. For Intra 8x8 mode, since all scaling factors are very similar, we do not apply any scaling matrix on it. 2 1 2 1 A D() = DD 1 1 1 1 2 1 2 1 /2 (1) 1 1 1 1 As shown in Fig. 3 and Fig. 4 result of J 9.5 Hadamardbased and DC-based with Intra 16x16, our proposed DCbased AD cost function for H.264/AC high profile can improve up to 0.3dB compared to J 9.5. he algorithm of

PN (db) 31 ush Hour @ 1920x1080 30fps DC-based with Intra_16x16 DC-based without Intra_16x16 Proposed wo-tage 0 20 40 60 80 100 Bitrate(bits) PN(dB) intage Car @ 1920x1080 30fps DC-based with Intra_16x16 DC-based without Intra_16x16 Proposed wo-tage 0 50 100 150 200 250 300 0 400 Bitrate(bits) Fig. 4. Performance comparison of different algorithm on 1080p sequence ush Hour and intage Car by 300 I-frame, CABAC encoding 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% High Profile Intra ode Distribution, Optis @ 1280x720 Intra_4x4 Intra_8x8 Intra_16x16 12 16 20 24 28 32 36 40 44 P High Profile Intra ode Distribution, ush Hour @ 1920x1080 12 16 20 24 28 32 36 40 44 P Intra_4x4 Intra_8x8 Intra_16x16 Fig. 5. Intra prediction mode distribution of high profile over 300 I-frames on HD sequence Optis and ush Hour se Original Pixels 0 1 se econstructed Pixels Fig. 6. Illustration of open-loop prediction boundary pixel. result DC-based without Intra 16x16 and Proposed wo- tage will be explained in ec. 3.2 and ec.3.3. 3.2. High Profile Intra Prediction ode Distribution In high profile, Intra 8x8 mode provides a trade-off option between Intra 4x4 and Intra 16x16. he Intra 8x8 has larger prediction size and fewer sub block headers comparing to Intra 4x4 mode, and has more prediction modes than Intra 16x16. herefore we analyze the intra prediction mode distribution of high profile based on our proposed DC-based cost estimation in ec. 3.1. In Fig.??, Intra 16x16 is already replaced by Intra 8x8 at low and medium bitrate and rarely selected at any P. Based on Fig. 3 and Fig. 4, removing Intra 16x16 almost has no quality drop. Hence, we remove Intra 16x16 mode from our hardware design to simplify the hardware architecture and schedule with nearly no quality drop. 3.3. Hardware Oriented Open-loop Intra Prediction In order to improve the processing parallelism limited by data dependency addressed in ec. 2, we propose an open-loop intra prediction scheme to use original pixels instead of reconstructed pixels as boundary pixels for intra predictors. he open-loop prediction concept is based on that original pixels are close to reconstructed pixels in our target high definition application which PN is often greater than db. For B boundary pixels, the reconstructed pixels are still used because they are already available. ake Intra 4x4 as an example, when block 1 in Fig.6 is processing, it uses the four Predict alue for Left 4x4 ub Block Predict alue for ight 4x4 ub Block Fig. 7. econfigurable luma predictor for two 4x4 blocks.

Current B Buffer 48x64 Assigner C Data Buffer 48x64 Chroma Ext. pper ow Buffer D I F F DC Buffer ultitransform I ode Decision Best ode eg. Entropy Coder Coefficient Buffer 48x64x2 econstructed Pixel Buffer 48x64 Fig. 8. Proposed high profile intra hardware architecture. Open-loop Prediction tage 628 cycles ode Decision Closed-loop econstruction tage aximum 272 cycles 4 288 288 48 6 Load Data L Intra_4x4 Prediction I4 Pred. Blk0 I4 Pred. Blk1 9x4=36 I4 Pred. Blk2 I4 Pred. Blk3 9x8=72 I8 Pred. Blk0 0 4 292 I8 Pred. Blk1 Intra_8x8 Prediction I4 Pred. Blk14 I4 Pred. Blk15 I8 Pred. Blk3 Prediction 580 628 634 I4 Blk0 Blk0 Intra_4x4 ec. ; ec. 17 17 634 766 824 Intra_8x8 ec. or Inter ec. ec. Blk0 I4 Blk7 Blk3 Blk1 ec. I4 Blk15 17 ec. 29 29 906 Intra_4x4 ode Intra_8x8 ode or Inter ode Fig. 9. Proposed open-loop prediction and closed-loop reconstruction two-stage schedule. original pixels (yellow pixels) as its left boundary pixels and the nine reconstructed pixels from upper row B as upper boundary pixels. As shown in Fig. 3 and Fig. 4, our proposed open-loop scheme has very slight quality degradation comparing to closed-loop DC-based intra prediction and is still better than J 9.5. By proposed open-loop scheme, the intra prediction of each sub block can be predicted in parallel without waiting neighboring blocks reconstruction loop. 4. POPOED HADAE ACHIECE AND CHEDLE 4.1. econfigurable 8-pixel parallelism PE design In order to be consistent with the throughput of 8x8 DC in Intra 8x8 prediction, the parallelism of our architecture is set to be 8 pixels parallel. ince the Intra 8x8 prediction mode is similar to Intra 4x4 prediction, we propose a reconfigurable intra luma predictor generator that can generate eight predictors for Intra 8x8 mode, or eight predictors for two 4x4 sub blocks for Intra 4x4 mode as shown in Fig. 7. Besides, the modified multi-transform in our previous work is adopted[7]. his multi-transform can be configured as two 4x4 DC/IDC or one 8x8 DC/IDC transform for cost estimation and reconstruction. he proposed hardware architecture can unified throughput and improve processing capability with excellent area efficiency by using these reconfigurable 8- pixel parallel PEs as shown in Fig. 8. 4.2. Proposed two stage schedule nlike previous prediction-reconstruction interleaved scheme [3, 4, 5], the schedule of proposed architecture can be divided into two stages, open-loop prediction and closed-loop reconstruction as shown in Fig. 9. In prediction stage, our architecture can process two 4x4 sub blocks in parallel in Intra 4x4 mode and can process next two sub blocks immediately without reconstruction because of open-loop prediction. In this stage, only the best mode for each sub block and total B mode cost are stored. In reconstruction stage, only one mode is selected for reconstruction. If Intra 4x4 mode is selected, luma 4x4 block and chroma 4x4 block will be reconstructed in parallel for higher hardware utilization as shown in Fig. 9. It is because in

able 1. Gate count distribution of proposed intra prediction architecture odule gate count assigner 9.8k predictor 17.3k Chroma 1.3k ulti-transform 27.5k ode decision 6.9k uantization 19.3k Inverse uant. 9.2k F and buffers 9.1k otal 100.4k H.264 decoding process, each 4x4 luma sub block should be reconstructed in the zig-zag scan order in Fig. 2. Once Intra 8x8 mode or inter mode is chosen, it will process only one 8x8 luma sub block at a time. he chroma reconstruction will be executed after four 8x8 luma blocks are done. his architecture can make 8 parallelism PEs to achieve almost 100% hardware utilization and save operating cycles from unmeaningful reconstruction. It only takes 906 cycles to process one B. 4.3. Hardware Implementation esult and Comparison he proposed hardware architecture was designed with erilog HDL code and synthesized by C 90-nm technology. he total gate count is 100.4k and the gate count for each component is listed in able 1. he comparison of previous design is in able 2. Comparing to previous works, our design is the first hardware architecture that can support H.264 high profile intra prediction at 1080p HD spec and can process inter mode reconstruction without extra hardware cost or operating cycles. 5. CONCLION able 2. Comparison of previous works and proposed architecture Design [3] [4] [5] Proposed echnology 0.25um 0.18um 0.um 0.09um ax. operation freq. 55Hz 125Hz 108Hz 223Hz ax. target size 720x480 1280x720 1280x720 1920x1080 Gate count 74.6k 76.8k 192.4k 100.4k On-chip memory 146bit 9728bit 18176bit 12288bit Cycles/B <1300 <1080 <927 <906 Pixel parallelism 4 4 4 8 Decision method DC DC Hadamard DC upport Intra 8x8 No No No Yes upport 8x8 DC No No No Yes upport Inter ec. No No No Yes Freq. for 720p HD N/A 117Hz 108Hz 98Hz Freq. for D1 54Hz Hz 38.5Hz Hz [2] D. arpe and et al., H.264/PEG4-AC fidelity range extensions : ools, profiles, performance, and application areas, in Proc. IEEE ICIP, ept. 2005, vol. 1, pp. 593 596. [3] Y.-. Huang and et al., Analysis, fast algorithm, and LI architecture design for H.264/AC intra frame coder, IEEE rans. on C, vol. 15, no. 3, pp. 8 401, ar. 2005. [4] C.-. Ku and et al., A high-definition H.264/AC intra-frame codec IP for digital video and still camera applications, IEEE rans. on C, vol. 16, no. 8, pp. 917 928, Aug. 2006. [5] K. uh,. Park, and H.Cho, An efficient hardware architecture of intra prediction and /II module for H.264 encoder, EI Journal, vol. 27, 2005. [6] Y.-. Huang and et al., A 1.3tops H.264/AC single-chip encoder for HD applications, in Proc. of IEEE ICC, 2005, pp. 128 588. [7].-D. Chuang, Y.-H. Chen, C.-H. sai, and L.-G. Chen, Analysis and architecture design for multi-transform for H.264/AC high profile, in Interational oc Design Conference, 2006. In this paper, the hardware oriented two stage intra prediction algorithm and hardware architecture is proposed. he proposed algorithm and architecture can break intra prediction data dependency and improve hardware utilization with nearly no quality loss. Comparing to previous works, our design can support more features with lower operation frequency. he proposed architecture is the first hardware architecture that can support H.264/AC high profile intra prediction at 1080p HD 30 fps spec. 6. EFEENCE [1] G. ullivan, P. opiwala, and A. Luthra, he H.264 advanced video coding standard : Overview and introduction to the fidelity range extensions, in PIE Conference on Application of Digital Image Processing II, 2004.