A Dynamic Quality-Adjustable H.264 Video Encoder for Power-Aware Video Applications

Size: px
Start display at page:

Download "A Dynamic Quality-Adjustable H.264 Video Encoder for Power-Aware Video Applications"

Transcription

1 TCSVT A Dynamic Quality-Adjustable H.264 Video Encoder for Power-Aware Video Applications Hsiu-Cheng Chang, Jia-Wei Chen, Bing-Tsung Wu, Ching-Lung Su, Jinn-Shyan Wang, and Jiun-In Guo Abstract This paper proposes a dynamic quality-adjustable H.264 Baseline Profile (BP) video encoder that comprises 470Kgates and 13.3Kbytes SRAM in a core size of 4.3x4.3mm 2 using TSMC 0.13µm 1P8M CMOS technology. Exploiting parameterized algorithms for motion estimation and intra prediction, the proposed design can dynamically configure the encoding modes with the design trade-off between power consumption and video quality for various video encoding applications. In addition, the proposed Basic Unit (BU) based rate control hardware can maintain a constant and stable bit-rate for network video transmission. It achieves real-time H.264 video encoding on CIF, D1, and HD720@30fps with 7mW-to-25mW, 27mW-to-162mW, and 122mW-to-183mW power dissipation in different quality modes. Index Terms Quality-adjustable, H.264, baseline profile, video encoder, HD720 I I. INTRODUCTION SO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) jointly developed the video standard, H.264/AVC [1] for next generation multimedia coding applications. The H.264 video encoder system is composed of various efficient coding techniques, including variable block size motion estimation and motion compensation with precision up to quarter-pixel prediction, various block size (16x16/4x4) intra prediction, in-loop de-blocking filtering and context adaptive entropy coding, which exhibits high coding efficiency by providing more accurate estimation results at the cost of much higher computational complexity [2]. As a result, the computational complexity of H.264 video coding is much higher than those of the previous MPEG standards, which induces the necessity of achieving real-time processing of H.264 video coding through dedicated hardware designs. Manuscript received December 15, 2008; revised March 12, 2009, and May 22, This work was supported by National Science Council of Taiwan under Grant NSC E This paper was recommended by Associate Editor Justin Ridge. H.-C. Cheng, B. -T. Wu and J. -I. Guo are with the Department of Computer Science and Information Engineering, National Chung-Cheng University, Chia-Yi, 621 Taiwan, R.O.C., ( changhsc@cs.ccu.edu.tw; wupt@cs.ccu.edu.tw; jiguo@cs.ccu.edu.tw). J.-W. Cheng and J. -S. Wang are with the Department of Electronics Engineering, National Chung-Cheng University, Chia-Yi, 621 Taiwan, R.O.C., ( 92jiawei@vlsi.ee.ccu.edu.tw; ieegsw@ccu.edu.tw). C. -L. Su is with the Department of Electronics Engineering, National Yunlin University of Science Technology, Yun-lin, Taiwan, R.O.C., ( kevinsu@yuntech.edu.tw) Copyright (c) 2009 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an to pubs-permissions@ieee.org. In addition, improving the hardware efficiency of video coding LSI like MPEG-4/H.264 is a recent design trend in implementing multimedia systems aimed at high-throughput design for high definition (HD) video [3, 8, 9] and low-power design for portable video [4, 5]. They are designed for one specific application for either HD video with high video resolution or portable video with smaller video resolution. Another dual-mode video codec design [10] supporting H.264/MPEG-4 not only satisfied HD720 (1280x720) video encoding, but also achieved the low power consumption. However, it was still designed for middle or high video resolution with good picture quality. All of the dedicated hardware architectures for H.264 video encoders [3, 4, 7, 8, 9, 10] lacked flexibility owned by the programmable multimedia processors to adjust the video qualities by selecting different video coding algorithms in execution. This flexibility is good for the power-aware video applications that need to trade-off the video coding quality and power consumption. Although the design [5] proposed an adaptive power-aware fast motion estimation algorithm in order to trade-off the video coding quality and power consumption by selecting different coding parameters, it was designed to support low and middle video resolutions up to H.264 SDTV (720x480) video encoding for reducing power consumption. Moreover, the state-of-the-art multimedia processors also have performance limitation up to H.264 D1 video encoding [6]. Therefore, in order to achieve both the real-time encoding for high resolution videos and dynamic quality adjustability for power-aware video applications, there are challenges to be overcome for achieving the flexibility in configuring the dedicated hardware architectures for H.264 video encoder. To achieve high throughput rates, low-power consumption, and dynamic quality adjustability, we propose a dynamic quality-adjustable H.264 BP video encoder that supports both versatile video resolutions from QCIF to HD720 and versatile video qualities on the same video (e.g. D1 video) when operating at different clock frequencies. The proposed design exploits the parameterized motion estimation and intra coding algorithms that could be dynamically configured to operate at different quality modes with different computational complexity, which enables the proposed design to be operated at different clock rates to exhibit different power consumption for some specific video resolutions. In addition, we also propose some design techniques to reduce the computational complexity of the key processing modules in H.264 video encoding, including two-stage fast integer motion estimation

2 TCSVT algorithm, fractional motion estimation algorithm with block-size trend prediction, fast luminance intra 4x4 search algorithms (i.e. Context Correlation Search Algorithm and Probability Context Correlation Search Algorithm ), fast chrominance intra search algorithm (i.e. Quarter Macro Block Search Algorithm ), and high-throughput scanning scheme for entropy coding. Furthermore, the compressed multimedia contents nowadays are often transmitted through heterogeneous networks, in which maintaining a constant and stable bit-rate is of great importance to achieve good video quality. The rate control algorithm in H.264 reference software JM [12] consists of three levels: i.e. Group of Picture (GOP) level, frame level, and BU level. Among them, BU level rate control algorithm owns better performance in allocating the constant data bits than the frame level rate control for video streaming. Observing the existing MPEG-4/H.264 video encoder designs [3, 4, 5, 7, 8, 9, 10], there are no BU-based rate control algorithms realized in these encoder hardware architectures. Instead, they used the frame level rate control algorithm. This is due to the strong data dependency of the BU-based rate control algorithm realized in a pipelined encoder, which makes it difficult to be realized in a pipelined H.264 video encoder design without increasing latency induced by the sequential rate control processing requirement. For solving this problem, we propose a BU-based rate control algorithm for the proposed H.264 video encoder by eliminating the data dependency of the original rate control in H.264. The proposed rate control algorithm owns better video quality as compared to JM frame-level rate control and facilitates the hardware realization in H.264 video encoders. Compared to the state-of-the-art H.264 BP video encoder designs, the proposed 470Kgates/13.3Kbytes SRAM H.264 video encoder not only achieves lower gate count and smaller internal memory, but also supports the unique feature of quality adjustability to trade-off the video coding quality and power consumption. Moreover, it achieves adjustable video encoding of 7mW-to-25mW for CIF@30fps, 27mW-to-163mW for D1@30fps, and 122mW-to-183mW for HD720@30fps when being operated at four different quality modes, i.e. QS0, QS1, QS2, and QS3. The QS0 mode has the best quality performance with the highest operating frequency among these four quality modes. On the other hand, the QS3 mode demonstrates the worst quality performance but the least power consumption. The maximum performance of the proposed design achieves encoding HD1080 video@20fps when it is operated at 108MHz in QS3 mode. The rest of this paper is organized as follows. We will present the proposed H.264 video encoder in Section II. In Section III, we will discuss the implementation and verification of the proposed design. In Section IV, we will evaluate the performance of the proposed design as compared to the existing ones. Finally, we conclude this paper in Section V. Fig. 1. Block diagram of the proposed H.264 BP video encoder. II. PROPOSED H.264 VIDEO ENCODER Fig. 1 shows the architecture of the proposed dynamic quality-adjustable H.264 video encoder design. Five key functional modules are optimized in the proposed design, including Integer Motion Estimation (IME), Fractional Motion Estimation (FME), Intra Coding, In-Loop Filter (ILF), and Entropy Coding (EC). To simplify the encoding flow and efficiently eliminate the data dependency, we adopt MacroBlock (MB) level pipelining schedule. There are four pipeline stages in the proposed design with the order of IME, FME, Intra Coding, and EC/ILF. The modules of EC and ILF are located at the same pipeline stage to speed up the performance. To support H.264 video encoding with adjustable qualities, configurations in both the ME and Intra Coding algorithms are provided through the System Controller. The System controller is implemented as a complex FSM. To improve video quality, a pipelined BU-based rate control scheme is used to efficiently allocate the data bits. To reduce internal memory size, a Predictive Data Store Buffer (PDSB) controller is adopted to efficiently access the intermediate data for video encoding through AHB-based SDR memory. In the following, the major design techniques adopted in the proposed design will be illustrated to exhibit the encapsulated high-throughput and dynamic quality-adjustable features. A. Integer Motion Estimation To support the dynamic quality-adjustable coding in the proposed design, we first encounter the problem to develop the flexible algorithms implemented in hardware architecture. Observing the existing IME designs, some of them [13-19] are implemented by the architecture based on the full-search block matching algorithm for doing IME. Although the advantage of the full search method is without any quality loss, it is difficult to achieve the quality adjustability based on the full-search block matching IME architecture. The other IME designs [20-24] are implemented by using fast-search block matching algorithms, like Three Step Search (TSS), Pixel Sub-Sampling, Data-adaptive, and MVFAST & Diamond search algorithms.

3 TCSVT (2,1,1) (2,2,1) (2,3,1) (2,4,1) (3,1,1) (3,1,2) (3,2,1) (3,2,2) (3,3,1) (3,3,2) (3,4,1) (3,4,2) (4,1,1) (4,1,2) (4,1,3) (4,2,1) (4,2,2) Fig. 4. The complexity ratio of the proposed IME algorithm in different configurations over the FSBM algorithm. (4,2,3) (4,3,1) (4,3,2) (4,3,3) (4,4,1) (4,4,2) (4,4,3) (5,1,1) (5,1,2) (5,1,3) (5,1,4) (5,2,1) (5,2,2) (5,2,3) (5,2,4) (5,3,1) (5,3,2) (5,3,3) (5,3,4) (5,4,1) (5,4,2) (5,4,3) (5,4,4) Fig. 2. Proposed HDLFS-IME algorithm.. Fig. 3 Illustration of the proposed HDLFS-IME algorithm with three configurable parameters. These IME designs mainly focus on reducing the computational complexity at the cost of sacrificing video quality in PSNR. However, they lacked of the flexibility for dynamically adjusting the number of search points in the encapsulated ME algorithms. The design [5] proposed a H.264 video encoder architecture along with a Four-Step Search (FSS) IME algorithm in order to trade-off the video coding quality and power consumption by selecting different iteration numbers of initial points in the variable block size FSS algorithm. However, it may suffer more quality drop because of only searching 25 points for doing IME in each reference frame. Therefore, in order to effectively reduce the hardware cost and dynamically adjust the video quality, we propose a low complexity, high flexibility integer motion estimation algorithm called Half-word Down-sample Local Full Search algorithm (denoted as HDLFS-IME algorithm) which achieves both accurate motion estimation and flexible configuration to achieve different quality modes dynamically. The data flow of the proposed HDLFS-IME algorithm is shown in Fig. 2. The illustration of the proposed HDLFS-IME operations with three parameters is shown in Fig. 3. The proposed HDLFS-IME algorithm consists of two stages. The Stage 1 in the proposed HDLFS-IME algorithm performs block matching operations on the variable-rate (determined by Down Sample Rate) 2-D down-sampled search points in the search window to select variable numbers (determined by Candidate Number) of good candidates for the next stage operations (i.e. Stage 2 operations). In the beginning of Stage 1, the current and reference pixels are truncated as half-word data (i.e. 4-bit data instead of 8-bit data) operations to reduce both the memory bandwidth and hardware cost. Down sampling on the search points is performed along both the horizontal and vertical directions starting from the center of the search range. In this stage, only the 16x16 MB partition size is processed to select several candidates. Then, proposed HDLFS-IME algorithm in the first stage sieves out several candidates as the center for doing the local full search operations in the second stage. The Stage 2 in the proposed HDLFS-IME algorithm performs the local variable-range (determined by Local Full Search Range) full-search block-matching operations on the search points around the selected candidates obtained by Stage 1 in the proposed algorithm to determine the best motion vectors for all the 41 block sizes in IME. In the following, the complexity analysis on both the proposed HDLFS-IME algorithm and he Full Search Block Matching IME algorithm (denoted as FSBM-IME algorithm) is considered. The total number of search points for one MB using the FSBM-IME algorithm (denoted as SP FSBM ) is shown in equation (1), where the search range is [-n, n]. Using the proposed HDLFS-IME algorithm, the number of search points for one MB in the first stage (denoted as SP HDLFS ) is shown in equation (2), and the number of search points in the second stage is shown in equation (3). The indexes z, y, and m are SP SP SP SP FSBM ( 2 n 1) 2 HDLFS _ st1 ( n / z 2 1) SP HDLFS FSBM 2 HDLFS _ st2 y ( 2m 1) ( n/ z 2 1) y(2m 1) 2 (2 n 1) 2 denoted as down sampling ratio, number of candidates, and the local search range [-m, m], respectively. The complexity ratio of the proposed HDLFS-IME algorithm over the FSBM-IME algorithm is shown in equation (4). Fig. 4 shows the complexity ratio plot of the proposed IME algorithm in different (z, y, m) configurations over the FSBM-IME algorithm, where n is equal to 16. For example, the complexity ratio is 9.6% while indexes (n, z, y, m) equal to (16, 5, 2, 2). The reduction of computational complexity using the proposed HDLFS-IME algorithm can achieve about 90.4% as compared to FSBM-IME algorithm.the major configurable parameters of the proposed HDLFS-IME algorithm are listed in Table I. 2 2 (1) (2) (3) (4)

4 TCSVT Fig. 5. The comparison of PSNR performance using the proposed algorithm in different configurations with JM full search algorithm at QCIF 256/512 kbit/s sequence Fig. 6. The comparison of PSNR performance using the proposed algorithm in different configurations with JM full search algorithm at CIF 768/1024 kbit/s sequences Fig. 7. The comparison of PSNR performance using the proposed algorithm in different configurations with JM full search algorithm at SDTV 1536/2048 kbit/s sequences. TABLE I THE CONFIGURABLE IME PARAMETERS IN THE PROPOSED DESIGN Fig. 8. The comparison of PSNR performance using the proposed algorithm in different configurations with JM full search algorithm at HD720p 3072/7168 kbit/s sequences. According to those parameters, we can use different configurations to reach different requirements for various applications. For example, the portable devices usually need low power and acceptable quality. We can use the larger down-sample ratio, less number of candidates, and smaller local full search range to reduce the power consumption. For the high definition television applications, the delicate image quality is the major design consideration. Then, we can use a smaller down-sample ratio, a more number of candidates, and a larger local full search range to generate accurate motion vectors in video encoding. Fig. 5 - Fig. 8 show the PSNR performance under different HDLFS-IME parameters and different bit rates compared to FSBM-IME in JM9.3 with the following settings: Intra Period: 1I29P, Search Range: 32x32, Reference frame number: 1, Rate Distortion Optimization (RDO): Off, Rate Control: Frame-based rate control. The test sequences consist of Fig. 9. The block diagram of the proposed IME architecture based on the HDLFS-IME algorithm. QCIF(176x144), CIF(352x288), SDTV(720x480), and HD720(1280x720) video formats. From the figures, we conclude the following facts. First, the larger Down Sample Ratio is, the larger PSNR drop is. Second, the more Candidate Number is, the smaller PSNR drop is. Third, the larger Local Full Search Range is, the smaller PSNR drop is. Fig. 9 shows the proposed IME architecture based on the HDLFS-IME algorithm. Beneficial from the low complexity

5 TCSVT st stage 2nd stage (a) (b) Fig. 10. (a) The architecture of the proposed processing element (PE); (b) The architecture of the proposed absolute difference (AD) module. feature of the proposed HDLFS-IME algorithm, the proposed IME architecture owns the feature of low hardware cost with good video quality. It also provides the feature of dynamic quality adjustability through the intelligent address generator and stage controllers which provide configurable parameters to configure the hardware architecture. In addition, we share the same hardware architecture in doing the IME operations of both stages according to the HDLFS-IME algorithm in the proposed design for the purpose of reducing hardware cost. The IME operations of the first and second stages in the HDLFS-IME algorithm are similar to each other, except the adoption of bit-truncation and pixel down sampling in the first stage. In the first stage, two processing elements (PEs) in SAD Calculator are used to process two 4x4 blocks at one cycle, so eight cycles are required for calculating the SAD of one MB, as shown in Fig. 10 (a). The dedicated PE architecture contains sixteen absolute difference (AD) modules. The input pixels of the current data and reference data are separated into two 4 bits for realizing pixel bit-truncation. Thus, the AD module in PEs can process two truncated pixels, such as MSB1 and MSB2 of the reference data in Fig. 10 (b), which can double the throughput rate. In the second stage, the reference pixels and current pixels will not be truncated to ensure the accuracy of SAD. After generating sixteen SADs for 4x4 blocks, the Mode Generator will generate the SADs for all the other modes, such as 16x16, 16x8, 8x16, 8x8, 8x4, and 4x8. Using TSMC 0.13um CMOS technology with the frequency constraint of 150MHz, the proposed design costs 59.8K gates, and requires 4Kbits Current pixel buffer and 22Kbits Reference pixel buffer for supporting search ranges of [-16, 16]. B. Fractional Motion Estimation In H.264 reference software JM9.3, the FME algorithm adopts the 2-step block matching operations to examine every candidate block in the search range, as shown in Fig. 11. The center integer pixel is the IME selected best-matched point of each MB partition. First, the 8 half pixels around the center pixel will be processed for selecting the best-matched half-precision pixel. Then, the 8 quarter pixels will be searched around the best-matched half-precision pixel to get the best-matched quarter-precision pixel of each partition. After the best-matched quarter-precision pixels of each MB partition Integer pixels Half pixels Quarter pixels Fig. 11. Sketch map of 2-step block matching search of FME in JM9.3. are selected, every MB combinational types will be examined and get the best-matched combination. Most existing FME designs [26-28] adopt this 2-step block matching algorithm for hardware realization by either adopting the array processing for high throughput rate or exploiting data reuse in the partial summation of absolute difference (SAD) for reducing the hardware cost. Another FME design [29] is based on an algorithm called A Single Iteration Fractional Motion Estimation Algorithm (SIFME), which totally searches for six candidates including two square points and 4 triangle points. No matter what kind of FME algorithms they used, the major idea is to predict the fractional motion vector direction in the first stage, and then search the several points around the best candidate during the first stage. They used [-0.75, +0.75] as the FME search range along both the X- and Y-directions. As previously described, we propose the HDLFS-IME algorithm to calculate the integer motion vectors for a MB. In Stage 2 of the HDLFS-IME algorithm, we search around the integer candidates with a local full search block matching operation. If we assume the best integer motion vector has been found in IME stage, the FME stage just needs to take [-0.5, +0.5] as the search range around this best integer candidate. It is not necessary to search an area larger than [-0.5, +0.5]. From the analysis shown above, in order to reduce the complexity in doing FME, we propose a Cluster Selection Fractional Motion Estimation algorithm (denoted as CS-FME algorithm) with the [-0.5, +0.5] search range. The proposed CS-FME algorithm adopts the integer motion vector (IMV) obtained from the proposed HDLFS-IME algorithm as the search center, and takes [-0.5, +0.5] as the search range along both X- and Y-directions. With full-search block matching algorithm, we perform the fractional motion estimation according to these 25 candidate points in each block mode. Fig. 12 shows Cluster Selection and Block Size Trend Prediction (BSTP) in the proposed CS-FME algorithm. The FME operations on the 41 modes of IME motion vectors are separated into two clusters, i.e. Cluster 1 (i.e. doing the FME operations on 16 16, 16 8, 8 16, and 8 8 block sizes) and Cluster 2 (i.e. doing the FME operations on 8 4, 4 8, and 4 4 block sizes). The adjustable qualities are provided by selecting different numbers of clusters for doing FME. In the proposed

6 TCSVT Fig. 12. Cluster Selection and BSTP algorithm in the proposed CS-FME algorithm. Fig. 14. Comparison of PSNR performance with JM when using Cluster 1 + Cluster 2 in the proposed CS-FME algorithm. Fig. 13. Processing candidate points in the proposed CS-FME algorithm. CS-FME algorithm, a BSTP scheme is adopted to skip the unnecessary FME operations on the IME modes of Cluster 2 if the IME cost in mode is lower than that in 8 8 mode. Fig. 13 shows the 25 candidate points in the proposed CS-FME algorithm. Compared to JM9.3, the PSNR performance of the proposed CS-FME algorithm is shown in Fig. 14 for Cluster 1+Cluster 2 and Fig. 15 for Cluster 1 only. It has been mentioned previously that the FME search algorithm in JM9.3 is 2-stage search algorithm and the Lagrangian mode decision is adopted to determine the best MB partition. In addition, the test sequences consist of the video resolutions for QCIF, CIF, SDTV, and HD720. The other settings are shown as below: 1) search range: 32x32, 2) Intra period: 30 (1I29P), 3) RDO: Off, 4) Reference frame number = 1, 5) RC algorithm: Frame-based, 6) Bit rate are 128 ~ 512(kbits/s) for QCIF, 512 ~ 896(kbits/s) for CIF, 1024 ~ 1792(kbits/s) for SDTV, and 2048 ~ 2816(kbits/s) for HD720. Fig. 14 and fig. 15 show that the proposed CS-FME algorithm owns the better PSNR performance when the video resolution is larger. The architecture of the proposed quality adjustable CS-FME design is shown in Fig. 16. The MV/Mode SRAM stores the 41 integer motion vectors (IMVs) from IME. The Quality Adjustable Controller controls the order of the FME for different block-size partitions and performs mode selection decided by cluster parameter. The MV Cost Calculator can be divided into three parts. The first part is to calculate the cost of the reference frame number. This cost is always zero because we only support one reference frame. The second part calculates the cost of the motion vectors. The IMVs from MV/Mode SRAM will be transmitted to MV Cost Calculator to Fig. 15. Comparison of PSNR performance with JM when only using Cluster 1 in the proposed CS-FME algorithm. Fig. 16. Architecture of the proposed CS-FME design. calculate the cost of the motion vectors, which is corresponding to the 25 fractional pixels in the search range. The third part is used to calculate the SATD to get the cost of difference pixels. In this part, we first calculate the half and quarter pixels by interpolation unit. Then, we use these interpolated data to calculate the SATD. In order to generate the interpolated data efficiently for increasing the throughput, the Interpolation Unit can perform operations in dual directions and the SATD architecture can process 8x4 blocks instead of 4x4 blocks. When the Interpolation Unit performs the operations on 8x4 blocks, the horizontal interpolation and the vertical interpolation filters can process 14 integer pixels and 4 integer pixels in one cycle, respectively. This is why the FME Search Window SRAM (SWS) has to be partitioned into 6 banks with 4 pixels in a bank. Fig. 17 shows the example of the filtering operations between Interpolation Unit and Search Window SRAM (SWS). After the fractional interpolation operations, we use these interpolated data to perform the SATD calculation. Fig. 18 shows the architecture of the SATD Calculator. It

7 TCSVT Horizonta l pixels SWS Bank(i) SWS Bank(i+1) SWS Bank(i+2) SWS Bank(i+3) SWS Bank(i+4) Fig. 17. Example of filtering operations between Interpolation Unit and FME Search Window SRAM (SWS). Fig. 19. Proposed low complexity search algorithms for quality adjustable intra coding. Fig. 18. Architecture of SATD Calculator for 8x4 blocks. contains 20 Processing Elements (PEs) classified into two parts, i.e. 10 PEs for processing the left 4x4 block, and 10 PEs for processing the right 4x4 one. Using this architecture, we can accelerate the processing speed of the 16x16, 16x8, 8x16, 8x8 and 8x4 modes. As for the 4x8 and 4x4 modes, the 10 PEs located at the right side of SATD Calculator will be disabled to reduce the power consumption. Eventually, the Best Mode Selector decides the best combination for the FME. Using TSMC 0.13um CMOS technology with the frequency constraint of 150MHz, the proposed design costs 180.2K gates, and requires 1555 cycles/mb and 557 cycles/mb to realize the FME operations in (Cluster 1 + Cluster2) and Cluster 1, respectively. C. Intra Coding In the literature, there are some H.264 intra encoder designs [30-32] focusing on the optimization of mode decision scheduling or elimination of I16MB/Chroma plane prediction mode to reduce the processing cycles for low complexity intra coding. However, these designs only focus on reducing the intra coding complexity with acceptable video qualities. They do not discuss the possibility to support the flexible intra coding methods for power-aware video applications with the trade-off in video quality and power consumption. In the proposed design, we not only propose a flexible fast intra coding algorithm to dynamically adjust the video quality in terms of configurable parameters to exhibit different power consumption for different applications, but also exploit the common terms among the intra prediction of different modes (including I16MB/Chroma plane mode) to reduce the hardware cost. In order to realize the quality adjustability in hardware with a little quality loss, we propose two search techniques for luminance intra 4x4 mode decisions, i.e. Context Correlation Search Algorithm (CC-SA) and Probability Context Correlation Search Algorithm (PCC-SA) [33]. The CC-SA technique is to take advantages of the spatial correlation of Fig. 20. Comparison of PSNR performance with JM when using CC-SA, PCC-SA, and QMB-SA. intra texture between the current block and neighboring blocks. The CC-SA technique searches 4.8 modes per block in average. In addition, PCC-SA exploits the statistics of intra coding modes in real sequences to only search high probability modes for further reducing complexity. The PCC-SA technique can search less prediction modes (i.e. 3.7 modes per block in average compared to the CC-SA technique). Fig. 19 shows the proposed search algorithm with the detailed search modes. If the upper block and left block are both unavailable, we only do Mode 2 prediction. In addition, the Modes 1, 2, and 8 will be selected as candidates while the upper block is unavailable. The Modes 0, 2, 3, and 7 will be selected as candidates while the left block is unavailable. Then, if the upper block and left block are both available, we will search the modes listed in Fig. 19 according to the different search algorithms and the modes from the upper block & left block. Compared to full search algorithm of intra coding in H.264 reference software JM, adopting CC-SA and PCC-SA reduces 45% and 57% of computational complexity, respectively. For intra prediction on chrominance pixels, a Quarter MB Search Algorithm (QMB-SA) is proposed according to the observation that human eyes are less sensitive to the errors of chrominance pixels than luminance ones. Hence, we only perform the intra prediction on the left-top block instead of all four chrominance blocks in a MB. All of the above proposed Intra Coding search algorithms may cause the quality drop. Therefore, the comparison of each PSNR results for different proposed Intra Coding algorithms and QP is shown in Fig. 20. The simulation conditions are based on three different CIF videos including foreman, mobile, and stefan (300 I-frames) with Hadamard-based SATD mode decision. However, adopting

8 TCSVT Fig. 21. Architecture of the proposed quality adjustable intra coding design. Fig. 23. Example of the data dependence in a pipelined H.264 encoder design. i = 1, 5, 9 j = 1, 2, 3 (a) (b) Fig. 24. (a) Proposed MAD prediction pattern; (b) Proposed real bit prediction. Fig. 22. Proposed Intra Pixel Generator (IPG) architecture. (CC-SA + QMBSA) and (PCC-SA + QMBSA) increase 3.71% and 6.63% of bit-rate in average as compared to the full search in JM. The proposed quality adjustable intra coding architecture consists of Mode Decision Core (MDC) and Texture Coding Core (TTC), as shown in Fig. 21. In the MDC, Intra Pixel Generator (IPG) generates the intra predictors. Then, the SATD and Mode Decision unit compute the residual data to decide the best prediction mode. To support quality adjustability, we realize CC-SA, PCC-SA tables and QMB-SA algorithms in the mode decision controller. To reduce mode decision time, we use 2-D Hadamard transform in SATD calculation. It reduces 32 cycles as compared to the design using 1-D Hadamard transform. To reduce the hardware cost, we integrate the DCT, IDCT and Hadamard transform together into a multi-transform unit and optimize the IPG with the hardware sharing mechanisms including Shared Item Mechanism of Intra Pixel Generator (SIMIPG) and Plane Mode Sharing Mechanism (PMSM) [34]. In addition, although the plane mode in H.264 intra coding is complex, we realize it by sharing the IPG hardware for improving the video quality. Fig. 22 shows the proposed Intra Pixel Generator (IPG) architecture. D. Hardware Rate Control The BU-based rate control in H.264 reference software JM requires strong data dependency in encoding each MB, which causes the needed real bit sizes and Mean Absolute Difference (MAD) values to be unavailable when generating the quantization parameters in the MB-pipelined H.264 encoders. In the literatures, there have been many RC algorithms [35-38] proposed to improve the quality of H.264 JM rate control. However, all of these RC algorithms are implemented by software, and these algorithms need large amount of prediction data or complex RC model to achieve accurate estimation, which makes them difficult to be realized in a pipelined H.264 video encoder design without increasing latency induced by the sequential RC process. For example, as illustrated in Fig. 23, when the MB4 is coded in IME stage, it needs the real bit size and MAD values from the previous MBs (i.e. MB0~MB3) to generate QP. However, MB1, MB2, and MB3 are coded just in Entropy, Intra, and FME stages, respectively. It would cause the problem of unavailable values for real bit size and MAD when the rate control algorithm is realized in a pipelined H.264 encoder design. To solve this problem, we propose a real bit size prediction and MAD prediction. The MAD prediction for the current BU is obtained according to the BU prediction pattern from the previous frame shown in Fig. 24(a). The real bit size prediction is acquired by the proposed Lagrange model to have a good prediction according to the listed formula in Fig. 24(b) when the current BU is still during encoding. The minimize cost (d) is the Inter or Intra Cost from each previous MB shown in Fig. 25. For example, when MB4(n+1) is coded in IME stage, there are no FME Cost and Intra Cost in MB(4n+3). Therefore, we propose Eq. (5) and Eq. (6) to predict FME Cost and Intra Cost for the previous MBs. Then, the decided minimum cost is assigned to the formula of real bit size prediction to get the predicted bit size. The value of Threshold in Eq. (5) and Eq. (6)

9 TCSVT TABLE II COMPARISON OF PSNR AND TARGET BIT-RATE WITH JM FOR THE PROPOSED BU-BASED RC Fig. 25. Data Dependency of Minimum Inter Cost and Minimum Intra Cost. Fig. 26. Block diagram of the proposed RC architecture. is a user-defined parameter to differentiate the slow motion or fast motion. According to the experience from our simulation, the ideal values of Threshold are 8, 8, and 16 for QCIF, CIF, and D1 video resolution, respectively. These two predictions successfully release the data dependency between MB encoding in the proposed H.264 video encoder to achieve the constant and stable bit-rate. This would induce a best effort bitrate utilization so as to achieve good video quality for video streaming applications. Table II shows the simulation results, and the other settings are shown as below: 1) RC algorithm: BU-based, 2) Basic Unit: 1 MB, 3) Intra period: 30 (1I29P), 3) RDO: off, 4) Reference frame number = 1, 5) Search range: 32x32. As shown in Table II, the proposed RC algorithm possesses almost the same video quality as compared to that in JM9.3. Fig. 25 also shows the RC pipelining in the proposed H.264 video encoder system. The first four MBs are assigned the initial QP because of the restriction of the prediction data from four-mb pipeline scheme. When the processed MB is in Entropy stage, the MAD value is calculated by the proposed MAD prediction. Then, Rate Distortion (RD) & MAD models will be updated for the next QP generation, and Lagrange model will be updated for real bit size prediction. Finally, Bit Allocation module and QP Generation will generate the suitable QP values at the beginning of IME stage. Fig. 26 shows the block diagram of the proposed RC architecture. The major block in the proposed architecture is ALU. The ALU consists of seven adders, two multipliers, one 16-cycle sequence divider, one 4-stage pipeline divider, one square-root unit and one QP generator. We adopt the CPU-like architecture to design the proposed RC architecture for sharing the arithmetic operators. Both of the QP generation and updating RC&MAD model are Fig. 27. Proposed high-throughput entropy coding architecture and FDSS scanning scheme. realized by ALU operations to reduce the hardware cost at the expense of the increasing processing cycles. Another major design consideration is to fit the cycle count budget for encoding each MB in the pipelined stage of the H.264 video encoder. The QP generation should be performed before IME for each MB. Therefore, it has 120 cycles to generate QP and has 300 cycles to update RC&MAD model after finishing Entropy Coding for each MB. According to theit only takes about 100 and 260 cycles to finish the tasks of QP generation and updating RC&MAD model, respectively. This performance could fit the requirement in the proposed design as mentioned above. E. ILF & Entropy Coding The ILF is a 4x4-block based architecture with a horizontal-vertical interleaved raster scan order in filtering

10 TCSVT TABLE III CYCLE COUNT FOR THE PROPOSED HDLFS-IME UNDER DIFFERENT COMBINATIONS OF DSR, CN AND LFSR Fig. 28. The example of FDSS scanning scheme. the data in a MB. According to the filtering order, the output order of filtered 4x4 block data is regular for both the address generation and data written to frame buffer located at the external memory using the burst mode access operations. About the filter design in ILF, it is similar to that used in H.264 video decoder [11]. Fig. 27 shows the high-throughput entropy coding (i.e. CAVLC) architecture consisting of Exp-Golomb Coding unit and Residual Engine. The Residual Engine is composed of Scanning Engine and Coding Engine. The processing bottleneck of Residual Engine lies in the residual data scanning, which requires 16 cycles for each 4x4 block, as indicated by the Traditional Scanning Scheme in Fig. 27. In order to remove this bottleneck, we propose a First-One Detecting Scanning Scheme (FDSS) technique implemented in Scanning Engine to fast detect the values of run_before for each non-zero coefficient, as indicated by FDSS in Fig. 27. A simple example for illustrating the FDSS is illustrated in Fig. 28. First, 16 coefficients (i.e. 14-bit) for one 4x4 block are represented in 16 1-bit signals. If the value of coefficient is zero, the signal is set to 0. Otherwise the signal is set to 1. Then, FDSS only scans these 16 1-bit signals to check if there are non-zero values, which not only improves the scanning throughput, but also avoids the increasing latency for critical path. Adopting the FDSS technique contributes about 6 times of throughput improvement as compared to the traditional one. F. Exploiting quality modes in H.264 video encoder The major goal in the proposed design is to achieve the quality adjustability with dynamic parameter configuration. With the proposed flexible quality adjustable algorithms for IME, FME, and Intra Coding modules, like HDLFS-IME algorithm, CS-FME algorithm, and CC-SA & PCC-SA & QMB-SA algorithms in Intra Coding, we can explore the relationship in the mapping of configuration parameters into the different quality modes. First, we analyze the processing cycle of each encoding module in the proposed design. The most complex one in the analysis is the proposed HDLFS-IME algorithm since there are three configuration parameters (i.e. Down-Sample Ratio denoted as DSR, Candidate Number denoted as CN, and Local Full Search Range denoted as LFSR), TABLE IV THE DEFINED QUALITY MODES IN THE PROPOSED DESIGN which induce a lot of combinations. Among the combinations, we have to decide which combinations are selected as the parameters in the quality modes of the proposed design. Table III shows the cycle analysis per MB under the different HDLFS-IME configuration parameters. For example, in Table III, the different columns marked with the same texture stand for their processing cycle are very close, but with different PSNR drop. As a result, we can observe from Table III and Fig. 5 to 8 in section that larger DSR for first estimating the motion vector trend combined with larger LFSR to determine the final motion vectors will have better video quality than the other combinations. Therefore, we adopt the parameters with less PSNR drop with the same timing budget. For the processing cycle of the proposed CS-FME design, adopting Cluster 1 only takes 557 cycles to finish the FME operations on a MB. On the other hand, adopting both Cluster 1 and Cluster2 takes 1555 cycles to finish the FME operations on a MB. On the intra coding, it respectively takes 1112 cycles, 760 cycles, and 626 cycles to finish the intra coding of a MB through full-search algorithm, CC-SA/QMB-SA search algorithm, and PCC-SA/QMB-SA search algorithm. At last,

11 TCSVT TABLE V SCALABILITY IN TRADING OFF DIFFERENT LOCAL MEMORY SIZES AND MEMORY BANDWIDTH OVERHEAD both the Entropy Coding and ILF take about 300 cycles for a MB in average. According to the above analysis, we define the four quality modes (i.e. QS0, QS1, QS2, and QS3) for dynamically configuring the proposed video encoder with frame by frame manner. Table IV shows the mapping of the quality modes to the parameters in the proposed algorithms. By setting the parameters of DSR, CN, LFSR, and numbers of Clusters, the design provides four quality modes in doing IME and FME operations. Exploiting the proposed CC-SA, PCC-SA, and QMB-SA algorithms to perform different numbers of intra-coding modes, we provide quality adjustability of QS0/QS1, QS2, and QS3 in intra-coding. Compared to JM [12], the proposed encoder design achieves 0.15dB, 0.16dB, 0.4dB, and 0.6dB of PSNR loss in average when operating at QS0, QS1, QS2, and QS3, respectively. G. Prediction Data Store Buffer (PDSB) In H.264 video coding, there are many data correlations between the current decoding MB and its neighboring decoded MBs. For example, the entropy coding needs information of Coded Block Pattern (CBP), Motion Vector Difference (MVD) and the upper row of 4x4 blocks in the neighboring MB s. In addition, intra coding needs the reconstructed pixels in the upper MB s. The ILF needs the unfinished filtered pixels in the upper MB s. If all the correlated data are stored in internal memory, there are about 19K/13K bytes of internal memory required for supporting HD1080/HD720 video encoding. For reducing this requirement of local memory in the proposed design, we adopt a DMA-like PDSB design [11] to collect the correlated data in a MB and store them in external memory if they are not used immediately. In the proposed H.264 encoder, in order to avoid the overhead in accessing prediction data through AMBA AHB interface, we always store the prediction data of entropy coding like coded block pattern, motion vector difference and the number of non-zero coefficients in local memory. Other prediction data (i.e. Intra and ILF prediction data) are accessed from the external memory through PDSB scheme and AMBA interface. Of course, there is a trade-off in the size of the required internal memory and the increase in the external memory bandwidth when using the proposed PDSB scheme as shown in Table V. This trade-off provides some design flexibility for system designers to decide which configuration is suitable for the implementation based on the choice of fabricated technology. (a) Fig. 29. (a) FPGA Prototyping of the proposed design; (b) The environment for chip testing. III. DESIGN IMPLEMENTATION In developing the proposed H.264 video encoder, we adopt the Concurrent Versions System (CVS) tool for file version control to improve the communication efficiency during the design process. The detailed implementation of the proposed design is described in the following sub-sections. A. Design flow and methodology The proposed design is implemented in VERILOG Hardware Description Language (HDL) coding with SPRINGSOFT nlint checking, SYNOPSYS logic synthesis, NANOSIM pre-layout simulation, and NANOSIM post-layout simulation according to the TSMC 1P8M 0.13 m CMOS technology. In RTL coding and verification, the code coverage tool, i.e. VN-Navigator, is used to analyze the code coverage of the provided test-benches. In order to speed up integrating the sub-modules of the proposed design, we have built a verification platform containing the proposed design and the bus functional models (BFMs) for RISC processor and external SDR memory for co-simulation. For quickly fixing the bugs encountered in system integration, we compare the intermediate results of each module in the proposed H.264 video encoder with the associated test patterns dumped from the H.264 reference software JM automatically in the proposed verification platform. This verification platform can be used to verify the proposed design in forms of RTL modeling and gate-level modeling. In addition, the assertion-based verification technique is also adopted to facilitate the debug process. The Open Verification Library (OVL) [39] is used as the assertion tool and the errors can be quickly detected and then fixed. B. Prototype of the proposed design In system level verification, we use the board level testing on the proposed design with the help of the ARM-based FPGA platform (FIE8100) from Faraday Inc. In order to verify the functional correctness of the proposed design, we have adopted over 100 testing sequences in different quality modes and parameters to verify the proposed design in FPGA prototyping, as shown in Fig. 29(a). The limitation of system clock (for AMBA and the proposed H.264 encoder) and CPU clock in the ARM-based platform is 40MHz and 200MHz, respectively. (b)

12 TCSVT First of all, we feed in the test sequences (YUV file) into SDRAM from SD card. Then, CPU activates the proposed encoder to start encoding frame by frame and receives/clears interrupts from the proposed encoder when finishing each frame encoding. The reconstructed reference data from the encoder are then written to display memory for 320x240 LCD. Finally, the proposed encoder writes the bit-stream into SD card through AMBA interface. C. Chip testing To measure the performance of the proposed H.264 video encoder chip, we use the chip testing environment shown in Fig. 29(b) to measure the power consumption and maximum operating clock frequency of the proposed design. In this environment, we provide the input data from AMBA interface through pattern generator and obtain the encoded bitstream from the external memory interface to ensure the chip correctness. According to the measurements, the proposed design operates at 10MHz for supporting CIF encoding with 7mW at 0.7V in QS3 mode. The power consumption is 183mW at 1.2V when it is operated at 108MHz for supporting HD720 in QS2 mode. IV. PERFORMANCE ANALYSIS Fig. 30 shows the performance of the proposed design with four quality modes. It can respectively encode D1 and HD720 videos at clock rates of 30/40/60/96MHz and 72/108MHz, respectively, at different quality levels with less than 0.6dB of PSNR loss in average. The feature of adjustable quality allows the design to adjust its encoding quality by trading-off different amounts of power consumption. It achieves different video recording time with finite battery charge when it is used in power adaptive coding applications like portable digital video recorders. Fig. 31 summarizes the chip implementation. The core size is 4.3x4.3mm 2 and includes 470Kgates and 13.3Kbytes of internal memory. The power consumption is 7mW-to-25mW, 27mW-to-163mW, and 122mW-to-183mW for encoding CIF@30fps, D1@30fps, and HD720 video@30fps with different quality modes, respectively. The maximum performance of the proposed design achieves encoding HD1080 video@20fps when it is operated at 108MHz in QS3 mode. Fig. 31 also compares the proposed design with the existing H.264 video encoders [3, 5, 7, 8]. The designs [3, 5, 7] support the H.264 baseline profile video coding tools. The design [8] supports both the H.264 baseline profile and part of the high profile coding tools like Context Adaptive Binary Arithmetic Coding (CABAC), 8x8 blocks, and B-frame coding, which is targeted at HDTV applications. Adopting simplified algorithms for IME and FME, the design [8] exhibits the performance to support HD1080 video encoding with moderate hardware cost and acceptable video quality. The chip micrograph is shown in Fig. 32. In addition to owning the good feature of dynamic quality adjustability, the proposed design owns the features of low hardware cost in terms of about Fig. 30. Quality modes in the proposed design. Fig. 31. Comparison with the state-of-the-art H.264 video encoders. Fig. 32. Chip micrograph and specification. 6~49% reduction in gate-count and 21%~61% reduction in internal memory. When operating at 10MHz for CIF video encoding, its power consumption is only 7mW, which is comparable to the 5mW reported in the state-of-the-art MPEG-4 encoder [4] for CIF video. Moreover, with quality-adjustable flexibility this design can be applied to power-adaptive video coding applications by trading off video quality and power consumption dynamically. V. CONCLUSION AND FUTURE WORKS In this paper, we have presented a dynamic quality-adjustable H.264 video encoder for both high definition and portable video applications. Exploiting the proposed parameterized algorithms for motion estimation and intra coding, the proposed design can dynamically configure the encoding modes with the design trade-off between power consumption and video quality for various video encoding applications. Using the TSMC 0.13 m 1P8M CMOS technology, the proposed design costs 470Kgates/13.3Kbytes

13 TCSVT SRAM and achieves H.264 encoding on and HD720 with different quality modes. The proposed design is much more flexible than the existing H.264 video encoders due to the provided dynamic quality adjustability. The proposed design can be implemented with the more advanced technology like 90nm CMOS to achieve the real-time encoding on video. ACKNOWLEDGEMENT The authors express the immense gratitude to the National Science Council (NSC) and the Chip Implementation Center (CIC) of Taiwan for the financial budget support under grant: NSC E and the chip fabrication support, respectively. REFERENCES [1] Advanced Video Coding, ISO/IEC amd ITU-T Rec. H.264, [2] L. E. G. Richardson, H.264 and MPEG-4 Video Compression - Video Coding for Next-generation Multimedia, JohnWiley & Sons Inc., [3] Y. W. Huang, T. C. Chen, C. H. Tsai, C. Y. Chen, T. W. Chen, C. S. Chen, C. Fu. Shen, S. Y. Ma, T. C. Wang, B. Yu. Hsieh, H. C. Fang, L. G. Chen, "A 1.3TOPS H.264/AVC single-chip encoder for HDTV applications," Proc. IEEE International Solid-State Circuits Conference, pp , Febuary [4] C. P. Lin, P. C. Tseng, Y. T. Chiu, S. S. Lin, C. C. Cheng, H. C. Fang, W. M. Chao, L. G. Chen, "A 5mW MPEG4 SP encoder with 2D bandwidth-sharing motion estimation for mobile applications," Proc. IEEE International Solid-State Circuits Conference, pp , Febuary [5] T. C. Chen, Y. H. Chen, C. Y. Tsai, S. F. Tsai, S. Y. Chien, L. G. Chen, "2.8 to 67.2mW Low-Power and Power-Aware H.264 Encoder for Mobile Applications," Proc. IEEE International Symposium on VLSI Circuits, pp , June [6] "H.264 software IP suite for DSP C64xx" from ATEME, [7] MM5010: "A Fully Hardwired H264 Encoder IP" from MMChips, product_mm5010.html. [8] Y. K. Lin, D. W. Li, C. C. Lin, T. Y. Kuo, S. J. Wu, W. C. Tai, W. C. Chang, and T. S. Chang, " A 242mW 10mm p H.264/AVC High-Profile Encoder Chip," Proc. IEEE International Solid-State Circuits Conference, pp , Febuary [9] Z. Liu, Y. Song, M. Shao, S. Li, L. Li, S. Ishiwata, M. Nakagawa, S. Goto, T. Ikenaga, "A 1.41W H.264/AVC Real-Time Encoder SOC for HDTV1080P," Proc. IEEE International Symposium on VLSI Circuits, pp , June [10] S. Mochizuki, T. Shibayama, M. Hase, F. Izuhara, K. Akie, M. Nobori, R. Imaoka, H. Ueda, K. Ishikawa, and H. Watanabe, "A 64 mw High Picture Quality H.264/MPEG-4 Video Codec IP for HD Mobile Applicationsin 90 nm CMOS," IEEE Journal of Solid-State Circuits, vol. 43, no. 11, pp , November [11] C. C. Lin, J. I. Guo, H. C. Chang, Y. C. Yang, J. W. Chen, M. C. Tsai, J. S. Wang, and J. I. Guo, A 160kGate 4.5kB SRAM H.264 Video Decoder for HDTV Applications, Proc. IEEE International Solid-State Circuits Conference, pp , Febuary, [12] Joint Video Team Reference Software JM9.3, suehring/tml/doc/, June [13] S. Y. Yap and J. V. McCanny, A VLSI architecture for variable block size video motion estimation, IEEE Transactions on Circuits and Systems for Video Technology, vol. 51, no. 7, pp , July 2004 [14] S. Y. Yap and J. V. Mccanny, A VLSI architecture for advanced video coding motion estimation, Proc. IEEE International Application-specific Systems, Architectures, and Processors Conference, pp , June [15] S. Lopez, F. Tobajas, A. Villar, V. de Armas, J. F. Lopez, and R. Sarmiento, Low cost efficient architecture for H.264 motion estimation, Proc. IEEE International Symposium on Circuits and Systems, vol. 1, pp , May [16] M. Kim, I. Hwang, and S. I. Chae, A fast VLSI Architecture for Full-Search Variable Block Size Motion Estimation in MPEG-4/H.264, Proc. Asia and South Pacific Design Automation Conference, vol. 1, pp , January [17] Y. W. Huang, T. C. Wang, B. Y. Hsieh, and L. G. Chen, Hardware architecture design for variable block size motion estimation in MPEG-4 AVC/JVT/ITU-T H.264, Proc. IEEE International Symposium on Circuits and Systems, vol. 2, pp , May [18] T. Komarek, and P. Pirsch, Array architectures for block matching algorithms, IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 10, pp , October [19] L. de Vos, and M. Stegherr, Parameterizable VLSI architectures for the full-search block-matching algorithm, IEEE Transactions on Circuits and Systems for Video Technology, vol. 36, no. 10, pp , October [20] H. M. Jong, L. G. Chen, and T. D. Chiueh, Parallel Architectures for 3-Step Hierarchical Search Block-Matching Algorithm, IEEE Transactions on Circuits and Systems for Video Technology, vol. 4, no. 4, pp , Auguest [21] S. Hamalainen, L. Koskinen, and K. Halonen, A hardware-based predictive motion estimation algorithm, Proc. IEEE International Symposium on Circuits and Systems, vol. 6, pp , July [22] K. B. Lee, H. Y. Chin, H. C. Hsu, and C. W. Jen, QME: an efficient subsampling-based block matching algorithm for motion estimation, Proc. IEEE International Symposium on Circuits and Systems, vol. 2, pp. II-305-8, May [23] H. Y. Chin, C. C. Cheng, Y. K. Lin, and T. S. Chang, A bandwidth efficient subsampling-based block matching architecture for motion estimation, Proc. Asia and South Pacific Design Automation Conference, vol. 2, pp. D/7-D/8, January [24] L. Fanucci, S. Saponara, and L. Bertini, A parametric VLSI architecture for video motion estimation, Integration, the VLSI Journal, vol. 31, no. 1, pp (22), November [25] C. L. Su, Y. C. Yang, C. W. Chen, W. S. Yang, Y. L. Chen, S. Y. Tseng, and J. I. Guo, A Low Complexity High Quality Integer Motion Estimation Architecture Design for H.264/AVC, Proc IEEE Asia-Pacific Conference on Circuits and Systems, pp , December [26] T. C. Chen, Y. W. Huang, L. G. Chen, Fully Utilized and Reusable Architecture for Fractional Motion Estimation of H.264/AVC, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, pp , May [27] C. Yang and S. Goto, High Performance VLSI Architecture of Fractional Motion Estimation in H.264 for HDTV, Proc. IEEE International Symposium on Circuits and Systems, pp , May [28] Y. Y. Wang and C. J. Tsai, An Efficient Dual-interpolator Architecture for Sub-pixel Motion Estimation, Proc. IEEE International Symposium on Circuits and Systems, pp , May [29] T. Y. Kuo, Y. K. Lin, and T. S. Chang, SIFME: Single Iteration Fractional-pel Motion Estimation Algorithm and Architecture for HDTV Sized H.264 Video Coding, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.1, pp , April [30] C. C. Cheng, C. W. Ku, and T. S. Chang, A 1280x720 Pixels 30 Frames/s H.264/MPEG-4 AVC Intra Encoder, Proc. IEEE International Symposium on Circuits and Systems, May 2006 [31] Y. W. Huang, B. Y. Hsieh, T. C. Chen, and L. G. Chen, Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder, IEEE Transactions on Circuits and Systems for Video Technology, pp , March [32] D. W. Li, C. W. Ku, C. C. Cheng, Y. K. Lin, and T. S. Chang, A 61MHz 72K Gates 1280x720 30fps H.264 Intra Encoder, Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol.2, pp. II-801-II-804, April [33] J. W. Chen, C. H. Chang, C. C. Lin, Y. H. Ou Yang, J. I. Guo, and J. S. Wang, A Condition-based Intra Prediction Algorithm for H.264/AVC, Proc IEEE International Conference on Multimedia & Expo, pp , July [34] C. H. Chang, J. W. Chen, H. C. Chang, Y. C. Yang, J. S. Wang, J. I. Guo, A Quality Scalable H.264/AVC Baseline Intra Encoder for High Definition Video Applications, Proc. IEEE Workshop on Signal Processing Systems, pp , October [35] X. Yi, and N. Ling, Rate control using enhanced frame complexity measure for H.264 video, Proc. IEEE Workshop on Signal Processing Systems, pp , October 2004.

14 TCSVT [36] M. Jiang, X. Yi, N. Ling, Improved frame-layer rate control for H.264 using MAD ratio, Proc. IEEE International Symposium on Circuits and Systems, vol.3, pp. III-813-6, May [37] H. Yu, Z. Lin, and F. Pan, An improved rate control algorithm for H.264, Proc. IEEE International Symposium on Circuits and Systems, vol.1, pp , May [38] S. Su, S. Yu, and J. Zhou, An improved Basic-Unit Layer Rate-Control Scheme on H.264, Proc. Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies, pp , December [39] Open Verification Library (OVL) in Bing-Tsung Wu was born in Penghu, Taiwan, R. O. C., in He received the B.S. degrees in Department of Information Management and M.S. degrees in Graduate School of Business and Operations Management from Chang Jung Christian University, Tainan, Taiwan, in 2003 and 2005 respectively. He is currently working toward the Ph.D. degree at the Graduate Institute of Computer Science and Information Engineering, National Chung Cheng University. His research interests include video processing, VLSI architectures, digital IP design and video rate control.. design. Hsiu-Cheng Chang was born in Tainan, Taiwan, R. O. C., in He received the B.S. and M.S. degrees in Department of Computer Science and Information Engineering from National Chung Cheng University, Chia Yi, Taiwan, in 2003 and 2005 respectively. He is currently working toward the Ph.D. degree at the Graduate Institute of Computer Science and Information Engineering, National Chung Cheng University. His research interests include video processing, VLSI architectures, digital IP design and multimedia SOC Jia-Wei Chen received the B. S. degree in electronics engineering form National Lien Ho Institute of Technology, Miao-Li, Taiwan, in 2003, and M.S. degrees from the electrical engineering at National Chung Cheng University Chia-Yi, Taiwan, in He is currently working toward the Ph.D. degree in electrical engineering at National Chung Cheng University Chia-Yi, Taiwan. His research interests include video processing, very large scale integration architecture design, digital IP design, and silicon-on-chip design. Ching-Lung Su was born in Taipei, Taiwan, R. O. C., in He received the B.S. degree from the Department of Electrical Engineering, Chinese Culture University, Taipei, Taiwan, the M.S. degree from the Graduate Institute of Electronics and Computer Science Engineering, National Yunlin University of Science & Technology, Yunlin, Taiwan, and the Ph.D degree from the Graduate Institute of Electronics Engineering, National Chiao Tung University, Hsinchu, Taiwan, in 1994, 1996, and 2003, respectively. In 2004, he joined the Department of Electronics Engineering, National Yunlin University of Science & Technology, as an assistant professor. During 2007 to 2008, he is on leave from National Yunlin University of Science & Technology and serves as the Technical Deputy Director of the Processor and Application Division, SoC Technology Center (STC), Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan. Since 2008, he is also a consultant of STC/ITRI. His research interests include embedded software for SoC, video signal processing, digital IC architecture design, multi-core embedded system and multimedia digital signal processor design. Jinn-Shyan Wang (S 85-M 88) was born in Taiwan, R.O.C., in He received the B.S. degree in electrical engineering from the National Cheng-Kung University, Tainan, Taiwan, in 1982 and the M.S. and Ph.D. degrees from the Institute of Electronics, National Chiao-Tung University, Hsinchu, Taiwan, in 1984 and 1988, respectively. He was with Industrial Technology Research Institute (ITRI) from , engaged in ASIC circuit and system design, and became the Manager of the Department of VLSI Design. He joined the Department of Electrical Engineering, National Chung-Cheng University, Chia-Yi, Taiwan, in 1995, where he is currently a full Professor. His research interests are in low-power and high-speed digital integrated circuits and systems, analog integrated circuits, IP and SOC design, and CMOS image sensors. He has published over 20 journal papers and 40 conference papers and holds over 20 patents on VLSI circuits and architectures. Jiun-In Guo was born in Kaohsiung, Taiwan, R.O.C. in He received the B.S. degree and Ph.D. degree in electronics engineering from National Chiao Tung University, Hsinchu, Taiwan, in 1989 and 1993, respectively. He is currently a Professor of the Department of Computer Science and Information Engineering, National Chung-Cheng University, Chiayi, Taiwan. He is now the research distinguished professor of National Chung-Cheng University from 2008 to He joined the System-on-Chip Research Center since March 2003 to start involving in several Grand Research Projects on low-power, high-performance processor design and multimedia IP/SOC design. He was the director of SOC Research Center, National Chung-Cheng University from 2005 to He was an Associate Professor of the Department of Computer Science and Information Engineering, National Chung-Cheng University from 2001 to 2003 and an Associate Professor of the Department of Electronics Engineering, National Lien-Ho Institute of Technology from 1994 to And he was the director of the Department of Electronics Engineering, National Lien-Ho Institute of Technology from 1996 to Dr. Guo was the recipient of the National Science Council (NSC) Research Award in 1996 and He was the recipient of the 2003 MXIC Young Professor Award for his contributions to the course of low-power Multimedia/DSP Silicon IP Design. He was also the recipient of the 2004 Chinese Institute of Electrical Engineering (CIEE) Outstanding Youth Electrical Engineer Award and the recipient of the 2008 Chinese Institute of Electrical Engineering (CIEE) Tai-Chung Section Outstanding Engineering Professor Award to recognize his excellent contributions to R&D and service of electrical engineering. He was also the recipient of the 2006 Outstanding Research Award of National Chung Cheng University. He has published over 120 technical papers on the research areas of low-power and low cost algorithm and architecture design for DSP/Multimedia signal processing applications. His research team has won over 25 IC related student design contest awards from 2003 to His research interests include image, multimedia, and digital signal processing, VLSI algorithm/architecture design, digital SIP design, and SOC design.

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Journal of Computational Information Systems 7: 8 (2011) 2843-2850 Available at http://www.jofcis.com High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC Meihua GU 1,2, Ningmei

More information

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Chuan-Yung Tsai, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute

More information

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 ISSCC 26 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1 22.1 A 125µW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications Tsu-Ming Liu 1, Ting-An Lin 2, Sheng-Zen Wang 2, Wen-Ping Lee

More information

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE 5359 Gaurav Hansda 1000721849 gaurav.hansda@mavs.uta.edu Outline Introduction to H.264 Current algorithms for

More information

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

FPGA based High Performance CAVLC Implementation for H.264 Video Coding FPGA based High Performance CAVLC Implementation for H.264 Video Coding Arun Kumar Pradhan Trident Academy of Technology Bhubaneswar,India Lalit Kumar Kanoje Trident Academy of Technology Bhubaneswar,India

More information

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS Theepan Moorthy and Andy Ye Department of Electrical and Computer Engineering Ryerson

More information

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC 0 Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC Ching-Yeh Chen Shao-Yi Chien Yu-Wen Huang Tung-Chien Chen Tu-Chih Wang and Liang-Gee Chen August 16 2005 1 Manuscript

More information

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46 LIST OF TABLES TABLE Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46 Table 5.2 Macroblock types 46 Table 5.3 Inverse Scaling Matrix values 48 Table 5.4 Specification of QPC as function

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

IN RECENT years, multimedia application has become more

IN RECENT years, multimedia application has become more 578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 17, NO. 5, MAY 2007 A Fast Algorithm and Its VLSI Architecture for Fractional Motion Estimation for H.264/MPEG-4 AVC Video Coding

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

Fast frame memory access method for H.264/AVC

Fast frame memory access method for H.264/AVC Fast frame memory access method for H.264/AVC Tian Song 1a), Tomoyuki Kishida 2, and Takashi Shimamoto 1 1 Computer Systems Engineering, Department of Institute of Technology and Science, Graduate School

More information

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Efficient MPEG- to H.64/AVC Transcoding in Transform-domain Yeping Su, Jun Xin, Anthony Vetro, Huifang Sun TR005-039 May 005 Abstract In this

More information

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING

A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING 2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) A LOW-COMPLEXITY AND LOSSLESS REFERENCE FRAME ENCODER ALGORITHM FOR VIDEO CODING Dieison Silveira, Guilherme Povala,

More information

Fast Mode Decision for H.264/AVC Using Mode Prediction

Fast Mode Decision for H.264/AVC Using Mode Prediction Fast Mode Decision for H.264/AVC Using Mode Prediction Song-Hak Ri and Joern Ostermann Institut fuer Informationsverarbeitung, Appelstr 9A, D-30167 Hannover, Germany ri@tnt.uni-hannover.de ostermann@tnt.uni-hannover.de

More information

Digital Video Processing

Digital Video Processing Video signal is basically any sequence of time varying images. In a digital video, the picture information is digitized both spatially and temporally and the resultant pixel intensities are quantized.

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Complexity Reduced Mode Selection of H.264/AVC Intra Coding Complexity Reduced Mode Selection of H.264/AVC Intra Coding Mohammed Golam Sarwer 1,2, Lai-Man Po 1, Jonathan Wu 2 1 Department of Electronic Engineering City University of Hong Kong Kowloon, Hong Kong

More information

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder

A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder A 4-way parallel CAVLC design for H.264/AVC 4 Kx2 K 60 fps encoder Huibo Zhong, Sha Shen, Yibo Fan a), and Xiaoyang Zeng State Key Lab of ASIC and System, Fudan University 825 Zhangheng Road, Shanghai,

More information

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC Randa Atta, Rehab F. Abdel-Kader, and Amera Abd-AlRahem Electrical Engineering Department, Faculty of Engineering, Port

More information

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen

BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION. Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen BANDWIDTH-EFFICIENT ENCODER FRAMEWORK FOR H.264/AVC SCALABLE EXTENSION Yi-Hau Chen, Tzu-Der Chuang, Yu-Jen Chen, and Liang-Gee Chen DSP/IC Design Lab., Graduate Institute of Electronics Engineering, National

More information

Advanced Video Coding: The new H.264 video compression standard

Advanced Video Coding: The new H.264 video compression standard Advanced Video Coding: The new H.264 video compression standard August 2003 1. Introduction Video compression ( video coding ), the process of compressing moving images to save storage space and transmission

More information

A Dedicated Hardware Solution for the HEVC Interpolation Unit

A Dedicated Hardware Solution for the HEVC Interpolation Unit XXVII SIM - South Symposium on Microelectronics 1 A Dedicated Hardware Solution for the HEVC Interpolation Unit 1 Vladimir Afonso, 1 Marcel Moscarelli Corrêa, 1 Luciano Volcan Agostini, 2 Denis Teixeira

More information

High Efficiency Data Access System Architecture for Deblocking Filter Supporting Multiple Video Coding Standards

High Efficiency Data Access System Architecture for Deblocking Filter Supporting Multiple Video Coding Standards 670 IEEE Transactions on Consumer Electronics, Vol. 58, No. 2, May 2012 High Efficiency Data Access System Architecture for Deblocking Filter Supporting Multiple Video Coding Standards Cheng-An Chien,

More information

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

Reducing/eliminating visual artifacts in HEVC by the deblocking filter. 1 Reducing/eliminating visual artifacts in HEVC by the deblocking filter. EE5359 Multimedia Processing Project Proposal Spring 2014 The University of Texas at Arlington Department of Electrical Engineering

More information

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC Proceedings of the 7th WSEAS International Conference on Multimedia, Internet & Video Technologies, Beijing, China, September 15-17, 2007 198 Reduced 4x4 Block Intra Prediction Modes using Directional

More information

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile

An Efficient Intra Prediction Algorithm for H.264/AVC High Profile An Efficient Intra Prediction Algorithm for H.264/AVC High Profile Bo Shen 1 Kuo-Hsiang Cheng 2 Yun Liu 1 Ying-Hong Wang 2* 1 School of Electronic and Information Engineering, Beijing Jiaotong University

More information

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION Yen-Chieh Wang( 王彥傑 ), Zong-Yi Chen( 陳宗毅 ), Pao-Chi Chang( 張寶基 ) Dept. of Communication Engineering, National Central

More information

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR.

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR. 2015; 2(2): 201-209 IJMRD 2015; 2(2): 201-209 www.allsubjectjournal.com Received: 07-01-2015 Accepted: 10-02-2015 E-ISSN: 2349-4182 P-ISSN: 2349-5979 Impact factor: 3.762 Aiyar, Mani Laxman Dept. Of ECE,

More information

MultiFrame Fast Search Motion Estimation and VLSI Architecture

MultiFrame Fast Search Motion Estimation and VLSI Architecture International Journal of Scientific and Research Publications, Volume 2, Issue 7, July 2012 1 MultiFrame Fast Search Motion Estimation and VLSI Architecture Dr.D.Jackuline Moni ¹ K.Priyadarshini ² 1 Karunya

More information

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN BANDWIDTH REDUCTION SCHEMES FOR MPEG- TO H. TRANSCODER DESIGN Xianghui Wei, Wenqi You, Guifen Tian, Yan Zhuang, Takeshi Ikenaga, Satoshi Goto Graduate School of Information, Production and Systems, Waseda

More information

CAMED: Complexity Adaptive Motion Estimation & Mode Decision for H.264 Video

CAMED: Complexity Adaptive Motion Estimation & Mode Decision for H.264 Video ICASSP 6 CAMED: Complexity Adaptive Motion Estimation & Mode Decision for H.264 Video Yong Wang Prof. Shih-Fu Chang Digital Video and Multimedia (DVMM) Lab, Columbia University Outline Complexity aware

More information

TRADITIONALLY, architectural design focuses more on

TRADITIONALLY, architectural design focuses more on 8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 8, AUGUST 2009 Algorithm and Architecture Design of Power-Oriented H.264/AVC Baseline Profile Encoder for Portable Devices

More information

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration , pp.517-521 http://dx.doi.org/10.14257/astl.2015.1 Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration Jooheung Lee 1 and Jungwon Cho 2, * 1 Dept. of

More information

Zonal MPEG-2. Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung

Zonal MPEG-2. Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung International Journal of Applied Science and Engineering 2007. 5, 2: 151-158 Zonal MPEG-2 Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung Department of Computer Science and Information Engineering

More information

An Efficient Mode Selection Algorithm for H.264

An Efficient Mode Selection Algorithm for H.264 An Efficient Mode Selection Algorithm for H.64 Lu Lu 1, Wenhan Wu, and Zhou Wei 3 1 South China University of Technology, Institute of Computer Science, Guangzhou 510640, China lul@scut.edu.cn South China

More information

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames Ki-Kit Lai, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing Department of Electronic and Information Engineering

More information

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1

More information

H.264 to MPEG-4 Transcoding Using Block Type Information

H.264 to MPEG-4 Transcoding Using Block Type Information 1568963561 1 H.264 to MPEG-4 Transcoding Using Block Type Information Jae-Ho Hur and Yung-Lyul Lee Abstract In this paper, we propose a heterogeneous transcoding method of converting an H.264 video bitstream

More information

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD Siwei Ma, Shiqi Wang, Wen Gao {swma,sqwang, wgao}@pku.edu.cn Institute of Digital Media, Peking University ABSTRACT IEEE 1857 is a multi-part standard for multimedia

More information

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec

Fast Wavelet-based Macro-block Selection Algorithm for H.264 Video Codec Proceedings of the International MultiConference of Engineers and Computer Scientists 8 Vol I IMECS 8, 19-1 March, 8, Hong Kong Fast Wavelet-based Macro-block Selection Algorithm for H.64 Video Codec Shi-Huang

More information

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments

A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264. Massachusetts Institute of Technology Texas Instruments 2013 IEEE Workshop on Signal Processing Systems A COMPARISON OF CABAC THROUGHPUT FOR HEVC/H.265 VS. AVC/H.264 Vivienne Sze, Madhukar Budagavi Massachusetts Institute of Technology Texas Instruments ABSTRACT

More information

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform

Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block Transform Circuits and Systems, 2010, 1, 12-17 doi:10.4236/cs.2010.11003 Published Online July 2010 (http://www.scirp.org/journal/cs) Fast Implementation of VC-1 with Modified Motion Estimation and Adaptive Block

More information

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications 46 IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.3, March 2008 Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

More information

VIDEO COMPRESSION STANDARDS

VIDEO COMPRESSION STANDARDS VIDEO COMPRESSION STANDARDS Family of standards: the evolution of the coding model state of the art (and implementation technology support): H.261: videoconference x64 (1988) MPEG-1: CD storage (up to

More information

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation

Chapter 10. Basic Video Compression Techniques Introduction to Video Compression 10.2 Video Compression with Motion Compensation Chapter 10 Basic Video Compression Techniques 10.1 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration

More information

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform

An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform An H.264/AVC Main Profile Video Decoder Accelerator in a Multimedia SOC Platform Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, TAIWAN 300 ylin@cs.nthu.edu.tw 2006/08/16

More information

An Improved H.26L Coder Using Lagrangian Coder Control. Summary

An Improved H.26L Coder Using Lagrangian Coder Control. Summary UIT - Secteur de la normalisation des télécommunications ITU - Telecommunication Standardization Sector UIT - Sector de Normalización de las Telecomunicaciones Study Period 2001-2004 Commission d' études

More information

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms RC24748 (W0902-063) February 12, 2009 Electrical Engineering IBM Research Report Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms Yuri Vatis Institut für Informationsverarbeitung

More information

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding Jung-Ah Choi and Yo-Sung Ho Gwangju Institute of Science and Technology (GIST) 261 Cheomdan-gwagiro, Buk-gu, Gwangju, 500-712, Korea

More information

Introduction to Video Compression

Introduction to Video Compression Insight, Analysis, and Advice on Signal Processing Technology Introduction to Video Compression Jeff Bier Berkeley Design Technology, Inc. info@bdti.com http://www.bdti.com Outline Motivation and scope

More information

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation

Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation LETTER IEICE Electronics Express, Vol.11, No.5, 1 6 Real-time and smooth scalable video streaming system with bitstream extractor intellectual property implementation Liang-Hung Wang 1a), Yi-Mao Hsiao

More information

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H. EE 5359 MULTIMEDIA PROCESSING SPRING 2011 Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.264 Under guidance of DR K R RAO DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY

More information

Video Compression An Introduction

Video Compression An Introduction Video Compression An Introduction The increasing demand to incorporate video data into telecommunications services, the corporate environment, the entertainment industry, and even at home has made digital

More information

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION K.Priyadarshini, Research Scholar, Department Of ECE, Trichy Engineering College ; D.Jackuline Moni,Professor,Department Of ECE,Karunya

More information

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING Journal of the Chinese Institute of Engineers, Vol. 29, No. 7, pp. 1203-1214 (2006) 1203 STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING Hsiang-Chun Huang and Tihao Chiang* ABSTRACT A novel scalable

More information

Motion Vector Coding Algorithm Based on Adaptive Template Matching

Motion Vector Coding Algorithm Based on Adaptive Template Matching Motion Vector Coding Algorithm Based on Adaptive Template Matching Wen Yang #1, Oscar C. Au #2, Jingjing Dai #3, Feng Zou #4, Chao Pang #5,Yu Liu 6 # Electronic and Computer Engineering, The Hong Kong

More information

Research on Transcoding of MPEG-2/H.264 Video Compression

Research on Transcoding of MPEG-2/H.264 Video Compression Research on Transcoding of MPEG-2/H.264 Video Compression WEI, Xianghui Graduate School of Information, Production and Systems Waseda University February 2009 Abstract Video transcoding performs one or

More information

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 29, 595-605 (2013) High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm * JONGWOO BAE 1 AND JINSOO CHO 2,+ 1

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

Implementation and analysis of Directional DCT in H.264

Implementation and analysis of Directional DCT in H.264 Implementation and analysis of Directional DCT in H.264 EE 5359 Multimedia Processing Guidance: Dr K R Rao Priyadarshini Anjanappa UTA ID: 1000730236 priyadarshini.anjanappa@mavs.uta.edu Introduction A

More information

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC) STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC) EE 5359-Multimedia Processing Spring 2012 Dr. K.R Rao By: Sumedha Phatak(1000731131) OBJECTIVE A study, implementation and comparison

More information

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation

A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation 2009 Third International Conference on Multimedia and Ubiquitous Engineering A Novel Deblocking Filter Algorithm In H.264 for Real Time Implementation Yuan Li, Ning Han, Chen Chen Department of Automation,

More information

10.2 Video Compression with Motion Compensation 10.4 H H.263

10.2 Video Compression with Motion Compensation 10.4 H H.263 Chapter 10 Basic Video Compression Techniques 10.11 Introduction to Video Compression 10.2 Video Compression with Motion Compensation 10.3 Search for Motion Vectors 10.4 H.261 10.5 H.263 10.6 Further Exploration

More information

EE Low Complexity H.264 encoder for mobile applications

EE Low Complexity H.264 encoder for mobile applications EE 5359 Low Complexity H.264 encoder for mobile applications Thejaswini Purushotham Student I.D.: 1000-616 811 Date: February 18,2010 Objective The objective of the project is to implement a low-complexity

More information

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION Low Complexity H.264 Video Encoding Paula Carrillo, Hari Kalva, and Tao Pin. Dept. of Computer Science and Technology,Tsinghua University, Beijing, China Dept. of Computer Science and Engineering, Florida

More information

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

International Journal of Emerging Technology and Advanced Engineering Website:   (ISSN , Volume 2, Issue 4, April 2012) A Technical Analysis Towards Digital Video Compression Rutika Joshi 1, Rajesh Rai 2, Rajesh Nema 3 1 Student, Electronics and Communication Department, NIIST College, Bhopal, 2,3 Prof., Electronics and

More information

Video Coding Using Spatially Varying Transform

Video Coding Using Spatially Varying Transform Video Coding Using Spatially Varying Transform Cixun Zhang 1, Kemal Ugur 2, Jani Lainema 2, and Moncef Gabbouj 1 1 Tampere University of Technology, Tampere, Finland {cixun.zhang,moncef.gabbouj}@tut.fi

More information

Fast Motion Estimation for Shape Coding in MPEG-4

Fast Motion Estimation for Shape Coding in MPEG-4 358 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 13, NO. 4, APRIL 2003 Fast Motion Estimation for Shape Coding in MPEG-4 Donghoon Yu, Sung Kyu Jang, and Jong Beom Ra Abstract Effective

More information

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation th International Conference on Advanced Computing and Communications Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation Avishek Saha Department of Computer Science and Engineering,

More information

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER

EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER EFFICIENT PU MODE DECISION AND MOTION ESTIMATION FOR H.264/AVC TO HEVC TRANSCODER Zong-Yi Chen, Jiunn-Tsair Fang 2, Tsai-Ling Liao, and Pao-Chi Chang Department of Communication Engineering, National Central

More information

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution Jae-Ho Hur, Hyouk-Kyun Kwon, Yung-Lyul Lee Department of Internet Engineering, Sejong University,

More information

Low-Power Video Codec Design

Low-Power Video Codec Design International Journal of Engineering Research and Development e-issn: 2278-067X, p-issn : 2278-800X, www.ijerd.com Volume 5, Issue 8 (January 2013), PP. 81-85 Low-Power Video Codec Design R.Kamalakkannan

More information

Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path

Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path G Abhilash M.Tech Student, CVSR College of Engineering, Department of Electronics and Communication Engineering, Hyderabad, Andhra

More information

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased platforms Damian Karwowski, Marek Domański Poznan University of Technology, Chair of Multimedia Telecommunications and Microelectronics

More information

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 3, SEPTEMBER

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 3, SEPTEMBER IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 7, NO. 3, SEPTEMBER 1999 345 Cost-Effective VLSI Architectures and Buffer Size Optimization for Full-Search Block Matching Algorithms

More information

PAPER Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip

PAPER Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip IEICE TRANS. INF. & SYST., VOL.E96 D, NO.3 MARCH 2013 601 PAPER Hardware Software Co-design of H.264 Baseline Encoder on Coarse-Grained Dynamically Reconfigurable Computing System-on-Chip Hung K. NGUYEN

More information

An Efficient VLSI Architecture for Full-Search Block Matching Algorithms

An Efficient VLSI Architecture for Full-Search Block Matching Algorithms Journal of VLSI Signal Processing 15, 275 282 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. An Efficient VLSI Architecture for Full-Search Block Matching Algorithms CHEN-YI

More information

High Efficiency Video Decoding on Multicore Processor

High Efficiency Video Decoding on Multicore Processor High Efficiency Video Decoding on Multicore Processor Hyeonggeon Lee 1, Jong Kang Park 2, and Jong Tae Kim 1,2 Department of IT Convergence 1 Sungkyunkwan University Suwon, Korea Department of Electrical

More information

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264.

Title Adaptive Lagrange Multiplier for Low Bit Rates in H.264. Provided by the author(s) and University College Dublin Library in accordance with publisher policies. Please cite the published version when available. Title Adaptive Lagrange Multiplier for Low Bit Rates

More information

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Chapter 11.3 MPEG-2 MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications: Simple, Main, SNR scalable, Spatially scalable, High, 4:2:2,

More information

Architecture of High-throughput Context Adaptive Variable Length Coding Decoder in AVC/H.264

Architecture of High-throughput Context Adaptive Variable Length Coding Decoder in AVC/H.264 Architecture of High-throughput Context Adaptive Variable Length Coding Decoder in AVC/H.264 Gwo Giun (Chris) Lee, Shu-Ming Xu, Chun-Fu Chen, Ching-Jui Hsiao Department of Electrical Engineering, National

More information

4G WIRELESS VIDEO COMMUNICATIONS

4G WIRELESS VIDEO COMMUNICATIONS 4G WIRELESS VIDEO COMMUNICATIONS Haohong Wang Marvell Semiconductors, USA Lisimachos P. Kondi University of Ioannina, Greece Ajay Luthra Motorola, USA Song Ci University of Nebraska-Lincoln, USA WILEY

More information

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS

ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS ERROR-ROBUST INTER/INTRA MACROBLOCK MODE SELECTION USING ISOLATED REGIONS Ye-Kui Wang 1, Miska M. Hannuksela 2 and Moncef Gabbouj 3 1 Tampere International Center for Signal Processing (TICSP), Tampere,

More information

Hardware Architecture Design of Video Compression for Multimedia Communication Systems

Hardware Architecture Design of Video Compression for Multimedia Communication Systems TOPICS IN CIRCUITS FOR COMMUNICATIONS Hardware Architecture Design of Video Compression for Multimedia Communication Systems Shao-Yi Chien, Yu-Wen Huang, Ching-Yeh Chen, Homer H. Chen, and Liang-Gee Chen

More information

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter Y. Vatis, B. Edler, I. Wassermann, D. T. Nguyen and J. Ostermann ABSTRACT Standard video compression techniques

More information

Mesh Based Interpolative Coding (MBIC)

Mesh Based Interpolative Coding (MBIC) Mesh Based Interpolative Coding (MBIC) Eckhart Baum, Joachim Speidel Institut für Nachrichtenübertragung, University of Stuttgart An alternative method to H.6 encoding of moving images at bit rates below

More information

High-Throughput Parallel Architecture for H.265/HEVC Deblocking Filter *

High-Throughput Parallel Architecture for H.265/HEVC Deblocking Filter * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, 281-294 (2014) High-Throughput Parallel Architecture for H.265/HEVC Deblocking Filter * HOAI-HUONG NGUYEN LE AND JONGWOO BAE 1 Department of Information

More information

High Efficient Intra Coding Algorithm for H.265/HVC

High Efficient Intra Coding Algorithm for H.265/HVC H.265/HVC における高性能符号化アルゴリズムに関する研究 宋天 1,2* 三木拓也 2 島本隆 1,2 High Efficient Intra Coding Algorithm for H.265/HVC by Tian Song 1,2*, Takuya Miki 2 and Takashi Shimamoto 1,2 Abstract This work proposes a novel

More information

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Upcoming Video Standards Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc. Outline Brief history of Video Coding standards Scalable Video Coding (SVC) standard Multiview Video Coding

More information

Video compression with 1-D directional transforms in H.264/AVC

Video compression with 1-D directional transforms in H.264/AVC Video compression with 1-D directional transforms in H.264/AVC The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Kamisli, Fatih,

More information

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010 EE 5359 Low Complexity H.264 encoder for mobile applications Thejaswini Purushotham Student I.D.: 1000-616 811 Date: February 18,2010 Fig 1: Basic coding structure for H.264 /AVC for a macroblock [1] .The

More information

A Motion Vector Predictor Architecture for AVS and MPEG-2 HDTV Decoder

A Motion Vector Predictor Architecture for AVS and MPEG-2 HDTV Decoder A Motion Vector Predictor Architecture for AVS and MPEG-2 HDTV Decoder Junhao Zheng 1,3, Di Wu 1, Lei Deng 2, Don Xie 4, and Wen Gao 1,2,3 1 Institute of Computing Technology, Chinese Academy of Sciences,

More information

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain Author manuscript, published in "International Symposium on Broadband Multimedia Systems and Broadcasting, Bilbao : Spain (2009)" One-pass bitrate control for MPEG-4 Scalable Video Coding using ρ-domain

More information

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Week 14. Video Compression. Ref: Fundamentals of Multimedia Week 14 Video Compression Ref: Fundamentals of Multimedia Last lecture review Prediction from the previous frame is called forward prediction Prediction from the next frame is called forward prediction

More information

The Implement of MPEG-4 Video Encoding Based on NiosII Embedded Platform

The Implement of MPEG-4 Video Encoding Based on NiosII Embedded Platform The Implement of MPEG-4 Video Encoding Based on NiosII Embedded Platform Fugang Duan School of Optical-Electrical and Computer Engineering, USST Shanghai, China E-mail: dfgvvvdfgvvv@126.com Zhan Shi School

More information

System-on-Chip Design Methodology for a Statistical Coder

System-on-Chip Design Methodology for a Statistical Coder System-on-Chip Design Methodology for a Statistical Coder Thinh M. Le, X.H. Tian, B.L. Ho, J. Nankoo, Y. Lian Department of Electrical and Computer Engineering National University of Singapore Email: elelmt@nus.edu.sg

More information

CMPT 365 Multimedia Systems. Media Compression - Video

CMPT 365 Multimedia Systems. Media Compression - Video CMPT 365 Multimedia Systems Media Compression - Video Spring 2017 Edited from slides by Dr. Jiangchuan Liu CMPT365 Multimedia Systems 1 Introduction What s video? a time-ordered sequence of frames, i.e.,

More information

A comparison of CABAC throughput for HEVC/H.265 VS. AVC/H.264

A comparison of CABAC throughput for HEVC/H.265 VS. AVC/H.264 A comparison of CABAC throughput for HEVC/H.265 VS. AVC/H.264 The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published

More information

Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding

Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding Fast Intra- and Inter-Prediction Mode Decision in H.264 Advanced Video Coding Mehdi Jafari Islamic Azad University, S & R Branch Department of Communication Engineering P.O.Box 14515-775, Tehran, Iran

More information