Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Similar documents
A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

BANDWIDTH REDUCTION SCHEMES FOR MPEG-2 TO H.264 TRANSCODER DESIGN

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

FAST SPATIAL LAYER MODE DECISION BASED ON TEMPORAL LEVELS IN H.264/AVC SCALABLE EXTENSION

Fast Mode Decision for H.264/AVC Using Mode Prediction

Implementation of A Optimized Systolic Array Architecture for FSBMA using FPGA for Real-time Applications

Analysis and Architecture Design of Variable Block Size Motion Estimation for H.264/AVC

Xuena Bao, Dajiang Zhou, Peilin Liu, and Satoshi Goto, Fellow, IEEE

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

Research on Transcoding of MPEG-2/H.264 Video Compression

Scalable Extension of HEVC 한종기

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Reconfigurable Variable Block Size Motion Estimation Architecture for Search Range Reduction Algorithm

An Optimized Template Matching Approach to Intra Coding in Video/Image Compression

Complexity Reduced Mode Selection of H.264/AVC Intra Coding

Zonal MPEG-2. Cheng-Hsiung Hsieh *, Chen-Wei Fu and Wei-Lung Hung

Co-synthesis and Accelerator based Embedded System Design

Star Diamond-Diamond Search Block Matching Motion Estimation Algorithm for H.264/AVC Video Codec

VIDEO COMPRESSION STANDARDS

A hardware design of optimized ORB algorithm with reduced hardware cost

Digital Video Processing

MultiFrame Fast Search Motion Estimation and VLSI Architecture

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

A HIGHLY PARALLEL CODING UNIT SIZE SELECTION FOR HEVC. Liron Anavi, Avi Giterman, Maya Fainshtein, Vladi Solomon, and Yair Moshe

Implementation and analysis of Directional DCT in H.264

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

The S6000 Family of Processors

implementation using GPU architecture is implemented only from the viewpoint of frame level parallel encoding [6]. However, it is obvious that the mot

EE 5359 MULTIMEDIA PROCESSING SPRING Final Report IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.

10.2 Video Compression with Motion Compensation 10.4 H H.263

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

A NOVEL SCANNING SCHEME FOR DIRECTIONAL SPATIAL PREDICTION OF AVS INTRA CODING

Fast frame memory access method for H.264/AVC

Video encoders have always been one of the resource

Introduction to Video Compression

A Quantized Transform-Domain Motion Estimation Technique for H.264 Secondary SP-frames

Reducing/eliminating visual artifacts in HEVC by the deblocking filter.

SINGLE PASS DEPENDENT BIT ALLOCATION FOR SPATIAL SCALABILITY CODING OF H.264/SVC

Compressed-Domain Video Processing and Transcoding

Efficient MPEG-2 to H.264/AVC Intra Transcoding in Transform-domain

Multimedia Systems Video II (Video Coding) Mahdi Amiri April 2012 Sharif University of Technology

High Performance Hardware Architectures for A Hexagon-Based Motion Estimation Algorithm

JPEG 2000 vs. JPEG in MPEG Encoding

Decoding-Assisted Inter Prediction for HEVC

SAD implementation and optimization for H.264/AVC encoder on TMS320C64 DSP

STACK ROBUST FINE GRANULARITY SCALABLE VIDEO CODING

IN RECENT years, multimedia application has become more

Toward Optimal Pixel Decimation Patterns for Block Matching in Motion Estimation

Reduced 4x4 Block Intra Prediction Modes using Directional Similarity in H.264/AVC

ABSTRACT. KEYWORD: Low complexity H.264, Machine learning, Data mining, Inter prediction. 1 INTRODUCTION

CONTENT ADAPTIVE COMPLEXITY REDUCTION SCHEME FOR QUALITY/FIDELITY SCALABLE HEVC

Dynamic Load Balancing for Real-Time Video Encoding on Heterogeneous CPU+GPU Systems

High Efficient Intra Coding Algorithm for H.265/HVC

Advanced Video Coding: The new H.264 video compression standard

Area Efficient SAD Architecture for Block Based Video Compression Standards

Coding of Coefficients of two-dimensional non-separable Adaptive Wiener Interpolation Filter

H.264/AVC Baseline Profile to MPEG-4 Visual Simple Profile Transcoding to Reduce the Spatial Resolution

Digital system (SoC) design for lowcomplexity. Hyun Kim

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

Motion Vector Coding Algorithm Based on Adaptive Template Matching

Research Article A High-Throughput Hardware Architecture for the H.264/AVC Half-Pixel Motion Estimation Targeting High-Definition Videos

Review and Implementation of DWT based Scalable Video Coding with Scalable Motion Coding.

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Express Letters. A Simple and Efficient Search Algorithm for Block-Matching Motion Estimation. Jianhua Lu and Ming L. Liou

Spline-Based Motion Vector Encoding Scheme

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

A Novel Design Framework for the Design of Reconfigurable Systems based on NoCs

FAST MOTION ESTIMATION DISCARDING LOW-IMPACT FRACTIONAL BLOCKS. Saverio G. Blasi, Ivan Zupancic and Ebroul Izquierdo

Context based optimal shape coding

Realtime H.264 Encoding System using Fast Motion Estimation and Mode Decision

Video Coding Using Spatially Varying Transform

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR.

A GPU-Based DVC to H.264/AVC Transcoder

Rate Distortion Optimization in Video Compression

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

IBM Research Report. Inter Mode Selection for H.264/AVC Using Time-Efficient Learning-Theoretic Algorithms

Testing HEVC model HM on objective and subjective way

Reconfigurable architectures and processors for real-time video motion estimation

Multimedia Decoder Using the Nios II Processor

Detecting and Correcting the Multiple Errors in Video Coding System

High-Performance VLSI Architecture of H.264/AVC CAVLD by Parallel Run_before Estimation Algorithm *

A Survey on Early Determination of Zero Quantized Coefficients in Video Coding

IJSER. Real Time Object Visual Inspection Based On Template Matching Using FPGA

Efficient Method for Half-Pixel Block Motion Estimation Using Block Differentials

Detecting and Correcting the Multiple Errors in Video Coding System

POLITECNICO DI TORINO Repository ISTITUZIONALE

Semi-Hierarchical Based Motion Estimation Algorithm for the Dirac Video Encoder

EE Low Complexity H.264 encoder for mobile applications

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

Compression of Stereo Images using a Huffman-Zip Scheme

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

EE 5359 Low Complexity H.264 encoder for mobile applications. Thejaswini Purushotham Student I.D.: Date: February 18,2010

DIGITAL TELEVISION 1. DIGITAL VIDEO FUNDAMENTALS

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

An Efficient Table Prediction Scheme for CAVLC

Prediction-based Directional Search for Fast Block-Matching Motion Estimation

Transcription:

, pp.517-521 http://dx.doi.org/10.14257/astl.2015.1 Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration Jooheung Lee 1 and Jungwon Cho 2, * 1 Dept. of Electronic and Electrical Engineering, Hongik University, Republic of Korea joolee@hongik.ac.kr 2 Dept. of Computer Education, Jeju National University, Republic of Korea jwcho@jejunu.ac.kr Abstract. In this paper, energy-efficient architecture for Variable Block Size Motion Estimation (VBSME) is proposed to fully utilize dynamic partial reconfiguration capability of programmable hardware fabric in the heterogeneous computing environment. Dynamic Partial Reconfiguration is a unique feature of FPGAs that makes best use of hardware resources and power by allowing adaptive algorithm to be implemented on the reconfigurable hardware fabric during runtime. We exploit this feature of FPGA to support run-time reconfiguration of the proposed modular hardware architecture for motion estimation (ME). The implemented scalable SAD array can provide different resolutions and frame rates for real time applications with multiple reconfigurable regions at the operating frequency of 90MHz. Keywords: Variable Block Size Motion Estimation, Partial Reconfiguration. 1 Introduction Due to the proliferation of heterogeneous embedded systems in the Internet of Things era, low complexity encoding for video sensing applications becomes essential especially in power-constrained environments. Current video encoding algorithms require high computing capability to achieve high compression ratios, but their computational complexities increase power consumption proportionally. Motion estimation techniques play a key role in inter-video coding by removing temporal redundancies among video frames obtained from the conventional CCD/CMOS sensors [1-3]. As in H.264/AVC and High Efficiency Video Coding (HEVC) standards, motion estimation engine demands large amount of complex computational capabilities due to its support for wide range of different block sizes in order to increase accuracy in finding best matching block. Recently, System-on-Chip (SoC) used for embedded video-sensing systems integrates various functional elements, such as general purpose processors, DSPs, customized hardware accelerators, memories, and programmable hardware fabrics (FPGAs). In this paper, we propose an approach to gain algorithmic and architectural * Corresponding Author ISSN: 2287-1233 ASTL Copyright 2015 SERSC

benefits through a) an adaptive reduction of search range to reduce number of search candidates during run-time and b) a reconfigurable modular architecture to support search range adaptation by fully utilizing partial reconfiguration capability of FPGAs. The rest of the paper is organized as follows. In Section 2, we present the hardware platform to implement the proposed reconfigurable motion estimation engine. The experimental results of the proposed reconfigurable approach are shown in Section 3. Finally, concluding remarks are given in Section 4. 2 Proposed Modular Motion Estimation Architecture The hardware platform to support proposed modular architecture for variable block size motion estimation is shown in Fig. 1. The static region which remains unchanged after initial configuration of the FPGA device includes current frame buffer, controller, comparator & block mode selector, and SAD buffer. The reconfigurable region consists of 4 Partial Reconfigurable Regions (PRRs). Each PRR is used to map the hardware bitstream of PE array which is responsible for computation of SADs. N 1 N 1 SAD h, v ) c x, y ) s _ w x 1 x h, y 1 y v ) (1) Controller x 0 y 0 HyperTerminal UART we data Frame Buffers data SADs, Mode CPU SAD Buffer we Comparator and Block Mode Selector SADs MVs P R I/ F Compact Flash System ACE PLB BUS Partially Reconfigurable Region Used to Implement Search Range Adaptive ME through Deeply Pipelined Datapath of Reconfigurable Processing Elements PRR1 PRR2 PRR3 H/W Library Module Mapped on Each Partially Reconfigurable Region During Run-Time User Control increasing search range Bitstream Mapping PRR4 Fig. 1. Hardware platform to implement partially reconfigurable ME Each PRR is responsible for SADs of ±2 search range for four blocks and by increasing the PRRs from 1, 2, 3, and 4, the motion estimation design can support ±2, ±4, ±6, and ±8 search range respectively. Similarly, the PRR structure can be designed to support increasing/decreasing SR (Search Range) in steps of 4 and 8. For example, if PRRs can be configured for 4-step SR, then the set of SRs [±4, ±8, ±12, and ±16] is supported. This can be useful for video with very high motion activities and high resolution which require optimum SR to be greater than ±8. 518 Copyright 2015 SERSC

To perform block-based motion estimation, the matching criterion such as SADs in (1) among the current image block and all candidate blocks in the search range of reference frame is computed. Let the block size be N N and location of each block in the current frame (C) is represented by (i, j), and a block within the search window (h, v) in the reference frame (R) is matched. Here, c(x,y) is a pixel value in the current block and s_w(x,y) is a search candidate block. The center (x1, y1) of the search range is predicted, depending on the behavior of motion vectors of neighboring blocks. 3 Experimental Results Performing exhaustive search is computationally very expensive and adds great complexity to motion estimation. A full search algorithm for each block candidate requires (2R+1) 2 SAD computations where [-R,+R] is the search window size. We reduce this search range (R) adaptively based on the correlation of the neighboring blocks motion vectors. The adaptive search range reduction for dynamic partial reconfiguration of modular motion estimation HW architecture is described below. a) Initially set the search range for all the reference frames (for the case of multiple or single referenced frame algorithm) to maximum search range size (i.e., 32). b) Perform prediction algorithm as used in the JM software and obtain Motion Vector Prediction (MVP). Then, set the search range center in the reference frame accordingly. For simplicity in hardware implementation, all the block types for current macroblock share the same MVP (Fast Full Search in JM). c) Perform Fast Full Search motion estimation and simultaneously store the number of MVs falling into different category of absolute displacements. (i.e., number of MVs with displacement 0, 1, 2, and so on). d) Set the search range for the next frame such that the search range covers at least Th (Threshold) % of total MVs. Th can vary from 90-100% depending on the user requirement: higher performance/psnr, less computational complexity, or hardware resources utilized, and so on. In the proposed approach, search range is updated at frame level due to the overhead of bitstream loading from the external memory for real-time reconfiguration on the FPGA device. Hence, it is assumed that the search range is fixed for all macroblocks in a particular frame. e) Continue from step (b) onwards. The performance comparison of Search Range Reduction Algorithm with the JM 12.4 Reference Software is given in Fig. 2. Threshold values from 95% to 99.5% are used for simulations on various video sequences (QCIF/30fps), and their corresponding results are shown in terms of PSNR and hardware utilization. For comparisons, Fast Full Search algorithm using the same search range of 32 for all blocks of the given macroblock is used for encoding 60 frames. Parameters in JM software are set to baseline profile (no B slices), one reference frame, and fixed search range. As shown in Fig. 2, the proposed adaptive search range approach using partial reconfiguration is very useful in reducing the computational complexity and hardware requirement significantly at the cost of little degradation in video quality, i.e., small increase in bitrate and decrease in PSNR. Copyright 2015 SERSC 519

PSNR(dB) 37.5 37 36.5 36 35.5 95% 97% 99% 99.50% 100% (a) P SNR vs. Threshold Threshold Carphone Football Suzie Hardware Utilization % 80 60 Carphone 40 Football 20 Suzie 0 Threshold 95% 97% 99% 99.50% (b) Hardware Utilization vs. Threshold Fig. 2. Performance comparison of adaptive SR with partial reconfiguration 4 Conclusion In this paper, we presented the design of motion estimation processing engine which is capable of executing for different search range values. The proposed motion estimation architecture is suitable for real-time applications where hardware resource is a critical criterion. First, a search range reduction is introduced at frame level, in order to intelligently adapt towards various video sequences and at the same time the computational complexity can be reduced further wherever possible with only small degradation in PSNR. Second, modular hardware architecture is presented for ME engine to process with different SR values during run time. References 1. Dias, T., Momcilovic, S., Roma, N., and Sousa, L. In: Adaptive motion estimation processor for autonomous video devices. EURASIP Journal on Embedded Systems, pp. 1--10. (2007) 2. Song, Y., Liu, Z., Goto, S., and Ikenaga, T. In: Scalable VLSI Architecture for Variable Block Size Integer Motion Estimation in H.264/AVC. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, pp. 979--988. (2006) 3. Liu, Z., Song, Y., Ikenaga, T., and Goto, S. In: A fine-grain scalable and low memory cost variable block size motion estimation architecture for H.264/AVC. IEICE Transactions on Electronics, pp. 1928--1936. (2006) 520 Copyright 2015 SERSC

4. Gupta, S., and Lee, J. In: A Scalable H.264/AVC Variable Block Size Motion Estimation Engine Using Partial Reconfiguration. Proceedings of International Conference on Engineering of Reconfigurable Systems and Algorithms, (2009) Copyright 2015 SERSC 521