Keywords: Processing Element, Motion Estimation, BIST, Error Detection, Error Correction, Residue-Quotient(RQ) Code.

Similar documents
Detecting and Correcting the Multiple Errors in Video Coding System

Detecting and Correcting the Multiple Errors in Video Coding System

Architecture to Detect and Correct Error in Motion Estimation of Video System Based on RQ Code

IJCSIET--International Journal of Computer Science information and Engg., Technologies ISSN

FPGA Based Low Area Motion Estimation with BISCD Architecture

IMPLEMENTATION OF ROBUST ARCHITECTURE FOR ERROR DETECTION AND DATA RECOVERY IN MOTION ESTIMATION ON FPGA

DESIGN FOR TESTABILITY TECHNIQUES FOR VIDEO CODING SYSTEMS

VLSI Architecture to Detect/Correct Errors in Motion Estimation Using Biresidue Codes

Built In Self Test For Multi Error Detection In Motion Estimation Computing Arrays

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

MultiFrame Fast Search Motion Estimation and VLSI Architecture

POWER CONSUMPTION AND MEMORY AWARE VLSI ARCHITECTURE FOR MOTION ESTIMATION

ISSN Vol.08,Issue.12, September-2016, Pages:

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

ARITHMETIC operations based on residue number systems

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

A High Sensitive and Fast Motion Estimation for One Bit Transformation Using SSD

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

High Performance Hardware Architectures for A Hexagon-Based Motion Estimation Algorithm

Enhanced Hexagon with Early Termination Algorithm for Motion estimation

DESIGN OF PARAMETER EXTRACTOR IN LOW POWER PRECOMPUTATION BASED CONTENT ADDRESSABLE MEMORY

A High-Speed FPGA Implementation of an RSD-Based ECC Processor

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Aiyar, Mani Laxman. Keywords: MPEG4, H.264, HEVC, HDTV, DVB, FIR.

AUTONOMOUS RECONFIGURATION OF IP CORE UNITS USING BLRB ALGORITHM

ISSN Vol.05,Issue.09, September-2017, Pages:

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

DYNAMIC CIRCUIT TECHNIQUE FOR LOW- POWER MICROPROCESSORS Kuruva Hanumantha Rao 1 (M.tech)

Area and Power efficient MST core supported video codec using CSDA

Design of a High Speed CAVLC Encoder and Decoder with Parallel Data Path

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume 9 /Issue 3 / OCT 2017

WORD LEVEL FINITE FIELD MULTIPLIERS USING NORMAL BASIS

TESTING OF FAULTS IN VLSI CIRCUITS USING ONLINE BIST TECHNIQUE BASED ON WINDOW OF VECTORS

DUE to the high computational complexity and real-time

VLSI IMPLEMENTATION OF ERROR TOLERANCE ANALYSIS

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

Available online at ScienceDirect. Procedia Technology 25 (2016 )

@ 2014 SEMAR GROUPS TECHNICAL SOCIETY.

High Performance VLSI Architecture of Fractional Motion Estimation for H.264/AVC

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

Soft-Core Embedded Processor-Based Built-In Self- Test of FPGAs: A Case Study

Area And Power Optimized One-Dimensional Median Filter

Speed Optimised CORDIC Based Fast Algorithm for DCT

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

A Comparison of Two Algorithms Involving Montgomery Modular Multiplication

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

MODULO 2 n + 1 MAC UNIT

Using Error Detection Codes to detect fault attacks on Symmetric Key Ciphers

HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Design and Implementation of 3-D DWT for Video Processing Applications

High Speed Multiplication Using BCD Codes For DSP Applications

Design and Implementation of Online BIST for Different Word Sizes of Memories MUNEERA JAMAL 1, K. PADMAJA DEVI 2

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume VI /Issue 3 / JUNE 2016

Implementation of a Fast Sign Detection Algoritm for the RNS Moduli Set {2 N+1-1, 2 N -1, 2 N }, N = 16, 64

A Structure-Independent Approach for Fault Detection Hardware Implementations of the Advanced Encryption Standard

A COST-EFFICIENT RESIDUAL PREDICTION VLSI ARCHITECTURE FOR H.264/AVC SCALABLE EXTENSION

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Implementation of Galois Field Arithmetic Unit on FPGA

FPGA IMPLEMENTATION OF BIT PLANE ENTROPY ENCODER FOR 3 D DWT BASED VIDEO COMPRESSION

A Survey on Early Determination of Zero Quantized Coefficients in Video Coding

THE orthogonal frequency-division multiplex (OFDM)

Paper ID # IC In the last decade many research have been carried

Fault Tolerant Parallel Filters Based on ECC Codes

FPGA Implementation of ALU Based Address Generation for Memory

A High Quality/Low Computational Cost Technique for Block Matching Motion Estimation

FPGA Implementation of Low Complexity Video Encoder using Optimized 3D-DCT

Parallel Processing Deblocking Filter Hardware for High Efficiency Video Coding

Area Efficient SAD Architecture for Block Based Video Compression Standards

AN EFFICIENT VIDEO WATERMARKING USING COLOR HISTOGRAM ANALYSIS AND BITPLANE IMAGE ARRAYS

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

DESIGN AND IMPLEMENTATION OF FAST DECIMAL MULTIPLIER USING SMSD ENCODING TECHNIQUE

VERY large scale integration (VLSI) design for power

ISSN Vol.03,Issue.05, July-2015, Pages:

TEST DATA COMPRESSION BASED ON GOLOMB CODING AND TWO-VALUE GOLOMB CODING

Anisha Rani et al., International Journal of Computer Engineering In Research Trends Volume 2, Issue 11, November-2015, pp.

A Two Dimensional Median Filter Design Using Low Power Filter Architecture

VLSI DESIGN FOR CONVOLUTIVE BLIND SOURCE SEPARATION

VLSI Computational Architectures for the Arithmetic Cosine Transform

Design and Implementation of Effective Architecture for DCT with Reduced Multipliers

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

A Dedicated Hardware Solution for the HEVC Interpolation Unit

Volume 5, Issue 5 OCT 2016

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Transcription:

ISSN 2319-8885 Vol.03,Issue.31 October-2014, Pages:6116-6120 www.ijsetr.com FPGA Implementation of Error Detection and Correction Architecture for Motion Estimation in Video Coding Systems ZARA NILOUFER 1, SK.AHMEDUDDIN ZAKIR 2 1 PG Scholar, Dept of ECE, Shadan College of Engineering & Technology, Hyderabad, India, E-mail: Zara.nilofer@gmail.com. 2 Asst Prof, Dept of ECE, Shadan College of Engineering & Technology, Hyderabad, India, E-mail: syezakir@gmail.com. Abstract: Motion Estimation (ME) is the critical part of any video coding system as the video quality will be affected if an error has occurred in the ME. In order to test Motion Estimation in a video coding system an Error Detection and Correction Architecture (EDCA) is designed based on the Residue-and-Quotient (RQ) code. Multiple bit errors in the processing element (PE) i.e., key component of a ME, can be detected and the original data can be recovered effectively by using EDCA design. BIST technique iss utilised for testing. Experimental results indicate that the proposed design can detect multiple bit errors within lesser clock cycles. Keywords: Processing Element, Motion Estimation, BIST, Error Detection, Error Correction, Residue-Quotient(RQ) Code. I. INTRODUCTION Design for Testability (DFT) techniques are required in order to improve the quality and reduce the test cost of the digital circuit, while at the same time simplifying the test, debug and diagnose tasks. Logic built-in self-test (BIST) is a design for testability technique in which a portion of a circuit on a chip, board, or system is used to test the digital logic circuit itself. BIST technique for testing logic circuits can be online or offline. Online BIST is performed when the functional circuitry is in normal operational mode. In concurrent online BIST, testing is conducted simultaneously during normal functional operation. The functional circuitry is usually implemented with coding techniques. Video compression is achieved on two separate fronts by eliminating spatial redundancies and temporal redundancies from video signals. Removing spatial redundancies involves the task of removing video information that is consistently repeated within certain areas of a single frame. For example a frame shot of a blue sky will have a consistent shade of blue across the entire frame. This information can be compressed through the use of various discreet cosine transformations that map a given image in terms of its light or colour intensities. This paves the way for spatial compression by only capturing the distinct intensities, instead of the spread of intensities over the entire frame. Since compression through removing spatial redundancies does not involve the use of motion estimation. Compression through the removal of temporal redundancies involves compressing information that is repeated over a given sequence of frames. Consequently, the motion estimation process is the process of deriving a suitable Motion Vector (MV) that best describes the spatial movement of objects from one frame to the next. Motion compensation is a key process in temporal video compression, which eliminates temporal redundancies found in video. Temporal redundancy occurs when an identical set of pixels exist across multiple video frames. Generally, Motion Estimation Computing Array (MECA) performs up to 50% of computations in the entire video coding system. In VCS, Video data needs to be compressed before storage and transmission, complex algorithms are required to eliminate the redundancy, extracting the redundant information. Motion Estimation (ME) is the process of creating motion vectors to track the motion of objects within video footage. It is an essential part of many compression standards and is a crucial component of the H.264 video compression standard. In particular ME can consist of over 40% of the total computation. Motion Estimation is the technique of finding a suitable Motion Vector (MV) that best describes the movement of a set of pixels from its original position within one frame to its new positions in the subsequent frame. Encoding just the motion vector for the set of pixels requires significantly less bits than what is required to encode the entire set of pixels, while still retaining enough information to reproduce the original video sequence. II. RELATED WORK The Existing EDCA scheme consists of two major blocks, i.e. error detection circuit (EDC) and data recovery circuit (DRC), to detect the errors and recover the corresponding data in a specific CUT is as shown in Fig.1. The test code generator (TCG) in the architecture utilizes the concepts of RQ code to generate the corresponding test codes for error detection and data recovery. The output from the Processing element (PE) is compared with the test code values in the EDC. The output of EDC indicates the occurrence of error. DRC is in charge of recovering data from TCG. Additionally, Copyright @ 2014 IJSETR. All rights reserved.

a selector is enabled to select the error free data or datarecovery results. A ME consists of many PE s incorporated in a 1-D or 2-D array for video encoding applications. Fig.1. shows the proposed EDCA circuit design for a specific Pei of a ME. This architecture consists of blocks that generate the residue and quotient values that are used to detect the errors. ZARA NILOUFER, SK.AHMEDUDDIN ZAKIR pixels as shown in Fig.3. Generally, a PE is made up of two adders (an 8-bit adder and a 12-bit adder) and accumulator. Fig.3. Processing Element Module Schematic Diagram. Fig.1. Existing error detection and correction architecture. III. IMPLEMENTATION OF PROPOSED EDCA DESIGN The proposed EDCA circuit design consists of two Processing elements (PEs) and two TCGs. 8bit current pixel and 8 bit reference pixel are given to each of PE1, PE2 and TCG1, TCG2.PE block generates SAD value which is input to RQCG circuit.the RQCG circuit generates Residue and Quotient (RQ) codes which are taken by Error Detection Circuit(EDC). The EDC circuit compares test codes from Test Code Generator (TCG) circuit and RQCQ circuit as shown in Fig.2. If the output of EDC circuit is generated as 0, the circuit is assumed As error free and MUX generates next PE. If EDC generates output as 1 then MUX selector takes corrected value from Data Recovery Circuit (DRC). Fig.2. Proposed EDCA Circuit Design. A. Processing Element Processing element calculates the sum of absolute differences (SAD) between current pixels and reference B. Fault Model A more practical approach is to select specific test patterns based on circuit structural information and a set of fault models. This approach is called structural testing. Structural testing saves time and improves test efficiency. A stuck-at fault affects the state of logic signals on lines in a logic circuit, including primary inputs (PIs), primary outputs (POs), internal gate inputs and outputs, fan-out stems (sources), and fan-out branches. A stuck-at fault transforms the correct value on the faulty signal line to appear to be stuck at a constant logic value, either logic 0 or logic 1, referred to as stuck-at-0 (SA0) or stuck-at-1 (SA1), respectively. The stuck at fault model can also be applied to sequential circuits; however, high fault coverage test generation for sequential circuits is much more difficult than for combinational circuits. The stuck-at (SA) model, must be adopted to cover actual failures in the interconnect data bus between PEs. The SA fault in a ME architecture can incur errors in computing SAD values. A distorted computational error and the magnitude of e are assumed here to be equal to - SAD SAD where SAD denotes the computed SAD value with SA faults. C. Residue and Quotient Generation Earlier codes like Parity Codes, Berger Codes were used for detecting a single bit error. Next evolved the residue code which is generally a separable arithmetic code that estimates the residue of the given data and appends it to the data. This code is capable of detecting a single bit error. Error detection logic using residue codes are simple and it can be easily implemented. For instance, assume that N denotes an integer, N1and N2 represent data words, and m refers to the modulus. A separate residue code is one in which N is coded as a pair (N, N m ).Notably; N m is the residue of N modulo m. However, only a bit error can be detected based on the residue code. Additionally, error correction is not possible by using the residue codes. Therefore, this work presents a quotient code to assist the residue code in detecting multiple bit errors and recovering errors. The mathematical model of RQ code is simply described as follows. Assume that binary data X is expressed as (1)

FPGA Implementation of Error Detection and Correction Architecture for Motion Estimation in Video Coding Systems The RQ code for X modulo m is expressed as R= X M and area and the Ref_pixel of the current macro block. The SAD Q= X/m respectively. Notably, i denotes the largest value for the macro block with size of NxN can be evaluated: integer not exceeding I. According to the above RQ code expression, the corresponding circuit design of the RQCG can be realized. In order to simplify the complexity of circuit design, the implementation of the module is generally dependent on the addition operation. Additionally, based on (7) the concept of residue code, the following definitions shown Where r xij, q xij and r yij, q yij denote the corresponding RQ code can be applied to generate the RQ code for circuit design. of X ij and Y ij modulo m. Importantly, X ij and Y ij represents the luminance pixel value of Cur_pixel and Ref_pixel, Definition 1: respectively. Based on the definitions shown in (2) and (3) the generation of the RQ code (R T and Q T ) from the TCG (2) block is obtained. The circuit design of TCG is achieved by Definition 2: equations (8) and (9), Let, then (3) To accelerate the circuit design of RQCG, the binary data shown in (1) can generally be divided into two parts: (4) Significantly, the value of k is equal to n/2 and the data formation of Y 0 and Y 1 are a decimal system. If the modulus m= (2 power k) -1, then the residue code of X modulo m is given by, (8) (5) Where (6) Since the value of Y 0 and Y 1 is generally greater than that of modulus m, the equations in (5) and (6) must be simplified further to replace the complex module operation with a simple addition operation by using the parameters Z 0, Z 1, α and β. Based on (5) and (6), the corresponding circuit design of the RQCG is easily realized by using the simple adders (ADDs). Namely, the RQ code can be generated with a low complexity and little hardware cost. D. Test Code Generation (TCG) TCG is the main block of the proposed EDCA design. Test Code Generation (TCG) design is based on the ability of the RQCG circuit to generate corresponding test codes in order to detect errors and recovers data. The specific PEi estimates the absolute difference between the Cur_pixel of the search E. Error Detection Process Error Detection Circuit (EDC) is used to perform the operation of error detection in the specific Pei as shown in Fig.2. This block is used to compare the output from the TCG i.e., (R T and Q T ) with output from RQCG1 i.e., (R PEi and Q PEi ), in order to detect the occurrence of an error. If the value of R PEi R T and Q PEi Q T, then the error in the PEi can be detected. The EDC output is then generated as a 0/1 signal to indicate that the tested PEi is error free/ with error. F. DRC Block The original data is recovered in the Data Recovery Circuit (DRC) during error correction process, by separating (9)

the RQ code from the TCG. The data recovery is possible by implementing the mathematical model as, (10) The proposed EDCA design executes the error detection and data recovery operations simultaneously. Additionally, error-free data from the tested PE i or the data recovery that results from DRC is selected by a multiplexer (MUX) to pass to the next specific PE i +1for subsequent testing. ZARA NILOUFER, SK.AHMEDUDDIN ZAKIR codes for each pixel value in the macroblock. Though this architecture can detect multiple bit errors, it took 16 clock pluses to get the output. To overcome this disadvantage, the proposed architecture is used which has 2 PEs and TCGs for computing the test codes for the pixel values. Verification of the proposed design is performed by using VHDL and then synthesized by using Mentor Graphics with the target device as 2V1500bg575 from the family Virtex II. The performance of the proposed EDCA design is estimated in terms of its speed. In this architecture, the pixel values in the 4x4 macroblock are taken out for the code generation based on the clocking signal. At each triggering edge of the clock the 8-bit pixel value from the ref_pixel and cur_pixel is taken and given to the TCG and PE blocks for code generation. The generated code from test code generation block and the RQCG output is compared in the EDC block, output of which indicates the error is avoided. TABLE I: Comparison Of Speeds Fig.4. Example of pixel values. G. Sample Calculation A sample of the 16 pixels for a 4x4 macro block is taken. Fig.4 presents an example of pixel values of the Cur_pixel and Ref_pixel. Based on (7), the SAD value of the 4x4 macro block is (11) Table 1 shows that the speed and high throughput are achieved by using 2 PEs and 2 TCGS and giving 8 pixels to each PE and TCG.The overall strategy is based on RQ code generation which can detect multiple bit errors within and also correct it. V. CONCLUSION The proposed architecture which increases the speedup as well as the throughput. The memory can keep the data path fully utilized in video processing function implementations which ensures high-speed operation and full utilization of the processing resources. The modulo is assumed here to be m=2 6-1=63. Thus, based on (8) and (9), the RQ code of the SAD value shown in (11) are R PEi = R T = 2124 63 =45 and Q PEi = Q T = 2124/63 =33. Since the value of R PEi (Q PEi ) is equal to R T (Q T ), EDC is enabled and a signal 0 is generated to describe a situation in which the specific is error-free. Conversely, if SA1 and SA0 errors occur in bits 1 and 12 of a specific PEi, i.e., the pixel values of PEi, 2124= (100001001100)2is turned into 77= (000001001101)2, resulting in the transformation of the RQ code of R PEi and Q PEi into 77 =14and 77/63 =1. Thus, an error signal 1 is generated from EDC and sent to the MUX in order to select the recovery results from DRC. IV. RESULTS AND DISCUSSIONS Residue-and-quotient code is used in the existing error detection architecture that has single Processing Element and Test Code Generation (TCG) block for computing the test VI. ACKNOWLEDGMENT I would like to thank my HOD Mr. Mohammed Abdul Mubeen and to my beloved guide S.K.Ahmeduddin Zakir for their continuous support and guidance. VII. REFERENCES [1] Y.W.Huang, B.Y.Hsieh,S.Y. Chien, S.Y. Ma, and L.G. Chen, Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC, IEEE Trans. Circuits Syst.Video Technol., Vol.16,no.4,pp.507-522,Apr.2006. [2] S.Surin and Y.H. Hu, Frame-level pipeline motion estimation array processor, IEEE Trans. Circuits Syst. Video Technol., vol. 11, no.2, pp.248-251, Feb.2001. [3] M.Y.Dong, S.H.Yang, and S.K. Lu, Design-fortestability techniques for motion estimation computing arrays, in Proc. Int. Conf. Commun., Circuits Syst., May 2008, pp.1188-1191. [4] J.F. Lin, J.C. Yeh, R.F. Hung, and C.W. Wu, A built-in self repair design for RAMs with 2-D redundancy, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 6, pp.742-745, Jun. 2005. [5] C.L. Hsu, C.H. Cheng, and Y. Liu, Built-in self detection/correction architecture for motion estimation

FPGA Implementation of Error Detection and Correction Architecture for Motion Estimation in Video Coding Systems computing arrays, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no.2, pp. 319-324, Feb.2010. [6] S.Bayat-Sarmadi and M.A. Hasan, On concurrent detection of errors in polynomial basis multiplication, IEEE Trans. Very Large Scale Integr. (VLSI) Systs., vol. 15, no.4, pp. 413-426, Apr.2007. [7] D. K. Park, H.M.Cho, S.B. Cho, and J.H. Lee, A fast motion estimation algorithm for SAD optimization in subpixel, Proc. Int. Symp. Integr. Circuits, Sep. 2007, pp. 528-531. [8] C.W. Chiou, C.C. Chang, C.Y. Lee, T.W. Hou, and J.M. Lin, Concurrent error detection and correction in Gaussian normal basis multiplier over GF (2^m), IEEE Trans. Comput., vol.58, no.6, pp.851-857, Jun.2009. [9] Chang- Hsin Cheng, Yu Liu, and Chun-Lung Hsu, Member, IEEE Design of an Error Detection and Data recovery Architecture for Motion Estimation Testing Applications, IEEE transactions on Very Large scale integration (VLSI) systems, vol.20, no.4, April.2012. [10] L.Breveglieri, P. Maistri, and I. Koren, Anote on error detection in an RSA architecture by means of residue codes, in Proc. IEEE Int. Symp. On-Line Testing, Jul. 2006, 176 177.