ISSN 2319-8885 Vol.03,Issue.31 October-2014, Pages:6116-6120 www.ijsetr.com FPGA Implementation of Error Detection and Correction Architecture for Motion Estimation in Video Coding Systems ZARA NILOUFER 1, SK.AHMEDUDDIN ZAKIR 2 1 PG Scholar, Dept of ECE, Shadan College of Engineering & Technology, Hyderabad, India, E-mail: Zara.nilofer@gmail.com. 2 Asst Prof, Dept of ECE, Shadan College of Engineering & Technology, Hyderabad, India, E-mail: syezakir@gmail.com. Abstract: Motion Estimation (ME) is the critical part of any video coding system as the video quality will be affected if an error has occurred in the ME. In order to test Motion Estimation in a video coding system an Error Detection and Correction Architecture (EDCA) is designed based on the Residue-and-Quotient (RQ) code. Multiple bit errors in the processing element (PE) i.e., key component of a ME, can be detected and the original data can be recovered effectively by using EDCA design. BIST technique iss utilised for testing. Experimental results indicate that the proposed design can detect multiple bit errors within lesser clock cycles. Keywords: Processing Element, Motion Estimation, BIST, Error Detection, Error Correction, Residue-Quotient(RQ) Code. I. INTRODUCTION Design for Testability (DFT) techniques are required in order to improve the quality and reduce the test cost of the digital circuit, while at the same time simplifying the test, debug and diagnose tasks. Logic built-in self-test (BIST) is a design for testability technique in which a portion of a circuit on a chip, board, or system is used to test the digital logic circuit itself. BIST technique for testing logic circuits can be online or offline. Online BIST is performed when the functional circuitry is in normal operational mode. In concurrent online BIST, testing is conducted simultaneously during normal functional operation. The functional circuitry is usually implemented with coding techniques. Video compression is achieved on two separate fronts by eliminating spatial redundancies and temporal redundancies from video signals. Removing spatial redundancies involves the task of removing video information that is consistently repeated within certain areas of a single frame. For example a frame shot of a blue sky will have a consistent shade of blue across the entire frame. This information can be compressed through the use of various discreet cosine transformations that map a given image in terms of its light or colour intensities. This paves the way for spatial compression by only capturing the distinct intensities, instead of the spread of intensities over the entire frame. Since compression through removing spatial redundancies does not involve the use of motion estimation. Compression through the removal of temporal redundancies involves compressing information that is repeated over a given sequence of frames. Consequently, the motion estimation process is the process of deriving a suitable Motion Vector (MV) that best describes the spatial movement of objects from one frame to the next. Motion compensation is a key process in temporal video compression, which eliminates temporal redundancies found in video. Temporal redundancy occurs when an identical set of pixels exist across multiple video frames. Generally, Motion Estimation Computing Array (MECA) performs up to 50% of computations in the entire video coding system. In VCS, Video data needs to be compressed before storage and transmission, complex algorithms are required to eliminate the redundancy, extracting the redundant information. Motion Estimation (ME) is the process of creating motion vectors to track the motion of objects within video footage. It is an essential part of many compression standards and is a crucial component of the H.264 video compression standard. In particular ME can consist of over 40% of the total computation. Motion Estimation is the technique of finding a suitable Motion Vector (MV) that best describes the movement of a set of pixels from its original position within one frame to its new positions in the subsequent frame. Encoding just the motion vector for the set of pixels requires significantly less bits than what is required to encode the entire set of pixels, while still retaining enough information to reproduce the original video sequence. II. RELATED WORK The Existing EDCA scheme consists of two major blocks, i.e. error detection circuit (EDC) and data recovery circuit (DRC), to detect the errors and recover the corresponding data in a specific CUT is as shown in Fig.1. The test code generator (TCG) in the architecture utilizes the concepts of RQ code to generate the corresponding test codes for error detection and data recovery. The output from the Processing element (PE) is compared with the test code values in the EDC. The output of EDC indicates the occurrence of error. DRC is in charge of recovering data from TCG. Additionally, Copyright @ 2014 IJSETR. All rights reserved.
a selector is enabled to select the error free data or datarecovery results. A ME consists of many PE s incorporated in a 1-D or 2-D array for video encoding applications. Fig.1. shows the proposed EDCA circuit design for a specific Pei of a ME. This architecture consists of blocks that generate the residue and quotient values that are used to detect the errors. ZARA NILOUFER, SK.AHMEDUDDIN ZAKIR pixels as shown in Fig.3. Generally, a PE is made up of two adders (an 8-bit adder and a 12-bit adder) and accumulator. Fig.3. Processing Element Module Schematic Diagram. Fig.1. Existing error detection and correction architecture. III. IMPLEMENTATION OF PROPOSED EDCA DESIGN The proposed EDCA circuit design consists of two Processing elements (PEs) and two TCGs. 8bit current pixel and 8 bit reference pixel are given to each of PE1, PE2 and TCG1, TCG2.PE block generates SAD value which is input to RQCG circuit.the RQCG circuit generates Residue and Quotient (RQ) codes which are taken by Error Detection Circuit(EDC). The EDC circuit compares test codes from Test Code Generator (TCG) circuit and RQCQ circuit as shown in Fig.2. If the output of EDC circuit is generated as 0, the circuit is assumed As error free and MUX generates next PE. If EDC generates output as 1 then MUX selector takes corrected value from Data Recovery Circuit (DRC). Fig.2. Proposed EDCA Circuit Design. A. Processing Element Processing element calculates the sum of absolute differences (SAD) between current pixels and reference B. Fault Model A more practical approach is to select specific test patterns based on circuit structural information and a set of fault models. This approach is called structural testing. Structural testing saves time and improves test efficiency. A stuck-at fault affects the state of logic signals on lines in a logic circuit, including primary inputs (PIs), primary outputs (POs), internal gate inputs and outputs, fan-out stems (sources), and fan-out branches. A stuck-at fault transforms the correct value on the faulty signal line to appear to be stuck at a constant logic value, either logic 0 or logic 1, referred to as stuck-at-0 (SA0) or stuck-at-1 (SA1), respectively. The stuck at fault model can also be applied to sequential circuits; however, high fault coverage test generation for sequential circuits is much more difficult than for combinational circuits. The stuck-at (SA) model, must be adopted to cover actual failures in the interconnect data bus between PEs. The SA fault in a ME architecture can incur errors in computing SAD values. A distorted computational error and the magnitude of e are assumed here to be equal to - SAD SAD where SAD denotes the computed SAD value with SA faults. C. Residue and Quotient Generation Earlier codes like Parity Codes, Berger Codes were used for detecting a single bit error. Next evolved the residue code which is generally a separable arithmetic code that estimates the residue of the given data and appends it to the data. This code is capable of detecting a single bit error. Error detection logic using residue codes are simple and it can be easily implemented. For instance, assume that N denotes an integer, N1and N2 represent data words, and m refers to the modulus. A separate residue code is one in which N is coded as a pair (N, N m ).Notably; N m is the residue of N modulo m. However, only a bit error can be detected based on the residue code. Additionally, error correction is not possible by using the residue codes. Therefore, this work presents a quotient code to assist the residue code in detecting multiple bit errors and recovering errors. The mathematical model of RQ code is simply described as follows. Assume that binary data X is expressed as (1)
FPGA Implementation of Error Detection and Correction Architecture for Motion Estimation in Video Coding Systems The RQ code for X modulo m is expressed as R= X M and area and the Ref_pixel of the current macro block. The SAD Q= X/m respectively. Notably, i denotes the largest value for the macro block with size of NxN can be evaluated: integer not exceeding I. According to the above RQ code expression, the corresponding circuit design of the RQCG can be realized. In order to simplify the complexity of circuit design, the implementation of the module is generally dependent on the addition operation. Additionally, based on (7) the concept of residue code, the following definitions shown Where r xij, q xij and r yij, q yij denote the corresponding RQ code can be applied to generate the RQ code for circuit design. of X ij and Y ij modulo m. Importantly, X ij and Y ij represents the luminance pixel value of Cur_pixel and Ref_pixel, Definition 1: respectively. Based on the definitions shown in (2) and (3) the generation of the RQ code (R T and Q T ) from the TCG (2) block is obtained. The circuit design of TCG is achieved by Definition 2: equations (8) and (9), Let, then (3) To accelerate the circuit design of RQCG, the binary data shown in (1) can generally be divided into two parts: (4) Significantly, the value of k is equal to n/2 and the data formation of Y 0 and Y 1 are a decimal system. If the modulus m= (2 power k) -1, then the residue code of X modulo m is given by, (8) (5) Where (6) Since the value of Y 0 and Y 1 is generally greater than that of modulus m, the equations in (5) and (6) must be simplified further to replace the complex module operation with a simple addition operation by using the parameters Z 0, Z 1, α and β. Based on (5) and (6), the corresponding circuit design of the RQCG is easily realized by using the simple adders (ADDs). Namely, the RQ code can be generated with a low complexity and little hardware cost. D. Test Code Generation (TCG) TCG is the main block of the proposed EDCA design. Test Code Generation (TCG) design is based on the ability of the RQCG circuit to generate corresponding test codes in order to detect errors and recovers data. The specific PEi estimates the absolute difference between the Cur_pixel of the search E. Error Detection Process Error Detection Circuit (EDC) is used to perform the operation of error detection in the specific Pei as shown in Fig.2. This block is used to compare the output from the TCG i.e., (R T and Q T ) with output from RQCG1 i.e., (R PEi and Q PEi ), in order to detect the occurrence of an error. If the value of R PEi R T and Q PEi Q T, then the error in the PEi can be detected. The EDC output is then generated as a 0/1 signal to indicate that the tested PEi is error free/ with error. F. DRC Block The original data is recovered in the Data Recovery Circuit (DRC) during error correction process, by separating (9)
the RQ code from the TCG. The data recovery is possible by implementing the mathematical model as, (10) The proposed EDCA design executes the error detection and data recovery operations simultaneously. Additionally, error-free data from the tested PE i or the data recovery that results from DRC is selected by a multiplexer (MUX) to pass to the next specific PE i +1for subsequent testing. ZARA NILOUFER, SK.AHMEDUDDIN ZAKIR codes for each pixel value in the macroblock. Though this architecture can detect multiple bit errors, it took 16 clock pluses to get the output. To overcome this disadvantage, the proposed architecture is used which has 2 PEs and TCGs for computing the test codes for the pixel values. Verification of the proposed design is performed by using VHDL and then synthesized by using Mentor Graphics with the target device as 2V1500bg575 from the family Virtex II. The performance of the proposed EDCA design is estimated in terms of its speed. In this architecture, the pixel values in the 4x4 macroblock are taken out for the code generation based on the clocking signal. At each triggering edge of the clock the 8-bit pixel value from the ref_pixel and cur_pixel is taken and given to the TCG and PE blocks for code generation. The generated code from test code generation block and the RQCG output is compared in the EDC block, output of which indicates the error is avoided. TABLE I: Comparison Of Speeds Fig.4. Example of pixel values. G. Sample Calculation A sample of the 16 pixels for a 4x4 macro block is taken. Fig.4 presents an example of pixel values of the Cur_pixel and Ref_pixel. Based on (7), the SAD value of the 4x4 macro block is (11) Table 1 shows that the speed and high throughput are achieved by using 2 PEs and 2 TCGS and giving 8 pixels to each PE and TCG.The overall strategy is based on RQ code generation which can detect multiple bit errors within and also correct it. V. CONCLUSION The proposed architecture which increases the speedup as well as the throughput. The memory can keep the data path fully utilized in video processing function implementations which ensures high-speed operation and full utilization of the processing resources. The modulo is assumed here to be m=2 6-1=63. Thus, based on (8) and (9), the RQ code of the SAD value shown in (11) are R PEi = R T = 2124 63 =45 and Q PEi = Q T = 2124/63 =33. Since the value of R PEi (Q PEi ) is equal to R T (Q T ), EDC is enabled and a signal 0 is generated to describe a situation in which the specific is error-free. Conversely, if SA1 and SA0 errors occur in bits 1 and 12 of a specific PEi, i.e., the pixel values of PEi, 2124= (100001001100)2is turned into 77= (000001001101)2, resulting in the transformation of the RQ code of R PEi and Q PEi into 77 =14and 77/63 =1. Thus, an error signal 1 is generated from EDC and sent to the MUX in order to select the recovery results from DRC. IV. RESULTS AND DISCUSSIONS Residue-and-quotient code is used in the existing error detection architecture that has single Processing Element and Test Code Generation (TCG) block for computing the test VI. ACKNOWLEDGMENT I would like to thank my HOD Mr. Mohammed Abdul Mubeen and to my beloved guide S.K.Ahmeduddin Zakir for their continuous support and guidance. VII. REFERENCES [1] Y.W.Huang, B.Y.Hsieh,S.Y. Chien, S.Y. Ma, and L.G. Chen, Analysis and complexity reduction of multiple reference frames motion estimation in H.264/AVC, IEEE Trans. Circuits Syst.Video Technol., Vol.16,no.4,pp.507-522,Apr.2006. [2] S.Surin and Y.H. Hu, Frame-level pipeline motion estimation array processor, IEEE Trans. Circuits Syst. Video Technol., vol. 11, no.2, pp.248-251, Feb.2001. [3] M.Y.Dong, S.H.Yang, and S.K. Lu, Design-fortestability techniques for motion estimation computing arrays, in Proc. Int. Conf. Commun., Circuits Syst., May 2008, pp.1188-1191. [4] J.F. Lin, J.C. Yeh, R.F. Hung, and C.W. Wu, A built-in self repair design for RAMs with 2-D redundancy, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.13, no. 6, pp.742-745, Jun. 2005. [5] C.L. Hsu, C.H. Cheng, and Y. Liu, Built-in self detection/correction architecture for motion estimation
FPGA Implementation of Error Detection and Correction Architecture for Motion Estimation in Video Coding Systems computing arrays, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 18, no.2, pp. 319-324, Feb.2010. [6] S.Bayat-Sarmadi and M.A. Hasan, On concurrent detection of errors in polynomial basis multiplication, IEEE Trans. Very Large Scale Integr. (VLSI) Systs., vol. 15, no.4, pp. 413-426, Apr.2007. [7] D. K. Park, H.M.Cho, S.B. Cho, and J.H. Lee, A fast motion estimation algorithm for SAD optimization in subpixel, Proc. Int. Symp. Integr. Circuits, Sep. 2007, pp. 528-531. [8] C.W. Chiou, C.C. Chang, C.Y. Lee, T.W. Hou, and J.M. Lin, Concurrent error detection and correction in Gaussian normal basis multiplier over GF (2^m), IEEE Trans. Comput., vol.58, no.6, pp.851-857, Jun.2009. [9] Chang- Hsin Cheng, Yu Liu, and Chun-Lung Hsu, Member, IEEE Design of an Error Detection and Data recovery Architecture for Motion Estimation Testing Applications, IEEE transactions on Very Large scale integration (VLSI) systems, vol.20, no.4, April.2012. [10] L.Breveglieri, P. Maistri, and I. Koren, Anote on error detection in an RSA architecture by means of residue codes, in Proc. IEEE Int. Symp. On-Line Testing, Jul. 2006, 176 177.