PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

Similar documents
Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

FPGA Implementation of Binary Quasi Cyclic LDPC Code with Rate 2/5

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Performance study and synthesis of new Error Correcting Codes RS, BCH and LDPC Using the Bit Error Rate (BER) and Field-Programmable Gate Array (FPGA)

ANALYSIS OF AN AREA EFFICIENT VLSI ARCHITECTURE FOR FLOATING POINT MULTIPLIER AND GALOIS FIELD MULTIPLIER*

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA

FPGA Implementation of High Speed AES Algorithm for Improving The System Computing Speed

Efficient Implementation of Low Power 2-D DCT Architecture

FPGA BASED ADAPTIVE RESOURCE EFFICIENT ERROR CONTROL METHODOLOGY FOR NETWORK ON CHIP

Fault Tolerant Parallel Filters Based On Bch Codes

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

Design of Convolution Encoder and Reconfigurable Viterbi Decoder

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

FPGA Matrix Multiplier

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013

RECENTLY, researches on gigabit wireless personal area

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders

J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India.

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

OVer past decades, iteratively decodable codes, such as

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

Implementation and Analysis of an Error Detection and Correction System on FPGA

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

System Verification of Hardware Optimization Based on Edge Detection

Hardware Implementation

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

Design & Implementation of AHB Interface for SOC Application

Implementation of Optimized ALU for Digital System Applications using Partial Reconfiguration

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

An Efficient FPGA Implementation of the Advanced Encryption Standard (AES) Algorithm Using S-Box

Comparison of pipelined IEEE-754 standard floating point multiplier with unpipelined multiplier

Controller IP for a Low Cost FPGA Based USB Device Core

LOW-POWER IMPLEMENTATION OF A HIGH-THROUGHPUT LDPC DECODER FOR IEEE N STANDARD. Naresh R. Shanbhag

Summary. Introduction. Application Note: Virtex, Virtex-E, Spartan-IIE, Spartan-3, Virtex-II, Virtex-II Pro. XAPP152 (v2.1) September 17, 2003

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

INTRODUCTION TO FPGA ARCHITECTURE

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES

High Speed Systolic Montgomery Modular Multipliers for RSA Cryptosystems

DESIGN AND IMPLEMENTATION OF 32-BIT CONTROLLER FOR INTERACTIVE INTERFACING WITH RECONFIGURABLE COMPUTING SYSTEMS

98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011

DUE to the high computational complexity and real-time

AES Core Specification. Author: Homer Hsing

DETECTION AND CORRECTION OF CELL UPSETS USING MODIFIED DECIMAL MATRIX

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

2016 Maxwell Scientific Publication Corp. Submitted: August 21, 2015 Accepted: September 11, 2015 Published: January 05, 2016

Low Complexity Architecture for Max* Operator of Log-MAP Turbo Decoder

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm

Design Space Exploration of the Lightweight Stream Cipher WG-8 for FPGAs and ASICs

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

MCM Based FIR Filter Architecture for High Performance

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n

An Efficient Constant Multiplier Architecture Based On Vertical- Horizontal Binary Common Sub-Expression Elimination Algorithm

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Layered Decoding With A Early Stopping Criterion For LDPC Codes

FPGA IMPLEMENTATION OF A NEW BCH DECODER USED IN DIGITAL VIDEO BROADCASTING - SATELLITE - SECOND GENERATION (DVB-S2)

Performance Analysis of Gray Code based Structured Regular Column-Weight Two LDPC Codes

Reconfigurable PLL for Digital System

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

An Efficient VLSI Execution of Data Transmission Error Detection and Correction Based Bloom Filter

Performance Evaluation of Transcoding and FEC Schemes for 100 Gb/s Backplane and Copper Cable

Design of Flash Controller for Single Level Cell NAND Flash Memory

Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes

A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S2 LDPC DECODER

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

Modified CORDIC Architecture for Fixed Angle Rotation

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

The Serial Commutator FFT

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

International Journal of Advanced Research in Computer Science and Software Engineering

Topics. Midterm Finish Chapter 7

DESIGN AND SIMULATION OF 1 BIT ARITHMETIC LOGIC UNIT DESIGN USING PASS-TRANSISTOR LOGIC FAMILIES

An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture

Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications

Complexity Comparison of Non-Binary LDPC Decoders

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

DESIGN AND IMPLEMENTATION OF SDR SDRAM CONTROLLER IN VHDL. Shruti Hathwalia* 1, Meenakshi Yadav 2

Reduced Latency Majority Logic Decoding for Error Detection and Correction

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

/$ IEEE

Transcription:

American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS 1 Revathy, M. and 2 R. Saravanan 1 Department of Electronics and Communication Engineering, 2 Department of Computer Science and Engineering, PSNA College of Engineering and Technology, Dindigul, India Received 2013-11-13; Revised 2013-12-20; Accepted 2014-01-30 ABSTRACT In this study, we propose a low power, high efficient Low Density Parity-Check Code (LDPC) Decoder Architecture for error detection and correction applications. LDPC codes have been adopted in latest wireless standards such as satellite and mobile communications since they possess superior error-detecting and correcting capabilities. As technology scales, memory devices become larger and more powerful and low power consumption based error correction codes are needed. This study discuses the design and analysis of check node unit and variable node unit in LDPC decoder. The architecture is synthesized on Xilinx 9.2i and simulated using Modelsim, which is targeted to 90 nm device. Synthesis report shows that the proposed architecture reduces the hardware utilization and power consumption when compared to the conventional architecture design. Keywords: LDPC Decoder, Node Architecture, Variable Node, Check Node 1. INTRODUCTION Low Density Parity-Check (LDPC) codes are known to have excellent performance for high speed data transmission and low complexity. However, moderatelength or short length binary LDPC codes have been shown to have an early error floor and degraded decoding performance. These codes have been implemented in various standards such as WiMax (IEEE 802.16) and other high speed applications, where parallel implementations of iterative message-passing algorithms are ideally used in LDPC decoding. Reducing the complexity of the algorithm means to reduce the chip size and power consumption, at the same time increasing the throughput. Hence, Min-Sum (MS) algorithm was used to solve this issue. LDPC codes are suitable for iterative decoding, i.e., an iterative decoder can perform consecutive decoding of both rows and columns. LDPC implements parallelism in the decoding process thereby There are several decoding algorithms conventionally used. Out of these, the Belief Propagation (BP) algorithm attains an excellent decoding performance. For the standard BP algorithm, numerous multiplicative and logarithmic computations are necessary to compute the check node. The Min-Sum (MS) algorithm, interchanges the product term with the minimum value. Even though performance is reduced, the hardware complexity of the BP algorithm is significantly minimized, by replacing complex computations of check nodes with simple summation and comparison operations. The min-sum algorithm provides a less sensitivity in decoding performance under finite word-length implementations and do not require channel information. Due to this advantage, MS algorithm is widely used. The study presents a novel design for LDPC decoder operating at low power and designed over a smaller silicon area. The remaining sections are organized as follows: In section 2, the literature review is presented. In section 3, the design of variable and check nodes achieving high decoding throughput. Corresponding Author: Revathy, M., Revathy, M., Department of Electronics and Communication Engineering, PSNA College of Engineering and Technology, Dindigul, India 558

architectures has been discussed. In section 4, the simulation results are given and analyzed. Section 5 summarizes the method giving the conclusions. 2. RELATED WORKS Boutillon et al. (2013) proposed the serial GF(64)- LDPC decoder based on the Extended Min-Sum algorithm. Their decoder design obtained a performance at less than 0.7dB from the Belief Propagation algorithm for different code rates and lengths. Their design easily adapts in decoding very high Galois Field orders, such as GF(4096) or higher. The experimental results showed that the decoder area is less than 20% of a Virtex 4 FPGA for a decoding throughput of 2.95 Mbps. The method proposed by Reviriego et al. (2013) relies on majority logic decoding serially with simple hardware. The method detects whether a word has errors in the first iteration of majority logic decoding and when it is error-free, the decoding ends before completion of the remaining iterations. Since most words in a memory are without errors, the average decoding time is greatly reduced. The method has been applied to a similar technique called Euclidean Geometry Low Density Parity Check (EG-LDPC) codes which is one step majority logic decodable. The experimental result shows that the method estimates the probability of error detection for different code sizes and numbers of errors accurately and is effective for EG-LDPC codes. Sangmin and Sobelman (2013) finite precision effects in non-binary mixed-domain LDPC decoders were analyzed. It is shown how improved decoding performance can be achieved by using an offset-based method and proper scaling techniques. In addition, a novel Fast Fourier Transform (FFT)-based Belief Propagation (BP) decoder architecture was proposed balancing the computational load between processing units. The results showed that the method requires 47% reduced number of field programmable gate array slices compared to a standard FFT-based BP architecture. Seong-In and Lee (2013) investigated the various methods for constructing Block-Circulant (BC) Reed- Solomon-Based Low-Density Parity-Check (RS-LDPC) codes. The proposed technique results in a BC form of a parity-check matrix from a random parity-check matrix for RS-LDPC codes. A decoder architecture and switch network for BC-RS-LDPC code was developed based on the new BC parity-check matrix. BC-RS-LDPC decoder architecture is designed with high performance to demonstrate the efficiency of the technique. The results show that the designed decoder required 1.3 M gates at 559 an operating frequency of 450 MHz to achieve a data throughput of 41 Gbps with 8 iterations. Swapna and Anbuselvi (2012) designed a LDPC decoder using wave pipelining concept, which reduces the level of latency of the decoding process. This study provided the detailed internal architecture and its design complexity in terms of hardware utilization factors. Yao et al. (2011) presented the design of memory efficient LDPC decoder by reducing the quantization word length of decoding information thereby minimizing the hardware complexity. Two quantization schemes were proposed to reduce the number of memory bits required by decoder design by using short word length with guaranteed Bit-Error-Rate (BER). Their results showed that the two quantization schemes simplifies the hardware complexity with very little loss of decoding performance. 3. PROPOSED DECODER ARCHITECTURE The decoder designed using our method consists of two processing elements-check Node Unit (CNU) and Variable Node Unit (VNU) as shown in Fig. 1. The CNUs and VNUs are connected through the routing network. Input and output edges of the check node unit are labeled by the connectivity given by parity-check matrix. The final outputs are taken from the VNU after the required numbers of iterations have been completed. Fig. 1. LDPC decoder design overview

3.1. Check Node Architecture The architecture of the proposed check node module of LDPC decoder is shown in Fig. 2. It is used in determining the strength of the received signal against noises in the channel. Four directional difference vectors are calculated with twelve SUB, four ADD and four sorter and concatenator modules. Then, the value of smallest difference is decided by using the sorter and concatenator units shown in Fig. 3. In the proposed design, ADD and subtractor units are required for the design of check node. Since, the gate count or hardware complexity of one multiplier or division is much more than one adder or subtractor unit, all multipliers or divisions are replaced by adder or subtractor unit, thereby reducing the hardware cost. After addition and subtractions are determined on received sequences, the sorting and concatenation is performed on these sequences in order to compute the response of the check node unit in LDPC decoder. Each check node block produces 6 bit length of sequence and thus it produces 24 bit length sequence. 3.2. Variable Node Architecture The variable node unit architecture shown in Fig. 4 computes the hard decision vector x. This vector is routed through the routing network to the Check Node Block (CNB). The routing between two nodes has single-bit value and the routing network size is smaller compared to that used in sum product algorithm. The variable node processor unit is comprised of a flipdetection circuit, line buffer, multiplexed adder and concatenator. Initially the received signal y from check node unit, is fed to the Variable Node Block (VNB) through a register-feedback assembly. Fig. 2. Check node architecture design Fig. 3. Internal block of sorter and concatenator 560

Fig. 4. Variable node architecture Fig. 5. Generation of control signal x k The received values are 6-bit Signed-Magnitude (SM) values. Let [sn: mn] be n th 4-bit value provided to the n th VNB, where sn denotes hard decision value and mn denotes the magnitude of the same. The SM makes the correlation calculation simple to be implemented. The correlator circuit consists of an inverter followed by a 1-bit multiplexer. The proposed VNA unit consists basic multiplexer, line buffer, summer and concatenator modules. The block datas received from CNU unit are divided in to four sub blocks. The first three sub module blocks are processed by adder module and last sub module block 561 is processed directly by multiplexer unit. The control signal for multiplexer module acts as a select line in this unit, which is generated and illustrated in Fig. 5. Figure 5 explains the generation of control signal (x k ). The signed bit input is applied to the shift operator. The multiplexers and line buffers help in producing the control signal, which is fed to the VNU. The use of simple computational methodologies along with less row and column weights reduces the operations thereby, resulting in significantly less power consumption. The subtraction unit receives six bits from variable node architecture and line buffer unit and thus produces

single sign bit which satisfies the following constraint Equation (1 and 2): k ( ) x = 1 if y output of line buffer 0 (1) k k1 ( ) x = 0 if y output of line buffer < 0 (2) k1 4. RESULTS The proposed LDPC coded based CDMA architecture system is tested on a Spartan-3E family device XC3S500E using Modelsim 5.8 and Xilinx 9.2i. This proposed scheme utilized 100 LUTs and 56 slices at a maximum frequency of 200 MHz. The proposed work results shows that the system incorporated with LDPC leads to lower power consumption in terms of slices, Look Up Tables and Flip Flops. The various devices in the Spartan-3 family are tested against their power and quiescent voltages and tabulated in Table 1 and 2, respectively. The power utilization details of Spartan family are illustrated graphically in Fig. 6. The parameters considered for investigation include number of Slices (S), number of Look Up Table (LUT s), Slice Latches (SL), gate counts (GA), Power Consumption (PC) and Area Utilization. All these parameters are tabulated in Table 3 and graphically plotted in Fig. 7, respectively. The performance based on decoding latency and LUTs has also been assesses and tabulated in Table 4. The other performance evaluation parameters used for our analysis are listed below. The Area Utilization (AU) is defined as follows Equation (3): Area of active mod ules AU = 100% (3) Area of totalmod ules In this study, the area utilization is 1.2. The value of AU does not validate which design is superior to the others, but it just provides an evaluative method for reference. 5. DISCUSSION The proposed LDPC design has been evaluated in various Spartan 3E devices for comparing the power utilizations. The main objective of proposed decoder is to operate at a reduced power level. The comparison of power utilizations of various Spartan devices have been discussed in Table 1. 562 Fig. 6. Graphical illustration of power utilization Fig. 7. Graphical plot for hardware utilization Table 1. Comparison of power consumptions of Spartan-3 family FPGA family Device specifications Power consumption Spartan-3E Xc3s100E 33.59 mw Spartan-3E Xc3s250E 52.29 mw Spartan-3E Xc3s500E 81.37 mw Table 2. Comparison of quiescent voltages for spartan-3 family FPGA family Xc3s500E Xc3s250E Xc3s100E Quiescent V ccint 1.20V 26 ma 15 ma 8 ma Quiescent V ccaux 2.50V 18 ma 12 ma 8 ma Quiescent V cco25 2.50V 2 ma 2 ma 2 ma Table 3. Performance analysis of hardware utilization Performance parameters Proposed architecture Slices 56 LUTs 100 Gate Counts 2304 IOBs 48 Table 4. Performance comparison Methodology Decoding latency (ns) LUTs Proposed 28.00 100 Sangmin and Sobelman (2013) 56.36 6422 Spagnol et al. (2009) 50.05 12924

A number of performance evaluation and resource utilization parameters are determined for the proposed LDPC decoder. The present research is focused on the design and development of pipelined LDPC decoder for low power applications. The proposed LDPC decoding architecture is synthesized using Xilinx project navigator version 9.2i tool. The proposed architecture provided decoding latency of 28 ns for both check node and variable node implementation. Sangmin and Sobelman (2013) achieved the decoding latency about 56.36 ns while Spagnol et al. (2009) achieved decoding latency about 50.05 ns. The proposed LDPC architecture is performed as superior in this way. Also, the proposed architecture consumed 4-input LUTs about 100. Sangmin and Sobelman (2013) consumed 6422 LUTs and Spagnol et al. (2009) consumed 12924 LUTs with respect to their various designing methodologies. The comparison is illustrated in Table 4. 6. CONCLUSION For the LDPC codes used in low power wireless networks and wireless sensor devices, power consumption and hardware utilization are essential. This study obtained results and analyses the low power LDPC decoding architecture incorporating check node and variable node. This proposed LDPC architecture are designed and tested on various FPGA hardware platform, its hardware utilization and power consumption are discussed with its proposed architecture. This proposed LDPC decoder is well suitable for low power mobile applications which consumes minimum of 81.37 mw of energy. Even though it provides low power consumption, the hardware complexity and area utilization is high. This is the limitation of this research. To overcome such limitation, hybrid node architecture which combines the features of check node and variable node, may be used in future. 7. REFERENCES Boutillon, E., L. Conde-Canencia and A. Al Ghouwayel, 2013. Design of a GF(64)-LDPC decoder based on the EMS algorithm. IEEE Trans. Circ. Syst., 60: 2644-2656. DOI: 10.1109/TCSI.2013.2279186 Reviriego, P.J., A. Maestro and M.F. Flanagan, 2013. Error detection in majority logic decoding of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes. IEEE Trans. Very Large Scale Integrat. Syst., 21: 156-159. DOI: 10.1109/TVLSI.2011.2179681 Sangmin, K. and G.E. Sobelman, 2013. Scaling, offset and balancing techniques in FFT-based BP nonbinary LDPC decoders. IEEE Trans. Circ. Syst., 60: 277-281. DOI: 10.1109/TCSII.2013.2251959 Seong-In, H. and H. Lee, 2013. Block-circulant RS- LDPC code: Code construction and efficient decoder design. IEEE Trans. Very Large Scale Integrat. Syst., 21: 1337-1341. DOI: 10.1109/TVLSI.2012.2210452 Spagnol, C., E. Popovici and W. Marnane, 2009. Hardware implementation of GF(2 m ) LDPC decoders. IEEE Trans. Circ. Syst., 56: 2609-2620. DOI: 10.1109/TCSI.2009.2016621 Swapna, S. and M. Anbuselvi, 2012. Design and analysis of iterative decoder using wave pipelining. Proceedings of the International Conference on Computing and Control Engineering, Apr. 12-13, pp: 1-4. Yao, Y., W. Liang, F. Ye and J. Ren, 2011. Memory efficient LDPC decoder design. Proceedings of the Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia), Oct. 6-7, IEEE Xplore Press, Macau, pp: 127-130. DOI: 10.1109/PrimeAsia.2011.6075087 563