Analysis of Min Sum Iterative Decoder using Buffer Insertion

Similar documents
Conditional Speculative Decimal Addition*

Newton-Raphson division module via truncated multipliers

CHAPTER 4 PARALLEL PREFIX ADDER

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

RADIX-10 PARALLEL DECIMAL MULTIPLIER

A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation

Parallelism for Nested Loops with Non-uniform and Flow Dependences

A Binarization Algorithm specialized on Document Images and Photos

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

THE low-density parity-check (LDPC) code is getting

X- Chart Using ANOM Approach

Repeater Insertion for Two-Terminal Nets in Three-Dimensional Integrated Circuits

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

Mallathahally, Bangalore, India 1 2

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

An Optimal Algorithm for Prufer Codes *

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Programming in Fortran 90 : 2017/2018

The Codesign Challenge

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Simulation Based Analysis of FAST TCP using OMNET++

Explicit Formulas and Efficient Algorithm for Moment Computation of Coupled RC Trees with Lumped and Distributed Elements

Hybrid Non-Blind Color Image Watermarking

NOVEL CONSTRUCTION OF SHORT LENGTH LDPC CODES FOR SIMPLE DECODING

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Support Vector Machines

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

High-Boost Mesh Filtering for 3-D Shape Enhancement

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Lecture 3: Computer Arithmetic: Multiplication and Division

Parallel matrix-vector multiplication

Delay Variation Optimized Traffic Allocation Based on Network Calculus for Multi-path Routing in Wireless Mesh Networks

Hermite Splines in Lie Groups as Products of Geodesics

FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Private Information Retrieval (PIR)

Edge Detection in Noisy Images Using the Support Vector Machines

Related-Mode Attacks on CTR Encryption Mode

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Wishing you all a Total Quality New Year!

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator

Solving two-person zero-sum game by Matlab

High-Level Power Modeling of CPLDs and FPGAs

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

Cluster Analysis of Electrical Behavior

[33]. As we have seen there are different algorithms for compressing the speech. The

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

An Efficient Garbage Collection for Flash Memory-Based Virtual Memory Systems

Unsupervised Learning

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Loop Pipelining for High-Throughput Stream Computation Using Self-Timed Rings

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Classification Based Mode Decisions for Video over Networks

Wavefront Reconstructor

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

u Delay Delay x p Data, u x p1 Encoder 1 Puncturer Interleaver p2 p2 Encoder 2

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Projection-Based Performance Modeling for Inter/Intra-Die Variations

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

A fast algorithm for color image segmentation

Using Fuzzy Logic to Enhance the Large Size Remote Sensing Images

The Research of Support Vector Machine in Agricultural Data Classification

Construction of ROBDDs. area. that such graphs, under some conditions, can be easily manipulated.

Feature Reduction and Selection

Preconditioning Parallel Sparse Iterative Solvers for Circuit Simulation

Mathematics 256 a course in differential equations for engineering students

A novel Adaptive Sub-Band Filter design with BD-VSS using Particle Swarm Optimization

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Classifying Acoustic Transient Signals Using Artificial Intelligence

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Storage Binding in RTL synthesis

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Speedup of Type-1 Fuzzy Logic Systems on Graphics Processing Units Using CUDA

AADL : about scheduling analysis

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Efficient Distributed File System (EDFS)

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Improving High Level Synthesis Optimization Opportunity Through Polyhedral Transformations

Smoothing Spline ANOVA for variable screening

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

ELEC 377 Operating Systems. Week 6 Class 3

Optimal Scheduling of Capture Times in a Multiple Capture Imaging System

arxiv: v3 [cs.ds] 7 Feb 2017

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

On the Efficiency of Swap-Based Clustering

Angle-Independent 3D Reconstruction. Ji Zhang Mireille Boutin Daniel Aliaga

Multiobjective fuzzy optimization method

Transcription:

Analyss of Mn Sum Iteratve ecoder usng Buffer Inserton Saravanan Swapna M.E II year, ept of ECE SSN College of Engneerng M. Anbuselv Assstant Professor, ept of ECE SSN College of Engneerng S.Salvahanan Prncpal, SSN College of Engneerng ABSRAC hs paper presents the analyss of teratve decoder n terms of clock frequency/speed. Iteratve decodng s a powerful technque for error correcton n communcaton system. Low ensty Party Check Codes (LPC), due to ther near Shannon lmt performance under teratve decodng has sgnfcant attenton n real lfe communcaton applcatons. In the lterature, varous algorthms of teratve decoder have been addressed wth trade off of computatonal complexty and decodng performance. Mn-Sum (MS) algorthm, wth reduced computatonal complexty s taken nto the consderaton. he archtecture of MS decoder s desgned at the transstor level transstor level targeted to 45 nm technology. he desgned archtecture s optmzed usng Wave, specfcally buffer nserton. g optmzaton s done wth the proper placement of buffer, at the varous paths of the archtecture. Wave s a method of hgh performance crcut desgn whch mplements n logc wthout the use of ntermedate latches or regsters. he mum and mum delay path s analyzed n the archtecture. he performance metrcs such as the clock frequency, power and delay are analyzed. he optmzed archtecture operates at a better speed wth margnal ncrease n power. Keywords VLSI, Buffer nserton, Wave, clock frequency, LPC codes, Mn-Sum algorthm. 1. INROUCION Low-ensty Party Check (LPC) codes were frst proposed by Gallager n 1962 [1] and [2].hey attracted great nterest because of ther hgh performance, hgh degree of parallelsm and relatvely low complexty. LPC fnds ts applcatons n wdeband wreless multmeda communcatons and magnetc storage systems. LPC s a class of teratve decoder whch nherts parallelsm n the decodng process whch can lead to a hgh decodng throughput. In hgh-speed applcatons, parallel mplementatons of teratve message-passng algorthms for the decodng of LPC codes are preferred. o reduce the complexty of the algorthm, whch translates to reducng the area and power consumpton as well as ncreasng the throughput, researchers have used MS algorthm. Iteratve decoder performs successve decodng of both rows and column. Among the number of decodng algorthms used, the well-known Belef Propagaton (BP) or Sum Product (SP) algorthm acheves a good decodng performance. For the standard BP algorthm n Log-Lkelhood Rato (LLR) doman, a lot of logarthmc and multplcatve computatons are requred for the check node computaton. he -sum (MS) algorthm, replaces the product term by mum. hereby t can sgnfcantly reduce the hardware complexty of the BP algorthm at the cost of performance degradatons, where complex computatons at the check nodes can be mplemented wth smple comparson and summaton operatons. he advantages of the MS algorthm s the they do not requre channel nformaton such as the nose varance for Addtve Whte Gaussan Nose (AWGN) channel [3] and provde less senstve decodng performance under fnte word-length mplementatons over the BP algorthm [4]. Hgher operatng frequences may be obtaned n dgtal systems by the process of buffer nserton, whch permts clock frequences hgher that dctated by largest propagaton delay between nput and output. Even though, ths technque mproves the throughput of a logc crcut, t has a number of dsadvantages such as ncrease n latency, ncrease n area and clock dstrbuton complexty. Wave s one of the alternatves to. It provdes a method for sgnfcantly reducng clock loads and the assocated area and latency whle retanng the external functonalty and tg of a dgtal crcut. Buffer nserton (also called repeater nserton) s a common and effectve technque to use actve devce areas to trade for reducton of nterconnects delays. he Elmore delay of a long wre grows quadratcally n terms of the wre length, thereby buffer nserton can reduce nterconnect delay sgnfcantly. he conference verson of ths paper n [5].he formaton of the paper s as follows: In secton 2, an elaboraton of LPC codes and decodng algorthm are gven. In secton 3, the sum decodng algorthm s dscussed. In Secton 4 Wave technque s defned. In secton 5, the Buffer Inserton technque s elaborated. In secton 6, the archtectures are analyzed and the results were obtaned. In secton 7, the conclusons are summarzed. 2. LPC COES AN ECOING ALGORIHM 2.1.1 LPC codes LPC codes are a class of lnear block codes defned by a sparse Party Check Matrx (PCM) H that has a low densty of 1 s. hs matrx forms the null space of the code word c, such that any vald code word would satsfy the equaton ch =0. PCM can also be represented n a graphcal manner usng anner graphs representaton. hese graphs belong to a general class of bpartte graphs whch conssts of two classes of nodes, the varable and check nodes. he varable nodes 13

represent code words, corresponds to the columns n PCM, and the check nodes represent party check equatons, whch are the row element n PCM. he anner graph shows the connecton between varable node and check node j f the correspondng bt h j n the PCM s 1, as shown n the example of Fg. 1. 3.1.1 ALGORIHM In the LLR doman, we use the notaton L(q j ) for the message passed from the varable node to check node j, and, and L(r j ) for the message from check node j to varable node. he MS algorthm s descrbed by the followng steps n each teratons: Step 1: he ntal messages at varable nodes are set to: L(q j ) L(c ) y (1) j Step 2: Check node update: L(r j ) ( α ) (β ) '\Vj\ ' (2) Vj\ α sgn(l(q )) (3) β /L(q )/ (4) Fg. 1 Example of party check matrx and ts correspondng anner graph. Gallager ntroduced the dea of teratve, message passng decodng of LPC codes. he dea s to teratvely share the results of the local node decodng by passng them along the edges of the tanner graph. he varable node and the check node n parallel, teratvely pass the messages along ther adjacent edges. he value of the code bts are updated accordngly. Based on the doman of analyss, the decodng algorthm are classfed as Probablty- based sum product algorthm (SPA), Log doman based SPA and LLR doman based SPA[6]. he log-doman SPA algorthm has lower complexty and s more numercally stable than the probablty doman SPA algorthm. MS s the modfed log doman SPA by replacng product as mum of sum. he major advantage of MS s that the knowledge of nose power s not needed for the decodng process. 3. MIN-SUM ALGORIHM MS decodng algorthm [7], s an approxmaton of the teratve Sum-Product (SP) algorthm. Although the performance of MS s generally a few tenths of a db lower than that of SP decodng, t s more robust to quantzaton errors when mplemented wth fxed-pont operatons [8] and [9]. In MS the hardware for the check node functon s smple when compared to the SP algorthm. In MS decodng, smlar to SP algorthm, the extrnsc messages are passed between check and varable nodes n the form of log lkelhood ratos (LLRs). he LLR doman s more advantageous than the probablty doman decodng because message multplcatons are no longer needed. Normalzaton process used n probablty doman requres addtonal computatons. Wth the use of LLR ratos, these addtonal computatons are elated. Where V j\ s the set of varable nodes connected to check node j excludng varable node. Step 3: Varable node update: L(q ) L(c ) L(r ) (5) j j' C\j j' Step 4: ecson at varable nodes: L(Q ) L(c ) L(r ) (6) j C j Where c s the set of check nodes connected to varable node and ĉ s the estmate of the code bt. he algorthm stops f ( ĉ ĉ 1,..., n ). H =0, or f the mum number of teratons s reached. Step 5: If the condtons above are not satsfed then return to step 1 n the algorthm. 4. WAVE PIPELINING Wave s a process that can ncrease the clock frequency of dgtal systems [10]. It s also known as mum rate. Unlke ordnary, wave does not requre nternal clock elements to ncrease throughput. he rate at whch logc can propagate through the crcut depends not on the longest path delay but on the dfference between the longest and shortest path delays. In a ppelned system, a logc network s parttoned nto ppelne stages, each of whch operates upon data computed n the prevous cycle by the prevous ppelne stage. When a logc network s ppelned, synchronzng elements, ether latches or regsters, are nserted to partton the network nto stages. Ppelnng of a crcut nto N stages can result n speedup n throughput up to a factor of N. he nserted synchronzng elements ncrease the area and power consumpton of the logc. hey add addtonal latency and cycle tme overhead. Wave s an alternatve synchronous crcut clockng technque that allows overlapped executon of multple operatons wthout usng synchronzng elements wthn the logc. Rather, knowledge 14

of the sgnal propagaton delay characterstcs of the logc network s used at desgn tme to manage the sgnal delays so as to ensure that operatons do not nterfere wth ther predecessor nor successor computatons. Fg.2 shows the wave ppelned crcut. Where, s the dfference between (crtcal path) and (non-crtcal path). CK ( MAX MIN ) S H 2Δ (1) CK 6. ARCHIECURE OF MS ECOER In ths paper, for each path the tg analyss had been done. and are calculated. he dentfed non-crtcal paths are proportonally nserted wth buffers. hereby the and clock frequency has been evaluated. o mplement the varable nodes wth degree 3, we use the same basc modules of the archtecture desgned n [13] and [14]. In our desgn, we calculate the mum number of bts needed nsde the adder module by assug the mum values for the nputs. Consderng 6-bt quantzaton, we have 4 nputs wth mum absolute value of 7. So the absolute value of the mum total sum would be 32 whch can be represented by a 8-bt sgned number. Messages are thus converted from 6-bt sgn-magntude to 8-bt 2 s complement and passed to the full adder. he man advantage of the 2 s complement converson s that t leads to reducton n the number of bts n the computaton whch ncreases the decodng complexty. Fg.2 Wave crcut In the above equaton S and H are the setup and hold tme whch s the same for the crcuts. Only the dfference n delay of the crtcal and the non-crtcal path can be changed. herefore ths procedure of modfcaton s done here. hs technque provdes a method for sgnfcantly reducng clock loads and the assocated area, power and latency whle retanng the external functonalty and tg of a synchronous crcut [11]. It s of partcular nterest today because t nvolves desgn and analyss across a varety of levels (process, layout, crcut, logc, tg, and archtecture) whch characterze VLSI desgn. Wave can mprove the throughput of a logc crcut whle avodng some of the overheads of tradtonal. he area and power overheads of a tradtonal ppelne are avoded n the wave ppelne snce there are no nternal synchronzers. In order to perform Wave technque the archtecture s desgned and analyzed at transstor level to fnd the crtcal and non-crtcal paths. he technque of buffer nserton n the non-crtcal path s used to realze the Wave ppelned archtecture. 5. BUFFER INSERION here are number of delay reducng methods. Some of them nclude Wre Length Mnmzaton, evce Szng, Buffer Inserton, Wre Sze Optmzaton, Smultaneous evces and Interconnect Optmzaton. Buffer Inserton s method used for the reducton of the delay [12]. he mum and the mum delay paths are analyzed n the desgned archtecture. elay along the mum and mum delay path s vared by buffer nserton. rade off between power consumpton and the delay ncurred n the archtecture. he speed of the desgned crcut s mproved wth the compromse n terms of power consumpton. Fg. 3 he archtecture of varable node of degree 3 for MS q Also, messages are clpped to (2 1 1) when they are converted back from 8-bt 2 s complement doman to 6-bt sgn-magntude doman before beng passed to the check nodes. he archtecture s analyzed n transstor level usng - Spce and the process technology of 45nm s used. he check node archtecture conssts of two components, one for sgn bt and the other for magntude bts. Fg.5 Archtecture of magntude update crcut for check nodes of degree 6 15

Fg.4 he schematc of the magntude update crcut he messages from the varable nodes have 1 bt for the sgn and 5 bts whch represent the magntude. he sgn bts of the ncog messages to a check node are XOR-ed together, and then the sgn of the outgong message on each edge s obtaned as the XOR of the sgn of the ncog varable message on that edge and the XOR of the sgns of all the ncog messages. Wth the mprovement n CNU (Check node update crcut), the buffer nserton technque s also appled to VNU (Varable node update crcut). he effect of buffer nserton s prompt n CNU compared to VNU. he schematc of the magntude update crcut n Fg.4 shows the descrpton of varous mum and mum delay paths. he way the buffers are nserted to reduce the dfference n delay or Smlar analyss s done n the varable node update crcut. o calculate the magntude of the messages n check nodes, mum functons are used. hs archtecture s shown n Fg.5. he sgn update crcut s shown n Fg.6. he analyzed result for wave s before and after buffer nserton s descrbed n table 1 and 2. Results for buffer nserton n the mum delay path are n table 3. he performance metrcs such as the clock frequency, power and delay are analyzed. he optmzed archtecture operates at a better speed wth margnal ncrease n power. able 1 and 2 summarzes the results of the MS and the Wave ppelned MS archtecture of the check node and varable node archtecture of degree 6 and 5- bt quantzaton. able 1 CNU analyss before and after wave Before Wave After Wave 13.643 81.327 201.53 201.53 201.393 200.7167 Clk-Frequency(MHz) 4.965 4.9821 Power(mW) 0.525 0.968 No. of gates 708 968 able 1 shows that, the speed of the crcut s ncreased by 17100 Hz wth a slght ncrease n the power consumpton after Wave. Fg.6 he sgn update crcut of check node of degree 6 16

able 2 VNU analyss before and after wave Before Wave After Wave 61.491 61.857 141.54 141.55 80.0049 79.693 Clk-Frequency(MHz) 12.4923 12.5481 Power(mW) 0.2669 0.4502 No. of gates 128 278 able 2 shows that, the speed of the crcut s ncreased by 55850 Hz wth a slght ncrease n the power consumpton after Wave. able 3 CNU analyss before and after buffer nserton n the crtcal path Before buffer nserton After buffer nserton 61.325 61.325 161.39 81.39 100.065 20.067 Clk-Frequency(MHz) 9.9935 49.23 Power(mW) 0.525 0.5451 No. of gates 708 960 able 3 shows that, the speed of the crcut s ncreased by 49MHz wth a slght ncrease n the power consumpton after Buffer nserton. It can be seen from the above analyss that the buffer nserton n the crtcal path shows a greater mprovement n the speed of the crcut wth reduced power and number of gates compared to the buffer nserton n the non-crtcal path. 7. CONCLUSION he sum decoder archtecture s desgned at the transstor level targeted to 45 nm technology. he power and delay parameters are analyzed wth the effect of the effect of buffer nserton at the crtcal and non-crtcal path of the desgned MS teratve decoder was studed. It s evdent that the proposed archtecture of buffer nserton at the crtcal path has mprovement n clock frequency/ speed of operaton wth margnal ncrease n power. hereby the effcent hardware archtecture s realzed wth the same decodng performance. he other class of Wave technques namely node collapsng and logc restructurng. REFERENCE [1] R. G. Gallager, Low-ensty Party-Check Codes,. Cambrdge MA: MI Press, 1963. [2] Keshab K. Parh, VLSI gtal Sgnal Processng Systems, Chapter-16, pp 591-642. [3] odd K.Moon, Error Correcton Codng Mathematcal method and Algorthm, Chapter-15, pg 634-674. [4] Wllam E.Ryan, An Introducton to LPC codes 2003. [5] Saravanan Swapna, M.Anbuselv and S.Salvahanan, esgn and analyss of teratve decoder usng wave Conference proceedngs,iccce 2012. [6] Papaharalabos et al, Modfed sum-product algorthms for decodng low-densty party-check codes, Communcatons, vol.1, no.3, 2007. [7] J. Zhao, F. Zarkeshvar and A. H. Banhashem, On mplementaton of -sum algorthm and ts modfcatons for decodng LPC codes, IEEE rans. Comm., vol. 53, no. 4, pp. 549-554, Aprl 2005. [8] Sna oloue and Amr H. Banhashem, Fpga Implementaton Of Varants Of Mn-Sum Algorthm, ept. of sys.and compt. Engg,caleton unversty,ottawa,on,canada,2008. [9] aesun Oh and Keshab K. Parh, Mn-Sum ecoder Archtectures Wth Reduced Word Length for LPC Codes,.IEEE ransactons On Crcuts And Systems I: Regular Papers, vol. 57 IE,, no. 1, January 2010. [10] V. Vreen, G. Seetharaman, and B. Venkataraman, Synthess echnques for Implementaton of Wave-Ppelned Crcuts n ASICs, Internatonal Conference on Electronc esgn, 2008. [11] SurveyWayne P. Burleson, Macej Ceselsk, Faban Klass, and Wenta Lu, Wave-Ppelnng: A utoral and Research IEEE ransactons On Very Large Scale Integraton (VLSI) Systems, vol. 6, no. 3, September 1998. [12] Interconnect esgn for eep Submcron ICs Jason Cong, Zhgang Pan, Le He, Cheng-Kok Koh and Ke- Yong khoo Computer Scence epartment Unversty of Calforna, Los Angeles, CA 90095 [13] A.J. Blanksby and C. J. Howland, A 690-mW 1-Gb/s 1024-b, rate-1/2 low-densty party-check code decoder, IEEE J. Sold-State Crcuts, vol. 37, pp. 404-412, March 2002. [14] Ka He, Jn Sha and L LZhongfeng Wang, Low Power ecoder esgn for QC-LPC Codes, IEEE 2010. 17