Analysis of Min Sum Iterative Decoder using Buffer Insertion

Analyss of Mn Sum Iteratve ecoder usng Buffer Inserton Saravanan Swapna M.E II year, ept of ECE SSN College of Engneerng M. Anbuselv Assstant Professor, ept of ECE SSN College of Engneerng S.Salvahanan Prncpal, SSN College of Engneerng ABSRAC hs paper presents the analyss of teratve decoder n terms of clock frequency/speed. Iteratve decodng s a powerful technque for error correcton n communcaton system. Low ensty Party Check Codes (LPC), due to ther near Shannon lmt performance under teratve decodng has sgnfcant attenton n real lfe communcaton applcatons. In the lterature, varous algorthms of teratve decoder have been addressed wth trade off of computatonal complexty and decodng performance. Mn-Sum (MS) algorthm, wth reduced computatonal complexty s taken nto the consderaton. he archtecture of MS decoder s desgned at the transstor level transstor level targeted to 45 nm technology. he desgned archtecture s optmzed usng Wave, specfcally buffer nserton. g optmzaton s done wth the proper placement of buffer, at the varous paths of the archtecture. Wave s a method of hgh performance crcut desgn whch mplements n logc wthout the use of ntermedate latches or regsters. he mum and mum delay path s analyzed n the archtecture. he performance metrcs such as the clock frequency, power and delay are analyzed. he optmzed archtecture operates at a better speed wth margnal ncrease n power. Keywords VLSI, Buffer nserton, Wave, clock frequency, LPC codes, Mn-Sum algorthm. 1. INROUCION Low-ensty Party Check (LPC) codes were frst proposed by Gallager n 1962 [1] and [2].hey attracted great nterest because of ther hgh performance, hgh degree of parallelsm and relatvely low complexty. LPC fnds ts applcatons n wdeband wreless multmeda communcatons and magnetc storage systems. LPC s a class of teratve decoder whch nherts parallelsm n the decodng process whch can lead to a hgh decodng throughput. In hgh-speed applcatons, parallel mplementatons of teratve message-passng algorthms for the decodng of LPC codes are preferred. o reduce the complexty of the algorthm, whch translates to reducng the area and power consumpton as well as ncreasng the throughput, researchers have used MS algorthm. Iteratve decoder performs successve decodng of both rows and column. Among the number of decodng algorthms used, the well-known Belef Propagaton (BP) or Sum Product (SP) algorthm acheves a good decodng performance. For the standard BP algorthm n Log-Lkelhood Rato (LLR) doman, a lot of logarthmc and multplcatve computatons are requred for the check node computaton. he -sum (MS) algorthm, replaces the product term by mum. hereby t can sgnfcantly reduce the hardware complexty of the BP algorthm at the cost of performance degradatons, where complex computatons at the check nodes can be mplemented wth smple comparson and summaton operatons. he advantages of the MS algorthm s the they do not requre channel nformaton such as the nose varance for Addtve Whte Gaussan Nose (AWGN) channel [3] and provde less senstve decodng performance under fnte word-length mplementatons over the BP algorthm [4]. Hgher operatng frequences may be obtaned n dgtal systems by the process of buffer nserton, whch permts clock frequences hgher that dctated by largest propagaton delay between nput and output. Even though, ths technque mproves the throughput of a logc crcut, t has a number of dsadvantages such as ncrease n latency, ncrease n area and clock dstrbuton complexty. Wave s one of the alternatves to. It provdes a method for sgnfcantly reducng clock loads and the assocated area and latency whle retanng the external functonalty and tg of a dgtal crcut. Buffer nserton (also called repeater nserton) s a common and effectve technque to use actve devce areas to trade for reducton of nterconnects delays. he Elmore delay of a long wre grows quadratcally n terms of the wre length, thereby buffer nserton can reduce nterconnect delay sgnfcantly. he conference verson of ths paper n [5].he formaton of the paper s as follows: In secton 2, an elaboraton of LPC codes and decodng algorthm are gven. In secton 3, the sum decodng algorthm s dscussed. In Secton 4 Wave technque s defned. In secton 5, the Buffer Inserton technque s elaborated. In secton 6, the archtectures are analyzed and the results were obtaned. In secton 7, the conclusons are summarzed. 2. LPC COES AN ECOING ALGORIHM 2.1.1 LPC codes LPC codes are a class of lnear block codes defned by a sparse Party Check Matrx (PCM) H that has a low densty of 1 s. hs matrx forms the null space of the code word c, such that any vald code word would satsfy the equaton ch =0. PCM can also be represented n a graphcal manner usng anner graphs representaton. hese graphs belong to a general class of bpartte graphs whch conssts of two classes of nodes, the varable and check nodes. he varable nodes 13

represent code words, corresponds to the columns n PCM, and the check nodes represent party check equatons, whch are the row element n PCM. he anner graph shows the connecton between varable node and check node j f the correspondng bt h j n the PCM s 1, as shown n the example of Fg. 1. 3.1.1 ALGORIHM In the LLR doman, we use the notaton L(q j ) for the message passed from the varable node to check node j, and, and L(r j ) for the message from check node j to varable node. he MS algorthm s descrbed by the followng steps n each teratons: Step 1: he ntal messages at varable nodes are set to: L(q j ) L(c ) y (1) j Step 2: Check node update: L(r j ) ( α ) (β ) '\Vj\ ' (2) Vj\ α sgn(l(q )) (3) β /L(q )/ (4) Fg. 1 Example of party check matrx and ts correspondng anner graph. Gallager ntroduced the dea of teratve, message passng decodng of LPC codes. he dea s to teratvely share the results of the local node decodng by passng them along the edges of the tanner graph. he varable node and the check node n parallel, teratvely pass the messages along ther adjacent edges. he value of the code bts are updated accordngly. Based on the doman of analyss, the decodng algorthm are classfed as Probablty- based sum product algorthm (SPA), Log doman based SPA and LLR doman based SPA[6]. he log-doman SPA algorthm has lower complexty and s more numercally stable than the probablty doman SPA algorthm. MS s the modfed log doman SPA by replacng product as mum of sum. he major advantage of MS s that the knowledge of nose power s not needed for the decodng process. 3. MIN-SUM ALGORIHM MS decodng algorthm [7], s an approxmaton of the teratve Sum-Product (SP) algorthm. Although the performance of MS s generally a few tenths of a db lower than that of SP decodng, t s more robust to quantzaton errors when mplemented wth fxed-pont operatons [8] and [9]. In MS the hardware for the check node functon s smple when compared to the SP algorthm. In MS decodng, smlar to SP algorthm, the extrnsc messages are passed between check and varable nodes n the form of log lkelhood ratos (LLRs). he LLR doman s more advantageous than the probablty doman decodng because message multplcatons are no longer needed. Normalzaton process used n probablty doman requres addtonal computatons. Wth the use of LLR ratos, these addtonal computatons are elated. Where V j\ s the set of varable nodes connected to check node j excludng varable node. Step 3: Varable node update: L(q ) L(c ) L(r ) (5) j j' C\j j' Step 4: ecson at varable nodes: L(Q ) L(c ) L(r ) (6) j C j Where c s the set of check nodes connected to varable node and ĉ s the estmate of the code bt. he algorthm stops f ( ĉ ĉ 1,..., n ). H =0, or f the mum number of teratons s reached. Step 5: If the condtons above are not satsfed then return to step 1 n the algorthm. 4. WAVE PIPELINING Wave s a process that can ncrease the clock frequency of dgtal systems [10]. It s also known as mum rate. Unlke ordnary, wave does not requre nternal clock elements to ncrease throughput. he rate at whch logc can propagate through the crcut depends not on the longest path delay but on the dfference between the longest and shortest path delays. In a ppelned system, a logc network s parttoned nto ppelne stages, each of whch operates upon data computed n the prevous cycle by the prevous ppelne stage. When a logc network s ppelned, synchronzng elements, ether latches or regsters, are nserted to partton the network nto stages. Ppelnng of a crcut nto N stages can result n speedup n throughput up to a factor of N. he nserted synchronzng elements ncrease the area and power consumpton of the logc. hey add addtonal latency and cycle tme overhead. Wave s an alternatve synchronous crcut clockng technque that allows overlapped executon of multple operatons wthout usng synchronzng elements wthn the logc. Rather, knowledge 14

of the sgnal propagaton delay characterstcs of the logc network s used at desgn tme to manage the sgnal delays so as to ensure that operatons do not nterfere wth ther predecessor nor successor computatons. Fg.2 shows the wave ppelned crcut. Where, s the dfference between (crtcal path) and (non-crtcal path). CK ( MAX MIN ) S H 2Δ (1) CK 6. ARCHIECURE OF MS ECOER In ths paper, for each path the tg analyss had been done. and are calculated. he dentfed non-crtcal paths are proportonally nserted wth buffers. hereby the and clock frequency has been evaluated. o mplement the varable nodes wth degree 3, we use the same basc modules of the archtecture desgned n [13] and [14]. In our desgn, we calculate the mum number of bts needed nsde the adder module by assug the mum values for the nputs. Consderng 6-bt quantzaton, we have 4 nputs wth mum absolute value of 7. So the absolute value of the mum total sum would be 32 whch can be represented by a 8-bt sgned number. Messages are thus converted from 6-bt sgn-magntude to 8-bt 2 s complement and passed to the full adder. he man advantage of the 2 s complement converson s that t leads to reducton n the number of bts n the computaton whch ncreases the decodng complexty. Fg.2 Wave crcut In the above equaton S and H are the setup and hold tme whch s the same for the crcuts. Only the dfference n delay of the crtcal and the non-crtcal path can be changed. herefore ths procedure of modfcaton s done here. hs technque provdes a method for sgnfcantly reducng clock loads and the assocated area, power and latency whle retanng the external functonalty and tg of a synchronous crcut [11]. It s of partcular nterest today because t nvolves desgn and analyss across a varety of levels (process, layout, crcut, logc, tg, and archtecture) whch characterze VLSI desgn. Wave can mprove the throughput of a logc crcut whle avodng some of the overheads of tradtonal. he area and power overheads of a tradtonal ppelne are avoded n the wave ppelne snce there are no nternal synchronzers. In order to perform Wave technque the archtecture s desgned and analyzed at transstor level to fnd the crtcal and non-crtcal paths. he technque of buffer nserton n the non-crtcal path s used to realze the Wave ppelned archtecture. 5. BUFFER INSERION here are number of delay reducng methods. Some of them nclude Wre Length Mnmzaton, evce Szng, Buffer Inserton, Wre Sze Optmzaton, Smultaneous evces and Interconnect Optmzaton. Buffer Inserton s method used for the reducton of the delay [12]. he mum and the mum delay paths are analyzed n the desgned archtecture. elay along the mum and mum delay path s vared by buffer nserton. rade off between power consumpton and the delay ncurred n the archtecture. he speed of the desgned crcut s mproved wth the compromse n terms of power consumpton. Fg. 3 he archtecture of varable node of degree 3 for MS q Also, messages are clpped to (2 1 1) when they are converted back from 8-bt 2 s complement doman to 6-bt sgn-magntude doman before beng passed to the check nodes. he archtecture s analyzed n transstor level usng - Spce and the process technology of 45nm s used. he check node archtecture conssts of two components, one for sgn bt and the other for magntude bts. Fg.5 Archtecture of magntude update crcut for check nodes of degree 6 15

Fg.4 he schematc of the magntude update crcut he messages from the varable nodes have 1 bt for the sgn and 5 bts whch represent the magntude. he sgn bts of the ncog messages to a check node are XOR-ed together, and then the sgn of the outgong message on each edge s obtaned as the XOR of the sgn of the ncog varable message on that edge and the XOR of the sgns of all the ncog messages. Wth the mprovement n CNU (Check node update crcut), the buffer nserton technque s also appled to VNU (Varable node update crcut). he effect of buffer nserton s prompt n CNU compared to VNU. he schematc of the magntude update crcut n Fg.4 shows the descrpton of varous mum and mum delay paths. he way the buffers are nserted to reduce the dfference n delay or Smlar analyss s done n the varable node update crcut. o calculate the magntude of the messages n check nodes, mum functons are used. hs archtecture s shown n Fg.5. he sgn update crcut s shown n Fg.6. he analyzed result for wave s before and after buffer nserton s descrbed n table 1 and 2. Results for buffer nserton n the mum delay path are n table 3. he performance metrcs such as the clock frequency, power and delay are analyzed. he optmzed archtecture operates at a better speed wth margnal ncrease n power. able 1 and 2 summarzes the results of the MS and the Wave ppelned MS archtecture of the check node and varable node archtecture of degree 6 and 5- bt quantzaton. able 1 CNU analyss before and after wave Before Wave After Wave 13.643 81.327 201.53 201.53 201.393 200.7167 Clk-Frequency(MHz) 4.965 4.9821 Power(mW) 0.525 0.968 No. of gates 708 968 able 1 shows that, the speed of the crcut s ncreased by 17100 Hz wth a slght ncrease n the power consumpton after Wave. Fg.6 he sgn update crcut of check node of degree 6 16

able 2 VNU analyss before and after wave Before Wave After Wave 61.491 61.857 141.54 141.55 80.0049 79.693 Clk-Frequency(MHz) 12.4923 12.5481 Power(mW) 0.2669 0.4502 No. of gates 128 278 able 2 shows that, the speed of the crcut s ncreased by 55850 Hz wth a slght ncrease n the power consumpton after Wave. able 3 CNU analyss before and after buffer nserton n the crtcal path Before buffer nserton After buffer nserton 61.325 61.325 161.39 81.39 100.065 20.067 Clk-Frequency(MHz) 9.9935 49.23 Power(mW) 0.525 0.5451 No. of gates 708 960 able 3 shows that, the speed of the crcut s ncreased by 49MHz wth a slght ncrease n the power consumpton after Buffer nserton. It can be seen from the above analyss that the buffer nserton n the crtcal path shows a greater mprovement n the speed of the crcut wth reduced power and number of gates compared to the buffer nserton n the non-crtcal path. 7. CONCLUSION he sum decoder archtecture s desgned at the transstor level targeted to 45 nm technology. he power and delay parameters are analyzed wth the effect of the effect of buffer nserton at the crtcal and non-crtcal path of the desgned MS teratve decoder was studed. It s evdent that the proposed archtecture of buffer nserton at the crtcal path has mprovement n clock frequency/ speed of operaton wth margnal ncrease n power. hereby the effcent hardware archtecture s realzed wth the same decodng performance. he other class of Wave technques namely node collapsng and logc restructurng. REFERENCE [1] R. G. Gallager, Low-ensty Party-Check Codes,. Cambrdge MA: MI Press, 1963. [2] Keshab K. Parh, VLSI gtal Sgnal Processng Systems, Chapter-16, pp 591-642. [3] odd K.Moon, Error Correcton Codng Mathematcal method and Algorthm, Chapter-15, pg 634-674. [4] Wllam E.Ryan, An Introducton to LPC codes 2003. [5] Saravanan Swapna, M.Anbuselv and S.Salvahanan, esgn and analyss of teratve decoder usng wave Conference proceedngs,iccce 2012. [6] Papaharalabos et al, Modfed sum-product algorthms for decodng low-densty party-check codes, Communcatons, vol.1, no.3, 2007. [7] J. Zhao, F. Zarkeshvar and A. H. Banhashem, On mplementaton of -sum algorthm and ts modfcatons for decodng LPC codes, IEEE rans. Comm., vol. 53, no. 4, pp. 549-554, Aprl 2005. [8] Sna oloue and Amr H. Banhashem, Fpga Implementaton Of Varants Of Mn-Sum Algorthm, ept. of sys.and compt. Engg,caleton unversty,ottawa,on,canada,2008. [9] aesun Oh and Keshab K. Parh, Mn-Sum ecoder Archtectures Wth Reduced Word Length for LPC Codes,.IEEE ransactons On Crcuts And Systems I: Regular Papers, vol. 57 IE,, no. 1, January 2010. [10] V. Vreen, G. Seetharaman, and B. Venkataraman, Synthess echnques for Implementaton of Wave-Ppelned Crcuts n ASICs, Internatonal Conference on Electronc esgn, 2008. [11] SurveyWayne P. Burleson, Macej Ceselsk, Faban Klass, and Wenta Lu, Wave-Ppelnng: A utoral and Research IEEE ransactons On Very Large Scale Integraton (VLSI) Systems, vol. 6, no. 3, September 1998. [12] Interconnect esgn for eep Submcron ICs Jason Cong, Zhgang Pan, Le He, Cheng-Kok Koh and Ke- Yong khoo Computer Scence epartment Unversty of Calforna, Los Angeles, CA 90095 [13] A.J. Blanksby and C. J. Howland, A 690-mW 1-Gb/s 1024-b, rate-1/2 low-densty party-check code decoder, IEEE J. Sold-State Crcuts, vol. 37, pp. 404-412, March 2002. [14] Ka He, Jn Sha and L LZhongfeng Wang, Low Power ecoder esgn for QC-LPC Codes, IEEE 2010. 17