MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

Size: px
Start display at page:

Download "MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA"

Transcription

1 MULTIRATE HIGHTHROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical and Computer Engineering, Rice University {rpredrag, debaynas, marjan, ABSTRACT In order to achieve high decoding throughput (hundreds of MBits/sec and above) for multiple code rates and moderate codeword lengths, several LDPC decoder solutions with different levels of processing parallelism are possible. Selection between these solutions is based on a threefold criterion: hardware complexity, decoding throughput, and errorcorrecting performance. In this work, we determine the multirate LDPC decoder architecture with the best tradeoff in terms of area cost, errorcorrecting performance, and decoding throughput. The prototype architecture of this decoder is implemented on an FPGA. I. INTRODUCTION The recent designs of LDPC decoders are mostly based on blockstructured Parity Check Matrices (PCMs) that allow structured architecture solutions with different levels of processing parallelism (data throughput). The tradeoff between throughput and area for partially parallel LDPC decoders that support blockstructured regular PCMs has been investigated in [7, 9]. However, there still remains a challenge to construct improved PCMs to preserve at the same time the architecture modularity and to approach the performance of irregular random codes. The scalable partially parallel decoder from [10] that supports three code rates is based on optimized structured regular and irregular PCMs, but achieves only moderate decoding throughput. On the other hand, fully parallel decoders such as in [5] can be very fast but the lack of flexibility and large area is a major disadvantage in wireless systems that require support for multiple code rates and codeword sizes. Authors in [6] propose a novel optimization of LDPC codes to achieve area reduction in a fully parallel decoder, however, the codes are not generally suitable for use in flexible decoder architectures. In this paper, we investigate the complex tradeoff between the levels of architecture parallelism, area cost, and performance and target the design of flexible decoders. This analysis also requires special optimizations of blockstructured PCMs. The specific target is to design a single decoder architecture that supports multiple code rates (1/, /3, 3/4 and 5/6) and moderate codeword lengths (between 648 and 59) with high data throughput ( 1GBits/sec) which is compatible with the IEEE 80.11n standard specifications [3]. The wide range of decoder solutions with different levels of processing parallelism are analyzed. The decoder solution with the best tradeoff This work was supported in part by Nokia Corporation and by NSF under grants CNS055169, CNS and EIA between throughput, area and performance is chosen for hardware implementation. A prototype architecture is implemented on a Xilinx FPGA. II. LOW DENSITY PARITYCHECK CODES An LDPC code is a linear block code specified by a very sparse PCM as shown in Fig. 1 with random placement of the nonzero entries. Each coded bit is represented by a PCM column (variable node), while each row of the PCM represents a parity check equation (check node). The loglikelihood ratios (LLRs) are used for representation of reliability in order to simplify the arithmetic operations: R mj message denotes a check node LLR sent from the check node m to the variable node j, L(q mj ) message represents a variable node LLR sent from the variable node j to the check node m, and L(q j ) (j =1,...,n) represent the a posteriori probability ratios ( ) for all coded bits. The are initialized with the apriori(channel) reliability value of the coded bit j ( j =1,...,n). The iterative layered belief propagation (LBP) algorithm is a variation of the standard belief propagation (SBP) algorithm [7]. The convergence rate is doubled because the scheduling of reliability is optimized [8]. As is shown in Fig. 1, the typical blockstructured PCM is composed of concatenated horizontal layers (component codes) and shifted identity submatrices. The belief propagation algorithm is repeated for each component code while the updated are passed between the subcodes. For each variable node j inside the current horizontal layer, L(q mj ) that correspond to all check nodes neighbors m are computed according to: L(q mj )=L(q j ) R mj. (1) For each check node m, R mj, for all variable nodes j that participate in a particular paritycheck equation m, are computed according to: R mj = j N(m)\{j} sign (L(q mj )) Ψ Ψ(L(q mj )), () j N(m)\{j} where N(m) is the set of all variable nodes [ from )] the paritycheck equation m, and Ψ(x) = log tanh. The a ( x posteriori reliability in the current horizontal layer are updated according to: L(q j )=L(q mj )+R mj. (3) /06/$0.00 c 006 IEEE

2 If all paritycheck equations are satisfied or the predetermined maximum number of iterations is reached, the decoding algorithm stops. Otherwise, the algorithm repeats from (1) for the next horizontal layer. III. LDPC DECODERS WITH DIFFERENT PROCESSING PARALLELISM The processing parallelism consists of two parts: the parallelism in decoding of concatenated codes (horizontal layers of the PCM), and the parallelism in reading/writing of reliability from/to memory modules. Several decoder solutions with different levels of processing parallelism are considered targeting hundreds of MBits/sec data throughput: 1. L1RW1: Decoder with full decoding parallelism per one horizontal layer (L1), reading/writing of from one nonzero submatrix in each clock cycle (RW1).. L1RW: Decoder with full decoding parallelism per one horizontal layer, reading/writing of from two consecutive nonzero submatrices in each clock cycle (RW). 3. L3RW1: Decoder with the pipelining of three consecutive horizontal layers (L3), reading/writing of from one nonzero submatrix in each clock cycle. 4. L3RW: Decoder with the pipelining of three consecutive horizontal layers, reading/writing of from two consecutive nonzero submatrices in each clock cycle. 5. L6RW: Decoder with double pipelining (the pipelining of six consecutive horizontal layers, L6), reading/writing of from two consecutive nonzero submatrices in each clock cycle. 6. FULL: Fully parallel decoder with the simultaneous execution of all layers and simultaneous reading/writing of all reliability. The blockstructured irregular codes have been proposed for the IEEE 80.11n standard [3] since they can approach errorcorrecting performance of excellent random codes. Inspired by these codes, we propose a systematic construction of irregular blockstructured PCMs which allows two times higher memory access throughput. At the same time the errorcorrecting performance are preserved. An example of these novel blockstructured PCMs is shown in Fig. 1. The main constraint in the proposed PCM design is that any two consecutive nonzero submatrices from the same horizontal layer belong to odd and even PCMs blockcolumns (see Fig.1). This property is essential for achieving more parallel memory access since the from two submatrices can be simultaneously read/written from/to two independent memory modules. The PCMs with 4 blockcolumns provide better performance than the PCMs with 18, 48, 7 or 96 blockcolumns. This is due to the nearoptimal profile and small number of short cycles while supporting sufficiently high parallelism level crucial for highthroughput decoder implementations. Therefore, all analyzed decoder architectures including the fully parallel one support our novel PCMs with 4 blockcolumns. Figure (part labelled as RW1) shows the organization of memory used in solutions 1 and 3. The single dualport Rows Columns Figure 1: An example of the novel blockstructured irregular paritycheck matrix with 4 blockcolumns and 8 horizontal layers. The codeword size is 196, rate=/3, and the size of each submatrix is Blockcolumn 1 Blockcolumn Blockcolumn 3 Blockcolumn 3 Blockcolumn 4 RW1 Blockcolumn 1 Blockcolumn 3 Blockcolumn 3 RW Blockcolumn Blockcolumn 4 Blockcolumn 4 Figure : Organization of memory into single dualport RAM module, and two independent dualport RAM modules with from odd and even blockcolumns of the parity RAM block is utilized where each memory location contains the from one blockcolumn. Therefore, one full submatrix of the PCM can be accessed in each clock cycle. The width of the valid memory content depends on the codeword length (size of the submatrix), and it is up to 108 for the largest supported codeword length of 59. Meanwhile, each check node memory location also contains check node from one full nonzero submatrix. The architectureoriented constraint of equally distributed odd and even nonzero blockcolumns in every horizontal layer allows for the read/write of two submatrices per clock cycle (used in solutions, 4 and 5). As is shown in Fig. (part labelled as RW), the memory is partitioned into two independent modules. Each location in one memory module contains that correspond to one (out of twelve) odd blockcolumns. Another memory module contains from the even block columns. Meanwhile, each check node memory location contains check node that correspond to two consecutive nonzero submatrices from the same horizontal layer. The entire content of the check node memory location is loaded/stored in a single clock cycle. It is important to observe that the dualport RAMs can be replaced with single port RAMs if there is no pipelining of the horizontal layers. However, the memory organization does not depend on the number of memory ports.

3 (l) th layer (l1) th layer Time l th layer Throughput/area [MBits/sec/mm ] Average Throughput [MBits/sec] Arithmetic area [K gates] Memory size [KBits] Total area [mm /100] L3RW1 L3RW L6RW FULL x10 x10 Figure 3: Belief propagation based on pipelining of three decoding stages for three consecutive horizontal layers of the parity L1RW1 L1RW x10 (l5) th layer (l4) th layer (l3) th layer (l) th layer (l1) th layer l th layer Levels of Processing Parallelism Figure 5: The average decoding throughput, hardware complexity and tradeoff ratio for the LDPC decoder solutions with different levels of processing parallelism. Time Figure 4: Belief propagation based on pipelining of three decoding stages for six consecutive horizontal layers of the parity By construction, all rows inside a single horizontal layer are independent and can be processed in parallel without any performance loss. Every horizontal layer is decoded through three stages: memory reading stage, processing stage, and memory writing stage corresponding to Eqs. (1), () and (3), respectively. The decoding parallelism is increased if three consecutive horizontal layers are pipelined (label L3), which is visualized in Fig. 3. The threestage pipelining introduces a small performance loss due to the overlapping between the from the same blockcolumns but from the different horizontal layers. The decoding parallelism is doubled if the pipelining of six consecutive layers is employed (label L6), as is shown in Fig. 4. In order to avoid the reading/writing collisions (simultaneous reading and writing of from the identical blockcolumns but from two different horizontal layers), it is necessary to postpone the start of every second layer by one clock cycle (see Fig. 4). IV. THROUGHPUTAREAPERFORMANCE TRADEOFF ANALYSIS The proposed decoder solutions with different processing parallelism levels presented in the previous section are compared. It is assumed that each solution (except the L6RW architecture) supports multiple code rates (from 1/ to 5/6) and codeword lengths (648, 196, 1944, and 59) with novel blockstructured PCMs. The L6RW solution only supports code rates up to 3/4 since the PCM with 4blockcolumns has only four horizontal layers for the code rate of 5/6 which is insufficient to fill the pipeline (so L6RW is at a disadvantage). Figure 5 shows the decoding throughput and area complexity for all analyzed decoders: decoding parallelism and memory access parallelism increase from left to right (from L1RW1 to FULL as in section III.). The tradeoff is defined as a ratio between decoding throughput and decoder core area and therefore it is represented in MBits/sec/mm. The decoding throughput is based on the average number of decoding iterations required to achieve a frameerror rate (FER) of 10 4 for the largest supported code rate and codeword length. We assume 10 6 simulated codeword transmissions, the maximum number of iterations is set to 15 and the clock frequency is 00 MHz. The hardware complexity is measured by the area in mm assuming a 0.18 µm 6metal CMOS process. The total area is computed as a summation of the arithmetic area and the memory area. The arithmetic part of each decoder is represented as a number of standard CMOS logic gates (also shown in Fig. 5), where it is assumed that the typical TSMC design density for 0.18 µm technology is 100K logic gates per mm [1]. The memory size is given as the number of bits required for the storage of and check (SRAMs) and supported PCMs (ROMs). The corresponding memory area is computed by assuming the oneported SRAM cell size of 4.65 µm which is a typical size for 0.18 µm technology [1]. The dualport SRAM blocks are required for implementation of the L3RW1, L3RW and L6RW decoders because the memory reading and writing stages are pipelined; the memory cell area is typically doubled for the same design rule []. The fully parallel solution does not have any inherent flexibility the arithmetic logic for all 16 ratesize combinations is necessary which significantly increases the arithmetic area. However, the same set of latches can be utilized for the storage of reliability for all ratesize combinations. If pipelining of only three layers is employed then the decoding throughput is increased by approximately three times while the arithmetic area is increased only marginally. On the other hand, the memory area is doubled since then both mirror and mirror check node memories are required. Overall, three

4 Frame Error Rate LBP 6 layers pipelined (Max.Iter =15) SBP, Max.Iter=15 LBP 3 layers pipelined (Max.Iter=15) SBP, Max.Iter=5 LBP 6 layers pipelined (Max.Iter =5) LBP, no pipelining (Max.Iter =15) Eb/No [db] Figure 6: FER for PCM with 4 blockcolumns (code rate of /3, code size of 196: nonpipelined LBP, pipelined LBP, doublepipelined LBP, SBP. stage pipelining significantly improves the throughput/area ratio (see Fig. 5, L1RW1 vs. L3RW1 architecture, and L1RW vs. L3RW architecture). If the memory access parallelism is doubled then the decoding throughput is directly increased by more than 50%, while the arithmetic area is doubled and memory size remains the same (see Fig. ). Overall, if only the memory access parallelism is increased, then the throughput/area ratio is only slightly improved (see Fig. 5, L1RW1 vs. L1RW architecture, and L3RW1 vs. L3RW architecture). The further increase of decoding parallelism (L6RW and FULL solutions) does not improve the tradeoff ratio since the throughput improvements are smaller than the necessary increase in area. A similar effect is expected if the memory access parallelism is further increased; if four submatrices per clock cycle are accessed (not shown here since it requires special redesign of the PCMs) then the arithmetic area would increase two times. However, the decoding throughput would improve by only about 5% since the processing latency will start to dominate compared to the memory access latency. It can be observed from Fig. 5 that the best throughput per area ratio is achieved for the threestage pipelining approach with memory access that allows reading/writing of two submatrices per clock cycle (the L3RW solution). At the same time, the performance loss due to the pipelining is acceptable (about 0.1dB for the FER around 10 4, see Fig. 6, 10 6 codeword transmissions are simulated). In order to avoid the performance loss, the double pipelining and the SBP approaches would require additional decoding iterations which further decrease the decoding throughput. V. THREESTAGE PIPELINED LDPC DECODER WITH DOUBLEPARALLEL MEMORY ACCESS An LDPC decoder with threestage pipelining and a memory organization that supports access of two submatrices during a single clock cycle (L3RW) is chosen for hardware implemen Address Generator CHECK MEMORY CHECK MEMORY Address Generator To 1 Mirror 1 Bank of 108 Decoding FUs CONTROLLER To Mirror MODULE 1 MODULE 1 ROM 1 Modules ROM Modules ROM 1 Mirror ROM Mirror Figure 7: The block diagram of decoder architecture for block structured irregular paritycheck matrices with 4 blockcolumns. tation because it achieves the best tradeoff between decoding throughput and area as shown in section IV.. The highlevel block diagram of the proposed decoder architecture is shown in Fig. 7. The architecture supports codeword sizes of: 648, 196, 1944, and 59, and code rates of: 1/, /3, 3/4, and 5/6. Four identical permuters are required for the blockshifting of after being loaded from the memory modules (original and mirror). The scalable permuter (permutation of the blocks of sizes: 7, 54, 81, and 108) is composed of 7bit input 3:1 multiplexers organized in multiple pipelined stages. The modified minsum approximation with correcting offset [4] is employed as a central part of the decoding function unit (DFU). A single DFU is responsible for processing one row of the PCM through three pipeline stages according to Eqs. (1), (), and (3) as is shown in Fig 8. The support for different code rates and codeword lengths implies the usage of multiple PCMs. Information about each supported PCM is stored in the corresponding ROM module where a single memory location contains the blockcolumn position of one nonzero submatrix and the associated shift value. The blockcolumn position represents the reading/writing address of the appropriate memory module. In order to avoid the permutation of during the writing stage, the relative shift values (difference between two consecutive shift values of the same blockcolumn) are stored instead of the original shift values. The support of multiple code rates is insured by the control unit. The number of nonzero submatrices per horizontal layer varies with the code rate which affects the latencies of reading/writing stages, as well as the processing latency within the DFUs. The control logic provides the synchronization between the pipelined stages. It also handles the exemptions which occur if the number of nonzero submatrices in the horizontal layer is odd. In that case, only one blockcolumn per clock cycle is read/written from/to odd or even memory module. The full content of the corresponding check node memory location (width of two submatrices) is loaded even though the second half of it is not valid. The control unit then disables the appropriate part of the arithmetic logic inside the DFUs. Figure 9 shows the FER performance of the implemented

5 1 Check Memory Module s SM s SM Min Sum Unit XOR Logic Min* Unit Buffer Min* Min* To SM>'s Logic Min Min Offset Offset Buffer Modif. Min Min CMP /SEL CMP /SEL Check Memory () SM s Buffer Min Min* Min Buffer Sum Sum Unit SM s To To Module Module Reading Processing Writing Figure 8: The block diagram of the decoding function unit (DFU) with three pipeline stages and blockserial processing. Interface to and check node memories is also included. Frame Error Rate LBP pipelined, 6 bit fixed point LBP pipelined, 7 bit fixed point LBP pipelined, 8 bit fixed point LBP pipelined, floating point Table 1: Xilinx Virtex4XC4VFX60 FPGA utilization. Resource Used Utilization rate Slices 19,595 77% LUTs 36,45 7% Block RAMs % word lengths. We believe that the pipelining of multiple horizontal layers combined with a sufficiently parallel memory organization is a general tradeoff solution that can be applied to other blockstructured LDPC codes Eb/No [db] Figure 9: The FER for the implemented decoder (code rate of /3, code length of 196, maximum number of iterations is 15); pipelined LBP (floating vs. fixedpoint). flexible decoder for the case of a /3rate code and a codeword length of 196. The arithmetic precision of seven bits is chosen for representation of reliability (two s complement with one bit for the fractional part) as well as for all arithmetic operations. A prototype architecture has been implemented using Xilinx System Generator and targeted to a Xilinx Virtex4XC4VFX60 FPGA. Table 1 shows the utilization statistics. Based on the XST synthesis tool report and full place and route statistics, a maximum clock frequency of 130 MHz can be achieved which determines the maximum average throughput of approximately 1.1 GBits/sec. VI. CONCLUSION In this paper, the multirate decoder architecture that represents the best tradeoff in terms of throughput, area, and performance is found and implemented on an FPGA. An identical tradeoff analysis can be applied for a wide range of code rates and code REFERENCES [1] cell.html. [] [3] IEEE Wireless LANsWWiSE Proposal: High Throughput extension to the Standard. IEEE n. [4] J. Chen, A. Dholakai, E. Eleftheriou, M.P.C. Fossorier, and X.Y. Hu. Reducedcomplexity decoding of LDPC codes. IEEE Transactions on Communications, 53(8): , Aug [5] A. Darabiha, A.C. Carusone, and F.R. Kschischang. MultiGbit/sec low density parity check decoders with reduced interconnect complexity. In IEEE International Symposium on Circuits and Systems (ISCAS), pages , May 005. [6] E. Kim and G.S. Choi. Diagonal lowdensity paritycheck code for simplified routing in decoder. In IEEE Workshop on Signal Processing Systems Design and Implementation, pages , Nov [7] M.M. Mansour and N.R. Shanbhag. Highthroughput LDPC decoders. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 11(6): , Dec [8] P. Radosavljevic, A. de Baynast, and J.R. Cavallaro. Optimized message passing schedules for LDPC decoding. In 39th Asilomar Conference on Signals, Systems and Computers, 005, pages , Nov [9] S.H. Kang and I.C. Park. Loosely coupled memorybased decoding architecture for low density parity check codes. IEEE Transactions on Circuits and Systems, 53(5): , May 006. [10] L. Yang, H. Lui, and C.J.R. Shi. Code construction and FPGA implementation of a lowerrorfloor multirate lowdensity paritycheck code decoder. IEEE Transactions on Circuits and Systems, 53(4):89 904, April 006.

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical

More information

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, Joseph R. Cavallaro ECE Department, Rice University

More information

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 1, NO. 1, NOVEMBER 2006 1 Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders Predrag Radosavljevic, Student

More information

Distributed Decoding in Cooperative Communications

Distributed Decoding in Cooperative Communications Distributed Decoding in Cooperative Communications Marjan Karkooti and Joseph R. Cavallaro Rice University, Department of Electrical and Computer Engineering, Houston, TX, 77005 {marjan,cavallar} @rice.edu

More information

A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders

A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders Houshmand ShiraniMehr 1, Tinoosh Mohsenin 2 and Bevan Baas 1 1 ECE Department, University of California, Davis, 2 CSEE Department,

More information

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Saleh Usman, Mohammad M. Mansour, Ali Chehab Department of Electrical and Computer Engineering American University of Beirut Beirut

More information

Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes

Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes Ning Chen, Yongmei Dai, and Zhiyuan Yan Department of Electrical and Computer Engineering, Lehigh University, PA

More information

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE 802.11n Sherif Abou Zied 1, Ahmed Tarek Sayed 1, and Rafik Guindi 2 1 Varkon Semiconductors, Cairo, Egypt 2 Nile University, Giza, Egypt Abstract

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013

International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 Design of Low Density Parity Check Decoder for WiMAX and FPGA Realization M.K Bharadwaj #1, Ch.Phani Teja *2, K.S ROY #3 #1 Electronics and Communications Engineering,K.L University #2 Electronics and

More information

LOW-DENSITY parity-check (LDPC) codes, which are defined

LOW-DENSITY parity-check (LDPC) codes, which are defined 734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 9, SEPTEMBER 2009 Design of a Multimode QC-LDPC Decoder Based on Shift-Routing Network Chih-Hao Liu, Chien-Ching Lin, Shau-Wei

More information

Design of a Low Density Parity Check Iterative Decoder

Design of a Low Density Parity Check Iterative Decoder 1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,

More information

Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders

Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Oana Boncalo (1), Alexandru Amaricai (1), Valentin Savin (2) (1) University Politehnica Timisoara, Romania (2) CEA-LETI,

More information

Memory Efficient Decoder Architectures for Quasi-Cyclic LDPC Codes

Memory Efficient Decoder Architectures for Quasi-Cyclic LDPC Codes Memory Efficient Decoder Architectures for Quasi-Cyclic LDPC Codes Yongmei Dai, Ning Chen and Zhiyuan Yan Department of Electrical and Computer Engineering Lehigh University, PA 805, USA E-mails: {yod30,

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel I J C T A, 9(40), 2016, pp. 397-404 International Science Press ISSN: 0974-5572 BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel Neha Mahankal*, Sandeep Kakde* and Atish Khobragade**

More information

Layered Decoding With A Early Stopping Criterion For LDPC Codes

Layered Decoding With A Early Stopping Criterion For LDPC Codes 2012 2 nd International Conference on Information Communication and Management (ICICM 2012) IPCSIT vol. 55 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V55.14 ayered Decoding With A Early

More information

Partially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule

Partially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule IEICE TRANS. FUNDAMENTALS, VOL.E89 A, NO.4 APRIL 2006 969 PAPER Special Section on Selected Papers from the 18th Workshop on Circuits and Systems in Karuizawa Partially-Parallel LDPC Decoder Achieving

More information

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger To cite this version: Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger lambda-min

More information

STARTING in the 1990s, much work was done to enhance

STARTING in the 1990s, much work was done to enhance 1048 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 57, NO. 5, MAY 2010 A Low-Complexity Message-Passing Algorithm for Reduced Routing Congestion in LDPC Decoders Tinoosh Mohsenin, Dean

More information

RECENTLY, low-density parity-check (LDPC) codes have

RECENTLY, low-density parity-check (LDPC) codes have 892 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 53, NO. 4, APRIL 2006 Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder

More information

HIGH THROUGHPUT LOW POWER DECODER ARCHITECTURES FOR LOW DENSITY PARITY CHECK CODES

HIGH THROUGHPUT LOW POWER DECODER ARCHITECTURES FOR LOW DENSITY PARITY CHECK CODES HIGH THROUGHPUT LOW POWER DECODER ARCHITECTURES FOR LOW DENSITY PARITY CHECK CODES A Dissertation by ANAND MANIVANNAN SELVARATHINAM Submitted to the Office of Graduate Studies of Texas A&M University in

More information

AN FPGA BASED OVERLAPPED QUASI CYCLIC LDPC DECODER FOR WI-MAX

AN FPGA BASED OVERLAPPED QUASI CYCLIC LDPC DECODER FOR WI-MAX 2 th May 24. Vol. 63 No.2 25-24 JATIT & LLS. All rights reserved. ISSN: 992-8645 www.jatit.org E-ISSN: 87-395 AN FPGA BASED OVERLAPPED QUASI CYCLIC LDPC DECODER FOR WI-MAX G.AMIRTHA GOWRI, 2 S.SUBHA RANI

More information

LOW-POWER IMPLEMENTATION OF A HIGH-THROUGHPUT LDPC DECODER FOR IEEE N STANDARD. Naresh R. Shanbhag

LOW-POWER IMPLEMENTATION OF A HIGH-THROUGHPUT LDPC DECODER FOR IEEE N STANDARD. Naresh R. Shanbhag LOW-POWER IMPLEMENTATION OF A HIGH-THROUGHPUT LDPC DECODER FOR IEEE 802.11N STANDARD Junho Cho Department of Electrical Engineering, Seoul National University, Seoul, 151-744, Korea Naresh R. Shanbhag

More information

Design of a Quasi-Cyclic LDPC Decoder Using Generic Data Packing Scheme

Design of a Quasi-Cyclic LDPC Decoder Using Generic Data Packing Scheme Design of a Quasi-Cyclic LDPC Decoder Using Generic Data Packing Scheme Jinlei Chen, Yan Zhang and Xu Wang Key Laboratory of Network Oriented Intelligent Computation Shenzhen Graduate School, Harbin Institute

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

A Flexible FPGA-Based Quasi-Cyclic LDPC Decoder

A Flexible FPGA-Based Quasi-Cyclic LDPC Decoder IEEE ACCESS 1 A Flexible FPGA-Based Quasi-Cyclic LDPC Decoder Peter Hailes, Lei Xu, Robert G. Maunder, Bashir M. Al-Hashimi and Lajos Hanzo School of ECS, University of Southampton, SO17 1BJ, UK Corresponding

More information

A Reduced Routing Network Architecture for Partial Parallel LDPC decoders

A Reduced Routing Network Architecture for Partial Parallel LDPC decoders A Reduced Routing Network Architecture for Partial Parallel LDPC decoders By HOUSHMAND SHIRANI MEHR B.S. (Sharif University of Technology) July, 2009 THESIS Submitted in partial satisfaction of the requirements

More information

Design and Implementation of 3-D DWT for Video Processing Applications

Design and Implementation of 3-D DWT for Video Processing Applications Design and Implementation of 3-D DWT for Video Processing Applications P. Mohaniah 1, P. Sathyanarayana 2, A. S. Ram Kumar Reddy 3 & A. Vijayalakshmi 4 1 E.C.E, N.B.K.R.IST, Vidyanagar, 2 E.C.E, S.V University

More information

Hybrid Iteration Control on LDPC Decoders

Hybrid Iteration Control on LDPC Decoders Hybrid Iteration Control on LDPC Decoders Erick Amador and Raymond Knopp EURECOM 694 Sophia Antipolis, France name.surname@eurecom.fr Vincent Rezard Infineon Technologies France 656 Sophia Antipolis, France

More information

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr

More information

ASIP LDPC DESIGN FOR AD AND AC

ASIP LDPC DESIGN FOR AD AND AC ASIP LDPC DESIGN FOR 802.11AD AND 802.11AC MENG LI CSI DEPARTMENT 3/NOV/2014 GDR-ISIS @ TELECOM BRETAGNE BREST FRANCE OUTLINES 1. Introduction of IMEC and CSI department 2. ASIP design flow 3. Template

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011

98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011 98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011 Memory System Optimization for FPGA- Based Implementation of Quasi-Cyclic LDPC Codes Decoders Xiaoheng Chen,

More information

Area and Energy Efficient VLSI Architectures for Low-Density Parity-Check Decoders using an On-the-fly Computation

Area and Energy Efficient VLSI Architectures for Low-Density Parity-Check Decoders using an On-the-fly Computation Area and Energy Efficient VLSI Architectures for Low-Density Parity-Check Decoders using an On-the-fly Computation Kiran Gunnam Texas A&M University October 11 2006 Outline Introduction of LDPC Problem

More information

Informed Dynamic Scheduling for Belief-Propagation Decoding of LDPC Codes

Informed Dynamic Scheduling for Belief-Propagation Decoding of LDPC Codes Informed Dynamic Scheduling for Belief-Propagation Decoding of LDPC Codes Andres I. Vila Casado, Miguel Griot and Richard D. Wesel Department of Electrical Engineering, University of California, Los Angeles,

More information

A Massively Parallel Implementation of QC-LDPC Decoder on GPU

A Massively Parallel Implementation of QC-LDPC Decoder on GPU A Massively Parallel Implementation of QC-LDPC Decoder on GPU Guohui Wang, Michael Wu, Yang Sun, and Joseph R Cavallaro Department of Electrical and Computer Engineering, Rice University, Houston, Texas

More information

Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining

Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining Ming Su, Lili Zhou, Student Member, IEEE, and C.-J. Richard

More information

Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications

Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Technology

More information

Low Complexity Opportunistic Decoder for Network Coding

Low Complexity Opportunistic Decoder for Network Coding Low Complexity Opportunistic Decoder for Network Coding Bei Yin, Michael Wu, Guohui Wang, and Joseph R. Cavallaro ECE Department, Rice University, 6100 Main St., Houston, TX 77005 Email: {by2, mbw2, wgh,

More information

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,

More information

THE DESIGN OF STRUCTURED REGULAR LDPC CODES WITH LARGE GIRTH. Haotian Zhang and José M. F. Moura

THE DESIGN OF STRUCTURED REGULAR LDPC CODES WITH LARGE GIRTH. Haotian Zhang and José M. F. Moura THE DESIGN OF STRUCTURED REGULAR LDPC CODES WITH LARGE GIRTH Haotian Zhang and José M. F. Moura Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 523 {haotian,

More information

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications Fabien Demangel, Nicolas Fau, Nicolas Drabik, François Charot, Christophe Wolinski To cite this version: Fabien

More information

Hardware Implementation

Hardware Implementation Low Density Parity Check decoder Hardware Implementation Ruchi Rani (2008EEE2225) Under guidance of Prof. Jayadeva Dr.Shankar Prakriya 1 Indian Institute of Technology LDPC code Linear block code which

More information

A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S2 LDPC DECODER

A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S2 LDPC DECODER 18th European Signal Processing Conference (EUSIPCO-010) Aalborg, Denmark, August -7, 010 A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S LDPC DECODER Wen Ji, Makoto Hamaminato,

More information

Low Error Rate LDPC Decoders

Low Error Rate LDPC Decoders Low Error Rate LDPC Decoders Zhengya Zhang, Lara Dolecek, Pamela Lee, Venkat Anantharam, Martin J. Wainwright, Brian Richards and Borivoje Nikolić Department of Electrical Engineering and Computer Science,

More information

Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR)

Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR) Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR) P. Vincy Priscilla 1, R. Padmavathi 2, S. Tamilselvan 3, Dr.S. Kanthamani 4 1,4 Department

More information

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm A New MIMO etector Architecture Based on A Forward-Backward Trellis Algorithm Yang Sun and Joseph R Cavallaro epartment of Electrical and Computer Engineering Rice University, Houston, TX 775 Email: {ysun,

More information

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 Sunghwan Kim* O, Min-Ho Jang*, Jong-Seon No*, Song-Nam Hong, and Dong-Joon Shin *School of Electrical Engineering and

More information

A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders

A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders 2246 IEICE TRANS. FUNDAMENTALS, VOL.E94 A, NO.11 NOVEMBER 2011 PAPER Special Section on Smart Multimedia & Communication Systems A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN 255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,

More information

A new two-stage decoding scheme with unreliable path search to lower the error-floor for low-density parity-check codes

A new two-stage decoding scheme with unreliable path search to lower the error-floor for low-density parity-check codes IET Communications Research Article A new two-stage decoding scheme with unreliable path search to lower the error-floor for low-density parity-check codes Pilwoong Yang 1, Bohwan Jun 1, Jong-Seon No 1,

More information

Strategies for High-Throughput FPGA-based QC-LDPC Decoder Architecture

Strategies for High-Throughput FPGA-based QC-LDPC Decoder Architecture Strategies for High-Throughput FPGA-based QC-LDPC Decoder Architecture Swapnil Mhaske, Hojin Kee, Tai Ly, Ahsan Aziz, Predrag Spasojevic Wireless Information Network Laboratory, Rutgers University, New

More information

Are Polar Codes Practical?

Are Polar Codes Practical? Are Polar Codes Practical? Prof. Warren J. Gross McGill University, Montreal, QC. Coding: from Practice to Theory Berkeley, CA. Feb. 10 th, 2015. Achieving the Channel Capacity Successive cancellation

More information

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes.

! Program logic functions, interconnect using SRAM. ! Advantages: ! Re-programmable; ! dynamically reconfigurable; ! uses standard processes. Topics! SRAM-based FPGA fabrics:! Xilinx.! Altera. SRAM-based FPGAs! Program logic functions, using SRAM.! Advantages:! Re-programmable;! dynamically reconfigurable;! uses standard processes.! isadvantages:!

More information

Reduced Complexity of Decoding Algorithm for Irregular LDPC Codes Using a Split Row Method

Reduced Complexity of Decoding Algorithm for Irregular LDPC Codes Using a Split Row Method Journal of Wireless Networking and Communications 2012, 2(4): 29-34 DOI: 10.5923/j.jwnc.20120204.01 Reduced Complexity of Decoding Algorithm for Irregular Rachid El Alami *, Mostafa Mrabti, Cheikh Bamba

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye

A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS. Theepan Moorthy and Andy Ye A SCALABLE COMPUTING AND MEMORY ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION ON FIELD-PROGRAMMABLE GATE ARRAYS Theepan Moorthy and Andy Ye Department of Electrical and Computer Engineering Ryerson

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function.

FPGA. Logic Block. Plessey FPGA: basic building block here is 2-input NAND gate which is connected to each other to implement desired function. FPGA Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different

More information

LOW-density parity-check (LDPC) codes have attracted

LOW-density parity-check (LDPC) codes have attracted 2966 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 LDPC Block and Convolutional Codes Based on Circulant Matrices R. Michael Tanner, Fellow, IEEE, Deepak Sridhara, Arvind Sridharan,

More information

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes 1 Prakash K M, 2 Dr. G S Sunitha 1 Assistant Professor, Dept. of E&C, Bapuji Institute of Engineering and Technology,

More information

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7

ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 ISSCC 2003 / SESSION 8 / COMMUNICATIONS SIGNAL PROCESSING / PAPER 8.7 8.7 A Programmable Turbo Decoder for Multiple 3G Wireless Standards Myoung-Cheol Shin, In-Cheol Park KAIST, Daejeon, Republic of Korea

More information

INTRODUCTION TO FPGA ARCHITECTURE

INTRODUCTION TO FPGA ARCHITECTURE 3/3/25 INTRODUCTION TO FPGA ARCHITECTURE DIGITAL LOGIC DESIGN (BASIC TECHNIQUES) a b a y 2input Black Box y b Functional Schematic a b y a b y a b y 2 Truth Table (AND) Truth Table (OR) Truth Table (XOR)

More information

Disclosing the LDPC Code Decoder Design Space

Disclosing the LDPC Code Decoder Design Space Disclosing the LDPC Code Decoder Design Space Torben Brack, Frank Kienle, Norbert Wehn Microelectronic System Design Research Group University of Kaiserslautern Erwin-Schrödinger-Straße 67663 Kaiserslautern,

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm

FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm Marjan Karkooti, Joseph R. Cavallaro Center for imedia Communication, Department of Electrical and Computer Engineering MS-366, Rice University,

More information

EE414 Embedded Systems Ch 5. Memory Part 2/2

EE414 Embedded Systems Ch 5. Memory Part 2/2 EE414 Embedded Systems Ch 5. Memory Part 2/2 Byung Kook Kim School of Electrical Engineering Korea Advanced Institute of Science and Technology Overview 6.1 introduction 6.2 Memory Write Ability and Storage

More information

Optimized Min-Sum Decoding Algorithm for Low Density PC Codes

Optimized Min-Sum Decoding Algorithm for Low Density PC Codes Optimized Min-Sum Decoding Algorithm for Low Density PC Codes Dewan Siam Shafiullah, Mohammad Rakibul Islam, Mohammad Mostafa Amir Faisal, Imran Rahman, Dept. of Electrical and Electronic Engineering,

More information

Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory

Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory Wenzhe Zhao, Hongbin Sun, Minjie Lv, Guiqiang Dong, Nanning Zheng, and Tong Zhang Institute

More information

Reduced Latency Majority Logic Decoding for Error Detection and Correction

Reduced Latency Majority Logic Decoding for Error Detection and Correction Reduced Latency Majority Logic Decoding for Error Detection and Correction D.K.Monisa 1, N.Sathiya 2 1 Department of Electronics and Communication Engineering, Mahendra Engineering College, Namakkal, Tamilnadu,

More information

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes P. Kalai Mani, V. Vishnu Prasath PG Student, Department of Applied Electronics, Sri Subramanya College of Engineering

More information

Investigation of Error Floors of Structured Low- Density Parity-Check Codes by Hardware Emulation

Investigation of Error Floors of Structured Low- Density Parity-Check Codes by Hardware Emulation Investigation of Error Floors of Structured Low- Density Parity-Check Codes by Hardware Emulation Zhengya Zhang, Lara Dolecek, Borivoje Nikolic, Venkat Anantharam, and Martin Wainwright Department of Electrical

More information

Design and Implementation of Low Density Parity Check Codes

Design and Implementation of Low Density Parity Check Codes IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 09 (September. 2014), V2 PP 21-25 www.iosrjen.org Design and Implementation of Low Density Parity Check Codes

More information

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline

CPE/EE 422/522. Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices. Dr. Rhonda Kay Gaede UAH. Outline CPE/EE 422/522 Introduction to Xilinx Virtex Field-Programmable Gate Arrays Devices Dr. Rhonda Kay Gaede UAH Outline Introduction Field-Programmable Gate Arrays Virtex Virtex-E, Virtex-II, and Virtex-II

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes.

HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes. HiPANQ Overview of NVIDIA GPU Architecture and Introduction to CUDA/OpenCL Programming, and Parallelization of LDPC codes Ian Glendinning Outline NVIDIA GPU cards CUDA & OpenCL Parallel Implementation

More information

Capacity-approaching Codes for Solid State Storages

Capacity-approaching Codes for Solid State Storages Capacity-approaching Codes for Solid State Storages Jeongseok Ha, Department of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Contents Capacity-Approach Codes Turbo

More information

Fault Tolerant Parallel Filters Based On Bch Codes

Fault Tolerant Parallel Filters Based On Bch Codes RESEARCH ARTICLE OPEN ACCESS Fault Tolerant Parallel Filters Based On Bch Codes K.Mohana Krishna 1, Mrs.A.Maria Jossy 2 1 Student, M-TECH(VLSI Design) SRM UniversityChennai, India 2 Assistant Professor

More information

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT

More information

HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm

HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm I J C T A, 9(20), 2016, pp. 75-80 International Science Press HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm Sandeep Kakde* and Atish Khobragade** ABSTRACT

More information

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning Lowering the Error Floors of Irregular High- LDPC Codes by Graph Conditioning Wen-Yen Weng, Aditya Ramamoorthy and Richard D. Wesel Electrical Engineering Department, UCLA, Los Angeles, CA, 90095-594.

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

EECS150 - Digital Design Lecture 16 - Memory

EECS150 - Digital Design Lecture 16 - Memory EECS150 - Digital Design Lecture 16 - Memory October 17, 2002 John Wawrzynek Fall 2002 EECS150 - Lec16-mem1 Page 1 Memory Basics Uses: data & program storage general purpose registers buffering table lookups

More information

CHAPTER 4 BLOOM FILTER

CHAPTER 4 BLOOM FILTER 54 CHAPTER 4 BLOOM FILTER 4.1 INTRODUCTION Bloom filter was formulated by Bloom (1970) and is used widely today for different purposes including web caching, intrusion detection, content based routing,

More information

Parallel graph traversal for FPGA

Parallel graph traversal for FPGA LETTER IEICE Electronics Express, Vol.11, No.7, 1 6 Parallel graph traversal for FPGA Shice Ni a), Yong Dou, Dan Zou, Rongchun Li, and Qiang Wang National Laboratory for Parallel and Distributed Processing,

More information

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing

EE878 Special Topics in VLSI. Computer Arithmetic for Digital Signal Processing EE878 Special Topics in VLSI Computer Arithmetic for Digital Signal Processing Part 6c High-Speed Multiplication - III Spring 2017 Koren Part.6c.1 Array Multipliers The two basic operations - generation

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

It is well understood that the minimum number of check bits required for single bit error correction is specified by the relationship: D + P P

It is well understood that the minimum number of check bits required for single bit error correction is specified by the relationship: D + P P October 2012 Reference Design RD1025 Introduction This reference design implements an Error Correction Code (ECC) module for the LatticeEC and LatticeSC FPGA families that can be applied to increase memory

More information

A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder

A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder Proceedings of the 5th WSEAS Int. Conf. on Instrumentation, Measurement, Circuits and Systems, angzhou, China, April 6-8, 26 (pp28-223) A Memory Efficient FPGA Implementation of Quasi-Cyclic DPC Decoder

More information

A High-Throughput FPGA Implementation of Quasi-Cyclic LDPC Decoder

A High-Throughput FPGA Implementation of Quasi-Cyclic LDPC Decoder 40 IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.3, March 07 A High-Throughput FPGA Implementation of Quasi-Cyclic LDPC Decoder Hossein Gharaee, Mahdie Kiaee and Naser

More information

FAULT TOLERANT SYSTEMS

FAULT TOLERANT SYSTEMS FAULT TOLERANT SYSTEMS http://www.ecs.umass.edu/ece/koren/faulttolerantsystems Part 6 Coding I Chapter 3 Information Redundancy Part.6.1 Information Redundancy - Coding A data word with d bits is encoded

More information

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES

PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES PROGRAMMABLE MODULES SPECIFICATION OF PROGRAMMABLE COMBINATIONAL AND SEQUENTIAL MODULES. psa. rom. fpga THE WAY THE MODULES ARE PROGRAMMED NETWORKS OF PROGRAMMABLE MODULES EXAMPLES OF USES Programmable

More information

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double

More information

IMPLEMENTATION OF DISTRIBUTED CANNY EDGE DETECTOR ON FPGA

IMPLEMENTATION OF DISTRIBUTED CANNY EDGE DETECTOR ON FPGA IMPLEMENTATION OF DISTRIBUTED CANNY EDGE DETECTOR ON FPGA T. Rupalatha 1, Mr.C.Leelamohan 2, Mrs.M.Sreelakshmi 3 P.G. Student, Department of ECE, C R Engineering College, Tirupati, India 1 Associate Professor,

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices

Basic FPGA Architectures. Actel FPGAs. PLD Technologies: Antifuse. 3 Digital Systems Implementation Programmable Logic Devices 3 Digital Systems Implementation Programmable Logic Devices Basic FPGA Architectures Why Programmable Logic Devices (PLDs)? Low cost, low risk way of implementing digital circuits as application specific

More information

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes

FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes FPGA Implementation of Double Error Correction Orthogonal Latin Squares Codes E. Jebamalar Leavline Assistant Professor, Department of ECE, Anna University, BIT Campus, Tiruchirappalli, India Email: jebilee@gmail.com

More information