Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining

Size: px
Start display at page:

Download "Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining"

Transcription

1 Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining Ming Su, Lili Zhou, Student Member, IEEE, and C.-J. Richard Shi, Fellow, IEEE Department of Electrical Engineering, University of Washington {mingsu, Abstract In this paper, we apply C-slow retiming and asynchronous deep pipelining to maximize the throughput-area efficiency of fully parallel lowdensity-parity-check (LDPC) decoding. Pipelined decoders are implemented in a.8 µm FDSOI CMOS process. Experimental results show that our pipelining technique is an efficient approach to maximizing LDPC decoding throughput while minimizing the area consumption. First, pipelined decoders can achieve extraordinary high throughput which non-pipelined design cannot. Second, for the same throughput, pipelined decoders use less area than non-pipelined design. Our approach can improve the throughput of a published implementation by 4 times with only about 8% area overhead. Without using clocks, proposed asynchronous pipelined decoders are more scalable in design complexity and more robust to process-voltagetemperature variations than existing clock-based LDPC decoders.. Introduction Since their rediscovery, LDPC codes have attracted growing attention due to their near Shannon-limit error correction capability [], [2]. LDPC decoding performance has facilitated the advancement of a variety of applications such as next-generation mobile communication, digital video broadcasting and longhaul optical communication systems [3] [4] [5]. The rapid development of very large-scale integrated circuit (VLSI) technology has made possible high-throughput hardware decoding of LDPC codes. Several hardware implementations using fully parallel, partially parallel, and serial architectures have been published for a wide range of applications [6] [7] [8] [9] []. As an important performance measurement of a decoder, throughput is defined as the amount of information decoded per second. The maximal throughput is achieved by fully parallel implementations. However, even the state-of-art fully parallel architectures cannot meet the throughput requirements for high-speed applications such as nextgeneration communication [4] [5], military and space systems [] [2]. Furthermore, the fully parallel implementations of decoding for maximal throughput present several welldescribed design challenges, including large silicon area, congested interconnect, and hard-to-control clock skews. These challenges grow rapidly with the LDPC code-length, and thus jeopardize the efficiency and performance of fully parallel decoders in power, area consumption and design effort [6]. In this paper, we propose to exploit C-slow retiming and asynchronous deep pipelining to achieve the maximal possible throughputs of LDPC decoding. Deep pipelining is a well-known technique to increase the throughput. It is implemented in this paper using C- slow retiming, observing the iterative decoding nature of LDPC codes. Without using clock, our design is more robust and less sensitive to process-voltagetemperature variation. To the best of our knowledge, this is the first reported asynchronous implementation of LDPC decoders. This paper is organized as follows. Section 2 reviews the LDPC codes and fully parallel decoder architecture. Section 3 introduces C-slow retiming. Asynchronous pipelined decoder design is described in Section 4. Section 5 presents the implementation results. Section 6 concludes the paper. 2. LDPC code and fully-parallel Architecture 2.. LDPC Codes and Decoding Algorithm /7/$ IEEE 636

2 An LDPC code is represented by a sparse paritycheck matrix H, which elements are binary numbers. Let c denote a binary vector. If c satisfies that H c = () then c is said to be a valid codeword of H. Here denotes matrix vector multiplication while multiplication and addition are AND and XOR operations, respectively. Each row of H defines a parity check and the row elements indicate which elements of c are included in the parity check. Fig. (a) shows the H matrix of a LDPC code and parity equations defined by each row. Fig. 2(b) depicts the corresponding widely used Tanner graph representation of this LDPC code [3]. A Tanner graph is a bipartite graph, and consists of two sets of nodes. One set of nodes is called variable nodes, which map to the columns of the H matrix (also of vector c), while the other set of nodes is called check nodes, which map to the rows of the H matrix. The edge between a variable node and a check node maps to a element in the H matrix. These mappings are illustrated by dashed lines. For example, H, being one and the edge between check node C and variable node V indicate that V is involved in the parity check defined in C. In Tanner Graph, variable nodes and check nodes represent hardware units that store intermediate node information and perform algorithmic computations, such as the parity check in check nodes. H = Fig.. An example LDPC code. (a) H matrix and (b) Tanner Graph. Fig. 2. Message communication LDPC codes. LDPC codes are widely used in applications where information has to be transmitted through noisy communication channels, which add noise to the message, e.g., flip certain bits of the message. Message is encoded and decoded before and after the channel. In the decoding process, the channel-distorted codeword is corrected. Fig. 2 shows such a process. One classic and most popular algorithm for decoding LDPC codes is the iterative soft-decision messagepassing algorithm also known as the Belief Propagation (BP) algorithm [4]. The algorithm can be described using the Tanner Graph. The vector to be decoded (often called input message) is the loglikelihood ratio (LLR) representation of the message, is received from the communication channel. The LLR representation of the received bit is defined as: P( x = y) λ = ln[ ] (2) P( x = y) where λ is usually quantized to an n-bit binary number and the first bit represents sign and the rest bits represent magnitude. x is the message bit that is transmitted through the channel and y is the bit received by the decoder possibly distorted. Intermediate message is also in the form of LLR and either passed from a variable node to a check node (variable message) or vice versa (check message). Input messages are initially stored in variable nodes and replaced by subsequent intermediate messages. The main steps of the algorithm are described as follows [6]: ) Initialize each variable node and its outgoing variable message in LLR form; 2) Pass the variable messages from the variable nodes to the check nodes along the edges of the graph. 3) At check nodes, perform an update of all the LLRs. First perform a parity check on the sign bits of the incoming variable messages to form the row parity check. Then form the sign of each outgoing check message as the XOR of the sign of the incoming variable message corresponding to each edge and the row parity check result. Update the LLR magnitude by computing intermediate row parity reliability defined as λ = tanh( λ / 2) i I, j J. (3) i jh, i, j= where I and J are index set of check nodes and variable nodes, respectively. h indicates that involved i, j = i, j 637

3 variable messages are from variable nodes j incident with check node i. This computation task can be simplified in the logarithmic domain, where multiplication and division are replaced by addition and subtraction. Eq. (3) becomes: ln λ = ln[tanh( / 2)] (4) i λ i, j j, = h i, j Based on row parity reliability, all outgoing check message reliabilities are computed as: * λi, j= 2a tanh tanh( λi, j/ 2) jh, i, j=, j i (5) = 2atanh exp( ln( λi) ln tanh( λi, j / 2) ) { } * where λ is the reliability of the check message from i, j check node i to variable node j. 4) Pass the check messages (updated LLRs in Step 3) from the check nodes back to the variable nodes along the edges of the graph. 5) At the variable nodes, update LLRs. Perform a summation of the input message (LLR of the received bit) and all of the incoming check messages (updated LLRs). The decoded bit is taken to be the sign of the summation. Each outgoing variable message for the next decoding iteration is then formed by a summation of all the incoming check messages except the one from the destination check node of this outgoing variable message. 6) Repeat steps 2 through 5 until a termination condition is met, such as the current messages passed to the parity check nodes satisfy all of the parity checks Fully Parallel Decoder The BP algorithm, described using Tanner graph, can be naturally mapped to a fully parallel decoder architecture, which is implemented for a 24-bit, rate- ½ LDPC code in [6] and [7]. The corresponding graph has 24 variable nodes and 52 check nodes. All variable nodes have same functionality so variable node unit (VNU) is only designed once and reused for every instance of variable node. The same holds for check node unit (CNU). The input message is in a 4-bit binary sign-magnitude notation and converted to the 2 s complement because all arithmetic operations can be performed more efficiently. In addition, all the logarithm and hyperbolic functions are implemented using a table-lookup method. The input number is used to index the table and the approximated function value is retrieved. Data_in Packet_start Packet_start Variable node Dec_out Fig. 3. Three-stage pipelined fully parallel LDPC decoder. The clock period of the circuit is determined by the critical path delay depicted in Eq. (6), which is the sum of variable node delay TVN, flip-flop setup time TSU, clock-to-q time TCKQ, check node delay TCN and clock skew Tskew. TVN and TCN are dominant terms because both variable node and check node contain many levels of logic. T = T + T + T + T + T (6) VN SU CKQ CN skew All instances of variable nodes and check nodes are placed and routed according to the edges in the graph. 24 variable nodes are partitioned into 6 64-node groups, which work in parallel. Each group is threestage fully pipelined as shown in Fig. 3. The first stage takes 64 cycles to shift in 64 input messages. In the second stage, it takes 64 iterations to decode the message. The last stage also takes 64 cycles to shift out 64 decoded bits. These three stages each takes 64 cycles thus can operate in parallel. The throughput is defined as the number of messages decoded per second and can be expressed analytically as: # group Throughput = (7) # iteration T where #group denotes number of variable node group, #iteration denotes number of iterations and T is the clock period. 3. C-slow retiming Fully parallel decoder can be highly pipelined. However, clock cycle will not decrease because of the presence of feedback loops in LDPC decoder. Proposed by Leiserson et al. [5], C-slow retiming is an approach of accelerating computations that include feedback loops. Fig. 4 illustrates how conventional pipelining and C-slow retiming are applied to a circuit containing feedback loop [6]. The circuit was modeled by a directed graph with nodes representing the logic unit 638

4 messages loading into a pipeline is T int erval. After the first C sets of messages are loaded into the pipeline, the pipeline is fully occupied. Therefore, only when the first message finishes decoding and exits the pipeline, the C+th set of message enters the pipeline. Assume that the completion time for the first message (also for the C+ message to enter the pipeline) is. Then we have T st T load = ( C ) M (8) Fig. 4. An example of C-slow Retiming. with delay and edges representing pipeline registers. After pipelining, the clock cycle of the circuit in Fig. 4(a) is reduced from 4 to 2 (Fig. 4(b)). However if a feedback path is added as shown in Fig. 4(c), the feedback loop becomes the critical path. The proper functionality requires that every input have to meet with its immediately preceding input at the first logic in the loop. So an input can only be scheduled after its immediately preceding input propagates through the feedback path. Simply inserting registers into the loop will not alter the critical path due this requirement. In Fig. 4(d), we apply C-slow retiming, where each loop and I/O register is replaced by 2 consecutive registers. The pipeline can perform two independent computations by taking its input from two independent data streams alternately every clock cycle. This 2- slowed pipeline operates correctly because the input and the intermediate results contained in the first registers of the pairs belongs to the same computation task so that a new input will always meet with the feedback of the same stream. Also, the input register needs not to wait for feedback to latch a new input because the pipeline can be fed with input from another independent task. Further retiming can reduce the clock cycle to 2 as shown in Fig. 4(e). BP decoding is a well-suited application of C-slow retiming. First, input messages are decoded independently from each other thus each message can be viewed as an independent task. Second, fixed iteration count can be used. Assume that the iteration count is M. We pipeline each VNU and CNU into C/2 stages. Instead of using a great number of registers to buffer the input, we use only one register and schedule appropriately. If M is dividable by C, we schedule an input every M/C iterations, which is equivalent to M clock cycles. Initially the C-stage pipeline is empty. Assume thattload time is needed to load the first C sets of input messages. The interval time between each set of T st = Tload + M (9) M T int ( N, N + ) = C = M () C The throughput becomes one message per M/C iteration while its un-pipelined counterpart uses M iterations to decode one message. Fig. 5 shows how the above scheduling works on a 4-slowed simplified decoder. The circuit is pipelined into 4 stages and the iteration count is 4. Each iteration takes 4 clock cycles thus the input is scheduled every iteration, which is equivalent to 4 clock cycles. After pipelining, the clock cycle becomes: T ' = T log ic + TSU + TCKQ + T () skew where T logic denotes the logic delay between two pipeline registers, which is approximately the original combinational path delay divided by the pipeline depth. T SU, T CKQ and T skew are constant factors and same as in (). They are considered as pipeline overhead because they do not scale with pipeline depth. Optimal pipeline depth is always determined by trading-off among speed, area and power consumption [7]. We present further analysis of this and how it guides our design in Section Pipelining LDPCs 4.. Asynchronous Micropipeline Pipeline can be implemented synchronously and asynchronously. In synchronous pipelines, combinational logics are placed between clocked registers and data are sequenced by one or more globally distributed clocks. Outputs from combinational logics are latched into registers at the same clock edge. As an example, Fig. 6(a) depicts a synchronous pipeline, where R denotes register and CL denotes combinational logic. Asynchronous pipelines have similar structure; however, instead of synchronized at same global clock edge, data transfer is localized at each pipeline stage in 639

5 path delay of the corresponding combinational logic to ensure that correct computation results are latched by registers. The clock input for each pipeline register is generated by synchronizing request and acknowledgement signals using C-element. Transistor level circuit and truth table of 2-input C-element are shown in Fig. 7. When both request and acknowledgement are high, indicating that the new input is ready to be latched by the register, the clock signal goes high enabling the latching of the input. The clock signal remains high until both signals become low. Before both signals rise, the clock remains low. Fig. 6(c) shows the timing diagram of the transition signaling protocol. In transition signaling, each transition of the C-element output, i.e. the clock input of the register, can trigger the register to latch the input data. First, request Rin goes to high to indicate that new data is available at the stage's input. Assume that Aout is low (the stage is empty), so the input data can be registered. Then, the stage raises Ain acknowledging the previous stage that it no longer needs the input. After the some delay, Rout goes to high indicating to the subsequent stage that it has new input data available. Some time after Rout rises, the subsequent stage will raise Aout to indicate that it has consumed (i.e., registered) the output data. The previous stage (a) Fig. 5. Input schedule for a 4-stage 4-iteration 4- slowed LDPC decoder. (a), (b) Before and after retiming; (c) Box represents a pipeline stage, the number represents the index of the input message. a handshake manner [8], [9]. Micropipeline shown in Fig. 6 (b) is a widely used asynchronous pipeline style [2]. There are two control signals, namely requests and acknowledgements. Request signals travel forward in the pipeline indicating whether the data in current stage is ready to be latched by the subsequent stage. Acknowledgement signals travel backward indicating whether data have been consumed by the subsequent stage. Stage outputs are transferred in bundles with request signal, which usually passes through the matched delay elements (the oval labeled as delay). The matched delay must be greater than the critical (b) (c) Fig. 6. (a) Synchronous pipeline; (b) transition signaling micropipeline; and (c) its timing diagram. 64

6 can lower Rin indicating that a subsequent data is available and another cycle starts over Microarchitecture Design Fig. 8 depicts the architecture of the asynchronously pipelined fully parallel decoder with C-slow retiming. The main part consists of pipelined variable nodes unit (all VNUs), pipelined check nodes unit (all CNUs), and the forward and feedback paths between them. In Step 5 of the decoding algorithm, the original input messages are required in variable node computation. In the original implementation, the input message is stored in a register in variable node. After C-slow retiming, we also need to make C copies of this register in order to accommodate input messages from multiple input streams. These registers are shown separately from variable nodes as message extension registers in Fig. 8 but in actual design they are placed locally in variable nodes. The scheduling control unit is responsible for generating control signal according to the scheduling scheme. Fig. 9 shows the logic of a VNU-CNU path with request and acknowledgement signals. REG stores the Fig. 9. VNU-CNU logic with handshake signals. variable message. Its input is multiplexed from input message (decoding algorithm step ), or subsequently updated variable node computation result (decoding algorithm Step 5). The input MUX/DEMUX takes input data and request from, and steer the acknowledgement to one of the two sources. Two input sources exist for variable node. One is the input message; the other is the temporary check variable. The counter counts the number of transitions of request signal to generate select signal according to C-slow retiming input schedule. 5. Implementation results (a) Fig. 7. C-element. (a) transistor level circuit. (b) truth table. input message Feedback message Input Scheduling Control Unit start Pipelined Variable Nodes Unit variable message Top Level VNU (b) parity check check message Pipelined Check Nodes Unit Message Extension Registers Fig. 8. Top-level decoder architecture. decoded bit Proposed techniques cause very small modification on the first and third stage of the entire decoding system. We have implemented the decoding stage in a.8 µm.8 V FDSOI CMOS process. The unpipelined decoder is first implemented using Verilog HDL, and then synthesized using various clock cycle constraints. On each synthesized un-pipelined design, C-slow retiming with different pipeline depths is employed. The resulting synchronous pipelines are transformed to asynchronous ones. The clock period constraint of the pipelined design is the initial unpipelined design s clock cycle divided by the pipeline depth. Retiming is performed using Synopsys Design Compiler s pipeline_retime command, which requires the original un-pipelined design, the pipeline depth and the target clock period. Since the target clock period is required before hand, we use the initial un-pipelined design s clock cycle divided by the pipeline depth. We omit the pipeline overhead in Eq. (). This causes 64

7 Design Compiler to insert buffers into the logics to achieve the target clock. The performance comparison of pipelined and nonpipelined designs boils down to compare clock period. Also note that in area comparison the area overhead of asynchronous pipelining, including the area of C- elements and matched delay elements is not included. For each CNU and VNU, only one C-element and one matched delay element are shared by the entire pipeline stage, making this area overhead negligible (only about 3.8% of CNU and VNU). Fig. compares the pre-layout areas of both pipelined and non-pipelined designs. X-axis is clock period constrains and y-axis is the area. First, we can see that only pipelined designs (indicated by colored markers) can achieve a clock period 3 ns. Second, at longer period (4 and 5 ns), pipelined designs consume less area than the non-pipelined design. The synchronous design in [7] is implemented in the same process and achieves 2Gbit/s throughput, which is equivalent to a clock period of 8 ns. Our approach can improve its throughput by 4 times with only 8% area overhead (indicated by the circle marker at 2 ns). To find the most efficient design in terms of speed and area, we define an absolute cost function (ACF) as cost abs = area clock _ period (2) Fig. plots this cost function with respect to the initial clock period before pipelining and the pipeline depth. From Fig., we have the following observations. First, the minimal cost is attained when the initial clock period is 8 and pipeline depth is 6. Second, we can see that designs with same period before pipelining have similar cost value. Third, the design with longer period has smaller area. However, the overall cost for design with long period becomes even higher for the increase of period outperforms the decrease of area. Now we consider the problem of how to derive a pipelined design including buffer insertion from a given non-pipelined design to accomplish the maximal performance improvement with the least area overhead. For this purpose, we define a relative cost function (RCF) as: cost = area_overhead (3) relative delay _ improvement _ factor where area of pipelined design area _ overhead = (4) area of starting unpipelined design delay _ improvement _ factor clock cycle of unpipelined design = = pipeline depth clock cycle of pipelined design (5) Fig.. Area comparison (non-pipelined vs pipelined). Fig.. 3D surface plot of absolute cost function. Fig. 2 3D surface plot of relative cost function. This cost function is plotted in Fig. 2. The shape of the RCF is completely different from that of the ACF. When the clock period constraint of the non-pipelined design becomes tighter, the area overhead overweighs the speed improvement. This is for when the pipeline overhead becomes much larger compared to Tlogic, the synthesis tool inserts a great number of buffers to reduce the logic delay to compensate the overhead. 642

8 6. Conclusion In this paper, we applied pipelining techniques to maximize the throughput of LDPC decoding. C-slow retiming is used to efficiently pipeline the feedback loops of the iterative decoding. Experimental results show that our pipelining technique is an efficient approach to maximizing LDPC decoding throughput while minimizing the area. First, pipelined decoders can achieve extraordinary high throughput which nonpipelined design cannot. Second, for the same throughput, pipelined decoders consume less area than non-pipelined design. Third, our approach can improve the throughput of a published implementation by 4 times with only about 8% area overhead. In addition, with the use of asynchronous pipelines, we can mitigate several well-known design challenges, including large silicon area, congested interconnect, and hard-to-control clock skews. This is especially attractive for implementing high-throughput digital functionality in three-dimensional integrated circuits, where the process variations across tiers of devices are so high that clocking across tiers is difficult. 7. Acknowledgement The authors thank Professor Carl Ebeling of the Department of Computer Science and Engineering at University of Washington for his suggestions on C- slow retiming. This research was supported by US Defense Advanced Research Projects Agency (DARPA) under Grant Number N , monitored by Navy SPAWAR Systems Center, San Diego, USA. 8. References [] D. J. C. Mackay and R. M. Neal, Near Shannon limit performance of low density parity check codes, IEE Electronics Letters, vol.33, no.6, pp , March 997. [2] M. Chiani, A.Conti, and A.Ventura, Evaluation of lowdensity parity-check codes over block fading channels, Proceedings of IEEE International Conference on Communications, June 2, vol. 3, pp [3] IEEE Draft P82.3an/D IEEE Standard for Information technology -- Telecommunications and information exchange between systems -- Local and metropolitan area networks -- Specific requirements. [4] I. B. Djordjevic, and B. Vasic, -gb/s transmission using orthogonal frequency-division multiplexing, IEEE Photonics Technology Letters, vol. 8, no. 5, pp , Aug. 26. [5] I. B. Djordjevic, O. Milenkovic, and B. Vasic, Generalized low-density parity-check codes for optical communication systems, Journal of Lightwave Technology, vol. 23, no. 5, pp , May 25. [6] A. J. Blanksby and C. J. Howland, A 69-mW -Gb/s 24-b, rate-/2 low-density parity-check code decoder, IEEE Journal of Solid-State Circuits, vol. 37, no. 3, pp , March 22. [7] L. Zhou, C. Wakayama, N. Jankrajarng, B. Hu and C.-J. R. Shi, A high-throughput low-power fully parallel 24-bit /2-rate low density parity check code decoder in 3D integrated circuits, in Proc. Asia and South Pacific Design Automation Conf., Jan. 26, pp [8] E. Yeo, P. Pakzad, B. Nikolić, and V. Anantharam, High throughput low-density parity-check decoder architectures, Proceedings of IEEE Global Telecommunications Conference, Nov. 2, vol. 5, pp [9] M. M. Mansour and N. R. Shanbhag, A 64-Mb/s 248-bit programmable LDPC decoder chip, IEEE Journal of Solid-State Circuits, vol. 4, no. 3, pp , March 26. [] L. H. Miles, J. W. Gambles, G. K. Maki, W. E. Ryan and S. R. Whitaker, An 86-Mb/s (858, 736) lowdensity parity-check encoder, IEEE Journal of Solid State Circuits, vol. 4, no. 8, pp , Aug. 26. [] [2] enhrchannelcodingschemes%2.htm [3] R. Tanner, A recursive approach to low complexity codes, IEEE Transactions on Information Theory, vol. 27, no. 9, pp , Sep. 98. [4] R. Gallager, Low-density parity-check codes, IRE Transactions on Infomation Theory, vol. 7, pp. 2 28, Jan [5] C. Leiserson, F. Rose, and J. Saxe, Optimizing synchronous circuitry by retiming, Proceedings of the 3rd Caltech Conference On VLSI, pp. 87-6, March 983. [6] N. Weaver, Y. Markovskiy, Y. Patel and J. Wawrzynek, Post placement c-slow retiming for the Xilinx Virtex FPGA, Proceedings of the th ACM Symposium of Field Programmable Gate Arrays, Feb. 23, pp [7] M. S. Hrishikesh et al., The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proc. of the 29th Annual International Symposium on Computer Architecture, pp. 4-24, May 22 [8] V. Berkel, M. B. Josephs and S. M. Nowick, Applications of asynchronous circuits, Proceedings of the IEEE, vol. 87, no. 2, pp , Feb [9] M. Singh and S.M. Nowick, High Throughput asynchronous pipelines for fine-grain dynamic datapath, Proceedings of the 6th International Symposium on Advanced Research in Asynchronous Circuits and Systems, April 2, pp [2] I. E. Sutherland, "Micropipelines", Communications of the ACM, vol. 32, no. 6, pp , June

Implementing a 2-Gbs 1024-bit ½-rate Low-Density Parity-Check Code Decoder in Three-Dimensional Integrated Circuits* (Invited Paper)

Implementing a 2-Gbs 1024-bit ½-rate Low-Density Parity-Check Code Decoder in Three-Dimensional Integrated Circuits* (Invited Paper) Implementing a 2-Gbs 24-bit ½-rate Low-Density Parity-Check Code Decoder in Three-Dimensional Integrated Circuits* (Invited Paper) Lili Zhou, Cherry Wakayama, Robin Panda, Nuttorn Jangkrajarng, Bo Hu,

More information

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

Design of a Low Density Parity Check Iterative Decoder

Design of a Low Density Parity Check Iterative Decoder 1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Saleh Usman, Mohammad M. Mansour, Ali Chehab Department of Electrical and Computer Engineering American University of Beirut Beirut

More information

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER

CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The

More information

ERROR correcting codes are used to increase the bandwidth

ERROR correcting codes are used to increase the bandwidth 404 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 3, MARCH 2002 A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder Andrew J. Blanksby and Chris J. Howland Abstract A 1024-b, rate-1/2,

More information

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, Joseph R. Cavallaro ECE Department, Rice University

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA MULTIRATE HIGHTHROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical

More information

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Introduction to Asynchronous Circuits and Systems

Introduction to Asynchronous Circuits and Systems RCIM Presentation Introduction to Asynchronous Circuits and Systems Kristofer Perta April 02 / 2004 University of Windsor Computer and Electrical Engineering Dept. Presentation Outline Section - Introduction

More information

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset

A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute

More information

Partially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule

Partially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule IEICE TRANS. FUNDAMENTALS, VOL.E89 A, NO.4 APRIL 2006 969 PAPER Special Section on Selected Papers from the 18th Workshop on Circuits and Systems in Karuizawa Partially-Parallel LDPC Decoder Achieving

More information

HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm

HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm I J C T A, 9(20), 2016, pp. 75-80 International Science Press HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm Sandeep Kakde* and Atish Khobragade** ABSTRACT

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

RECENTLY, low-density parity-check (LDPC) codes have

RECENTLY, low-density parity-check (LDPC) codes have 892 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 53, NO. 4, APRIL 2006 Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder

More information

Capacity-approaching Codes for Solid State Storages

Capacity-approaching Codes for Solid State Storages Capacity-approaching Codes for Solid State Storages Jeongseok Ha, Department of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Contents Capacity-Approach Codes Turbo

More information

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics

More information

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of

More information

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM

A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous

More information

REVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE

REVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE REVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE Seema S. Gumbade 1, Anirudhha S. Wagh 2, Dr.D.P.Rathod 3 1,2 M. Tech Scholar, Veermata Jijabai Technological Institute (VJTI), Electrical Engineering

More information

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique

A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique P. Durga Prasad, M. Tech Scholar, C. Ravi Shankar Reddy, Lecturer, V. Sumalatha, Associate Professor Department

More information

High Performance Interconnect and NoC Router Design

High Performance Interconnect and NoC Router Design High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 Sunghwan Kim* O, Min-Ho Jang*, Jong-Seon No*, Song-Nam Hong, and Dong-Joon Shin *School of Electrical Engineering and

More information

A Parallel Decoding Algorithm of LDPC Codes using CUDA

A Parallel Decoding Algorithm of LDPC Codes using CUDA A Parallel Decoding Algorithm of LDPC Codes using CUDA Shuang Wang and Samuel Cheng School of Electrical and Computer Engineering University of Oklahoma-Tulsa Tulsa, OK 735 {shuangwang, samuel.cheng}@ou.edu

More information

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,

More information

The design of a simple asynchronous processor

The design of a simple asynchronous processor The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,

More information

POWER consumption has become one of the most important

POWER consumption has become one of the most important 704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George

More information

Verilog for High Performance

Verilog for High Performance Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes

More information

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE 802.11n Sherif Abou Zied 1, Ahmed Tarek Sayed 1, and Rafik Guindi 2 1 Varkon Semiconductors, Cairo, Egypt 2 Nile University, Giza, Egypt Abstract

More information

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 1, NO. 1, NOVEMBER 2006 1 Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders Predrag Radosavljevic, Student

More information

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel I J C T A, 9(40), 2016, pp. 397-404 International Science Press ISSN: 0974-5572 BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel Neha Mahankal*, Sandeep Kakde* and Atish Khobragade**

More information

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup

A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington

More information

the main limitations of the work is that wiring increases with 1. INTRODUCTION

the main limitations of the work is that wiring increases with 1. INTRODUCTION Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,

More information

On the construction of Tanner graphs

On the construction of Tanner graphs On the construction of Tanner graphs Jesús Martínez Mateo Universidad Politécnica de Madrid Outline Introduction Low-density parity-check (LDPC) codes LDPC decoding Belief propagation based algorithms

More information

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger To cite this version: Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger lambda-min

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN 255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,

More information

Design of 8 bit Pipelined Adder using Xilinx ISE

Design of 8 bit Pipelined Adder using Xilinx ISE Design of 8 bit Pipelined Adder using Xilinx ISE 1 Jayesh Diwan, 2 Rutul Patel Assistant Professor EEE Department, Indus University, Ahmedabad, India Abstract An asynchronous circuit, or self-timed circuit,

More information

VHDL for Synthesis. Course Description. Course Duration. Goals

VHDL for Synthesis. Course Description. Course Duration. Goals VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

On combining chase-2 and sum-product algorithms for LDPC codes

On combining chase-2 and sum-product algorithms for LDPC codes University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms

More information

Chapter 6. CMOS Functional Cells

Chapter 6. CMOS Functional Cells Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this

More information

Implementation of ALU Using Asynchronous Design

Implementation of ALU Using Asynchronous Design IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.

More information

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC

EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

Low Error Rate LDPC Decoders

Low Error Rate LDPC Decoders Low Error Rate LDPC Decoders Zhengya Zhang, Lara Dolecek, Pamela Lee, Venkat Anantharam, Martin J. Wainwright, Brian Richards and Borivoje Nikolić Department of Electrical Engineering and Computer Science,

More information

Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications

Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Technology

More information

On the Design of High Speed Parallel CRC Circuits using DSP Algorithams

On the Design of High Speed Parallel CRC Circuits using DSP Algorithams On the Design of High Speed Parallel CRC Circuits using DSP Algorithams 1 B.Naresh Reddy, 2 B.Kiran Kumar, 3 K.Mohini sirisha 1 Dept.of ECE,Kodada institute of Technology & Science for women,kodada,india

More information

Distributed Decoding in Cooperative Communications

Distributed Decoding in Cooperative Communications Distributed Decoding in Cooperative Communications Marjan Karkooti and Joseph R. Cavallaro Rice University, Department of Electrical and Computer Engineering, Houston, TX, 77005 {marjan,cavallar} @rice.edu

More information

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego 1. Introduction Low-density parity-check (LDPC) codes

More information

64-bit parallel CRC Generation for High Speed Applications

64-bit parallel CRC Generation for High Speed Applications 64-bit parallel CRC Generation for High Speed Applications 1.P.Sushma (Student) 2.N.Sushma (Student) 3.V.Sowmya (Student) 4.B.Krishna Associate Professor 5.D.ArunKumar Assistant Professor KITE WOMEN S

More information

A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders

A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders Houshmand ShiraniMehr 1, Tinoosh Mohsenin 2 and Bevan Baas 1 1 ECE Department, University of California, Davis, 2 CSEE Department,

More information

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes P. Kalai Mani, V. Vishnu Prasath PG Student, Department of Applied Electronics, Sri Subramanya College of Engineering

More information

Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders

Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Oana Boncalo (1), Alexandru Amaricai (1), Valentin Savin (2) (1) University Politehnica Timisoara, Romania (2) CEA-LETI,

More information

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,

More information

CONTENTS CHAPTER 1: NUMBER SYSTEM. Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii)

CONTENTS CHAPTER 1: NUMBER SYSTEM. Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii) CONTENTS Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii) CHAPTER 1: NUMBER SYSTEM 1.1 Digital Electronics... 1 1.1.1 Introduction... 1 1.1.2 Advantages of Digital Systems...

More information

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor

A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor Abstract Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there

More information

Chip Design for Turbo Encoder Module for In-Vehicle System

Chip Design for Turbo Encoder Module for In-Vehicle System Chip Design for Turbo Encoder Module for In-Vehicle System Majeed Nader Email: majeed@wayneedu Yunrui Li Email: yunruili@wayneedu John Liu Email: johnliu@wayneedu Abstract This paper studies design and

More information

Introduction to Field Programmable Gate Arrays

Introduction to Field Programmable Gate Arrays Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.

More information

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT

UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT UNIT-III 1 KNREDDY UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT Register Transfer: Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Micro operations Logic

More information

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic

Evolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:

More information

Complexity-Optimized Low-Density Parity-Check Codes

Complexity-Optimized Low-Density Parity-Check Codes Complexity-Optimized Low-Density Parity-Check Codes Masoud Ardakani Department of Electrical & Computer Engineering University of Alberta, ardakani@ece.ualberta.ca Benjamin Smith, Wei Yu, Frank R. Kschischang

More information

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3

Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,

More information

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,

More information

Energy Efficient Layer Decoding Architecture for LDPC Decoder

Energy Efficient Layer Decoding Architecture for LDPC Decoder eissn:232-225x;pissn:232-224 Volume: 4; Issue: ; January -25 Energy Efficient Layer Decoding Architecture for LDPC Decoder Jyothi B R Lecturer KLS s VDRIT Haliyal-58329 Abstract- Low Density Parity-Check

More information

Hardware Implementation

Hardware Implementation Low Density Parity Check decoder Hardware Implementation Ruchi Rani (2008EEE2225) Under guidance of Prof. Jayadeva Dr.Shankar Prakriya 1 Indian Institute of Technology LDPC code Linear block code which

More information

FPGA Implementation of ALU Based Address Generation for Memory

FPGA Implementation of ALU Based Address Generation for Memory International Journal of Emerging Engineering Research and Technology Volume 2, Issue 8, November 2014, PP 76-83 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) FPGA Implementation of ALU Based Address

More information

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:

EECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4

More information

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure

Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure Iswarya Gopal, Rajasekar.T, PG Scholar, Sri Shakthi Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India Assistant

More information

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY SYSTEMATIC ERROR-CORRECTING CODES IMPLEMENTATION FOR MATCHING OF DATA ENCODED M.Naga Kalyani*, K.Priyanka * PG Student [VLSID]

More information

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital

COMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in

More information

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips

Overview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,

More information

Unit 2: High-Level Synthesis

Unit 2: High-Level Synthesis Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis

More information

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141 EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more

More information

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)

4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013) 1 4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013) Lab #1: ITB Room 157, Thurs. and Fridays, 2:30-5:20, EOW Demos to TA: Thurs, Fri, Sept.

More information

Fault Tolerant Parallel Filters Based on ECC Codes

Fault Tolerant Parallel Filters Based on ECC Codes Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on

More information

A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder

A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder Proceedings of the 5th WSEAS Int. Conf. on Instrumentation, Measurement, Circuits and Systems, angzhou, China, April 6-8, 26 (pp28-223) A Memory Efficient FPGA Implementation of Quasi-Cyclic DPC Decoder

More information

Design of Low Power Digital CMOS Comparator

Design of Low Power Digital CMOS Comparator Design of Low Power Digital CMOS Comparator 1 A. Ramesh, 2 A.N.P.S Gupta, 3 D.Raghava Reddy 1 Student of LSI&ES, 2 Assistant Professor, 3 Associate Professor E.C.E Department, Narasaraopeta Institute of

More information

LowcostLDPCdecoderforDVB-S2

LowcostLDPCdecoderforDVB-S2 LowcostLDPCdecoderforDVB-S2 John Dielissen*, Andries Hekstra*, Vincent Berg+ * Philips Research, High Tech Campus 5, 5656 AE Eindhoven, The Netherlands + Philips Semiconductors, 2, rue de la Girafe, BP.

More information

International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013

International Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013 Design of Low Density Parity Check Decoder for WiMAX and FPGA Realization M.K Bharadwaj #1, Ch.Phani Teja *2, K.S ROY #3 #1 Electronics and Communications Engineering,K.L University #2 Electronics and

More information

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores

A Configurable Multi-Ported Register File Architecture for Soft Processor Cores A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

UNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT

UNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT UNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) contents Memory: Introduction, Random-Access memory, Memory decoding, ROM, Programmable Logic Array, Programmable Array Logic, Sequential programmable

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)

Verilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4) Verilog Sequential Logic Verilog for Synthesis Rev C (module 3 and 4) Jim Duckworth, WPI 1 Sequential Logic Module 3 Latches and Flip-Flops Implemented by using signals in always statements with edge-triggered

More information

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 3, May 2018 Piecewise Linear Approximation Based on Taylor Series of LDPC

More information

Digital Design with FPGAs. By Neeraj Kulkarni

Digital Design with FPGAs. By Neeraj Kulkarni Digital Design with FPGAs By Neeraj Kulkarni Some Basic Electronics Basic Elements: Gates: And, Or, Nor, Nand, Xor.. Memory elements: Flip Flops, Registers.. Techniques to design a circuit using basic

More information

EECS150 - Digital Design Lecture 09 - Parallelism

EECS150 - Digital Design Lecture 09 - Parallelism EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization

More information

LOW-DENSITY parity-check (LDPC) codes, which are defined

LOW-DENSITY parity-check (LDPC) codes, which are defined 734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 9, SEPTEMBER 2009 Design of a Multimode QC-LDPC Decoder Based on Shift-Routing Network Chih-Hao Liu, Chien-Ching Lin, Shau-Wei

More information

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy

Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,

More information

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications Fabien Demangel, Nicolas Fau, Nicolas Drabik, François Charot, Christophe Wolinski To cite this version: Fabien

More information

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani

More information

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient ISSN (Online) : 2278-1021 Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient PUSHPALATHA CHOPPA 1, B.N. SRINIVASA RAO 2 PG Scholar (VLSI Design), Department of ECE, Avanthi

More information