Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining
|
|
- Reginald Craig
- 6 years ago
- Views:
Transcription
1 Maximizing the Throughput-Area Efficiency of Fully-Parallel Low-Density Parity-Check Decoding with C-Slow Retiming and Asynchronous Deep Pipelining Ming Su, Lili Zhou, Student Member, IEEE, and C.-J. Richard Shi, Fellow, IEEE Department of Electrical Engineering, University of Washington {mingsu, Abstract In this paper, we apply C-slow retiming and asynchronous deep pipelining to maximize the throughput-area efficiency of fully parallel lowdensity-parity-check (LDPC) decoding. Pipelined decoders are implemented in a.8 µm FDSOI CMOS process. Experimental results show that our pipelining technique is an efficient approach to maximizing LDPC decoding throughput while minimizing the area consumption. First, pipelined decoders can achieve extraordinary high throughput which non-pipelined design cannot. Second, for the same throughput, pipelined decoders use less area than non-pipelined design. Our approach can improve the throughput of a published implementation by 4 times with only about 8% area overhead. Without using clocks, proposed asynchronous pipelined decoders are more scalable in design complexity and more robust to process-voltagetemperature variations than existing clock-based LDPC decoders.. Introduction Since their rediscovery, LDPC codes have attracted growing attention due to their near Shannon-limit error correction capability [], [2]. LDPC decoding performance has facilitated the advancement of a variety of applications such as next-generation mobile communication, digital video broadcasting and longhaul optical communication systems [3] [4] [5]. The rapid development of very large-scale integrated circuit (VLSI) technology has made possible high-throughput hardware decoding of LDPC codes. Several hardware implementations using fully parallel, partially parallel, and serial architectures have been published for a wide range of applications [6] [7] [8] [9] []. As an important performance measurement of a decoder, throughput is defined as the amount of information decoded per second. The maximal throughput is achieved by fully parallel implementations. However, even the state-of-art fully parallel architectures cannot meet the throughput requirements for high-speed applications such as nextgeneration communication [4] [5], military and space systems [] [2]. Furthermore, the fully parallel implementations of decoding for maximal throughput present several welldescribed design challenges, including large silicon area, congested interconnect, and hard-to-control clock skews. These challenges grow rapidly with the LDPC code-length, and thus jeopardize the efficiency and performance of fully parallel decoders in power, area consumption and design effort [6]. In this paper, we propose to exploit C-slow retiming and asynchronous deep pipelining to achieve the maximal possible throughputs of LDPC decoding. Deep pipelining is a well-known technique to increase the throughput. It is implemented in this paper using C- slow retiming, observing the iterative decoding nature of LDPC codes. Without using clock, our design is more robust and less sensitive to process-voltagetemperature variation. To the best of our knowledge, this is the first reported asynchronous implementation of LDPC decoders. This paper is organized as follows. Section 2 reviews the LDPC codes and fully parallel decoder architecture. Section 3 introduces C-slow retiming. Asynchronous pipelined decoder design is described in Section 4. Section 5 presents the implementation results. Section 6 concludes the paper. 2. LDPC code and fully-parallel Architecture 2.. LDPC Codes and Decoding Algorithm /7/$ IEEE 636
2 An LDPC code is represented by a sparse paritycheck matrix H, which elements are binary numbers. Let c denote a binary vector. If c satisfies that H c = () then c is said to be a valid codeword of H. Here denotes matrix vector multiplication while multiplication and addition are AND and XOR operations, respectively. Each row of H defines a parity check and the row elements indicate which elements of c are included in the parity check. Fig. (a) shows the H matrix of a LDPC code and parity equations defined by each row. Fig. 2(b) depicts the corresponding widely used Tanner graph representation of this LDPC code [3]. A Tanner graph is a bipartite graph, and consists of two sets of nodes. One set of nodes is called variable nodes, which map to the columns of the H matrix (also of vector c), while the other set of nodes is called check nodes, which map to the rows of the H matrix. The edge between a variable node and a check node maps to a element in the H matrix. These mappings are illustrated by dashed lines. For example, H, being one and the edge between check node C and variable node V indicate that V is involved in the parity check defined in C. In Tanner Graph, variable nodes and check nodes represent hardware units that store intermediate node information and perform algorithmic computations, such as the parity check in check nodes. H = Fig.. An example LDPC code. (a) H matrix and (b) Tanner Graph. Fig. 2. Message communication LDPC codes. LDPC codes are widely used in applications where information has to be transmitted through noisy communication channels, which add noise to the message, e.g., flip certain bits of the message. Message is encoded and decoded before and after the channel. In the decoding process, the channel-distorted codeword is corrected. Fig. 2 shows such a process. One classic and most popular algorithm for decoding LDPC codes is the iterative soft-decision messagepassing algorithm also known as the Belief Propagation (BP) algorithm [4]. The algorithm can be described using the Tanner Graph. The vector to be decoded (often called input message) is the loglikelihood ratio (LLR) representation of the message, is received from the communication channel. The LLR representation of the received bit is defined as: P( x = y) λ = ln[ ] (2) P( x = y) where λ is usually quantized to an n-bit binary number and the first bit represents sign and the rest bits represent magnitude. x is the message bit that is transmitted through the channel and y is the bit received by the decoder possibly distorted. Intermediate message is also in the form of LLR and either passed from a variable node to a check node (variable message) or vice versa (check message). Input messages are initially stored in variable nodes and replaced by subsequent intermediate messages. The main steps of the algorithm are described as follows [6]: ) Initialize each variable node and its outgoing variable message in LLR form; 2) Pass the variable messages from the variable nodes to the check nodes along the edges of the graph. 3) At check nodes, perform an update of all the LLRs. First perform a parity check on the sign bits of the incoming variable messages to form the row parity check. Then form the sign of each outgoing check message as the XOR of the sign of the incoming variable message corresponding to each edge and the row parity check result. Update the LLR magnitude by computing intermediate row parity reliability defined as λ = tanh( λ / 2) i I, j J. (3) i jh, i, j= where I and J are index set of check nodes and variable nodes, respectively. h indicates that involved i, j = i, j 637
3 variable messages are from variable nodes j incident with check node i. This computation task can be simplified in the logarithmic domain, where multiplication and division are replaced by addition and subtraction. Eq. (3) becomes: ln λ = ln[tanh( / 2)] (4) i λ i, j j, = h i, j Based on row parity reliability, all outgoing check message reliabilities are computed as: * λi, j= 2a tanh tanh( λi, j/ 2) jh, i, j=, j i (5) = 2atanh exp( ln( λi) ln tanh( λi, j / 2) ) { } * where λ is the reliability of the check message from i, j check node i to variable node j. 4) Pass the check messages (updated LLRs in Step 3) from the check nodes back to the variable nodes along the edges of the graph. 5) At the variable nodes, update LLRs. Perform a summation of the input message (LLR of the received bit) and all of the incoming check messages (updated LLRs). The decoded bit is taken to be the sign of the summation. Each outgoing variable message for the next decoding iteration is then formed by a summation of all the incoming check messages except the one from the destination check node of this outgoing variable message. 6) Repeat steps 2 through 5 until a termination condition is met, such as the current messages passed to the parity check nodes satisfy all of the parity checks Fully Parallel Decoder The BP algorithm, described using Tanner graph, can be naturally mapped to a fully parallel decoder architecture, which is implemented for a 24-bit, rate- ½ LDPC code in [6] and [7]. The corresponding graph has 24 variable nodes and 52 check nodes. All variable nodes have same functionality so variable node unit (VNU) is only designed once and reused for every instance of variable node. The same holds for check node unit (CNU). The input message is in a 4-bit binary sign-magnitude notation and converted to the 2 s complement because all arithmetic operations can be performed more efficiently. In addition, all the logarithm and hyperbolic functions are implemented using a table-lookup method. The input number is used to index the table and the approximated function value is retrieved. Data_in Packet_start Packet_start Variable node Dec_out Fig. 3. Three-stage pipelined fully parallel LDPC decoder. The clock period of the circuit is determined by the critical path delay depicted in Eq. (6), which is the sum of variable node delay TVN, flip-flop setup time TSU, clock-to-q time TCKQ, check node delay TCN and clock skew Tskew. TVN and TCN are dominant terms because both variable node and check node contain many levels of logic. T = T + T + T + T + T (6) VN SU CKQ CN skew All instances of variable nodes and check nodes are placed and routed according to the edges in the graph. 24 variable nodes are partitioned into 6 64-node groups, which work in parallel. Each group is threestage fully pipelined as shown in Fig. 3. The first stage takes 64 cycles to shift in 64 input messages. In the second stage, it takes 64 iterations to decode the message. The last stage also takes 64 cycles to shift out 64 decoded bits. These three stages each takes 64 cycles thus can operate in parallel. The throughput is defined as the number of messages decoded per second and can be expressed analytically as: # group Throughput = (7) # iteration T where #group denotes number of variable node group, #iteration denotes number of iterations and T is the clock period. 3. C-slow retiming Fully parallel decoder can be highly pipelined. However, clock cycle will not decrease because of the presence of feedback loops in LDPC decoder. Proposed by Leiserson et al. [5], C-slow retiming is an approach of accelerating computations that include feedback loops. Fig. 4 illustrates how conventional pipelining and C-slow retiming are applied to a circuit containing feedback loop [6]. The circuit was modeled by a directed graph with nodes representing the logic unit 638
4 messages loading into a pipeline is T int erval. After the first C sets of messages are loaded into the pipeline, the pipeline is fully occupied. Therefore, only when the first message finishes decoding and exits the pipeline, the C+th set of message enters the pipeline. Assume that the completion time for the first message (also for the C+ message to enter the pipeline) is. Then we have T st T load = ( C ) M (8) Fig. 4. An example of C-slow Retiming. with delay and edges representing pipeline registers. After pipelining, the clock cycle of the circuit in Fig. 4(a) is reduced from 4 to 2 (Fig. 4(b)). However if a feedback path is added as shown in Fig. 4(c), the feedback loop becomes the critical path. The proper functionality requires that every input have to meet with its immediately preceding input at the first logic in the loop. So an input can only be scheduled after its immediately preceding input propagates through the feedback path. Simply inserting registers into the loop will not alter the critical path due this requirement. In Fig. 4(d), we apply C-slow retiming, where each loop and I/O register is replaced by 2 consecutive registers. The pipeline can perform two independent computations by taking its input from two independent data streams alternately every clock cycle. This 2- slowed pipeline operates correctly because the input and the intermediate results contained in the first registers of the pairs belongs to the same computation task so that a new input will always meet with the feedback of the same stream. Also, the input register needs not to wait for feedback to latch a new input because the pipeline can be fed with input from another independent task. Further retiming can reduce the clock cycle to 2 as shown in Fig. 4(e). BP decoding is a well-suited application of C-slow retiming. First, input messages are decoded independently from each other thus each message can be viewed as an independent task. Second, fixed iteration count can be used. Assume that the iteration count is M. We pipeline each VNU and CNU into C/2 stages. Instead of using a great number of registers to buffer the input, we use only one register and schedule appropriately. If M is dividable by C, we schedule an input every M/C iterations, which is equivalent to M clock cycles. Initially the C-stage pipeline is empty. Assume thattload time is needed to load the first C sets of input messages. The interval time between each set of T st = Tload + M (9) M T int ( N, N + ) = C = M () C The throughput becomes one message per M/C iteration while its un-pipelined counterpart uses M iterations to decode one message. Fig. 5 shows how the above scheduling works on a 4-slowed simplified decoder. The circuit is pipelined into 4 stages and the iteration count is 4. Each iteration takes 4 clock cycles thus the input is scheduled every iteration, which is equivalent to 4 clock cycles. After pipelining, the clock cycle becomes: T ' = T log ic + TSU + TCKQ + T () skew where T logic denotes the logic delay between two pipeline registers, which is approximately the original combinational path delay divided by the pipeline depth. T SU, T CKQ and T skew are constant factors and same as in (). They are considered as pipeline overhead because they do not scale with pipeline depth. Optimal pipeline depth is always determined by trading-off among speed, area and power consumption [7]. We present further analysis of this and how it guides our design in Section Pipelining LDPCs 4.. Asynchronous Micropipeline Pipeline can be implemented synchronously and asynchronously. In synchronous pipelines, combinational logics are placed between clocked registers and data are sequenced by one or more globally distributed clocks. Outputs from combinational logics are latched into registers at the same clock edge. As an example, Fig. 6(a) depicts a synchronous pipeline, where R denotes register and CL denotes combinational logic. Asynchronous pipelines have similar structure; however, instead of synchronized at same global clock edge, data transfer is localized at each pipeline stage in 639
5 path delay of the corresponding combinational logic to ensure that correct computation results are latched by registers. The clock input for each pipeline register is generated by synchronizing request and acknowledgement signals using C-element. Transistor level circuit and truth table of 2-input C-element are shown in Fig. 7. When both request and acknowledgement are high, indicating that the new input is ready to be latched by the register, the clock signal goes high enabling the latching of the input. The clock signal remains high until both signals become low. Before both signals rise, the clock remains low. Fig. 6(c) shows the timing diagram of the transition signaling protocol. In transition signaling, each transition of the C-element output, i.e. the clock input of the register, can trigger the register to latch the input data. First, request Rin goes to high to indicate that new data is available at the stage's input. Assume that Aout is low (the stage is empty), so the input data can be registered. Then, the stage raises Ain acknowledging the previous stage that it no longer needs the input. After the some delay, Rout goes to high indicating to the subsequent stage that it has new input data available. Some time after Rout rises, the subsequent stage will raise Aout to indicate that it has consumed (i.e., registered) the output data. The previous stage (a) Fig. 5. Input schedule for a 4-stage 4-iteration 4- slowed LDPC decoder. (a), (b) Before and after retiming; (c) Box represents a pipeline stage, the number represents the index of the input message. a handshake manner [8], [9]. Micropipeline shown in Fig. 6 (b) is a widely used asynchronous pipeline style [2]. There are two control signals, namely requests and acknowledgements. Request signals travel forward in the pipeline indicating whether the data in current stage is ready to be latched by the subsequent stage. Acknowledgement signals travel backward indicating whether data have been consumed by the subsequent stage. Stage outputs are transferred in bundles with request signal, which usually passes through the matched delay elements (the oval labeled as delay). The matched delay must be greater than the critical (b) (c) Fig. 6. (a) Synchronous pipeline; (b) transition signaling micropipeline; and (c) its timing diagram. 64
6 can lower Rin indicating that a subsequent data is available and another cycle starts over Microarchitecture Design Fig. 8 depicts the architecture of the asynchronously pipelined fully parallel decoder with C-slow retiming. The main part consists of pipelined variable nodes unit (all VNUs), pipelined check nodes unit (all CNUs), and the forward and feedback paths between them. In Step 5 of the decoding algorithm, the original input messages are required in variable node computation. In the original implementation, the input message is stored in a register in variable node. After C-slow retiming, we also need to make C copies of this register in order to accommodate input messages from multiple input streams. These registers are shown separately from variable nodes as message extension registers in Fig. 8 but in actual design they are placed locally in variable nodes. The scheduling control unit is responsible for generating control signal according to the scheduling scheme. Fig. 9 shows the logic of a VNU-CNU path with request and acknowledgement signals. REG stores the Fig. 9. VNU-CNU logic with handshake signals. variable message. Its input is multiplexed from input message (decoding algorithm step ), or subsequently updated variable node computation result (decoding algorithm Step 5). The input MUX/DEMUX takes input data and request from, and steer the acknowledgement to one of the two sources. Two input sources exist for variable node. One is the input message; the other is the temporary check variable. The counter counts the number of transitions of request signal to generate select signal according to C-slow retiming input schedule. 5. Implementation results (a) Fig. 7. C-element. (a) transistor level circuit. (b) truth table. input message Feedback message Input Scheduling Control Unit start Pipelined Variable Nodes Unit variable message Top Level VNU (b) parity check check message Pipelined Check Nodes Unit Message Extension Registers Fig. 8. Top-level decoder architecture. decoded bit Proposed techniques cause very small modification on the first and third stage of the entire decoding system. We have implemented the decoding stage in a.8 µm.8 V FDSOI CMOS process. The unpipelined decoder is first implemented using Verilog HDL, and then synthesized using various clock cycle constraints. On each synthesized un-pipelined design, C-slow retiming with different pipeline depths is employed. The resulting synchronous pipelines are transformed to asynchronous ones. The clock period constraint of the pipelined design is the initial unpipelined design s clock cycle divided by the pipeline depth. Retiming is performed using Synopsys Design Compiler s pipeline_retime command, which requires the original un-pipelined design, the pipeline depth and the target clock period. Since the target clock period is required before hand, we use the initial un-pipelined design s clock cycle divided by the pipeline depth. We omit the pipeline overhead in Eq. (). This causes 64
7 Design Compiler to insert buffers into the logics to achieve the target clock. The performance comparison of pipelined and nonpipelined designs boils down to compare clock period. Also note that in area comparison the area overhead of asynchronous pipelining, including the area of C- elements and matched delay elements is not included. For each CNU and VNU, only one C-element and one matched delay element are shared by the entire pipeline stage, making this area overhead negligible (only about 3.8% of CNU and VNU). Fig. compares the pre-layout areas of both pipelined and non-pipelined designs. X-axis is clock period constrains and y-axis is the area. First, we can see that only pipelined designs (indicated by colored markers) can achieve a clock period 3 ns. Second, at longer period (4 and 5 ns), pipelined designs consume less area than the non-pipelined design. The synchronous design in [7] is implemented in the same process and achieves 2Gbit/s throughput, which is equivalent to a clock period of 8 ns. Our approach can improve its throughput by 4 times with only 8% area overhead (indicated by the circle marker at 2 ns). To find the most efficient design in terms of speed and area, we define an absolute cost function (ACF) as cost abs = area clock _ period (2) Fig. plots this cost function with respect to the initial clock period before pipelining and the pipeline depth. From Fig., we have the following observations. First, the minimal cost is attained when the initial clock period is 8 and pipeline depth is 6. Second, we can see that designs with same period before pipelining have similar cost value. Third, the design with longer period has smaller area. However, the overall cost for design with long period becomes even higher for the increase of period outperforms the decrease of area. Now we consider the problem of how to derive a pipelined design including buffer insertion from a given non-pipelined design to accomplish the maximal performance improvement with the least area overhead. For this purpose, we define a relative cost function (RCF) as: cost = area_overhead (3) relative delay _ improvement _ factor where area of pipelined design area _ overhead = (4) area of starting unpipelined design delay _ improvement _ factor clock cycle of unpipelined design = = pipeline depth clock cycle of pipelined design (5) Fig.. Area comparison (non-pipelined vs pipelined). Fig.. 3D surface plot of absolute cost function. Fig. 2 3D surface plot of relative cost function. This cost function is plotted in Fig. 2. The shape of the RCF is completely different from that of the ACF. When the clock period constraint of the non-pipelined design becomes tighter, the area overhead overweighs the speed improvement. This is for when the pipeline overhead becomes much larger compared to Tlogic, the synthesis tool inserts a great number of buffers to reduce the logic delay to compensate the overhead. 642
8 6. Conclusion In this paper, we applied pipelining techniques to maximize the throughput of LDPC decoding. C-slow retiming is used to efficiently pipeline the feedback loops of the iterative decoding. Experimental results show that our pipelining technique is an efficient approach to maximizing LDPC decoding throughput while minimizing the area. First, pipelined decoders can achieve extraordinary high throughput which nonpipelined design cannot. Second, for the same throughput, pipelined decoders consume less area than non-pipelined design. Third, our approach can improve the throughput of a published implementation by 4 times with only about 8% area overhead. In addition, with the use of asynchronous pipelines, we can mitigate several well-known design challenges, including large silicon area, congested interconnect, and hard-to-control clock skews. This is especially attractive for implementing high-throughput digital functionality in three-dimensional integrated circuits, where the process variations across tiers of devices are so high that clocking across tiers is difficult. 7. Acknowledgement The authors thank Professor Carl Ebeling of the Department of Computer Science and Engineering at University of Washington for his suggestions on C- slow retiming. This research was supported by US Defense Advanced Research Projects Agency (DARPA) under Grant Number N , monitored by Navy SPAWAR Systems Center, San Diego, USA. 8. References [] D. J. C. Mackay and R. M. Neal, Near Shannon limit performance of low density parity check codes, IEE Electronics Letters, vol.33, no.6, pp , March 997. [2] M. Chiani, A.Conti, and A.Ventura, Evaluation of lowdensity parity-check codes over block fading channels, Proceedings of IEEE International Conference on Communications, June 2, vol. 3, pp [3] IEEE Draft P82.3an/D IEEE Standard for Information technology -- Telecommunications and information exchange between systems -- Local and metropolitan area networks -- Specific requirements. [4] I. B. Djordjevic, and B. Vasic, -gb/s transmission using orthogonal frequency-division multiplexing, IEEE Photonics Technology Letters, vol. 8, no. 5, pp , Aug. 26. [5] I. B. Djordjevic, O. Milenkovic, and B. Vasic, Generalized low-density parity-check codes for optical communication systems, Journal of Lightwave Technology, vol. 23, no. 5, pp , May 25. [6] A. J. Blanksby and C. J. Howland, A 69-mW -Gb/s 24-b, rate-/2 low-density parity-check code decoder, IEEE Journal of Solid-State Circuits, vol. 37, no. 3, pp , March 22. [7] L. Zhou, C. Wakayama, N. Jankrajarng, B. Hu and C.-J. R. Shi, A high-throughput low-power fully parallel 24-bit /2-rate low density parity check code decoder in 3D integrated circuits, in Proc. Asia and South Pacific Design Automation Conf., Jan. 26, pp [8] E. Yeo, P. Pakzad, B. Nikolić, and V. Anantharam, High throughput low-density parity-check decoder architectures, Proceedings of IEEE Global Telecommunications Conference, Nov. 2, vol. 5, pp [9] M. M. Mansour and N. R. Shanbhag, A 64-Mb/s 248-bit programmable LDPC decoder chip, IEEE Journal of Solid-State Circuits, vol. 4, no. 3, pp , March 26. [] L. H. Miles, J. W. Gambles, G. K. Maki, W. E. Ryan and S. R. Whitaker, An 86-Mb/s (858, 736) lowdensity parity-check encoder, IEEE Journal of Solid State Circuits, vol. 4, no. 8, pp , Aug. 26. [] [2] enhrchannelcodingschemes%2.htm [3] R. Tanner, A recursive approach to low complexity codes, IEEE Transactions on Information Theory, vol. 27, no. 9, pp , Sep. 98. [4] R. Gallager, Low-density parity-check codes, IRE Transactions on Infomation Theory, vol. 7, pp. 2 28, Jan [5] C. Leiserson, F. Rose, and J. Saxe, Optimizing synchronous circuitry by retiming, Proceedings of the 3rd Caltech Conference On VLSI, pp. 87-6, March 983. [6] N. Weaver, Y. Markovskiy, Y. Patel and J. Wawrzynek, Post placement c-slow retiming for the Xilinx Virtex FPGA, Proceedings of the th ACM Symposium of Field Programmable Gate Arrays, Feb. 23, pp [7] M. S. Hrishikesh et al., The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays, Proc. of the 29th Annual International Symposium on Computer Architecture, pp. 4-24, May 22 [8] V. Berkel, M. B. Josephs and S. M. Nowick, Applications of asynchronous circuits, Proceedings of the IEEE, vol. 87, no. 2, pp , Feb [9] M. Singh and S.M. Nowick, High Throughput asynchronous pipelines for fine-grain dynamic datapath, Proceedings of the 6th International Symposium on Advanced Research in Asynchronous Circuits and Systems, April 2, pp [2] I. E. Sutherland, "Micropipelines", Communications of the ACM, vol. 32, no. 6, pp , June
Implementing a 2-Gbs 1024-bit ½-rate Low-Density Parity-Check Code Decoder in Three-Dimensional Integrated Circuits* (Invited Paper)
Implementing a 2-Gbs 24-bit ½-rate Low-Density Parity-Check Code Decoder in Three-Dimensional Integrated Circuits* (Invited Paper) Lili Zhou, Cherry Wakayama, Robin Panda, Nuttorn Jangkrajarng, Bo Hu,
More informationOverlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation
Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr
More informationA Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding
A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely
More informationDesign of a Low Density Parity Check Iterative Decoder
1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,
More informationPERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS
American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE
More informationInterlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes
Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Saleh Usman, Mohammad M. Mansour, Ali Chehab Department of Electrical and Computer Engineering American University of Beirut Beirut
More informationCHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER
84 CHAPTER 3 ASYNCHRONOUS PIPELINE CONTROLLER 3.1 INTRODUCTION The introduction of several new asynchronous designs which provides high throughput and low latency is the significance of this chapter. The
More informationERROR correcting codes are used to increase the bandwidth
404 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 3, MARCH 2002 A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder Andrew J. Blanksby and Chris J. Howland Abstract A 1024-b, rate-1/2,
More informationHIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES
HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, Joseph R. Cavallaro ECE Department, Rice University
More informationA Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors
A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,
More informationMULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA
MULTIRATE HIGHTHROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical
More informationMULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA
MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical
More informationLOW-DENSITY PARITY-CHECK (LDPC) codes [1] can
208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer
More informationGated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver
Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath
More informationIntroduction to Asynchronous Circuits and Systems
RCIM Presentation Introduction to Asynchronous Circuits and Systems Kristofer Perta April 02 / 2004 University of Windsor Computer and Electrical Engineering Dept. Presentation Outline Section - Introduction
More informationA Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset
A Novel Pseudo 4 Phase Dual Rail Asynchronous Protocol with Self Reset Logic & Multiple Reset M.Santhi, Arun Kumar S, G S Praveen Kalish, Siddharth Sarangan, G Lakshminarayanan Dept of ECE, National Institute
More informationPartially-Parallel LDPC Decoder Achieving High-Efficiency Message-Passing Schedule
IEICE TRANS. FUNDAMENTALS, VOL.E89 A, NO.4 APRIL 2006 969 PAPER Special Section on Selected Papers from the 18th Workshop on Circuits and Systems in Karuizawa Partially-Parallel LDPC Decoder Achieving
More informationHDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm
I J C T A, 9(20), 2016, pp. 75-80 International Science Press HDL Implementation of an Efficient Partial Parallel LDPC Decoder Using Soft Bit Flip Algorithm Sandeep Kakde* and Atish Khobragade** ABSTRACT
More information/$ IEEE
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,
More informationRECENTLY, low-density parity-check (LDPC) codes have
892 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 53, NO. 4, APRIL 2006 Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder
More informationCapacity-approaching Codes for Solid State Storages
Capacity-approaching Codes for Solid State Storages Jeongseok Ha, Department of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Contents Capacity-Approach Codes Turbo
More informationEfficient VLSI Huffman encoder implementation and its application in high rate serial data encoding
LETTER IEICE Electronics Express, Vol.14, No.21, 1 11 Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding Rongshan Wei a) and Xingang Zhang College of Physics
More informationEfficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes
Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of
More informationA Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM
A Synthesizable RTL Design of Asynchronous FIFO Interfaced with SRAM Mansi Jhamb, Sugam Kapoor USIT, GGSIPU Sector 16-C, Dwarka, New Delhi-110078, India Abstract This paper demonstrates an asynchronous
More informationREVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE
REVIEW ON CONSTRUCTION OF PARITY CHECK MATRIX FOR LDPC CODE Seema S. Gumbade 1, Anirudhha S. Wagh 2, Dr.D.P.Rathod 3 1,2 M. Tech Scholar, Veermata Jijabai Technological Institute (VJTI), Electrical Engineering
More informationA Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique
A Low-Power Field Programmable VLSI Based on Autonomous Fine-Grain Power Gating Technique P. Durga Prasad, M. Tech Scholar, C. Ravi Shankar Reddy, Lecturer, V. Sumalatha, Associate Professor Department
More informationHigh Performance Interconnect and NoC Router Design
High Performance Interconnect and NoC Router Design Brinda M M.E Student, Dept. of ECE (VLSI Design) K.Ramakrishnan College of Technology Samayapuram, Trichy 621 112 brinda18th@gmail.com Devipoonguzhali
More informationDesign and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology
Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,
More informationNew Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1
New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 Sunghwan Kim* O, Min-Ho Jang*, Jong-Seon No*, Song-Nam Hong, and Dong-Joon Shin *School of Electrical Engineering and
More informationA Parallel Decoding Algorithm of LDPC Codes using CUDA
A Parallel Decoding Algorithm of LDPC Codes using CUDA Shuang Wang and Samuel Cheng School of Electrical and Computer Engineering University of Oklahoma-Tulsa Tulsa, OK 735 {shuangwang, samuel.cheng}@ou.edu
More informationFPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS
FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS 1 RONNIE O. SERFA JUAN, 2 CHAN SU PARK, 3 HI SEOK KIM, 4 HYEONG WOO CHA 1,2,3,4 CheongJu University E-maul: 1 engr_serfs@yahoo.com,
More informationThe design of a simple asynchronous processor
The design of a simple asynchronous processor SUN-YEN TAN 1, WEN-TZENG HUANG 2 1 Department of Electronic Engineering National Taipei University of Technology No. 1, Sec. 3, Chung-hsiao E. Rd., Taipei,10608,
More informationPOWER consumption has become one of the most important
704 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 4, APRIL 2004 Brief Papers High-Throughput Asynchronous Datapath With Software-Controlled Voltage Scaling Yee William Li, Student Member, IEEE, George
More informationVerilog for High Performance
Verilog for High Performance Course Description This course provides all necessary theoretical and practical know-how to write synthesizable HDL code through Verilog standard language. The course goes
More informationLow Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n
Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE 802.11n Sherif Abou Zied 1, Ahmed Tarek Sayed 1, and Rafik Guindi 2 1 Varkon Semiconductors, Cairo, Egypt 2 Nile University, Giza, Egypt Abstract
More informationTradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 1, NO. 1, NOVEMBER 2006 1 Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders Predrag Radosavljevic, Student
More informationBER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel
I J C T A, 9(40), 2016, pp. 397-404 International Science Press ISSN: 0974-5572 BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel Neha Mahankal*, Sandeep Kakde* and Atish Khobragade**
More informationA Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup
A Hybrid Approach to CAM-Based Longest Prefix Matching for IP Route Lookup Yan Sun and Min Sik Kim School of Electrical Engineering and Computer Science Washington State University Pullman, Washington
More informationthe main limitations of the work is that wiring increases with 1. INTRODUCTION
Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,
More informationOn the construction of Tanner graphs
On the construction of Tanner graphs Jesús Martínez Mateo Universidad Politécnica de Madrid Outline Introduction Low-density parity-check (LDPC) codes LDPC decoding Belief propagation based algorithms
More informationlambda-min Decoding Algorithm of Regular and Irregular LDPC Codes
lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger To cite this version: Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger lambda-min
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN
255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,
More informationDesign of 8 bit Pipelined Adder using Xilinx ISE
Design of 8 bit Pipelined Adder using Xilinx ISE 1 Jayesh Diwan, 2 Rutul Patel Assistant Professor EEE Department, Indus University, Ahmedabad, India Abstract An asynchronous circuit, or self-timed circuit,
More informationVHDL for Synthesis. Course Description. Course Duration. Goals
VHDL for Synthesis Course Description This course provides all necessary theoretical and practical know how to write an efficient synthesizable HDL code through VHDL standard language. The course goes
More informationRECENTLY, researches on gigabit wireless personal area
146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,
More informationOn combining chase-2 and sum-product algorithms for LDPC codes
University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms
More informationChapter 6. CMOS Functional Cells
Chapter 6 CMOS Functional Cells In the previous chapter we discussed methods of designing layout of logic gates and building blocks like transmission gates, multiplexers and tri-state inverters. In this
More informationImplementation of ALU Using Asynchronous Design
IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) ISSN: 2278-2834, ISBN: 2278-8735. Volume 3, Issue 6 (Nov. - Dec. 2012), PP 07-12 Implementation of ALU Using Asynchronous Design P.
More informationEECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC
EECS150 - Digital Design Lecture 24 - High-Level Design (Part 3) + ECC April 12, 2012 John Wawrzynek Spring 2012 EECS150 - Lec24-hdl3 Page 1 Parallelism Parallelism is the act of doing more than one thing
More informationEFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS
Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL
More informationLow Error Rate LDPC Decoders
Low Error Rate LDPC Decoders Zhengya Zhang, Lara Dolecek, Pamela Lee, Venkat Anantharam, Martin J. Wainwright, Brian Richards and Borivoje Nikolić Department of Electrical Engineering and Computer Science,
More informationMulti-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications
Multi-Rate Reconfigurable LDPC Decoder Architectures for QC-LDPC codes in High Throughput Applications A thesis submitted in partial fulfillment of the requirements for the degree of Bachelor of Technology
More informationOn the Design of High Speed Parallel CRC Circuits using DSP Algorithams
On the Design of High Speed Parallel CRC Circuits using DSP Algorithams 1 B.Naresh Reddy, 2 B.Kiran Kumar, 3 K.Mohini sirisha 1 Dept.of ECE,Kodada institute of Technology & Science for women,kodada,india
More informationDistributed Decoding in Cooperative Communications
Distributed Decoding in Cooperative Communications Marjan Karkooti and Joseph R. Cavallaro Rice University, Department of Electrical and Computer Engineering, Houston, TX, 77005 {marjan,cavallar} @rice.edu
More informationQuantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes
Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego 1. Introduction Low-density parity-check (LDPC) codes
More information64-bit parallel CRC Generation for High Speed Applications
64-bit parallel CRC Generation for High Speed Applications 1.P.Sushma (Student) 2.N.Sushma (Student) 3.V.Sowmya (Student) 4.B.Krishna Associate Professor 5.D.ArunKumar Assistant Professor KITE WOMEN S
More informationA Reduced Routing Network Architecture for Partial Parallel LDPC Decoders
A Reduced Routing Network Architecture for Partial Parallel LDPC Decoders Houshmand ShiraniMehr 1, Tinoosh Mohsenin 2 and Bevan Baas 1 1 ECE Department, University of California, Davis, 2 CSEE Department,
More informationMajority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes
Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes P. Kalai Mani, V. Vishnu Prasath PG Student, Department of Applied Electronics, Sri Subramanya College of Engineering
More informationCost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders
Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Oana Boncalo (1), Alexandru Amaricai (1), Valentin Savin (2) (1) University Politehnica Timisoara, Romania (2) CEA-LETI,
More informationOPTIMIZING THE POWER USING FUSED ADD MULTIPLIER
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationCONTENTS CHAPTER 1: NUMBER SYSTEM. Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii)
CONTENTS Foreword...(vii) Preface... (ix) Acknowledgement... (xi) About the Author...(xxiii) CHAPTER 1: NUMBER SYSTEM 1.1 Digital Electronics... 1 1.1.1 Introduction... 1 1.1.2 Advantages of Digital Systems...
More informationA Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor
A Novel Carry-look ahead approach to an Unified BCD and Binary Adder/Subtractor Abstract Increasing prominence of commercial, financial and internet-based applications, which process decimal data, there
More informationChip Design for Turbo Encoder Module for In-Vehicle System
Chip Design for Turbo Encoder Module for In-Vehicle System Majeed Nader Email: majeed@wayneedu Yunrui Li Email: yunruili@wayneedu John Liu Email: johnliu@wayneedu Abstract This paper studies design and
More informationIntroduction to Field Programmable Gate Arrays
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May 9 June 2007 Javier Serrano, CERN AB-CO-HT Outline Historical introduction.
More informationUNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT
UNIT-III 1 KNREDDY UNIT-III REGISTER TRANSFER LANGUAGE AND DESIGN OF CONTROL UNIT Register Transfer: Register Transfer Language Register Transfer Bus and Memory Transfers Arithmetic Micro operations Logic
More informationEvolution of Implementation Technologies. ECE 4211/5211 Rapid Prototyping with FPGAs. Gate Array Technology (IBM s) Programmable Logic
ECE 42/52 Rapid Prototyping with FPGAs Dr. Charlie Wang Department of Electrical and Computer Engineering University of Colorado at Colorado Springs Evolution of Implementation Technologies Discrete devices:
More informationComplexity-Optimized Low-Density Parity-Check Codes
Complexity-Optimized Low-Density Parity-Check Codes Masoud Ardakani Department of Electrical & Computer Engineering University of Alberta, ardakani@ece.ualberta.ca Benjamin Smith, Wei Yu, Frank R. Kschischang
More informationBehavioral Array Mapping into Multiport Memories Targeting Low Power 3
Behavioral Array Mapping into Multiport Memories Targeting Low Power 3 Preeti Ranjan Panda and Nikil D. Dutt Department of Information and Computer Science University of California, Irvine, CA 92697-3425,
More informationLLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,
More informationEnergy Efficient Layer Decoding Architecture for LDPC Decoder
eissn:232-225x;pissn:232-224 Volume: 4; Issue: ; January -25 Energy Efficient Layer Decoding Architecture for LDPC Decoder Jyothi B R Lecturer KLS s VDRIT Haliyal-58329 Abstract- Low Density Parity-Check
More informationHardware Implementation
Low Density Parity Check decoder Hardware Implementation Ruchi Rani (2008EEE2225) Under guidance of Prof. Jayadeva Dr.Shankar Prakriya 1 Indian Institute of Technology LDPC code Linear block code which
More informationFPGA Implementation of ALU Based Address Generation for Memory
International Journal of Emerging Engineering Research and Technology Volume 2, Issue 8, November 2014, PP 76-83 ISSN 2349-4395 (Print) & ISSN 2349-4409 (Online) FPGA Implementation of ALU Based Address
More informationEECS 150 Homework 7 Solutions Fall (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are:
Problem 1: CLD2 Problems. (a) 4.3 The functions for the 7 segment display decoder given in Section 4.3 are: C 0 = A + BD + C + BD C 1 = A + CD + CD + B C 2 = A + B + C + D C 3 = BD + CD + BCD + BC C 4
More informationReliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure
Reliability of Memory Storage System Using Decimal Matrix Code and Meta-Cure Iswarya Gopal, Rajasekar.T, PG Scholar, Sri Shakthi Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India Assistant
More information[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY SYSTEMATIC ERROR-CORRECTING CODES IMPLEMENTATION FOR MATCHING OF DATA ENCODED M.Naga Kalyani*, K.Priyanka * PG Student [VLSID]
More informationCOMPUTER ARCHITECTURE AND ORGANIZATION Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital
Register Transfer and Micro-operations 1. Introduction A digital system is an interconnection of digital hardware modules that accomplish a specific information-processing task. Digital systems vary in
More informationOverview. CSE372 Digital Systems Organization and Design Lab. Hardware CAD. Two Types of Chips
Overview CSE372 Digital Systems Organization and Design Lab Prof. Milo Martin Unit 5: Hardware Synthesis CAD (Computer Aided Design) Use computers to design computers Virtuous cycle Architectural-level,
More informationUnit 2: High-Level Synthesis
Course contents Unit 2: High-Level Synthesis Hardware modeling Data flow Scheduling/allocation/assignment Reading Chapter 11 Unit 2 1 High-Level Synthesis (HLS) Hardware-description language (HDL) synthesis
More informationEECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 14 EE141
EECS 151/251A Fall 2017 Digital Design and Integrated Circuits Instructor: John Wawrzynek and Nicholas Weaver Lecture 14 EE141 Outline Parallelism EE141 2 Parallelism Parallelism is the act of doing more
More information4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013)
1 4DM4 Lab. #1 A: Introduction to VHDL and FPGAs B: An Unbuffered Crossbar Switch (posted Thursday, Sept 19, 2013) Lab #1: ITB Room 157, Thurs. and Fridays, 2:30-5:20, EOW Demos to TA: Thurs, Fri, Sept.
More informationFault Tolerant Parallel Filters Based on ECC Codes
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on
More informationA Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder
Proceedings of the 5th WSEAS Int. Conf. on Instrumentation, Measurement, Circuits and Systems, angzhou, China, April 6-8, 26 (pp28-223) A Memory Efficient FPGA Implementation of Quasi-Cyclic DPC Decoder
More informationDesign of Low Power Digital CMOS Comparator
Design of Low Power Digital CMOS Comparator 1 A. Ramesh, 2 A.N.P.S Gupta, 3 D.Raghava Reddy 1 Student of LSI&ES, 2 Assistant Professor, 3 Associate Professor E.C.E Department, Narasaraopeta Institute of
More informationLowcostLDPCdecoderforDVB-S2
LowcostLDPCdecoderforDVB-S2 John Dielissen*, Andries Hekstra*, Vincent Berg+ * Philips Research, High Tech Campus 5, 5656 AE Eindhoven, The Netherlands + Philips Semiconductors, 2, rue de la Girafe, BP.
More informationInternational Journal of Engineering Trends and Technology (IJETT) - Volume4Issue5- May 2013
Design of Low Density Parity Check Decoder for WiMAX and FPGA Realization M.K Bharadwaj #1, Ch.Phani Teja *2, K.S ROY #3 #1 Electronics and Communications Engineering,K.L University #2 Electronics and
More informationA Configurable Multi-Ported Register File Architecture for Soft Processor Cores
A Configurable Multi-Ported Register File Architecture for Soft Processor Cores Mazen A. R. Saghir and Rawan Naous Department of Electrical and Computer Engineering American University of Beirut P.O. Box
More informationPerformance of Multihop Communications Using Logical Topologies on Optical Torus Networks
Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,
More informationUNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) Department of Electronics and Communication Engineering, VBIT
UNIT - V MEMORY P.VIDYA SAGAR ( ASSOCIATE PROFESSOR) contents Memory: Introduction, Random-Access memory, Memory decoding, ROM, Programmable Logic Array, Programmable Array Logic, Sequential programmable
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationCHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP
133 CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP 6.1 INTRODUCTION As the era of a billion transistors on a one chip approaches, a lot of Processing Elements (PEs) could be located
More informationDUE to the high computational complexity and real-time
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen
More informationVerilog Sequential Logic. Verilog for Synthesis Rev C (module 3 and 4)
Verilog Sequential Logic Verilog for Synthesis Rev C (module 3 and 4) Jim Duckworth, WPI 1 Sequential Logic Module 3 Latches and Flip-Flops Implemented by using signals in always statements with edge-triggered
More informationPiecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA
Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 3, May 2018 Piecewise Linear Approximation Based on Taylor Series of LDPC
More informationDigital Design with FPGAs. By Neeraj Kulkarni
Digital Design with FPGAs By Neeraj Kulkarni Some Basic Electronics Basic Elements: Gates: And, Or, Nor, Nand, Xor.. Memory elements: Flip Flops, Registers.. Techniques to design a circuit using basic
More informationEECS150 - Digital Design Lecture 09 - Parallelism
EECS150 - Digital Design Lecture 09 - Parallelism Feb 19, 2013 John Wawrzynek Spring 2013 EECS150 - Lec09-parallel Page 1 Parallelism Parallelism is the act of doing more than one thing at a time. Optimization
More informationLOW-DENSITY parity-check (LDPC) codes, which are defined
734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 9, SEPTEMBER 2009 Design of a Multimode QC-LDPC Decoder Based on Shift-Routing Network Chih-Hao Liu, Chien-Ching Lin, Shau-Wei
More informationImproving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy
Improving the Fault Tolerance of a Computer System with Space-Time Triple Modular Redundancy Wei Chen, Rui Gong, Fang Liu, Kui Dai, Zhiying Wang School of Computer, National University of Defense Technology,
More informationA Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications
A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications Fabien Demangel, Nicolas Fau, Nicolas Drabik, François Charot, Christophe Wolinski To cite this version: Fabien
More informationDesigning and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders
Vol. 3, Issue. 4, July-august. 2013 pp-2266-2270 ISSN: 2249-6645 Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders V.Krishna Kumari (1), Y.Sri Chakrapani
More informationImplementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient
ISSN (Online) : 2278-1021 Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient PUSHPALATHA CHOPPA 1, B.N. SRINIVASA RAO 2 PG Scholar (VLSI Design), Department of ECE, Avanthi
More information