496 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018

Size: px
Start display at page:

Download "496 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018"

Transcription

1 496 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018 Basic-Set Trellis Min Max Decoder Architecture for Nonbinary LDPC Codes With High-Order Galois Fields Huyen Pham Thi, Member, IEEE, and Hanho Lee, Senior Member, IEEE Abstract Nonbinary low-density parity-check (NB-LDPC) codes outperform their binary counterparts in terms of errorcorrection performance. However, the drawback of NB-LDPC decoders is high complexity, especially for the check node unit (CNU), and the complexity increases considerably when increasing the Galois-field (GF) order. In this paper, a novel basic-set trellis min max algorithm is proposed to greatly reduce not only the CNU complexity but also the number of messages exchanged between the check node and the variable node compared with previous studies, which is highly efficient for higher order GFs. In addition, the proposed CNU is designed to compute the messages in a parallel way. Layered decoder architectures based on the proposed algorithm were implemented for the (837, 726) NB-LDPC code over GF(32) and the (1512, 1323) code over GF(64) using 90-nm CMOS technology, and obtained a reduction in the complexity by 30% and 37% for the CNU, and 40% and 37.4% for the whole decoder, respectively. Moreover, the proposed decoder achieves a higher throughput at 1.67 Gbit/s and 1.4 Gbit/s compared with the other state-of-the-art high-rate NB-LDPC decoders with high-order GFs. Index Terms Basic set (BS), check node processing, high order, layered decoding, nonbinary low-density parity-check LDPC, trellis min max (TMM), VLSI design. I. INTRODUCTION NONBINARY low-density parity-check (NB-LDPC) codes defined over Galois fields (GFs) GF(q) with q > 2 outperform their binary counterparts in terms of error-correcting performance and performance improvement in the error-floor region when code length is moderate [1]. In addition, these codes have good ability of burst error correction, especially for high-order GFs. Research results in [2] and [3] demonstrate that NB-LDPC codes provide superior performance compared with the best optimized binary LDPC code over fading channels, and the combination of NB-LDPC code with high-order modulations improves both the bandwidth efficiency and the error-correction capability. Moreover, the elimination of the Manuscript received June 26, 2017; revised September 17, 2017; accepted November 4, Date of publication December 8, 2017; date of current version February 22, This work was supported by the Basic Science Research Program through the NRF funded by the Ministry of Science, ICT and Future Planning under Grant 2016R1A2B (Corresponding author: Hanho Lee.) The authors are with the Department of Information and Communication Engineering, Inha University, Incheon 22212, South Korea ( phamhuyenmta87@gmail.com; hhlee@inha.ac.kr). Color versions of one or more of the figures in this paper are available online at Digital Object Identifier /TVLSI error floor is critical for flash memory applications, and the NB-LDPC codes show much promise for multilevel flash memory applications [4]. However, the main disadvantage of NB-LDPC codes is their highly complex decoding algorithms; it is difficult to achieve maximum throughput and minimum area for their architectures. In practical implementations, the NB-LDPC decoders have several drawbacks, such as a highly complex check node unit (CNU), a large area spent on storage elements, and routing congestion. First, the belief propagation (BP) algorithm used for binary LDPC decoding was introduced for the NB-LDPC decoding [1]. Then, a fast Fourier transform-bp (FFT-BP) algorithm [5] in the probability domain was proposed to reduce the computational complexity in check node processing by replacing the convolutional operations with multiplications in the frequency domain. Although the probability domain algorithm provides optimal error-correcting performance, the large number of additions and multiplications causes an exponential increase in hardware complexity. In [6], the FFT-BP algorithm based on the logarithm domain used log-likelihood ratio (LLR) values to decode the channel messages instead of probability values, in which the multiplications are replaced with additions. For practical NB-LDPC decoder implementations, suboptimal algorithms such as extended min-sum (EMS) [7] and the min max [8] algorithm have been proposed to reduce the complexity of the CNU as the main bottleneck of the NB-LDPC decoder. The min max algorithm [8] is interesting because it uses comparisons instead of additions [7] in the check node processing, which not only reduces the hardware complexity but also prevents the numerical growth of the decoder. In addition, in [8], a forward backward scheme was utilized to derive the check node output messages. This scheme includes sequential computations, which cause a throughput problem for the decoder architectures. Moreover, additional storage memories are required to store the intermediate messages such as forward and backward messages. Recently, the path construction algorithms [9], [10] and the relaxed min max (RMM) algorithm [11] introduced the trellis representation for check node processing to eliminate computing the forward backward messages, and thus reduces the memory requirement for the intermediate messages. The RMM algorithm [11] using the minimum basis to generate the check node output messages was proposed for NB-LDPC decoders, which further reduces the check node complexity IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See for more information.

2 THI AND LEE: BS-TMM DECODER ARCHITECTURE FOR NONBINARY LDPC CODES WITH HIGH-ORDER GFS 497 However, the sequential check node processing requires a large number of clock cycles, which limits the maximum throughput of the decoder. In [12] and [13], the trellis EMS algorithm was proposed to improve the throughput of the NB-LDPC decoders, where the check node output messages are generated in parallel by means of an extra column inserted to the original trellis. A disadvantage of the decoders in [12] and [13] is high area, which causes a reduction of the overall decoder efficiency. To take advantage of the idea in [12], the simplified trellis min max (STMM) algorithm [14] was proposed to improve the throughput of the min max decoders with less complexity. In [15], the one-minimum-only TMM algorithm was introduced on the basis of the STMM algorithm to reduce the CNU complexity by obtaining only one minimum and estimating the second one. In [12] [15], q d c check node output messages are exchanged between the check node and the variable nodes. For high-order GFs or high-rate NB-LDPC codes, there are two main drawbacks in [12] [15]. First, the amount of exchanged messages increases, which causes wiring congestion, and thus limits the maximum throughput of the decoders. Second, the check node output messages are stored in the memory for the next decoding iteration in the layered decoders. Therefore, the memory requirement becomes large, which leads to a significant growth in the decoder area for NB-LDPC codes. To overcome the drawbacks of [12] [15], Lacruz et al. [16] originally introduced a compression technique to reduce the exchanged messages between one check node and the variable nodes to four sets, including the intrinsic and extrinsic information, the path coordinates, and the hard-decision symbols with a size of 5 (q 1) + d c messages without any errorcorrecting performance loss. For further improvement, the research in [17] and [18] proposed to simplify the CNU architecture and reduce the exchanged messages to 4 (q 1)+d c messages with a similar error-correcting performance in [16]. The approximated TMM algorithms in [19] and [20] were introduced to reduce the amount of intrinsic information from (q 1) elements [16] to only two elements and L q elements, respectively, at the cost of some error-correcting performance loss. The remaining elements are calculated from the approximation functions. In this paper, a novel basic-set TMM (BS-TMM) algorithm is proposed for NB-LDPC codes based on the theory of the GF GF(q = 2 p ), where each field element is uniquely represented by a linear combination of p independent field elements. In the proposed BS-TMM algorithm, the basis set including the intrinsic information of only p = log 2 q independent field elements in the extra column is stored, and the other elements are constructed on the basis of this basic set. Moreover, a novel algorithm is introduced for finding p independent field elements with the most reliable messages of the basic set in parallel. The BS-TMM algorithm allows the reduction of exchanged messages between one check node and variable nodes from 4 (q 1) + d c [16] to (q 1) + 3 p + d c messages with a negligible performance loss of 0.1 db. The proposed method provides a great area reduction and throughput improvement for the NB-LDPC decoders with good errorcorrection performance. Therefore, it is extremely efficient for Algorithm 1 Layered Min Max Decoding Algorithm [17] the design of high-rate and high-order NB-LDPC decoders. Two NB-LDPC decoders, including (837, 726) over GF(32) and (1512, 1323) over GF(64), were implemented on the basis of the BS-TMM algorithm. The rest of this paper is organized as follows. Section II reviews the decoding algorithms for the NB-LDPC codes. Section III presents the proposed BS-TMM decoding algorithm for the NB-LDPC codes. In Section IV, the CNU architecture and the overall decoder architecture based on the BS-TMM algorithm are proposed. The implementation results and comparison with previous works are discussed in Section V. Finally, conclusions are drawn in Section VI. II. TRELLIS MIN MAX DECODING ALGORITHM A. Review of the Layered Min Max Algorithm A sparse parity-check matrix H with M rows and N columns defines an NB-LDPC linear block code, where each nonzero element h mn belongs to the GF GF(q). Moreover, a Tanner graph corresponding to H is used to represent the NB-LDPC codes in a graphical way, where variable nodes represent N columns of H and check nodes represent M rows of H. Letd c and d v be the check node degree (row weight) and the variable node degree (column weight) of H, respectively. Therefore, N(m) denotes the set of d c variable nodes connected to check node m,andm(n) denotes the set of d v check nodes connected to variable node n. LetQ mn (a) and R mn (a) be the exchanged messages from n variable node to m check node (V2C) and from m check node to n variable node (C2V) for each symbol a GF(q), respectively. A regular NB-LDPC code with fixed values of d c and d v is considered in this paper. A horizontal layered decoding algorithm is applied in this paper because of its higher convergence with similar performance, compared with the flood decoding algorithm. The layered decoding algorithm for the NB-LDPC codes is presented in Algorithm 1. Let c n be the nth reference symbol of a received codeword and z n be the nth harddecision symbol with the highest reliability. The decoding process is initialized by obtaining the LLR vectors with a size of q of the channel information by means of L n (a) = ln(pr(c n = z n channel)/pr(c n = a channel)). At the first layer of the first iteration, Q n (a) as the a posteriori information

3 498 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018 Algorithm 2 TMM Algorithm [14] for the variable node n is equal to L n (a), andr mn (a) is equal to zero. Let k and l be indexes in the loop for the kth iteration and lth layer, respectively. The Q n (a) messages are permuted following the nonzero element h mn of matrix H to obtain Q n (h mn a). Then, the V2C messages Q mn (a) are derived in step 3, and the normalization of these messages is implemented by steps 4 and 5 to ensure that the LLR value for the most reliable symbol in each vector is equal to zero. Step 6 involves in the computation of the check node output messages R mn (a) using the function, which depends on the algorithm applied for the check node processing. The updated messages Q n (a) in step 7 need to be undergone the reverse permutation before starting a new layer. The decoding process is repeated until the maximum number of iterations I max is reached. Finally, the output codeword c n is the most reliable symbol corresponding to Q n (a) messages. B. Trellis Min Max Algorithm With Compressed Messages In [14], the STMM algorithm was proposed for check node processing to generate the check node output messages in a parallel way. The STMM algorithm, which is presented in Algorithm 2, provides a good tradeoff between the errorcorrecting performance and the decoding complexity, compared with the previous works [11], [21]. The first step involves the transformation of the input messages from the normal domain Q mn (a) to the delta domain Q mn (a). This transformation ensures that the most reliable symbols are always in the first index corresponding to the GF symbol 0, and the rest of the indexes are in order of {α 0,α 1,...,α q 1 }. Step 2 relates to the computation of the syndrome β using the most reliable symbols z n from V2C messages. In step 3, the first minimum value m1(a) and its column index m1 col (a), as well as the second minimum value m2(a) for each trellis row are calculated using the function. Step 4 constructs an extra column Q(a) based on the most reliable path of the configuration set conf(n r, n c ) for each symbol a. Let path 0 be the optimal path for symbol 0 including all nodes in the first row of the delta trellis. The configuration set conf(n r, n c ) [13] including the possible paths is constructed by the n r most reliable messages and a maximum of n c deviations from path 0. The configuration set conf(1, 2) is considered in [14], where n r = 1andn c = 2. Thus, only the most reliable message m1(a) and maximum of two deviations [η 1 (a), η 2 (a)] are considered for each symbol a. In the case of one deviation, η 1 (a) is equal to η 2 (a). Otherwise, they are different. The check node output messages in the delta domain R mn (a) are simultaneously generated from step 5 to step 13 depending on the deviation information as η 1 (a) and η 2 (a). For each trellis row, if column j is not the deviations of the most reliable path, then the output message R mn j (a) is assigned by the extra column value Q ( a). If the most reliable path has one deviation at column j, then the second most reliable message m2(a) is assigned to the output message. In the case of the most reliable path formed by two deviations, m1(a) is assigned to the output message. Finally, step 14 transforms the output messages from the delta domain to the normal domain as the C2V messages R mn (a). A suitable scaling factor λ is used to improve the performance of the decoder, which does not affect the hardware complexity. A disadvantage of the STMM algorithm [14] is the large memory requirement to store q d c output messages of R mn (a) for each check node, which causes a high decoder area. In [18], a compression technique is applied to reduce the output check node messages from q d c values to four elementary sets such as I (a), E(a), P(a), andzn, including 4 (q 1)+d c values without any error-correcting performance loss, as follows. Thus, the memory requirement is significantly reduced Output: I (a) E(a) P(a) zn = z n + β. The set I (a) is generated in a similar way to the extra column Q(a). ThesetE(a) includes complement values, whose values are either m1(a) or m2(a) depending on the deviation information as shown in (2). The set P(a) contains the path information for updating the output messages as shownin(3) { m2(a) if η E(a) = 1 (a) = η 2 (a) (2) m1(a) otherwise. P(a) = { ( m1 col η 1 (a) ) (, m1 col η 2 (a) )}. (3) Finally, updating the C2V messages is implemented by decompression of the check node output messages in the variable node processing as follows: { ( ) R mn j a + z I (a) if P(a) = j n j = (4) E(a) otherwise. III. BASIC-SET TRELLIS MIN MAX DECODING ALGORITHM A. Basic-Set Trellis Min Max Algorithm In this section, the novel BS-TMM algorithm is proposed to greatly reduce the complexity and the memory requirement (1)

4 THI AND LEE: BS-TMM DECODER ARCHITECTURE FOR NONBINARY LDPC CODES WITH HIGH-ORDER GFS 499 Algorithm 3 BS-TMM Algorithm for check node processing as well as the exchanged messages between check nodes and variable nodes with a negligible error-correcting performance loss. The BS-TMM algorithm is highly efficient for designing the decoders with highorder GFs. Without loss of generality, the GF GF(q) with q = 2 p including q elements such as {0,α 0,α 1,...,α q 2 } is considered in our work. For each GF GF(2 p ), any field element is uniquely represented by the linear addition of p independent field elements. To take advantage of this, in our work, a set of only p = log 2 q independent field elements with the smallest LLRs, called the basic set B, are generated in the check node processing instead of (q 1) nonzero field elements in the extra column Q(a) [16], [19]. Then, construction of the Q(a) is implemented in the variable node processing based on the basic set B. The BS-TMM algorithm is represented in Algorithm 3. Steps 1 3 are similar to steps 1 3 in Algorithm 2. Step 4 computes the basic set B = {m1l, I l, a l } 1 l p including 3 p values ( p LLR values, p column indexes, and p field elements), based on the minimum values m1(a) and their column indexes I col (a) (1 a < q). Finding the basic set B is given by the function in Algorithm 4. Step 5 relates to calculating the complement values in set E(a). The complement values for p field elements, which belong to the basic set B, are assigned to the second minimum values m2(a). For the remaining field elements, the complement values are assigned to the minimum values m1(a). Finally, the output of the check node processing includes three sets B, E(a), andzn with a size of 3 p + (q 1) + d c values, which are used for generating the C2V messages in the variable node processing. Table I shows the number of bits exchanged between check node and variable node in the proposed algorithm and previous works for the general GF(q = 2 p )andw quantization bits for the LLR values. In addition, the number of exchanged bits for high-order GFs such as GF(32), GF(64), and GF(128) is also computed with d c = 27 and w = 6 quantization bits, and illustrated in Fig. 1. It is clear that the proposed algorithm greatly reduces the exchanged bits, compared with previous works. In [14], all C2V messages generated in the check node processing are exchanged, which causes an extremely high number of check node output bits. It can be seen that the exchanged bits are reduced by factors of almost 13, 16, and for GF(32), GF(64), and GF(128), respectively. Fig. 1. Number of exchanged bits between the check node and variable node for different GFs. TABLE I COMPARISON OF EXCHANGED MESSAGES BETWEEN CHECK NODE AND VARIABLE NODE WITH d c = 27 AND w = 6 In [16] [19], a small number of fixed sets, in which the size of each set is proportional to either q or d c, are exchanged. Compared with the original compression technique [16], the proposed work reduces the number of exchanged bits by factors of almost 2.5 and 3.48 for GF(32) and GF(128), respectively. In comparison with the latest work [19], the reduction of the exchanged bits is 38.59% and 52.07% for GF(32) and GF(128), respectively. The BS-TMM algorithm achieves a large reduction of the exchanged bits for two reasons. First, the BS-TMM algorithm reduces the number of fixed sets, where the basic set B, including 3 p values, is exchanged instead of 3 (q 1) values of two sets I (a) and P(a), as shown in (1). Second, the size of the basic set B is proportional to p = log 2 q, whereas the size of sets I (a) and P(a) is proportional to q. Thus, the BS-TMM algorithm is extremely efficient for high-order GFs. The function in Algorithm 4 relates to finding the basic set B based on the minimum values m1(a) and their column indexes I col (a). LetM be a set including the minimum values m1(a) and their column indexes I col (a). Instep1,setM is rearranged in ascending order of the m1(a) values to generate anewsetm.thefirsttwofieldelementsfromsetm are selected for the basic set because they are independent field elements with the smallest LLRs, as shown in steps 2 4. The remaining elements of the basic set are found by the loop from steps 5 to 11. The goal of the loop is to find

5 500 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018 Algorithm 4 Function: Finding Basic Set B Algorithm 5 Construct Extra Column Q(a) and R mn (a) the next independent field elements with the smallest LLRs except for both the selected elements and the elements that are generated by the possible combinations of the selected elements in steps 7 and 8. Finally, the basic set B generated includes p independent field elements with the most reliable LLR values. In the variable node processing, the extra column Q(a) is recovered and the C2V messages R mn (a) are generated on the basis of the output sets of the check node processing, including B, E(a), andzn, as shown in Algorithm 5. First, the extra column Q(a) and the path information d(a) are calculated in steps 1 7. For p field elements, which belong to the basic set B,theQ(a) value is the most reliable LLR m1l, and the path information d(a) has one deviation at the column index Il with 1 l p. The remaining field elements are computed on the basis of all possible combinations of the field elements in the basic set B.TheirQ(a) values are the maximum LLR value from the LLR values corresponding to the combined field elements, and their path information d(a) has more than one deviation and a maximum of p deviations. Updating the C2V messages is implemented in steps For each row, if the column index j does not belong to the part information d(a), the C2V message R mj (a) is assigned to the extra column Q(a). Otherwise, the C2V message R mj (a) is assigned to the complement set E(a). Finally, the C2V messages in the delta domain are converted to the normal domain in step 15. In Fig. 2, an example of the delta trellis for GF(8) with d c = 4 is presented, where the minimum values in each row are marked with a dashed square. The extra column Q(a) in the rightmost column is constructed on the basis of basic set B, as shown in Algorithm 5. This example demonstrates the method of building the basic set B and the extra column Q(a). From the delta trellis, set M, including the minimum values and their column indexes as M = {(2, 1,α 0 ), (10, 2,α 1 ), (26, 3,α 2 ), (1, 1,α 3 ), (3, 4,α 4 ), (30, 1,α 5 ), (4, 1, α 6 )}, is generated. After rearranging set M with ascending order of the minimum values, set M ={(1, 1,α 3 ), (2, 1,α 0 ), (3, 4,α 4 ), (4, 1,α 6 ), (10, 2,α 1 ), (26, 3,α 2 ), (30, 1,α 5 )} is Fig. 2. Example of the trellis based on GF(8) with d c = 4. achieved. The first two field elements from M are selected for the basic set as B = {(1, 1,α 3 ), (2, 1,α 0 )}. The third field element selected is the field element with the smallest LLR value from the remaining field elements of set M except for field element α 1 = α 3 + α 0 or (10, 2,α 1 ), which is a combination of two field elements in B. Hence, (3, 4,α 4 ) is selected, and the basic set B = {(1, 1,α 3 ), (2, 1,α 0 ), (3, 4,α 4 )} includes p = 3 independent field elements with the most reliable messages. Then, the extra column Q(a) is constructed. For p field elements in the extra column, which belong to the basic set B such as {α 3,α 0,α 4 }, their LLR values Q(a) and the path information d(a) are the same as the LLR values and column indexes in the basic set B. For other field elements, all combinations of the field elements in B are considered as follows: Q(α 3 + α 0 = α 1 ) = max (1, 2) = 2

6 THI AND LEE: BS-TMM DECODER ARCHITECTURE FOR NONBINARY LDPC CODES WITH HIGH-ORDER GFS 501 Fig. 3. FERs of the (837, 726) NB-LDPC code over GF(32) under the AWGN channel. Fig. 4. FERs of the (1512, 1323) NB-LDPC code over GF(64) under the AWGN channel. and d(α 3 + α 0 = α 1 ) = {1, 1}; Q(α 0 + α 4 = α 5 ) = max(2, 3) = 3andd(α 0 + α 4 = α 5 ) ={1, 4}; Q(α 3 + α 4 = α 6 ) = max (1, 3) = 3andd(α 3 + α 4 = α 6 ) = {1, 4}; and Q(α 3 + α 0 + α 4 = α 2 ) = max (1, 2, 3) = 3and d(α 3 + α 0 + α 4 = α 2 ) ={1, 1, 4}. B. Performance Analysis To demonstrate the error-correcting performance of the proposed BS-TMM decoding algorithm, we performed the simulations for two GFs: GF(32) and GF(64). Fig. 3 illustrates the frame error rate (FER) performance for (837, 726) NB-LDPC code over GF(32) with d v = 4 and d c = 27 under the additive white Gaussian noise (AWGN) channel and binary phase shift keying modulation. As shown in Fig. 3, the floating-point simulation result of the BS-TMM algorithm with 15 iterations shows a minor performance loss at almost 0.1 db, compared with the STMM algorithm [14] and the twoextra-column TMM algorithm [17]. However, the proposed BS-TMM algorithm provides low computation complexity, a large area reduction, and a significant improvement in throughput. This is explained by the fact that (q 1) messages in the extra column Q(a) in [14] and [17] are constructed directly from all reliable messages of the configuration set conf(1, 2) using (q 1) processors, whereas these are constructed on the basis of only p reliable messages in the basic set in our work. Compared with the R-TMM algorithm [11], in which C2V messages are generated on the basis of minimum basic sets, the FER performance of the BS-TMM algorithm is almost the same as that of the R-TMM algorithm. It is noted that d c minimum basic sets are required in the R-TMM algorithm [11] to generate the C2V messages, whereas the proposed BS-TMM algorithm requires only one basic set to construct the extra column. Moreover, the sequential design implemented in [11] causes a throughput problem, whereas the proposed BS-TMM algorithm-based design performs all calculations in one clock cycle. For the purpose of hardware implementation, various fixed-point simulations were performed using different quantization schemes. A scheme with 5-bit quantization and eight iterations was chosen, which shows a performance loss at almost 0.1 db, compared with the floating-point result at 15 iterations. Fig. 4 represents the FER performance of the (1512, 1323) NB-LDPC code over GF(64). As can be seen that the BS-TMM algorithm has a minor performance loss of 0.07 db and 0.14 db in comparison with the modified trellis min max (mt-mm) algorithm [19] and the STMM algorithm [14], respectively. This result demonstrates that the proposed BS-TMM algorithm provides good FER performance and significantly reduced computation complexity for the high-order GF. IV. BS-TMM DECODER ARCHITECTURE In this section, the proposed quasi-cyclic NB-LDPC decoder architectures and design technologies for the BS-TMM algorithm are described. The quasi-cyclic NB-LDPC codes over GF(q) are constructed by the algebraic construction method based on array dispersions of matrices in [22], where a (q 1) (q 1) submatrix is generated first. Then, a submatrix with size (d v, d c ) is selected from the (q 1) (q 1) submatrix. Each field element from the (d v, d c ) submatrix is dispersed in either a zero matrix or a circulant permutation matrix (CPM) of size (q 1) (q 1). As a result, the H matrix generated from the (d v, d c ) submatrix has M = (q 1) d v rows and N = (q 1) d c columns. A. CNU Architecture The top-level CNU architecture for the BS-TMM algorithm is shown in Fig. 5, where each module corresponds to a step in Algorithm 3. The transformation module converts V2C messages from normal to delta domain using the control signals z j. This module is constructed by means of d c reordering networks, as shown in [23], where each reordering network requires q log 2 q w-bit multiplexers. The

7 502 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018 Fig. 5. Top-level CNU architecture for BS-TMM algorithm. Fig. 6. Two-min finder architecture with eight inputs [24]. check node syndrome β is generated by a tree adder structure. The delta-to-normal domain transformation is derived later using d c reordering networks with the control signals z j = z j β. The function is responsible for finding the first two minimum values and the first minimum value s index from d c inputs using the 2-min finder. The 2-min finder is adopted by applying the technique in [24], which provides a good tradeoff between the area and latency. Because (q 1) rows in the delta trellis except the first row must perform the function, a total of (q 1) 2-min finders are required. Fig. 6 shows an example of the 2-min finder architecture with eight inputs. In [14] [19], the values in the extra column Q(a) and the path information are generated by means of the first minimum values m1(a) and their column indexes I col (a). (q 1) processors are required to generate LLR values and the path information for (q 1) nonzero elements in the Q(a). Each processor is responsible for constructing q/2 possible paths to find the LLR value and the path information of one nonzero field element, in the case of using the configuration set conf(1, 2). For higher order GFs (increasing the value q), constructing the extra column Q(a) becomes more complex and costly in terms of area. In our work, a basic set B including the LLR values and the column indexes of only p = log 2 q independent field elements needs to be constructed, which provides a large reduction of not only the area but also Fig. 7. (a) Third element in the basic set. (b) Fourth element in the basic set. the messages exchanged between check node and variable node. It is noted that, in our work, multiple nodes can come from the same column stage. This causes negligible performance loss as shown in [11]. For example, the trellis in Fig. 2 shows that two independent field elements in the basic set, such as {(1, 1,α 3 ) and (2, 1,α 0 )}, come from the same column stage. The architecture of the basic set constructor corresponds to the steps in Algorithm 4. A parallel sorting approach in [20] is applied in this paper to simultaneously generate the rearranged minimum values m1 (a) and their indexes I col (a) in ascending order of the m1(a) values in one clock cycle. Then, the first two field elements in the basic set are selected from the first two field elements in the rearranged values, such as (m1 1, I 1, a 1 ) = (m1 (a 1 ), I col (a 1), a 1 ) and (m1 2, I 2, a 2 ) = (m1 (a 2 ), I col (a 2), a 2 ). In this paper, we propose an architecture to obtain the (p 2) remaining independent field elements in a parallel way. Thus, p independent field elements in the basic set B are calculated in one clock cycle. Fig. 7 shows the proposed architectures for the next two elements in the basic set. In Fig. 7(a), the architecture is designed to generate the third element, where the combination of the first two elements such as a1 + a 2 is removed from the remaining rearranged field elements {a 3, a 4,...,a q 1 } by assigning the maximum

8 THI AND LEE: BS-TMM DECODER ARCHITECTURE FOR NONBINARY LDPC CODES WITH HIGH-ORDER GFS 503 TABLE II SYNTHESIS RESULTS FOR THE PROPOSED CNU ARCHITECTURE Fig. 8. E(a) complement generator for GF(8). quantization bits instead of the LLR value m1 (a j ). The min1- finder architecture is responsible for finding the smallest value and its index from (q 3) inputs. Then, the smallest value is the LLR value of the third element m1 3, and its index is used to obtain the field element a3 and the column index I3. The signals c1 3, c1 4,...,c1 q 1 are used to eliminate the combination of the previous field elements as a1 + a 2 in finding the current element and the next element. Immediately, the fourth element is generated, as shown in Fig. 7(b), which is independent of the previous elements. To ensure the independence, two eliminations are made. First, the input field elements in this stage are either the rearranged field elements {a 3, a 4,...,a q 1 } or assigning to zero element with p bits depending on the control signals c3 1, c1 4,...,c1 q 1. Second, the combinations of the previous elements with the third element, such as {a1 + a 3, a 2 + a 3, a 1 + a 2 + a 3 } and the third element a3, are eliminated. The control signals c2 3, c2 4,...,c2 q 1 are responsible for both eliminations in finding the fourth element, and further in the next field element. Finding the LLR value m1 4,fieldelementa 4, and column index I 4 of the fourth element is similar to that of the third element. This procedure is the same for the remaining field elements in the basic set. Finally, p independent field elements in the basic set B are generated simultaneously. The architecture of the E(a) complement generator for GF(8) is designed to generate (q 1) LLR values in parallel, as shown in Fig. 8. The one-hot function generates a group of q bits, where only one bit at location a j is equal to 1, and all the other bits are equal to 0. Therefore, the control signal e[0:7] has p high bits. The high bit locations correspond to the field elements generated from one deviation path, and the complement values E(a) are assigned to m2(a). Otherwise, the field elements are generated from more than two deviations, and the complement values E(a) are assigned to m1(a). The outputs of the proposed CNU architecture, including z n, E(a), and B = {m1 l, I l, a l } 1 l p, are used to generate the C2V messages R mn (a) corresponding to Algorithm 5 in the variable node processing. Thus, the total number of bits exchanged from C2V is d c p+ (q 1) w + p (w + log 2 (d c ) + p) bits. The synthesis results of the proposed CNU architecture for the (837, 726) NB-LDPC code over GF(32) and the (1512, 1323) NB-LDPC code over GF(64) are presented in Table II using the Synopsys design tools and a TSMC 90-nm CMOS standard cell library. Compared with the works in [14] and [15], the proposed CNU greatly reduces the gate count by 51.22% and 34.64% for GF(32), respectively, because of the removal of (q 1) processors for finding the extra column [14], [15] and applying the compression technique [16]. Compared with the original TMM algorithm with the compressed messages in [16], it can be seen that the area saving is almost 30% for GF(32) and 37% for GF(64). This is due to the complexity reduction for finding the basic set with a size of p = log 2 q instead of finding the sets corresponding to the extra column with a size of q [16]. In [20], L = 4 is chosen for designing the CNU architecture in both GF(32) and GF(64), while log 2 (32) = 5 and log 2 (64) = 6 values are kept in the proposed CNU architecture for GF(32) and GF(64), respectively. The number of exchange bits between check node and variable node in [20] is almost similar with the one in the proposed work. However, the proposed CNU architecture reduces the area by 18.7% for GF(32) and 14.2% for GF(64). This area reduction achieves because of that the work in [20] requires (q 1) processors to calculate (q 1) elements of the complement set E(a), while the proposed work uses only one module to calculate (q 1) elements of E(a) set. In addition, this improvement will be significantly increased if L values [20] are chosen similarly to the proposed CNU design. Compared with [17], since the proposed CNU needs to find only p = log 2 q elements instead of q elements of two extra columns, the CNU complexity is reduced by 11.6%. B. Decoder Architecture In this section, a complete decoder architecture based on the BS-TMM algorithm is designed for NB-LDPC codes. The proposed decoder architecture achieves a great reduction in the area because of the large area reduction in the CNU architecture. In addition, an improvement in the throughput is obtained since reducing the wires between the check node and variable node processors mitigates the routing congestion. The layered min max algorithm for the proposed decoder is presented in Algorithm 6, where the BS-TMM in Algorithm 3 is implemented in the check node processor. In addition, the decompression network (DN) corresponding to Algorithm 5 is implemented in the variable node processor to generate the C2V messages R mn (a) from outputs of the CNU architecture. The DN has three parts: 1) generating the LLR

9 504 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018 Fig. 9. Proposed extra column and path information generator for GF(8). (a) Extra column generator for the jth element. (b) Control signal generator. (c) Path information generator for the jth element. Algorithm 6 Proposed Layered Decoding Algorithm values of the extra column Q(a) and the path information d(a) with a maximum of p deviations on the basis of the basic set B = {m1l, I l, a l } 1 l p; 2) generating the C2V messages in the delta domain as R mn (a) on the basis of Q(a), E(a), andd(a); 3) and converting the C2V messages from delta to normal domain. It is noted that two DNs are required in the variable node processor. However, the proposed decoder area is much lower than that of the conventional decoders [14], [15]. First, Fig. 9 shows the proposed extra column and path information generator for GF(8). The LLR value of each element in the extra column Q(a j ) is selected from one of the p LLR values m1l (1 l p) in the basic set B depending on the p control signals s l [ j] (1 l p),asshown in Fig. 9(a). (q 1) architectures as in Fig. 9(a) are required to compute (q 1) messages in the Q(a) simultaneously. p control signals s l [ j] (1 l p) are generated using the architecture in Fig. 9(b). To compute Q(a j ), only one of the control signals s l [ j] is equal to 1, and others are equal to 0. Thus, only one of p LLR values is selected for the output Q(a j ). In order to calculate the p control signals s l [ j], 2 p 1 = q 1 combinations of p field elements in the basic set B excluding the zero element are divided into p groups, as shown in Fig. 9(b). p control signals s l [ j] correspond to p outputs of the groups. The lth group contains the field element al and its combinations with all possible combinations of the previous field elements ak (0 < k < l). Therefore, the lth group (l > 0) includes 2 l 1 combinations of the field elements. In addition, (q 1) path information corresponding to (q 1) field elements is also constructed, where each path information d[ j] has p column indexes d l [ j] (1 l p). (q 1) architectures as in Fig. 9(c) are required to compute (q 1) path information. For p field elements in the basic set al (1 l p), their paths are one deviation, and thus p values of the path information {d l [ j]} 1 l p are the same as column index Il. The path of the field element generated by the combination of all field elements in the basic set has a maximum of p deviations; thus, p values of the path information d l [ j] (1 l p) correspond to p column indexes Il (1 l p) in the basic set. For other field elements generated by the remaining combinations, the number of their deviations is k (1 < k < p), and then the d l [ j] is assigned to the column index Il with 1 l k. Otherwise, the d l [ j] with k < l p is assigned to the column index Ik. Fig. 10 presents the proposed C2V message generator for GF(8) with d c = 4. The C2V messages R mj (a) (1 j d c ) are simultaneously introduced by either Q(a) or E(a), which are the outputs of the multiplexers. The control signals for the multiplexers depend on the column indexes and p deviations of the path information. If the column index j (1 j d c ) is equal to at least one of p deviations d l (a) (1 l p), then the output of the multiplexer is assigned to compensation value E(a). Otherwise, the output of the multiplexer is assigned to Q(a). Fig. 11 shows the top-level decoder architecture for the proposed layered decoding algorithm, where one row of H

10 THI AND LEE: BS-TMM DECODER ARCHITECTURE FOR NONBINARY LDPC CODES WITH HIGH-ORDER GFS 505 corresponding to the output bits of the check node processor. A total of M [ p (w+ log(d c ) +p)+(q 1) w+d c p] bits are stored in one iteration. Compared with the M q d c w bits stored in CNMEM in the conventional approach [14], the memory requirement for CNMEM in the proposed decoder is greatly reduced, which leads to a large reduction in decoder area. Fig. 10. Proposed C2V message generator for GF(8) with d c = 4. Fig. 11. Top-level decoder architecture based on the BS-TMM algorithm. corresponding to one layer is processed in one clock cycle. It can be seen that the decoder architecture is divided into a variable node processor and check node processor. To start the decoding process, the LLR messages from channel information L n (a) are loaded in variable node memory (VNMEM). From the next layer and next iteration, the output messages of the variable node processor Q k,l n (a) are stored in the VNMEM. The VNMEM includes d c memories with a depth of (q 1) as the size of the CPM [22] and a width of q w bits. For each decoding time, one address is read and one address is written from each memory. The permutation and depermutation of the variable messages in steps 4 and 9 in Algorithm 6 are implemented by modules P and P 1, respectively. Each module requires d c (q 1) log 2 q multiplexers of w bits to permute or depermute d c vectors of (q 1) messages, and the control signals are based on the h mn nonzero values of H. The normalization module N is responsible for finding the most reliable messages and their locations z n, and generating the Q k,l mn(a) messages for the inputs of the check node processor. In addition, normalization ensures that the smallest value in each LLR vector Q k,l mn(a) is always equal to zero. At the last decoding iteration, the z n values are the hard-decision symbols c n stored in the output memory, and the P module and subtractor are inactive during this process. Since a layered decoding scheme is used, the outputs of the check node processor in one iteration must be stored in the check node memory (CNMEM) for the next iteration process. Thus, the CNMEM in the proposed decoder has a depth of M and a width of p (w+ log(d c ) +p)+(q 1) w+d c p bits V. IMPLEMENTATION RESULTS AND COMPARISON To illustrate the efficiency of our proposal for NB-LDPC codes, especially for high-order GFs, the complete decoder architectures were implemented for two codes (837, 726) NB-LDPC code over GF(32) and (1512, 1323) NB-LDPC code over GF(64). A Verilog HDL was used to model the architectures, and Synopsys design tools with the TSMC 90-nm CMOS standard cell library were used to implement the proposed decoder architectures. The throughput Tp of the decoders is archived as shown in (5), where seg is the number of pipeline stages used in the decoder architecture to improve the timing. In the proposed decoder architectures, seg = 9was chosen to obtain a balance between throughput and area Tp = f clk[mhz] (q 1) d c p [Mbps]. (5) I max (M + d v seg) + (q 1) Table III shows the implementation results of the proposed decoder in comparison with the other state-of-the-art works for the (837, 726) NB-LDPC code over GF(32). It can be seen that the proposed decoder outperforms the other approaches in both area and throughput. Compared with the STMM algorithm with uncompressed messages [14], our work has almost 8.3 times higher efficiency, and reduces gate count by a factor of 4.3. This significant improvement is achieved by the great reduction in both the storage bits in the CNMEM and the CNU complexity, as explained previously. Compared wih [11], our proposal not only reduces the gate count but also increases the throughput because of its reduced complexity and parallel processing in the CNU. Thus, the proposed decoder achieves almost 9.4 times higher efficiency. In [20], a reducedcomplexity NB-LDPC decoder was proposed on the basis of reducing the size of the intrinsic information and the path coordinates to L q values, and the decoder performance depends on the selected L value, whereas our approach reduces the size of these sets to p = log 2 q values for any GF. Because the complexity of the proposed CNU is reduced, the efficiency of the proposed decoder with p = 5 is almost 1.7 times higher than that in [20] implemented with L = 4. Compared with the decoders in [16], [19], and [17], the proposed decoder reduces the gate count by 40%, 35.4%, and 5.5%, and achieves 53%, 44.7%, and 4.5% higher efficiency, respectively. Moreover, the proposed decoder is almost 14.4 times more efficient than that of [25]. In our work, the (1512, 1323) NB-LDPC code over GF(64) is constructed by the submatrix (d v, d c ) = (3, 24) and a CPM of size (q 1) (q 1) [22], which is the same code rate as the (1536, 1344) NB-LDPC code in previous works, as shown in Table IV. It is noted that the size of the CPM in previous works is q q instead of (q 1) (q 1). The synthesis results of the proposed decoder for the (1512, 1323) NB-LDPC

11 506 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 3, MARCH 2018 TABLE III COMPARISON OF THE PROPOSED DECODER WITH OTHER WORKS FOR THE (837, 726) NB-LDPC CODE OVER GF(32) TABLE IV IMPLEMENTATION RESULTS OF THE PROPOSED DECODER FOR THE (1512, 1323) NB-LDPC CODE OVER GF(64) IN A 90-nm CMOS PROCESS VI. CONCLUSION In this paper, we propose a novel basic-set trellis min max algorithm for decoding NB-LDPC codes to reduce the complexity of the CNU architecture, the messages exchanged between the check node and the variable node, and the storage bits in the CNMEM, compared with previous works. The implementation results show that the decoder architecture based on the proposed algorithm provides a great area reduction and throughput improvement, compared with the other state-of-the-art works. In addition, the results for the NB-LDPC code over GF(64) demonstrate that the proposed algorithm is especially efficient for the high-rate NB-LDPC codes with high-order GFs. REFERENCES code and the comparison with previous works are presented in Table IV. For fair comparison with previous works in terms of throughput, the clock frequency, after placing and routing the design, was reduced following the work in [20]. It can be seen that the proposed decoder reduces the gate count by 57% and achieves almost 3.8 times higher efficiency, compared with the work from [14]. Compared with the works with compressed messages [16], [19], the proposed decoder improves not only the gate count but also the throughput because of a large reduction of the complexity in the CNU and the messages exchanged between check node and variable node, which contributes to mitigating the routing congestion. Therefore, the proposed decoder reduces the gate count by 37.4% and 48.4%, and obtains a higher efficiency at 53.2% and 61.4%, compared with [16] and [19], respectively. Moreover, the proposed decoder exhibits almost 38.6% higher efficiency, compared with the work in [20] with L = 5 for codes in GF(64). [1] M. C. Davey and D. MacKay, Low-density parity check codes over GF(q), IEEE Commun. Lett., vol. 2, no. 6, pp , Jun [2] R. Peng and R.-R. Chen, WLC45-2: Application of nonbinary LDPC codes for communication over fading channels using higher order modulations, in Proc. IEEE Global Telecommun. Conf. (GLOBECOM), Nov./Dec. 2006, pp [3] M. Arabaci, I. B. Djordjevic, L. Xu, and T. Wang, Nonbinary LDPCcoded modulation for high-speed optical fiber communication without bandwidth expansion, IEEE Photon. J., vol. 4, no. 3, pp , Jun [4] C. A. Aslam, Y. L. Guan, and K. Cai, Non-binary LDPC code with multiple memory reads for multi-level-cell (MLC) flash, in Proc. Asia Pacific Signal Inf. Process. Assoc., Annu. Summit Conf. (APSIPA), 2014, pp [5] L. Barnault and D. Declercq, Fast decoding algorithm for LDPC over GF(2 q ), in Proc. IEEE Inf. Theory Workshop, Mar./Apr. 2003, pp [6] H. Wymeersch, H. Steendam, and M. Moeneclaey, Log-domain decoding of LDPC codes over GF(q), in Proc. IEEE Int. Conf. Commun., vol. 2. Jun. 2004, pp [7] D. Declercq and M. Fossorier, Decoding algorithms for nonbinary LDPC codes over GF(q), IEEE Trans. Commun., vol. 55, no. 4, pp , Apr [8] V. Savin, Min-max decoding for non binary LDPC codes, in Proc. IEEE Int. Symp. Inf. Theory, Jul. 2008, pp [9] X. Zhang and F. Cai, Reduced-complexity decoder architecture for nonbinary LDPC codes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 19, no. 7, pp , Jul [10] K. He, J. Sha, and Z. Wang, Nonbinary LDPC code decoder architecture with efficient check node processing, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 59, no. 6, pp , Jun

12 THI AND LEE: BS-TMM DECODER ARCHITECTURE FOR NONBINARY LDPC CODES WITH HIGH-ORDER GFS 507 [11] F. Cai and X. Zhang, Relaxed min-max decoder architectures for nonbinary low-density parity-check codes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 21, no. 11, pp , Nov [12] E. Li, D. Declercq, and K. Gunnam, Trellis-based extended min-sum algorithm for non-binary LDPC codes and its hardware structure, IEEE Trans. Commun., vol. 61, no. 7, pp , Jul [13] E. Li, F. García-Herrero, D. Declercq, K. Gunnam, J. O. Lacruz, and J. Valls, Low latency T-EMS decoder for non-binary LDPC codes, in Conf. Rec. 47th Asilomar Conf. Signals, Syst. Comput. (ASILOMAR), Nov. 2013, pp [14] J. O. Lacruz, F. García-Herrero, D. Declercq, and J. Valls, Simplified trellis min max decoder architecture for nonbinary low-density paritycheck codes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 9, pp , Sep [15] J. O. Lacruz, F. García-Herrero, J. Valls, and D. Declercq, One minimum only trellis decoder for non-binary low-density parity-check codes, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 62, no. 1, pp , Jan [16] J. O. Lacruz, F. García-Herrero, and J. Valls, Reduction of complexity for nonbinary LDPC decoders with compressed messages, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 23, no. 11, pp , Nov [17] H. P. Thi and H. Lee, Two-extra-column trellis min max decoder architecture for nonbinary LDPC codes, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 5, pp , May [18] J. O. Lacruz, F. García-Herrero, M. J. Canet, J. Valls, and A. Pérez-Pascual, A 630 Mbps non-binary LDPC decoder for FPGA, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), May 2015, pp [19] J. O. Lacruz, F. García-Herrero, M. J. Canet, and J. Valls, Highperformance NB-LDPC decoder with reduction of message exchange, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 5, pp , May [20] J. O. Lacruz, F. García-Herrero, M. J. Canet, and J. Valls, Reducedcomplexity nonbinary LDPC decoder for high-order Galois fields based on trellis min max algorithm, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 24, no. 8, pp , Aug [21] Y.-L. Ueng, K.-H. Liao, H.-C. Chou, and C.-J. Yang, A high-throughput trellis-based layered decoding architecture for non-binary LDPC codes using max-log-qspa, IEEE Trans. Signal Process., vol. 61, no. 11, pp , Jun [22] B. Zhou, J. Kang, S. Song, S. Lin, K. Abdel-Ghaffar, and M. Xu, Construction of non-binary quasi-cyclic LDPC codes by arrays and array dispersions, IEEE Trans. Commun., vol. 57, no. 6, pp , Jun [23] J. Lin, J. Sha, Z. Wang, and L. Li, Efficient decoder design for nonbinary quasicyclic LDPC codes, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57, no. 5, pp , May [24] C.-L. Wey, M.-D. Shieh, and S.-Y. Lin, Algorithms of finding the first two minimum values and their hardware implementation, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 55, no. 11, pp , Dec [25] X. Chen and C.-L. Wang, High-throughput efficient non-binary LDPC decoder based on the simplified min-sum algorithm, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 59, no. 11, pp , Nov Huyen Pham Thi (M 14) received the B.S. degree from the Department of Information and Communication Engineering, Military Technical Academy, Ha Noi, Vietnam, in She is currently working toward the M.S. and Ph.D. integrated degree with the Department of Information and Communication Engineering from Inha University, Incheon, South Korea. Her current research interests include algorithms and VLSI architecture design for digital signal processing, forward error correction architectures, and communication systems. Hanho Lee (M 98 SM 13) received the Ph.D. and M.S. degrees in electrical and computer engineering from the University of Minnesota, Minneapolis, MN, USA, in 2000 and 1996, respectively. From 2000 to 2002, he was a Member of Technical Staff with Lucent Technologies (Bell Labs Innovations), Allentown, PA, USA. From 2002 to 2004, he was an Assistant Professor with the Department of Electrical and Computer Engineering, University of Connecticut, Storrs, CT, USA. Since 2004, he has been with the Department of Information and Communication Engineering, Inha University, Incheon, Korea, where he is currently a Professor. From 2010 to 2011, he was a Visiting Scholar with Bell Labs, Alcatel-Lucent, Murray Hill, NJ, USA. His current research interests include VLSI architecture design for forward error correction coding, cryptographic, VLSI signal processing, and digital communications.

Block-Layered Decoder Architecture for Quasi-Cyclic Nonbinary LDPC Codes

Block-Layered Decoder Architecture for Quasi-Cyclic Nonbinary LDPC Codes J Sign Process Syst (2015) 78:209 222 DOI 10.1007/s11265-013-0816-5 Block-Layered Decoder Architecture for Quasi-Cyclic Nonbinary LDPC Codes Chang-Seok Choi & Hanho Lee Received: 21 February 2013 /Revised:

More information

OVer past decades, iteratively decodable codes, such as

OVer past decades, iteratively decodable codes, such as 1 Trellis-based Extended Min-Sum Algorithm for Non-binary LDPC Codes and its Hardware Structure Erbao Li, David Declercq Senior Member, IEEE, and iran Gunnam Senior Member, IEEE Abstract In this paper,

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES

HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES HIGH-THROUGHPUT MULTI-RATE LDPC DECODER BASED ON ARCHITECTURE-ORIENTED PARITY CHECK MATRICES Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, Joseph R. Cavallaro ECE Department, Rice University

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

LOW-DENSITY parity-check (LDPC) codes, which are defined

LOW-DENSITY parity-check (LDPC) codes, which are defined 734 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 9, SEPTEMBER 2009 Design of a Multimode QC-LDPC Decoder Based on Shift-Routing Network Chih-Hao Liu, Chien-Ching Lin, Shau-Wei

More information

THE turbo code is one of the most attractive forward error

THE turbo code is one of the most attractive forward error IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 63, NO. 2, FEBRUARY 2016 211 Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Youngjoo Lee, Member, IEEE, Meng

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical

More information

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Saleh Usman, Mohammad M. Mansour, Ali Chehab Department of Electrical and Computer Engineering American University of Beirut Beirut

More information

Efficient Configurable Decoder Architecture for Non-binary Quasi-cyclic LDPC Codes

Efficient Configurable Decoder Architecture for Non-binary Quasi-cyclic LDPC Codes 1 Efficient Configurable Decoder Architecture for Non-binary Quasi-cyclic LDPC Codes Xiaoheng Chen, Shu Lin, Life Fellow, IEEE, and Venkatesh Akella Department of Electrical and Computer Engineering University

More information

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA

MULTI-RATE HIGH-THROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA MULTIRATE HIGHTHROUGHPUT LDPC DECODER: TRADEOFF ANALYSIS BETWEEN DECODING THROUGHPUT AND AREA Predrag Radosavljevic, Alexandre de Baynast, Marjan Karkooti, and Joseph R. Cavallaro Department of Electrical

More information

Complexity Comparison of Non-Binary LDPC Decoders

Complexity Comparison of Non-Binary LDPC Decoders Complexity Comparison of Non-Binary LDPC Decoders Laura Conde-Canencia, Ali Al-Ghouwayel, Emmanuel Boutillon To cite this version: Laura Conde-Canencia, Ali Al-Ghouwayel, Emmanuel Boutillon. Complexity

More information

Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders

Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Cost efficient FPGA implementations of Min- Sum and Self-Corrected-Min-Sum decoders Oana Boncalo (1), Alexandru Amaricai (1), Valentin Savin (2) (1) University Politehnica Timisoara, Romania (2) CEA-LETI,

More information

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes

Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Quantized Iterative Message Passing Decoders with Low Error Floor for LDPC Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego 1. Introduction Low-density parity-check (LDPC) codes

More information

Quasi-Cyclic Non-Binary LDPC Codes for MLC NAND Flash Memory

Quasi-Cyclic Non-Binary LDPC Codes for MLC NAND Flash Memory for MLC NAND Flash Memory Ahmed Hareedy http://www.loris.ee.ucla.edu/ LORIS Lab, UCLA http://www.uclacodess.org/ CoDESS, UCLA Joint work with: Clayton Schoeny (UCLA), Behzad Amiri (UCLA), and Lara Dolecek

More information

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel

BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel I J C T A, 9(40), 2016, pp. 397-404 International Science Press ISSN: 0974-5572 BER Evaluation of LDPC Decoder with BPSK Scheme in AWGN Fading Channel Neha Mahankal*, Sandeep Kakde* and Atish Khobragade**

More information

Image coding using Cellular Automata based LDPC codes

Image coding using Cellular Automata based LDPC codes 146 Image coding using Cellular Automata based LDPC codes Anbuselvi M1 and Saravanan P2 SSN college of engineering, Tamilnadu, India Summary Image transmission in wireless channel includes the Low Density

More information

On combining chase-2 and sum-product algorithms for LDPC codes

On combining chase-2 and sum-product algorithms for LDPC codes University of Wollongong Research Online Faculty of Engineering and Information Sciences - Papers: Part A Faculty of Engineering and Information Sciences 2012 On combining chase-2 and sum-product algorithms

More information

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO 2402 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016 A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO Antony Xavier Glittas,

More information

A new two-stage decoding scheme with unreliable path search to lower the error-floor for low-density parity-check codes

A new two-stage decoding scheme with unreliable path search to lower the error-floor for low-density parity-check codes IET Communications Research Article A new two-stage decoding scheme with unreliable path search to lower the error-floor for low-density parity-check codes Pilwoong Yang 1, Bohwan Jun 1, Jong-Seon No 1,

More information

FPGA Implementation of Binary Quasi Cyclic LDPC Code with Rate 2/5

FPGA Implementation of Binary Quasi Cyclic LDPC Code with Rate 2/5 FPGA Implementation of Binary Quasi Cyclic LDPC Code with Rate 2/5 Arulmozhi M. 1, Nandini G. Iyer 2, Anitha M. 3 Assistant Professor, Department of EEE, Rajalakshmi Engineering College, Chennai, India

More information

THE DESIGN OF STRUCTURED REGULAR LDPC CODES WITH LARGE GIRTH. Haotian Zhang and José M. F. Moura

THE DESIGN OF STRUCTURED REGULAR LDPC CODES WITH LARGE GIRTH. Haotian Zhang and José M. F. Moura THE DESIGN OF STRUCTURED REGULAR LDPC CODES WITH LARGE GIRTH Haotian Zhang and José M. F. Moura Department of Electrical and Computer Engineering Carnegie Mellon University, Pittsburgh, PA 523 {haotian,

More information

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes

lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes lambda-min Decoding Algorithm of Regular and Irregular LDPC Codes Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger To cite this version: Emmanuel Boutillon, Frédéric Guillou, Jean-Luc Danger lambda-min

More information

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders

Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 1, NO. 1, NOVEMBER 2006 1 Tradeoff Analysis and Architecture Design of High Throughput Irregular LDPC Decoders Predrag Radosavljevic, Student

More information

Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes

Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes Partly Parallel Overlapped Sum-Product Decoder Architectures for Quasi-Cyclic LDPC Codes Ning Chen, Yongmei Dai, and Zhiyuan Yan Department of Electrical and Computer Engineering, Lehigh University, PA

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

An Area-Efficient BIRA With 1-D Spare Segments

An Area-Efficient BIRA With 1-D Spare Segments 206 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 26, NO. 1, JANUARY 2018 An Area-Efficient BIRA With 1-D Spare Segments Donghyun Kim, Hayoung Lee, and Sungho Kang Abstract The

More information

Dynamic Window Decoding for LDPC Convolutional Codes in Low-Latency Optical Communications

Dynamic Window Decoding for LDPC Convolutional Codes in Low-Latency Optical Communications MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Dynamic Window Decoding for LDPC Convolutional Codes in Low-Latency Optical Communications Xia, T.; Koike-Akino, T.; Millar, D.S.; Kojima,

More information

A New List Decoding Algorithm for Short-Length TBCCs With CRC

A New List Decoding Algorithm for Short-Length TBCCs With CRC Received May 15, 2018, accepted June 11, 2018, date of publication June 14, 2018, date of current version July 12, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2847348 A New List Decoding Algorithm

More information

The Serial Commutator FFT

The Serial Commutator FFT The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this

More information

Architecture of a low-complexity non-binary LDPC decoder for high order fields

Architecture of a low-complexity non-binary LDPC decoder for high order fields Architecture of a low-complexity non-binary LDPC decoder for high order fields Adrian Voicila, François Verdier, David Declercq ETIS ENSEA/UCP/CNRS UMR-8051 95014 Cergy-Pontoise, (France) Marc Fossorier

More information

Design of a Quasi-Cyclic LDPC Decoder Using Generic Data Packing Scheme

Design of a Quasi-Cyclic LDPC Decoder Using Generic Data Packing Scheme Design of a Quasi-Cyclic LDPC Decoder Using Generic Data Packing Scheme Jinlei Chen, Yan Zhang and Xu Wang Key Laboratory of Network Oriented Intelligent Computation Shenzhen Graduate School, Harbin Institute

More information

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT

More information

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Efficient Majority Logic Fault Detector/Corrector Using Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes 1 U.Rahila Begum, 2 V. Padmajothi 1 PG Student, 2 Assistant Professor 1 Department Of

More information

Fault Tolerant Parallel Filters Based on ECC Codes

Fault Tolerant Parallel Filters Based on ECC Codes Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on

More information

Performance Analysis of Gray Code based Structured Regular Column-Weight Two LDPC Codes

Performance Analysis of Gray Code based Structured Regular Column-Weight Two LDPC Codes IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-issn: 2278-2834,p- ISSN: 2278-8735.Volume 11, Issue 4, Ver. III (Jul.-Aug.2016), PP 06-10 www.iosrjournals.org Performance Analysis

More information

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes 1 Prakash K M, 2 Dr. G S Sunitha 1 Assistant Professor, Dept. of E&C, Bapuji Institute of Engineering and Technology,

More information

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1

New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 New Message-Passing Decoding Algorithm of LDPC Codes by Partitioning Check Nodes 1 Sunghwan Kim* O, Min-Ho Jang*, Jong-Seon No*, Song-Nam Hong, and Dong-Joon Shin *School of Electrical Engineering and

More information

A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S2 LDPC DECODER

A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S2 LDPC DECODER 18th European Signal Processing Conference (EUSIPCO-010) Aalborg, Denmark, August -7, 010 A NOVEL HARDWARE-FRIENDLY SELF-ADJUSTABLE OFFSET MIN-SUM ALGORITHM FOR ISDB-S LDPC DECODER Wen Ji, Makoto Hamaminato,

More information

On the construction of Tanner graphs

On the construction of Tanner graphs On the construction of Tanner graphs Jesús Martínez Mateo Universidad Politécnica de Madrid Outline Introduction Low-density parity-check (LDPC) codes LDPC decoding Belief propagation based algorithms

More information

Reduced Complexity of Decoding Algorithm for Irregular LDPC Codes Using a Split Row Method

Reduced Complexity of Decoding Algorithm for Irregular LDPC Codes Using a Split Row Method Journal of Wireless Networking and Communications 2012, 2(4): 29-34 DOI: 10.5923/j.jwnc.20120204.01 Reduced Complexity of Decoding Algorithm for Irregular Rachid El Alami *, Mostafa Mrabti, Cheikh Bamba

More information

A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder

A Memory Efficient FPGA Implementation of Quasi-Cyclic LDPC Decoder Proceedings of the 5th WSEAS Int. Conf. on Instrumentation, Measurement, Circuits and Systems, angzhou, China, April 6-8, 26 (pp28-223) A Memory Efficient FPGA Implementation of Quasi-Cyclic DPC Decoder

More information

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr

More information

Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) Codes for Deep Space and High Data Rate Applications

Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) Codes for Deep Space and High Data Rate Applications Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) Codes for Deep Space and High Data Rate Applications Nikoleta Andreadou, Fotini-Niovi Pavlidou Dept. of Electrical & Computer Engineering Aristotle University

More information

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm A New MIMO etector Architecture Based on A Forward-Backward Trellis Algorithm Yang Sun and Joseph R Cavallaro epartment of Electrical and Computer Engineering Rice University, Houston, TX 775 Email: {ysun,

More information

LOW-density parity-check (LDPC) codes have attracted

LOW-density parity-check (LDPC) codes have attracted 2966 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 50, NO. 12, DECEMBER 2004 LDPC Block and Convolutional Codes Based on Circulant Matrices R. Michael Tanner, Fellow, IEEE, Deepak Sridhara, Arvind Sridharan,

More information

98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011

98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011 98 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 58, NO. 1, JANUARY 2011 Memory System Optimization for FPGA- Based Implementation of Quasi-Cyclic LDPC Codes Decoders Xiaoheng Chen,

More information

Capacity-approaching Codes for Solid State Storages

Capacity-approaching Codes for Solid State Storages Capacity-approaching Codes for Solid State Storages Jeongseok Ha, Department of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) Contents Capacity-Approach Codes Turbo

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

Optimized Graph-Based Codes For Modern Flash Memories

Optimized Graph-Based Codes For Modern Flash Memories Optimized Graph-Based Codes For Modern Flash Memories Homa Esfahanizadeh Joint work with Ahmed Hareedy and Lara Dolecek LORIS Lab Electrical Engineering Department, UCLA 10/08/2016 Presentation Outline

More information

Delay efficient mux approach for finding the first two minimum values

Delay efficient mux approach for finding the first two minimum values Delay efficient mux approach for finding the first two minimum values Nakka Sivaraju¹ S Suman² ¹PG scholar, ECE Department, CEC, AP, India ²Assistant Professor, ECE Department, CEC, AP, India Abstract

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

Design of Cages with a Randomized Progressive Edge-Growth Algorithm

Design of Cages with a Randomized Progressive Edge-Growth Algorithm 1 Design of Cages with a Randomized Progressive Edge-Growth Algorithm Auguste Venkiah, David Declercq and Charly Poulliat ETIS - CNRS UMR 8051 - ENSEA - University of Cergy-Pontoise Abstract The progressive

More information

TURBO codes, [1], [2], have attracted much interest due

TURBO codes, [1], [2], have attracted much interest due 800 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 2, FEBRUARY 2001 Zigzag Codes and Concatenated Zigzag Codes Li Ping, Member, IEEE, Xiaoling Huang, and Nam Phamdo, Senior Member, IEEE Abstract

More information

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n

Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE n Low Complexity Quasi-Cyclic LDPC Decoder Architecture for IEEE 802.11n Sherif Abou Zied 1, Ahmed Tarek Sayed 1, and Rafik Guindi 2 1 Varkon Semiconductors, Cairo, Egypt 2 Nile University, Giza, Egypt Abstract

More information

Low complexity FEC Systems for Satellite Communication

Low complexity FEC Systems for Satellite Communication Low complexity FEC Systems for Satellite Communication Ashwani Singh Navtel Systems 2 Rue Muette, 27000,Houville La Branche, France Tel: +33 237 25 71 86 E-mail: ashwani.singh@navtelsystems.com Henry Chandran

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation

Efficient Implementation of Single Error Correction and Double Error Detection Code with Check Bit Precomputation http://dx.doi.org/10.5573/jsts.2012.12.4.418 JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.12, NO.4, DECEMBER, 2012 Efficient Implementation of Single Error Correction and Double Error Detection

More information

MANY image and video compression standards such as

MANY image and video compression standards such as 696 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 9, NO 5, AUGUST 1999 An Efficient Method for DCT-Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks Changhoon Yim and

More information

Check-hybrid GLDPC Codes Without Small Trapping Sets

Check-hybrid GLDPC Codes Without Small Trapping Sets Check-hybrid GLDPC Codes Without Small Trapping Sets Vida Ravanmehr Department of Electrical and Computer Engineering University of Arizona Tucson, AZ, 8572 Email: vravanmehr@ece.arizona.edu David Declercq

More information

ISSN (Print) Research Article. *Corresponding author Akilambigai P

ISSN (Print) Research Article. *Corresponding author Akilambigai P Scholars Journal of Engineering and Technology (SJET) Sch. J. Eng. Tech., 2016; 4(5):223-227 Scholars Academic and Scientific Publisher (An International Publisher for Academic and Scientific Resources)

More information

Design of a Low Density Parity Check Iterative Decoder

Design of a Low Density Parity Check Iterative Decoder 1 Design of a Low Density Parity Check Iterative Decoder Jean Nguyen, Computer Engineer, University of Wisconsin Madison Dr. Borivoje Nikolic, Faculty Advisor, Electrical Engineer, University of California,

More information

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications

A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications A Generic Architecture of CCSDS Low Density Parity Check Decoder for Near-Earth Applications Fabien Demangel, Nicolas Fau, Nicolas Drabik, François Charot, Christophe Wolinski To cite this version: Fabien

More information

A GRAPHICAL MODEL AND SEARCH ALGORITHM BASED QUASI-CYCLIC LOW-DENSITY PARITY-CHECK CODES SCHEME. Received December 2011; revised July 2012

A GRAPHICAL MODEL AND SEARCH ALGORITHM BASED QUASI-CYCLIC LOW-DENSITY PARITY-CHECK CODES SCHEME. Received December 2011; revised July 2012 International Journal of Innovative Computing, Information and Control ICIC International c 2013 ISSN 1349-4198 Volume 9, Number 4, April 2013 pp. 1617 1625 A GRAPHICAL MODEL AND SEARCH ALGORITHM BASED

More information

Optimized Min-Sum Decoding Algorithm for Low Density PC Codes

Optimized Min-Sum Decoding Algorithm for Low Density PC Codes Optimized Min-Sum Decoding Algorithm for Low Density PC Codes Dewan Siam Shafiullah, Mohammad Rakibul Islam, Mohammad Mostafa Amir Faisal, Imran Rahman, Dept. of Electrical and Electronic Engineering,

More information

RECENTLY, low-density parity-check (LDPC) codes have

RECENTLY, low-density parity-check (LDPC) codes have 892 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 53, NO. 4, APRIL 2006 Code Construction and FPGA Implementation of a Low-Error-Floor Multi-Rate Low-Density Parity-Check Code Decoder

More information

Adaptive Linear Programming Decoding of Polar Codes

Adaptive Linear Programming Decoding of Polar Codes Adaptive Linear Programming Decoding of Polar Codes Veeresh Taranalli and Paul H. Siegel University of California, San Diego, La Jolla, CA 92093, USA Email: {vtaranalli, psiegel}@ucsd.edu Abstract Polar

More information

Capacity-Approaching Low-Density Parity- Check Codes: Recent Developments and Applications

Capacity-Approaching Low-Density Parity- Check Codes: Recent Developments and Applications Capacity-Approaching Low-Density Parity- Check Codes: Recent Developments and Applications Shu Lin Department of Electrical and Computer Engineering University of California, Davis Davis, CA 95616, U.S.A.

More information

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA

Piecewise Linear Approximation Based on Taylor Series of LDPC Codes Decoding Algorithm and Implemented in FPGA Journal of Information Hiding and Multimedia Signal Processing c 2018 ISSN 2073-4212 Ubiquitous International Volume 9, Number 3, May 2018 Piecewise Linear Approximation Based on Taylor Series of LDPC

More information

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. K. Ram Prakash 1, A.V.Sanju 2 1 Professor, 2 PG scholar, Department of Electronics

More information

Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR)

Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR) Optimized ARM-Based Implementation of Low Density Parity Check Code (LDPC) Decoder in China Digital Radio (CDR) P. Vincy Priscilla 1, R. Padmavathi 2, S. Tamilselvan 3, Dr.S. Kanthamani 4 1,4 Department

More information

Minimum-Polytope-Based Linear Programming Decoder for LDPC Codes via ADMM Approach

Minimum-Polytope-Based Linear Programming Decoder for LDPC Codes via ADMM Approach Minimum-Polytope-Based Linear Programg Decoder for LDPC Codes via ADMM Approach Jing Bai, Yongchao Wang, Member, IEEE, Francis C. M. Lau, Senior Member, IEEE arxiv:90.07806v [cs.it] 23 Jan 209 Abstract

More information

Memory Efficient Decoder Architectures for Quasi-Cyclic LDPC Codes

Memory Efficient Decoder Architectures for Quasi-Cyclic LDPC Codes Memory Efficient Decoder Architectures for Quasi-Cyclic LDPC Codes Yongmei Dai, Ning Chen and Zhiyuan Yan Department of Electrical and Computer Engineering Lehigh University, PA 805, USA E-mails: {yod30,

More information

Anbuselvi et al., International Journal of Advanced Engineering Technology E-ISSN

Anbuselvi et al., International Journal of Advanced Engineering Technology E-ISSN Research Paper ANALYSIS OF A REDUED OMPLEXITY FFT-SPA BASED NON BINARY LDP DEODER WITH DIFFERENT ODE ONSTRUTIONS Anbuselvi M, Saravanan P and Arulmozhi M Address for orrespondence, SSN ollege of Engineering

More information

Fault Tolerant Parallel Filters Based On Bch Codes

Fault Tolerant Parallel Filters Based On Bch Codes RESEARCH ARTICLE OPEN ACCESS Fault Tolerant Parallel Filters Based On Bch Codes K.Mohana Krishna 1, Mrs.A.Maria Jossy 2 1 Student, M-TECH(VLSI Design) SRM UniversityChennai, India 2 Assistant Professor

More information

Hybrid Iteration Control on LDPC Decoders

Hybrid Iteration Control on LDPC Decoders Hybrid Iteration Control on LDPC Decoders Erick Amador and Raymond Knopp EURECOM 694 Sophia Antipolis, France name.surname@eurecom.fr Vincent Rezard Infineon Technologies France 656 Sophia Antipolis, France

More information

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014

120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 61, NO. 2, FEBRUARY 2014 VL-ECC: Variable Data-Length Error Correction Code for Embedded Memory in DSP Applications Jangwon Park,

More information

Finding Small Stopping Sets in the Tanner Graphs of LDPC Codes

Finding Small Stopping Sets in the Tanner Graphs of LDPC Codes Finding Small Stopping Sets in the Tanner Graphs of LDPC Codes Gerd Richter University of Ulm, Department of TAIT Albert-Einstein-Allee 43, D-89081 Ulm, Germany gerd.richter@uni-ulm.de Abstract The performance

More information

Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression

Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Syed kareem saheb, Research scholar, Dept. of ECE, ANU, GUNTUR,A.P, INDIA. E-mail:sd_kareem@yahoo.com A. Srihitha PG student dept.

More information

Randomized Progressive Edge-Growth (RandPEG)

Randomized Progressive Edge-Growth (RandPEG) Randomized Progressive Edge-Growth (Rand) Auguste Venkiah, David Declercq, Charly Poulliat ETIS, CNRS, ENSEA, Univ Cergy-Pontoise F-95000 Cergy-Pontoise email:{venkiah,declercq,poulliat}@ensea.fr Abstract

More information

ERROR correcting codes are used to increase the bandwidth

ERROR correcting codes are used to increase the bandwidth 404 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 37, NO. 3, MARCH 2002 A 690-mW 1-Gb/s 1024-b, Rate-1/2 Low-Density Parity-Check Code Decoder Andrew J. Blanksby and Chris J. Howland Abstract A 1024-b, rate-1/2,

More information

A Reduced Routing Network Architecture for Partial Parallel LDPC decoders

A Reduced Routing Network Architecture for Partial Parallel LDPC decoders A Reduced Routing Network Architecture for Partial Parallel LDPC decoders By HOUSHMAND SHIRANI MEHR B.S. (Sharif University of Technology) July, 2009 THESIS Submitted in partial satisfaction of the requirements

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes

Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes Majority Logic Decoding Of Euclidean Geometry Low Density Parity Check (EG-LDPC) Codes P. Kalai Mani, V. Vishnu Prasath PG Student, Department of Applied Electronics, Sri Subramanya College of Engineering

More information

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology Senthil Ganesh R & R. Kalaimathi 1 Assistant Professor, Electronics and Communication Engineering, Info Institute of Engineering,

More information

Realization of Fixed Angle Rotation for Co-Ordinate Rotation Digital Computer

Realization of Fixed Angle Rotation for Co-Ordinate Rotation Digital Computer International Journal of Innovative Research in Electronics and Communications (IJIREC) Volume 2, Issue 1, January 2015, PP 1-7 ISSN 2349-4042 (Print) & ISSN 2349-4050 (Online) www.arcjournals.org Realization

More information

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator A.Sindhu 1, K.PriyaMeenakshi 2 PG Student [VLSI], Dept. of ECE, Muthayammal Engineering College, Rasipuram, Tamil Nadu,

More information

A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders

A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC Decoders 2246 IEICE TRANS. FUNDAMENTALS, VOL.E94 A, NO.11 NOVEMBER 2011 PAPER Special Section on Smart Multimedia & Communication Systems A Fast Systematic Optimized Comparison Algorithm for CNU Design of LDPC

More information

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER Bhuvaneswaran.M 1, Elamathi.K 2 Assistant Professor, Muthayammal Engineering college, Rasipuram, Tamil Nadu, India 1 Assistant Professor, Muthayammal

More information

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material

More information

Efficient Markov Chain Monte Carlo Algorithms For MIMO and ISI channels

Efficient Markov Chain Monte Carlo Algorithms For MIMO and ISI channels Efficient Markov Chain Monte Carlo Algorithms For MIMO and ISI channels Rong-Hui Peng Department of Electrical and Computer Engineering University of Utah /7/6 Summary of PhD work Efficient MCMC algorithms

More information

THE orthogonal frequency-division multiplex (OFDM)

THE orthogonal frequency-division multiplex (OFDM) 26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,

More information

DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY

DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY K.Maheshwari M.Tech VLSI, Aurora scientific technological and research academy, Bandlaguda, Hyderabad. k.sandeep kumar Asst.prof,

More information

Design of Cages with a Randomized Progressive Edge-Growth Algorithm

Design of Cages with a Randomized Progressive Edge-Growth Algorithm 1 Design of Cages with a Randomized Progressive Edge-Growth Algorithm Auguste Venkiah, David Declercq and Charly Poulliat ETIS - CNRS UMR 8051 - ENSEA - University of Cergy-Pontoise Abstract The Progressive

More information

Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory

Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory Improving Min-sum LDPC Decoding Throughput by Exploiting Intra-cell Bit Error Characteristic in MLC NAND Flash Memory Wenzhe Zhao, Hongbin Sun, Minjie Lv, Guiqiang Dong, Nanning Zheng, and Tong Zhang Institute

More information

BECAUSE of their superior performance capabilities on

BECAUSE of their superior performance capabilities on 1340 IEEE TRANSACTIONS ON MAGNETICS, VOL. 41, NO. 4, APRIL 2005 Packet-LDPC Codes for Tape Drives Yang Han and William E. Ryan, Senior Member, IEEE Electrical and Computer Engineering Department, University

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN

International Journal of Scientific & Engineering Research, Volume 4, Issue 5, May-2013 ISSN 255 CORRECTIONS TO FAULT SECURE OF MAJORITY LOGIC DECODER AND DETECTOR FOR MEMORY APPLICATIONS Viji.D PG Scholar Embedded Systems Prist University, Thanjuvr - India Mr.T.Sathees Kumar AP/ECE Prist University,

More information