A Parallel Pruned Bit-Reversal Interleaver

Size: px
Start display at page:

Download "A Parallel Pruned Bit-Reversal Interleaver"

Transcription

1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST [12] M. Sprachmann, Automatic generation of parallel CRC circuits, IEEE Des. Test Comput., vol. 18, no. 3, pp , May/Jun [13] G. Campobello, M. Russo, and G. Patanè, Parallel CRC realization, IEEE Trans. Comput., vol. 52, no. 10, pp , Oct [14] Sarance Technologies, Ottawa, ON, Canada, CRC-32 for 10 Gbps/ OC192 and 40 Gbps/OC768 Systems, [Online]. Available: [15] A. Joglekar, M. Kounavis, and F. Berry, A scalable and high performance software iscsi implementation, in Proc. USENIX FAST, 2005, pp [16] A. Crouch, Technology developments favor IP storage growth, Communications Technology Lab, Intel, Apr [Online]. Available: [17] O. Weiss, M. Gansen, and T. Noll, A flexible datapath generator for physical oriented design, in Proc. ESSCIRC, Villach, Sep. 2001, pp A Parallel Pruned Bit-Reversal Interleaver Mohammad M. Mansour Abstract A parallel algorithm and architecture for pruned bit-reversal interleaving (PBRI) are proposed. For a pruned interleaver of size with mother interleaver size = 2, the proposed algorithm interleaves any number [0 1] in at most 1 steps, as opposed to steps using existing PBRI algorithms. A parallel architecture of the proposed algorithm employing simple logic gates and having a short critical path delay is presented. The proposed architecture is valuable in reducing (de-)interleaving latency in emerging wireless standards that employ PBRI channel (de-)interleaving in their PHY layer such as the 3GPP2 Ultra Mobile Broadband standard. Index Terms Bit-reversal maps, channel interleavers, pruned interleavers. I. INTRODUCTION Channel interleaving is employed in most modern wireless communications systems to protect against burst errors [1]. A channel interleaver reshuffles encoded symbols in such a way that consecutive symbols get spread apart from each other as far as possible in order to break the temporal correlation between successive symbols involved in a burst of errors. The reverse de-interleaving operation is performed at the receiver side before feeding the symbols to the channel decoder. Typically, these interleavers employ some form of bit-reversal operations in generating the interleaved addresses, and have a programmable size to accommodate for various encoded packet lengths. For example, the emerging Ultra Mobil Broadband (UMB) standard within the 3rd Generation Partnership Project 2 (3GPP2) [2] employs a pruned bit-reversal channel interleaver in its PHY layer to interleave any packet of length that is a multiple of 8. In pruned bit-reversal interleaving, a packet of size N is interleaved by mapping n-bit linear addresses into n-bit bit-reversed addresses, where n is the smallest integer such that Manuscript received November 05, 2007; revised April 08, First published June 16, 2009; current version published July 22, The author was with Qualcomm Flarion Technologies, Bridgewater, NJ USA. He is currently with the Electrical and Computer Engineering Department, the American University of Beirut, Beirut , Lebanon ( mmansour@aub.edu.lb). Digital Object Identifier /TVLSI N 2 n. Linear addresses that map to addresses outside [0;N0 1] are invalid addresses and get pruned out (see [3] and [4] for other pruning techniques). The emphasis in the literature on interleavers and their architectures has been largely in the context of interleavers employed in turbo codes. Not much work has been done on architectures for PBRI channel interleavers. Bit-reversal mapping has been mainly applied to reduce row conflicts and improve hit-rates in SDRAM applications [5], and to improve the shuffle permutation stages of the FFT algorithm [6], [7]. In turbo interleavers, the emphasis has been on reducing interleaving latency by avoiding memory collisions of read/write operations by the constituent MAP decoders [8] [11]. Software programmable turbo interleavers for multiple 3G wireless standards have been addressed in [12]. A major disadvantage of a PBRI interleaver is that, despite its simplicity, interleaved addresses must be generated sequentially. That is, in order to generate the interleaved address of a linear address x, the interleaved addresses of all linear addresses less than x must first be generated. This follows from the fact that the number of pruned addresses that have occurred before x must be known in order to know where x gets mapped to. This requirement introduces a latency bottleneck, especially when (de-)interleaving long packets (e.g., 16 K in UMB [2]). In this paper, we present an algorithm that eliminates this dependency and determines any interleaved address in at most n 0 1 steps. Moreoever, the algorithm has a very simple architecture that can be constructed using basic logic gates and has a short critical path delay. II. SEQUENTIAL PBRI ALGORITHM A bit-reversal interleaver (BRI) maps an n-bit number x into another n-bit number y according to a simple bit-reversal rule such that the bits of y appear in the reverse order with respect to x. We designate the BRI mapping on n bits by the function y = n(x). The values taken by x and y range from 0 to 2 n 0 1, where M 2 n is the size of the interleaver. A pruned BRI maps an n-bit number x less than N, where N M, into another n-bit number y less than N according to the bit-reversal rule. The size of the pruned interleaver is N, while the size of mother interleaver is M. Note that the numbers from N to M 0 1 are pruned out of the interleaver mappings and are not considered valid mappings. We designate the PBRI mapping on n bits with parameter N by the function y = n;n (x). The mapping n;n (x) for a given x is computed sequentially by starting from y =0and maintaining the number of invalid mappings (x) skipped along the way. If y + (x) maps to a valid number (i.e., n (y + (x)) <N), then y is incremented by 1. If y + (x) maps to an invalid number, (x) is incremented by 1. These operations are repeated until y reaches x, and n (x) is valid. Algorithm 1 shows the pseudo-code of the sequential PBRI algorithm. Algorithm 1 Sequential PBRI algorithm. procedure PBRI-Seq(n; N; x) y 0 (x) 0 while y x do if n (y + (x)) N then (x) (x) +1 else n;n (y) n (y + (x)) y y +1 end if end while /$ IEEE

2 1148 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST 2009 TABLE I BRI AND PBRI MAPPINGS FOR n =4, N =12 TABLE II RECURSIVE COMPUTATION OF (9) USING (2) Theorem 1: Algorithm 1 with x = N 0 1 maps the set of integers [0;N01] into [0;N01] in exactly M 0 1 iterations for M=2 <N< M. In addition, the algorithm prunes exactly M 0N 01 integers before it terminates. Hence, the time complexity of a PBRI is determined by the size of its mother interleaver M and not N. Proof: We first prove that the map n;n is a bijection whose range is [0;N 0 1]. n;n is the composition of two maps, n, and the map f : x 7! x + (x). n is obviously bijective. In addition, since is a non-decreasing function, f is an increasing function. Therefore it is an injection whose codomain is its range, and hence, f is bijective. Therefore, n;n is a bijection whose range is the domain of f. The number of iterations run by Algorithm 1 for x = N 0 1 until it terminates (i.e., when y = x)isy+(x)+1 = N 01+(N01)+1 = N + (N 0 1). Next we determine (N 0 1) by proving that the last integer in [0;N 0 1], N 0 1, always gets mapped to M=2 0 1, i.e., n;n (N 0 1) = M= Consider the integer M=2 0 1 in the range of n;n. Since n;n is bijective, 9 y0 such that n (y0) =M=2 0 1, or y0 = 01 n (M=20 1) = M 0 2. Since f is bijective, 9 x0 N 0 1 such that f (x0) = x0 + (x0) = M 0 2. Assume that x0 6= N 0 1, but some number less than N 0 1. Consider x1 = x0 +1 N 0 1. Since f is an increasing function and f (x0) = M 0 2, f (x1) can only be M 0 1. But n (f(x1)) = n;n (x1) =M 0 1 >N01, which is a contradiction. Therefore, x0 = N 0 1=f 01 (M 0 2) = f 01 ( 01 n (M=20 1)) = 01 n;n (M=20 1). It follows that f (N 0 1) = N 0 1+(N 0 1) = M 0 2, and hence (N 0 1) = M 0 N 0 1. Hence, the algorithm terminates in N +(N 0 1) = M 0 1 iterations. 1) Example 1: Table I shows the mappings computed using Algorithm 1 assuming n = 4 and N = 12. Note that in this case, M =2 4 =16, so 4 mappings are pruned by 4;12. We assume in the following that the size of the pruned interleaver (N ) is more than half the size of the mother interleaver (M ), i.e., N> M=2, otherwise, the problem can be reformulated such that M is the smallest power-of-2 greater than or equal to N. In addition, if N = M, then (x) =0for all x and n;n (x) = n(x). There are no pruned integers in this case since all integers have valid mappings, and hence this case is degenerate. Hereafter, we assume that M <N<M: (1) 2 Note that from the definition of the bit-reversal operation and condition (1), it follows that if n (x) N, then n (x +1)<N, i.e., two consecutive numbers can not both have invalid mappings. We can use this fact to give a recursive definition of (x) for 0 x<n (x) = 0; if x =0; (x 0 1); if n (x + (x 0 1)) <N; (x 0 1) + 1; otherwise. In addition, note that if y>x, then (y) (x), and hence is a non-decreasing function. 2) Example 2: Let N = 96, n = 7, x = 9. (9) is determined recursively using (2) as shown in Table II. (9) = 3. Next, we show that Algorithm 1 always performs M 0 1 iterations in mapping the integers in [0;N 0 1], for any N satisfying (1). That is, the algorithm traverses all the integers in [0; 2 n 0 2] when mapping the integers [0;N0 1] independent of N, always pruning M 0 N 0 1 integers along the way. 1 1 Note that M 01 is a palindrome, i.e., (M 01) = M 01 N,soM 01 maps to an invalid number. Hence the algorithm terminates before M 01, which leaves only M 0 N 0 1 invalid integers to be pruned. (2) III. DETERMINING THE INVALID MAPPINGS (x) The time complexity of Algorithm 1 is O(M ), which follows directly from the fact that the number of invalid mappings (x) that have occurred in mapping all integers less than x must first be computed in order to determine what value x maps to. In the following, we present an algorithm to determine (x) with complexity O(log2 (M)) by analyzing the bit-structure of the invalid mappings. We first examine the quantity (x) in more detail. Note that (x) represents the minimum number of integers that must be skipped such that all integers from 0 to x have valid mappings. Equivalently, (x) represents the minimum number that needs to be added to x such that there are exactly x +1integers in the range [0;x + (x)] that have valid mappings. This quantity is not necessarily equal to the number of integers less than x that have invalid mappings, which we denote by (x). In fact, (x) (x) (see Fig. 1). This follows from the fact that for the (x) integers in the range [0;x] with invalid mappings, at least 1 (x) (x) more integers greater than x must be tried to check if they have valid mappings. But the numbers from x +1to x + (x) can in turn have invalid mappings that must be taken into account. So (x) is at least equal to number of invalid mappings in the range [0;x+ 1 (x)], which is given by 2 (x) (x + 1 (x)). Similarly, the numbers from x + 1 (x) +1to x + 2 (x) can in turn have invalid mappings that must be taken into account. So (x) is at least equal to number of invalid mappings in the range [0;x+ 2 (x)], which is given by 3 (x) (x + 2 (x)). The process is repeated for k steps until the interval [0;x + k (x)] contains exactly x +1valid mappings x + k (x) +1 0 x + k (x) = x +1 or equivalently until k (x) = (x + k (x)) k+1 (x). Then, (x) = k (x). Algorithm 2 shows the pseudo-code of the -algorithm that computes (x) iteratively using (x).

3 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST We denote the binary representation of x<2 n by x = x n01x n02 111x 1 x 0 ; x i =0or 1 Fig. 1. Invalid integers counted by (x) and (x). Algorithm 2 -algorithm procedure -ALGORITHM(n; N; x) k 0 0 (x) 0 repeat k+1 (x) (x + k (x)) until ( k+1 (x) = k (x)) (x) k (x) n;n (x) n (x + (x)) Theorem 2: The -algorithm converges to (x) in at most n 0 1 iterations. Proof: The convergence time to compute (x) is upper bounded by the time to compute (N 0 1). Since two consecutive numbers can not both have invalid mappings, then at most (x)=2 k new invalid integers can be added at step k. The algorithm terminates because the number of new invalid integers to be added decays exponentially with the number of iterations. We next show that the scenario that corresponds to adding the maximum number of invalid integers at each iteration requires the maximum number of iterations to converge. Consider the sum of the number of invalid integers added up to step k under such scenario: S(x; k) = k01 j=0 ((x)=2j ). Assume that S converges to (x) in k iterations, i.e., (x)=2 j = 0 for j k. Let there be another sum S 0 (x; k 0 ) that converges to (x) in k 0 steps such that at least at one step i<min(k; k 0 ), less than the maximum of (x)=2 i new invalid integers are added: S 0 (x; k 0 )= k 01 (0 (x)=2 j ). Since j=0 (x)=2 j =0for j k and 0 (x)=2 j (x)=2 j, for all j<k, it follows that 0 (x)=2 j =0for j k, and hence S 0 converges in at most k steps. So S 0 (x; k 0 )=S 0 (x; k). Moreover, since there exists at least one i<ksuch that 0 (x)=2 i <(x)=2 i, then S 0 (x; k) <S(x; k). Hence, S 0 converges to a number less than (x), which is a contradiction. Hence, S(x; k), if it exists, is the unique sequence that converges to (x) in k steps. Next we show that such a sequence exists for x = N 0 1, i.e., S(N 0 1;k) converges to (N 0 1) = M 0 N 0 1 in k = n 0 1 steps when N = M=2+1such that at each step j, (N 01)=2 j new invalid integers are added. Note that from Theorem 1, (N 01) = (M=2) = M=2 0 1, which is represented in binary as (n 0 1) ones. In addition, max((m=2)) = M=4 is represented in binary as one 1 and (n 0 2) zeros. Hence, (n 0 1) shift-and-add operations on M=2 are needed to produce (M=2)=2+(M=2)= (M=2)=2 n01 = M= IV. DETERMINING (x) The problem of determining (x) reduces to that of determining (x). We next present an algorithm to determine (x) by studying the bit-representation of the invalid numbers from N to M 0 1. where x n01 is the most significant bit (MSB) and x 0 is the least significant bit (LSB). We use the notation x[i : j] to represent the set of consecutive bits x i ;x i01;...;x j, ordered from MSB to LSB. The concatenation of two bit strings x[i 1 : j 1 ] and x[i 2 : j 2 ] is represented as y = x[i 1 : j 1] j x[i 2 : j 2]. Consider the bit-representation of the numbers between N and M 0 1. These numbers can be classified by their most-significant bits according to the bit-representation of N 0 1 as follows. Let z denote the number of zero bits in the bit-representation of N 0 1, and I 0 be the index set of those zeros ordered from most-significant to least-significant bit. For example, if N 0 1 = , then z = 4, I 0 = f5; 3; 1; 0g. Then the numbers x can be classified into 4 classes as follows (x represents don t care): C 0 1 : 11xxxxx (32 numbers); C 0 2 : 1011xxx (8 numbers); C 0 3 : x (2 numbers); C 0 4 : (1 number). The MSBs that define these classes are determined by scanning the bits of N 0 1 from left to right, searching for the zero bits. The MSBs of the first class correspond to the MSBs of N 0 1 up to the first zero, and then flipping the first zero to one. The MSBs of the second class correspond to the MSBs of N 0 1 up to the second zero, and then flipping the second zero to one. The MSBs of the remaining classes are similarly obtained. Mathematically, the smallest number in each of the z classes can be expressed as 0 i = N I (i) I (i) ; i =1; 2;...;z: We designate each class by its smallest number 0 i. Table III shows the three classes for the case N = 10011: class of 0 1 = 11000, class of 0 2 = 10100, and class of 0 3 = We are interested in the set of integers, which when bit-reversed, become invalid. These integers belong to the above defined classes, but in bit-reversed order. Define i = n ( 0 i), i =1; 2;...;z, and let C i be the corresponding classes. The i s represent the classes of invalid numbers in bit-reversed order. Also, let I denote the index set of the zero bits of n (N 0 1) ordered from LSB to MSB. Hence, if x 2 C i, then n(x) N and x[i(i) : 0] = i[i(i) :0]. The third column in Table III lists the classes of 1 = 00011, 2 = 00101, and 3 = for the case N = ) Example 3: Let N 0 1 = Then n(n 0 1) = , z =4, I = f1; 3; 5; 6g. The classes of invalid numbers in bit-reversed order are C 1 : xxxxx11, C 2 : xxx1101, C 3 : x110101, and C 4 : , with 1 = , 2 = , 3 = , and 4 = The number of invalid mappings (x) up to and including x can be determined by counting the number of invalid mappings belonging to each class C i, i =1; 2;...;z. Denote the number of invalid mappings belonging to class C i by i(x). Then, i(x) can be determined: 1) i; 2) MSB s of x to the left of the ith zero; x[n 0 1 :I(i) +1]; and 3) the remaining LSB s of x to the right of and including the ith zero, x[i(i) :0]. The most significant (n0i(i)01) bits x[n 01 : I(i)+1] represent the number of integers belonging to C i that have appeared before x, i.e., those integers that have same (I(i)+1)LSBs as i but are less than x[n 0 1:I(i)+1]j i [I(i) :0]. The least significant (I(i)+1)bits x[i(i) :0]are used to check if x x[n 0 1:I(i)+1]j i [I(i) :0], or equivalently, if x[i(i) :0] i[i(i) :0]. This checks if x itself maps to an invalid integer in C i,orifx maps to an integer greater than

4 1150 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST 2009 TABLE III INVALID CLASSES C AND C FOR M =32AND N =19 Fig. 2. Steps involved in computing (x); i = 1;...; 4 for N 0 1 = and x = Fig. 3. Architecture of the -circuit, i =1; 2;...;n0 1.Ifen is valid, otherwise is invalid. =1, then the last invalid integer in C i. In either case, i(x) is incremented by 1. Mathematically, i (x) can be expressed as i (x) = x[n 0 1:I(i)+1]; if x[i(i) :0]< i[i(i) :0]; x[n 0 1:I(i)+1]+1; otherwise (3) and (x) is sum of all i(x) corresponding to all z classes (x) = z i=1 i (x): (4) 2) Example 4: Let N be as defined as in Example 3, and let x = Fig. 2 illustrates the steps involved in computing the i (x) s. The pseudo-code listed in Algorithm 3 summarizes the procedure for computing the i(x) using (3) and (4). Algorithm 3 -algorithm procedure -ALGORITHM(n; N; x) z number of 0 s in N 0 1 (2) I index set of 0 s in N 0 1 (2) from LSB to MSB for i 1, z do i(x) x[n 0 1:I(i)+1] if x[i(i) :0] i [I(i) :0]then i (x) i (x) +1 end if end for z (x) i=1 i(x) Fig. 4. Architecture of the -algorithm for n =8. V. HARDWARE ARCHITECTURE Fig. 3 shows the logic circuit for computing the i s. The maximum number of zero bits in N 01 is n01, so the circuit generates (n01) i outputs, i =1;...;n01. For each output, an enable signal en i is also generated to indicate whether the output i is valid or not. If the ith least significant bit of n(n 0 1) is 1, then i is not defined. Fig. 4 shows the architecture of the -algorithm for n = 8using 1-bit full adder cells, comparators, and a -circuit. The adder rows implement the -algorithm by accumulating right-shifted copies of y (i.e., the i(y) s) depending on the control signals from the -circuit, where y = x + k (x) is the input to the -algorithm at iteration k. If i is valid, then i (y) is accumulated. The last row of adder cells adds x to the accumulated sum of i s, and the total sum is fed back after the pipeline registers to compute the next at iteration k +1. The adder cells take two input bits to add and an enable signal from the top, a carry-in input from the right, and generate a sum bit from the bottom and carry-out bit from the left. If the enable bit is de-asserted, the y i input bits are zeroed-out. The comparators generate a one if y[i :

5 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST existing sequential algorithms, and has a simple architecture amenable for high-speed applications. The proposed algorithm is valuable for emerging wireless standards such as 3GPP2/UMB [2] that employ PBRI channel (de-)interleavers on long packets in reducing interleaving latency on the transmitter side and de-interleaving latency on the receiver side. REFERENCES Fig. 5. Parallel lookahead PBRI architecture using the -algorithm. 0] i[i : 0], which is fed as an input carry to the first adder cell in each row. This effectively adds 1 to i(y) if y[i : 0] i[i : 0]. Finally, the adder cells in the last row generate an extra output (right) that corresponds to adding a 1 to n (x) if n (x) N 01. But since the maximum value of n(x) is N 01, an equality comparator is sufficient. The left output from the adder cells in the last row after the pipeline registers corresponding to x + k (x) is fed back as a new input to the next iteration. The critical path delay of the architecture is 2n 0 2 stages, which easily meets timing requirements in application-specific integrated circuit (ASIC) applications in current process technologies for values of n up to 16. The unsigned comparators introduce negligible delay because they can be implemented using XOR trees. The output of the -architecture is sampled every n 0 1 clock cycles to read out (x). A comparator can be added (not shown here) to compare k (y) and k+1 (y) for early termination. The sum x + (x) is also generated, which when read in reverse order, is equivalent to n;n (x). Fig. 5 shows a parallel lookahead PBRI architecture using the -algorithm. A packet of length N is divided into P sub-packets of length L, where each sub-packet is interleaved independently. The -block for sub-packet i computes the number of invalid integers skipped in the interval [0;i1 L 0 1]. Then (i 1 L 0 1) is used to initialize the ith component PBRI interleaver to interleave sub-packet i. The sequential PBRI algorithm (Algorithm 1) can be used, or for small values of L (up to 16), a parallel component PBRI can be implemented to interleave L integers in parallel by using 2L adders, 2L comparators, a multiplexer, and control logic. This scheme is based on the fact that for a sub-packet of length L, there can be at most L invalid integers in an interval spanning 2L integers. Hence the ith component PBRI maps the L integers i1l; i1l+1;...; (i+1)1l01 into the first L valid interleaved addresses in the interval [i 1 L + (i 1 L 0 1);i1 L + (i 1 L 0 1) + 2L 0 1].It computes 2L sums fi 1 L + (i 1 L 0 1)g; fi 1 L + (i 1 L 0 1)g+1; fi 1 L + (i 1 L 0 1)g +2;...;fi 1 L + (i 1 L 0 1)g +2L 0 1, and compares their bit-reversed values with N 0 1 to check if they are valid, and if so, the valid addresses get passed by the control logic through the multiplexer. However, the complexity of this scheme rapidly increases for larger values of L. The architecture of the lookahead interleaver in Fig. 5 attains a speedup by a factor of P using a serial component PBRI, and by a factor of N using a parallel component PBRI, over a fully sequential PBRI architecture. [1] S. Lin and D. J. Costello, Error Control Coding, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, [2] Physical layer standard for ultra mobile broadband (UMB) air interface specification, [Online]. Available: [3] F. Daneshgaran and P. Mulassano, Interleaver pruning for construction of variable-length turbo codes, IEEE Trans. Inf. Theory, vol. 50, no. 3, pp , Mar [4] M. Eroz and A. R. Hammongs, Jr, On the design of prunable interleavers for turbo codes, in Proc. IEEE Veh. Technol. Conf., May 1999, vol. 2, pp [5] J. Shao and B. T. Davis, The bit-reversal SDRAM address mapping, in Proc. Workshop Software Compilers Embedded Systems, ACM Int. Conf. Proc. Series, 2005, vol. 136, pp [6] J. Prado, A new fast bit-reversal permutation algorithm based on a symmetry, IEEE Signal Process. Lett., vol. 11, no. 12, pp , Dec [7] J. S. Walker, A new bit reversal algorithm, IEEE Trans. Acoust., Speech Signal Process., vol. 38, no. 8, pp , Aug [8] A. Giulietti, L. Van Der Perre, and M. Strum, Parallel turbo coding interleavers: avoiding collisions in accesses to storage elements, IEEE Commun. Lett., vol. 38, no. 5, pp , Feb [9] R. Dobkin, M. Peleg, and R. Ginosar, Parallel interleaver design and VLSI architecture for low-latency MAP turbo decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp , Apr [10] M. Thul, F. Gilbert, and N. Wehn, Optimized concurrent interleaving architecture for high-throughput turbo-decoding, in Proc. Int. Conf. Electron., Circuits Syst., 2002, vol. 3, pp [11] A. Tarable, S. Benedetto, and G. Montorsi, Mapping interleaving laws to parallel turbo and ldpc decoder architectures, IEEE Trans. Inf. Theory, vol. 50, no. 9, pp , Sep [12] M.-C. Shin and I.-C. Park, Processor-based turbo interleaver for multiple third-generation wireless standards, IEEE Commun. Lett., vol. 7, no. 5, pp , May VI. CONCLUSION A parallel lookahead pruned bit-reversal interleaver algorithm and architecture have been proposed. The algorithm interleaves a packet of length N in at most log(n ) 0 1 steps compared to N steps using

THE orthogonal frequency-division multiplex (OFDM)

THE orthogonal frequency-division multiplex (OFDM) 26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 56, NO. 1, JANUARY 2009 81 Bit-Level Extrinsic Information Exchange Method for Double-Binary Turbo Codes Ji-Hoon Kim, Student Member,

More information

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Parallel-computing approach for FFT implementation on digital signal processor (DSP) Parallel-computing approach for FFT implementation on digital signal processor (DSP) Yi-Pin Hsu and Shin-Yu Lin Abstract An efficient parallel form in digital signal processor can improve the algorithm

More information

ARITHMETIC operations based on residue number systems

ARITHMETIC operations based on residue number systems IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 53, NO. 2, FEBRUARY 2006 133 Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues A. B. Premkumar, Senior Member,

More information

Design of Convolutional Codes for varying Constraint Lengths

Design of Convolutional Codes for varying Constraint Lengths Design of Convolutional Codes for varying Constraint Lengths S VikramaNarasimhaReddy 1, Charan Kumar K 2, Neelima Koppala 3 1,2 MTech(VLSI) Student, 3 Assistant Professor, ECE Department, SreeVidyanikethan

More information

K.V.GANESH*,D.SRI HARI**,M.HEMA*** *(Department of ECE,JNTUK,KAKINADA) **(Department of ECE,JNTUA,Anantapur) * **(Department of ECE,JNTUA,Anantapur)

K.V.GANESH*,D.SRI HARI**,M.HEMA*** *(Department of ECE,JNTUK,KAKINADA) **(Department of ECE,JNTUA,Anantapur) * **(Department of ECE,JNTUA,Anantapur) Applications (IJERA) ISSN: 2248-9622 www.ijera.com Design and Synthesis of a Field Programmable CRC Circuit Architecture K.V.GANESH*,D.SRI HARI**,M.HEMA*** *(Department of ECE,JNTUK,KAKINADA) **(Department

More information

Semi-Random Interleaver Design Criteria

Semi-Random Interleaver Design Criteria Semi-Random Interleaver Design Criteria Abstract christin@ee. ucla. edu, A spread interleaver [l] of length N is a semi-random interleaver based on the random selection without replacement of N integers

More information

EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS

EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS In Proceedings of the European Signal Processing Conference, pages 831-83, Poznan, Poland, September 27. EFFICIENT PARALLEL MEMORY ORGANIZATION FOR TURBO DECODERS Perttu Salmela, Ruirui Gu*, Shuvra S.

More information

Twiddle Factor Transformation for Pipelined FFT Processing

Twiddle Factor Transformation for Pipelined FFT Processing Twiddle Factor Transformation for Pipelined FFT Processing In-Cheol Park, WonHee Son, and Ji-Hoon Kim School of EECS, Korea Advanced Institute of Science and Technology, Daejeon, Korea icpark@ee.kaist.ac.kr,

More information

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES S. SRINIVAS KUMAR *, R.BASAVARAJU ** * PG Scholar, Electronics and Communication Engineering, CRIT

More information

Fault Tolerant Parallel Filters Based on ECC Codes

Fault Tolerant Parallel Filters Based on ECC Codes Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 11, Number 7 (2018) pp. 597-605 Research India Publications http://www.ripublication.com Fault Tolerant Parallel Filters Based on

More information

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter A.S. Sneka Priyaa PG Scholar Government College of Technology Coimbatore ABSTRACT The Least Mean Square Adaptive Filter is frequently

More information

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code COPY RIGHT 2018IJIEMR.Personal use of this material is permitted. Permission from IJIEMR must be obtained for all other uses, in any current or future media, including reprinting/republishing this material

More information

DUE to the high computational complexity and real-time

DUE to the high computational complexity and real-time IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 15, NO. 3, MARCH 2005 445 A Memory-Efficient Realization of Cyclic Convolution and Its Application to Discrete Cosine Transform Hun-Chen

More information

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes

Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Interlaced Column-Row Message-Passing Schedule for Decoding LDPC Codes Saleh Usman, Mohammad M. Mansour, Ali Chehab Department of Electrical and Computer Engineering American University of Beirut Beirut

More information

By, Ajinkya Karande Adarsh Yoga

By, Ajinkya Karande Adarsh Yoga By, Ajinkya Karande Adarsh Yoga Introduction Early computer designers believed saving computer time and memory were more important than programmer time. Bug in the divide algorithm used in Intel chips.

More information

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.14, NO.4, AUGUST, 2014 http://dx.doi.org/10.5573/jsts.2014.14.4.407 High Throughput Radix-4 SISO Decoding Architecture with Reduced Memory Requirement

More information

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm 1 A.Malashri, 2 C.Paramasivam 1 PG Student, Department of Electronics and Communication K S Rangasamy College Of Technology,

More information

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver E.Kanniga 1, N. Imocha Singh 2,K.Selva Rama Rathnam 3 Professor Department of Electronics and Telecommunication, Bharath

More information

Digit-Level Semi-Systolic and Systolic Structures for the Shifted Polynomial Basis Multiplication Over Binary Extension Fields

Digit-Level Semi-Systolic and Systolic Structures for the Shifted Polynomial Basis Multiplication Over Binary Extension Fields IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 19, NO. 11, NOVEMBER 211 2125 [1] B. Calhoun and A. Chandrakasan, Static noise margin variation for sub-threshold SRAM in 65-nm CMOS,

More information

On the Design of High Speed Parallel CRC Circuits using DSP Algorithams

On the Design of High Speed Parallel CRC Circuits using DSP Algorithams On the Design of High Speed Parallel CRC Circuits using DSP Algorithams 1 B.Naresh Reddy, 2 B.Kiran Kumar, 3 K.Mohini sirisha 1 Dept.of ECE,Kodada institute of Technology & Science for women,kodada,india

More information

< Irregular Repeat-Accumulate LDPC Code Proposal Technology Overview

<  Irregular Repeat-Accumulate LDPC Code Proposal Technology Overview Project IEEE 802.20 Working Group on Mobile Broadband Wireless Access Title Irregular Repeat-Accumulate LDPC Code Proposal Technology Overview Date Submitted Source(s):

More information

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

Fixed Point LMS Adaptive Filter with Low Adaptation Delay Fixed Point LMS Adaptive Filter with Low Adaptation Delay INGUDAM CHITRASEN MEITEI Electronics and Communication Engineering Vel Tech Multitech Dr RR Dr SR Engg. College Chennai, India MR. P. BALAVENKATESHWARLU

More information

A Modified Medium Access Control Algorithm for Systems with Iterative Decoding

A Modified Medium Access Control Algorithm for Systems with Iterative Decoding A Modified Medium Access Control Algorithm for Systems with Iterative Decoding Inkyu Lee Carl-Erik W. Sundberg Sunghyun Choi Dept. of Communications Eng. Korea University Seoul, Korea inkyu@korea.ac.kr

More information

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies VLSI IMPLEMENTATION OF HIGH PERFORMANCE DISTRIBUTED ARITHMETIC (DA) BASED ADAPTIVE FILTER WITH FAST CONVERGENCE FACTOR G. PARTHIBAN 1, P.SATHIYA 2 PG Student, VLSI Design, Department of ECE, Surya Group

More information

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay e-issn: 2349-9745 p-issn: 2393-8161 Scientific Journal Impact Factor (SJIF): 1.711 International Journal of Modern Trends in Engineering and Research www.ijmter.com Area And Power Efficient LMS Adaptive

More information

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision

LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLIC HERE TO EDIT < LLR-based Successive-Cancellation List Decoder for Polar Codes with Multi-bit Decision Bo Yuan and eshab. Parhi, Fellow,

More information

A Universal Test Pattern Generator for DDR SDRAM *

A Universal Test Pattern Generator for DDR SDRAM * A Universal Test Pattern Generator for DDR SDRAM * Wei-Lun Wang ( ) Department of Electronic Engineering Cheng Shiu Institute of Technology Kaohsiung, Taiwan, R.O.C. wlwang@cc.csit.edu.tw used to detect

More information

/$ IEEE

/$ IEEE IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 10, OCTOBER 2006 1147 Transactions Briefs Highly-Parallel Decoding Architectures for Convolutional Turbo Codes Zhiyong He,

More information

The Serial Commutator FFT

The Serial Commutator FFT The Serial Commutator FFT Mario Garrido Gálvez, Shen-Jui Huang, Sau-Gee Chen and Oscar Gustafsson Journal Article N.B.: When citing this work, cite the original article. 2016 IEEE. Personal use of this

More information

THE turbo code is one of the most attractive forward error

THE turbo code is one of the most attractive forward error IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 63, NO. 2, FEBRUARY 2016 211 Memory-Reduced Turbo Decoding Architecture Using NII Metric Compression Youngjoo Lee, Member, IEEE, Meng

More information

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 2, Issue 5 (May. Jun. 203), PP 66-72 e-issn: 239 4200, p-issn No. : 239 497 VLSI Implementation of Parallel CRC Using Pipelining, Unfolding

More information

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS American Journal of Applied Sciences 11 (4): 558-563, 2014 ISSN: 1546-9239 2014 Science Publication doi:10.3844/ajassp.2014.558.563 Published Online 11 (4) 2014 (http://www.thescipub.com/ajas.toc) PERFORMANCE

More information

Programmable Turbo Decoder Supporting Multiple Third-Generation Wireless Standards

Programmable Turbo Decoder Supporting Multiple Third-Generation Wireless Standards Programmable Turbo Decoder Supporting Multiple Third-eneration Wireless Standards Myoung-Cheol Shin and In-Cheol Park Department of Electrical Engineering and Computer Science, KAIST Yuseong-gu Daejeon,

More information

TURBO CODES with performance near the Shannon

TURBO CODES with performance near the Shannon IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 13, NO. 4, APRIL 2005 427 Parallel Interleaver Design and VLSI Architecture for Low-Latency MAP Turbo Decoders Rostislav (Reuven)

More information

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can

LOW-DENSITY PARITY-CHECK (LDPC) codes [1] can 208 IEEE TRANSACTIONS ON MAGNETICS, VOL 42, NO 2, FEBRUARY 2006 Structured LDPC Codes for High-Density Recording: Large Girth and Low Error Floor J Lu and J M F Moura Department of Electrical and Computer

More information

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST

FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST FPGA IMPLEMENTATION OF FLOATING POINT ADDER AND MULTIPLIER UNDER ROUND TO NEAREST SAKTHIVEL Assistant Professor, Department of ECE, Coimbatore Institute of Engineering and Technology Abstract- FPGA is

More information

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO 2402 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 24, NO. 6, JUNE 2016 A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO Antony Xavier Glittas,

More information

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS

EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL INTERLEAVER FOR LONG TERM EVOLUTION SYSTEMS Rev. Roum. Sci. Techn. Électrotechn. et Énerg. Vol. 61, 1, pp. 53 57, Bucarest, 016 Électronique et transmission de l information EFFICIENT RECURSIVE IMPLEMENTATION OF A QUADRATIC PERMUTATION POLYNOMIAL

More information

Binary Adders. Ripple-Carry Adder

Binary Adders. Ripple-Carry Adder Ripple-Carry Adder Binary Adders x n y n x y x y c n FA c n - c 2 FA c FA c s n MSB position Longest delay (Critical-path delay): d c(n) = n d carry = 2n gate delays d s(n-) = (n-) d carry +d sum = 2n

More information

RECENTLY, researches on gigabit wireless personal area

RECENTLY, researches on gigabit wireless personal area 146 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 55, NO. 2, FEBRUARY 2008 An Indexed-Scaling Pipelined FFT Processor for OFDM-Based WPAN Applications Yuan Chen, Student Member, IEEE,

More information

P V Sriniwas Shastry et al, Int.J.Computer Technology & Applications,Vol 5 (1),

P V Sriniwas Shastry et al, Int.J.Computer Technology & Applications,Vol 5 (1), On-The-Fly AES Key Expansion For All Key Sizes on ASIC P.V.Sriniwas Shastry 1, M. S. Sutaone 2, 1 Cummins College of Engineering for Women, Pune, 2 College of Engineering, Pune pvs.shastry@cumminscollege.in

More information

A Comparison of Two Algorithms Involving Montgomery Modular Multiplication

A Comparison of Two Algorithms Involving Montgomery Modular Multiplication ISSN (Online) : 2319-8753 ISSN (Print) : 2347-6710 International Journal of Innovative Research in Science, Engineering and Technology An ISO 3297: 2007 Certified Organization Volume 6, Special Issue 5,

More information

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018

International Journal of Engineering and Techniques - Volume 4 Issue 2, April-2018 RESEARCH ARTICLE DESIGN AND ANALYSIS OF RADIX-16 BOOTH PARTIAL PRODUCT GENERATOR FOR 64-BIT BINARY MULTIPLIERS K.Deepthi 1, Dr.T.Lalith Kumar 2 OPEN ACCESS 1 PG Scholar,Dept. Of ECE,Annamacharya Institute

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

Low Complexity Architecture for Max* Operator of Log-MAP Turbo Decoder

Low Complexity Architecture for Max* Operator of Log-MAP Turbo Decoder International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Low

More information

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC Thangamonikha.A 1, Dr.V.R.Balaji 2 1 PG Scholar, Department OF ECE, 2 Assitant Professor, Department of ECE 1, 2 Sri Krishna

More information

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool

Critical-Path Realization and Implementation of the LMS Adaptive Algorithm Using Verilog-HDL and Cadence-Tool IOSR Journal of VLSI and Signal Processing (IOSR-JVSP) Volume 6, Issue 3, Ver. II (May. -Jun. 2016), PP 32-40 e-issn: 2319 4200, p-issn No. : 2319 4197 www.iosrjournals.org Critical-Path Realization and

More information

HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS

HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS HIGH-THROUGHPUT FINITE FIELD MULTIPLIERS USING REDUNDANT BASIS FOR FPGA AND ASIC IMPLEMENTATIONS Shaik.Sooraj, Jabeena shaik,m.tech Department of Electronics and communication Engineering, Quba College

More information

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field Veerraju kaki Electronics and Communication Engineering, India Abstract- In the present work, a low-complexity

More information

DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY

DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY DESIGN OF FAULT SECURE ENCODER FOR MEMORY APPLICATIONS IN SOC TECHNOLOGY K.Maheshwari M.Tech VLSI, Aurora scientific technological and research academy, Bandlaguda, Hyderabad. k.sandeep kumar Asst.prof,

More information

the main limitations of the work is that wiring increases with 1. INTRODUCTION

the main limitations of the work is that wiring increases with 1. INTRODUCTION Design of Low Power Speculative Han-Carlson Adder S.Sangeetha II ME - VLSI Design, Akshaya College of Engineering and Technology, Coimbatore sangeethasoctober@gmail.com S.Kamatchi Assistant Professor,

More information

Carry Checking/Parity Prediction Adders and ALUs

Carry Checking/Parity Prediction Adders and ALUs IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 1, FEBRUARY 2003 121 Carry Checking/Parity Prediction Adders and ALUs Michael Nicolaidis Abstract In this paper, we present

More information

Method for hardware implementation of a convolutional turbo code interleaver and a sub-block interleaver

Method for hardware implementation of a convolutional turbo code interleaver and a sub-block interleaver Method for hardware implementation of a convolutional turbo code interleaver and a sub-block interleaver isclosed is a method for hardware implementation of a convolutional turbo code interleaver and a

More information

A Hybrid ARQ Scheme for Resilient Packet Header Compression

A Hybrid ARQ Scheme for Resilient Packet Header Compression A Hybrid ARQ Scheme for Resilient Packet Compression Vijay A Suryavanshi and Aria Nosratinia Multimedia Communications Laboratory, The University of Texas at Dallas Richardson, TX 7083-0688, USA E-mail:

More information

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation

A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Journal of Automation and Control Engineering Vol. 3, No. 1, February 20 A VLSI Architecture for H.264/AVC Variable Block Size Motion Estimation Dam. Minh Tung and Tran. Le Thang Dong Center of Electrical

More information

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors

A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors A Software LDPC Decoder Implemented on a Many-Core Array of Programmable Processors Brent Bohnenstiehl and Bevan Baas Department of Electrical and Computer Engineering University of California, Davis {bvbohnen,

More information

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm

A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm A Review on Optimizing Efficiency of Fixed Point Multiplication using Modified Booth s Algorithm Mahendra R. Bhongade, Manas M. Ramteke, Vijay G. Roy Author Details Mahendra R. Bhongade, Department of

More information

Chapter 6 Combinational-Circuit Building Blocks

Chapter 6 Combinational-Circuit Building Blocks Chapter 6 Combinational-Circuit Building Blocks Commonly used combinational building blocks in design of large circuits: Multiplexers Decoders Encoders Comparators Arithmetic circuits Multiplexers A multiplexer

More information

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding

A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding A Low Power Asynchronous FPGA with Autonomous Fine Grain Power Gating and LEDR Encoding N.Rajagopala krishnan, k.sivasuparamanyan, G.Ramadoss Abstract Field Programmable Gate Arrays (FPGAs) are widely

More information

A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER

A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER A MULTIBANK MEMORY-BASED VLSI ARCHITECTURE OF DIGITAL VIDEO BROADCASTING SYMBOL DEINTERLEAVER D.HARI BABU 1, B.NEELIMA DEVI 2 1,2 Noble college of engineering and technology, Nadergul, Hyderabad, Abstract-

More information

ISSN Vol.08,Issue.12, September-2016, Pages:

ISSN Vol.08,Issue.12, September-2016, Pages: ISSN 2348 2370 Vol.08,Issue.12, September-2016, Pages:2273-2277 www.ijatir.org G. DIVYA JYOTHI REDDY 1, V. ROOPA REDDY 2 1 PG Scholar, Dept of ECE, TKR Engineering College, Hyderabad, TS, India, E-mail:

More information

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers

IEEE-754 compliant Algorithms for Fast Multiplication of Double Precision Floating Point Numbers International Journal of Research in Computer Science ISSN 2249-8257 Volume 1 Issue 1 (2011) pp. 1-7 White Globe Publications www.ijorcs.org IEEE-754 compliant Algorithms for Fast Multiplication of Double

More information

422 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010

422 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010 422 IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 45, NO. 2, FEBRUARY 2010 Turbo Decoder Using Contention-Free Interleaver and Parallel Architecture Cheng-Chi Wong, Ming-Wei Lai, Chien-Ching Lin, Hsie-Chia

More information

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning

Lowering the Error Floors of Irregular High-Rate LDPC Codes by Graph Conditioning Lowering the Error Floors of Irregular High- LDPC Codes by Graph Conditioning Wen-Yen Weng, Aditya Ramamoorthy and Richard D. Wesel Electrical Engineering Department, UCLA, Los Angeles, CA, 90095-594.

More information

Area And Power Optimized One-Dimensional Median Filter

Area And Power Optimized One-Dimensional Median Filter Area And Power Optimized One-Dimensional Median Filter P. Premalatha, Ms. P. Karthika Rani, M.E., PG Scholar, Assistant Professor, PA College of Engineering and Technology, PA College of Engineering and

More information

Error Correction and Detection using Cyclic Redundancy Check

Error Correction and Detection using Cyclic Redundancy Check Error Correction and Detection using Cyclic Redundancy Check Dr. T. Logeswari Associate Professor, Dept of Computer Science, New Horizon College, Banglore, Karnataka, India ABSTRACT: In this paper Cyclic

More information

FAST Fourier transform (FFT) is an important signal processing

FAST Fourier transform (FFT) is an important signal processing IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I: REGULAR PAPERS, VOL. 54, NO. 4, APRIL 2007 889 Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing Hyun-Yong Lee, Student Member,

More information

AN FFT PROCESSOR BASED ON 16-POINT MODULE

AN FFT PROCESSOR BASED ON 16-POINT MODULE AN FFT PROCESSOR BASED ON 6-POINT MODULE Weidong Li, Mark Vesterbacka and Lars Wanhammar Electronics Systems, Dept. of EE., Linköping University SE-58 8 LINKÖPING, SWEDEN E-mail: {weidongl, markv, larsw}@isy.liu.se,

More information

Floating Point Square Root under HUB Format

Floating Point Square Root under HUB Format Floating Point Square Root under HUB Format Julio Villalba-Moreno Dept. of Computer Architecture University of Malaga Malaga, SPAIN jvillalba@uma.es Javier Hormigo Dept. of Computer Architecture University

More information

Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku

Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku Multiple Constraint Satisfaction by Belief Propagation: An Example Using Sudoku Todd K. Moon and Jacob H. Gunther Utah State University Abstract The popular Sudoku puzzle bears structural resemblance to

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Complexity Analysis of Routing Algorithms in Computer Networks

Complexity Analysis of Routing Algorithms in Computer Networks Complexity Analysis of Routing Algorithms in Computer Networks Peter BARTALOS Slovak University of Technology Faculty of Informatics and Information Technologies Ilkovičova 3, 84 6 Bratislava, Slovakia

More information

High Speed Special Function Unit for Graphics Processing Unit

High Speed Special Function Unit for Graphics Processing Unit High Speed Special Function Unit for Graphics Processing Unit Abd-Elrahman G. Qoutb 1, Abdullah M. El-Gunidy 1, Mohammed F. Tolba 1, and Magdy A. El-Moursy 2 1 Electrical Engineering Department, Fayoum

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India Memory-Based Realization of FIR Digital Filter by Look-Up- Table Optimization Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra

More information

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS Liang-Gee Chen Po-Cheng Wu Tzi-Dar Chiueh Department of Electrical Engineering National

More information

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks

Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks Performance of Multihop Communications Using Logical Topologies on Optical Torus Networks X. Yuan, R. Melhem and R. Gupta Department of Computer Science University of Pittsburgh Pittsburgh, PA 156 fxyuan,

More information

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation

Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation Overlapped Scheduling for Folded LDPC Decoding Based on Matrix Permutation In-Cheol Park and Se-Hyeon Kang Department of Electrical Engineering and Computer Science, KAIST {icpark, shkang}@ics.kaist.ac.kr

More information

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay

Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay Implementation of Reduce the Area- Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay A.Sakthivel 1, A.Lalithakumar 2, T.Kowsalya 3 PG Scholar [VLSI], Muthayammal Engineering College,

More information

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes

Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes Non-recursive complexity reduction encoding scheme for performance enhancement of polar codes 1 Prakash K M, 2 Dr. G S Sunitha 1 Assistant Professor, Dept. of E&C, Bapuji Institute of Engineering and Technology,

More information

Reduced Latency Majority Logic Decoding for Error Detection and Correction

Reduced Latency Majority Logic Decoding for Error Detection and Correction Reduced Latency Majority Logic Decoding for Error Detection and Correction D.K.Monisa 1, N.Sathiya 2 1 Department of Electronics and Communication Engineering, Mahendra Engineering College, Namakkal, Tamilnadu,

More information

Implementation of a Fast Sign Detection Algoritm for the RNS Moduli Set {2 N+1-1, 2 N -1, 2 N }, N = 16, 64

Implementation of a Fast Sign Detection Algoritm for the RNS Moduli Set {2 N+1-1, 2 N -1, 2 N }, N = 16, 64 GLOBAL IMPACT FACTOR 0.238 I2OR PIF 2.125 Implementation of a Fast Sign Detection Algoritm for the RNS Moduli Set {2 N+1-1, 2 N -1, 2 N }, N = 16, 64 1 GARNEPUDI SONY PRIYANKA, 2 K.V.K.V.L. PAVAN KUMAR

More information

MANY image and video compression standards such as

MANY image and video compression standards such as 696 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL 9, NO 5, AUGUST 1999 An Efficient Method for DCT-Domain Image Resizing with Mixed Field/Frame-Mode Macroblocks Changhoon Yim and

More information

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems.

Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. Design and Implementation of Signed, Rounded and Truncated Multipliers using Modified Booth Algorithm for Dsp Systems. K. Ram Prakash 1, A.V.Sanju 2 1 Professor, 2 PG scholar, Department of Electronics

More information

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression

Design and Implementation of FPGA- based Systolic Array for LZ Data Compression Design and Implementation of FPGA- based Systolic Array for LZ Data Compression Mohamed A. Abd El ghany Electronics Dept. German University in Cairo Cairo, Egypt E-mail: mohamed.abdel-ghany@guc.edu.eg

More information

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm

High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm High Speed ACSU Architecture for Viterbi Decoder Using T-Algorithm Atish A. Peshattiwar & Tejaswini G. Panse Department of Electronics Engineering, Yeshwantrao Chavan College of Engineering, E-mail : atishp32@gmail.com,

More information

CRYPTOGRAPHIC devices are widely used in applications

CRYPTOGRAPHIC devices are widely used in applications 1036 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 20, NO. 6, JUNE 2012 Secure Multipliers Resilient to Strong Fault-Injection Attacks Using Multilinear Arithmetic Codes Zhen Wang,

More information

Joint PHY/MAC Based Link Adaptation for Wireless LANs with Multipath Fading

Joint PHY/MAC Based Link Adaptation for Wireless LANs with Multipath Fading Joint PHY/MAC Based Link Adaptation for Wireless LANs with Multipath Fading Sayantan Choudhury and Jerry D. Gibson Department of Electrical and Computer Engineering University of Califonia, Santa Barbara

More information

Volume 5, Issue 5 OCT 2016

Volume 5, Issue 5 OCT 2016 DESIGN AND IMPLEMENTATION OF REDUNDANT BASIS HIGH SPEED FINITE FIELD MULTIPLIERS Vakkalakula Bharathsreenivasulu 1 G.Divya Praneetha 2 1 PG Scholar, Dept of VLSI & ES, G.Pullareddy Eng College,kurnool

More information

Available online at ScienceDirect. Procedia Technology 25 (2016 )

Available online at  ScienceDirect. Procedia Technology 25 (2016 ) Available online at www.sciencedirect.com ScienceDirect Procedia Technology 25 (2016 ) 544 551 Global Colloquium in Recent Advancement and Effectual Researches in Engineering, Science and Technology (RAEREST

More information

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm

A New MIMO Detector Architecture Based on A Forward-Backward Trellis Algorithm A New MIMO etector Architecture Based on A Forward-Backward Trellis Algorithm Yang Sun and Joseph R Cavallaro epartment of Electrical and Computer Engineering Rice University, Houston, TX 775 Email: {ysun,

More information

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope G. Mohana Durga 1, D.V.R. Mohan 2 1 M.Tech Student, 2 Professor, Department of ECE, SRKR Engineering College, Bhimavaram, Andhra

More information

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory

Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 PP 11-18 www.iosrjen.org Resource Efficient Multi Ported Sram Based Ternary Content Addressable Memory S.Parkavi (1) And S.Bharath

More information

Fountain Codes Based on Zigzag Decodable Coding

Fountain Codes Based on Zigzag Decodable Coding Fountain Codes Based on Zigzag Decodable Coding Takayuki Nozaki Kanagawa University, JAPAN Email: nozaki@kanagawa-u.ac.jp Abstract Fountain codes based on non-binary low-density parity-check (LDPC) codes

More information

ECE 341. Lecture # 6

ECE 341. Lecture # 6 ECE 34 Lecture # 6 Instructor: Zeshan Chishti zeshan@pdx.edu October 5, 24 Portland State University Lecture Topics Design of Fast Adders Carry Looakahead Adders (CLA) Blocked Carry-Lookahead Adders Multiplication

More information

DESIGN OF HYBRID PARALLEL PREFIX ADDERS

DESIGN OF HYBRID PARALLEL PREFIX ADDERS DESIGN OF HYBRID PARALLEL PREFIX ADDERS S. Sadiq Basha Dept. of ECE Vemu Institute of Technology Chittor,A.P Sadiqbasha4u@gmail.com H. Chandra Sekhar Associate Professor, ECE Vemu Institute of Technology

More information

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier U.V.N.S.Suhitha Student Department of ECE, BVC College of Engineering, AP, India. Abstract: The ever growing need for improved

More information

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices Mario Garrido Gálvez, Miguel Angel Sanchez, Maria Luisa Lopez-Vallejo and Jesus Grajal Journal Article N.B.: When citing this work, cite the original

More information