A Parallel Pruned Bit-Reversal Interleaver

Size: px

Start display at page:

Download "A Parallel Pruned Bit-Reversal Interleaver"

Eugene Thompson
6 years ago
Views:

1 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST [12] M. Sprachmann, Automatic generation of parallel CRC circuits, IEEE Des. Test Comput., vol. 18, no. 3, pp , May/Jun [13] G. Campobello, M. Russo, and G. Patanè, Parallel CRC realization, IEEE Trans. Comput., vol. 52, no. 10, pp , Oct [14] Sarance Technologies, Ottawa, ON, Canada, CRC-32 for 10 Gbps/ OC192 and 40 Gbps/OC768 Systems, [Online]. Available: [15] A. Joglekar, M. Kounavis, and F. Berry, A scalable and high performance software iscsi implementation, in Proc. USENIX FAST, 2005, pp [16] A. Crouch, Technology developments favor IP storage growth, Communications Technology Lab, Intel, Apr [Online]. Available: [17] O. Weiss, M. Gansen, and T. Noll, A flexible datapath generator for physical oriented design, in Proc. ESSCIRC, Villach, Sep. 2001, pp A Parallel Pruned Bit-Reversal Interleaver Mohammad M. Mansour Abstract A parallel algorithm and architecture for pruned bit-reversal interleaving (PBRI) are proposed. For a pruned interleaver of size with mother interleaver size = 2, the proposed algorithm interleaves any number [0 1] in at most 1 steps, as opposed to steps using existing PBRI algorithms. A parallel architecture of the proposed algorithm employing simple logic gates and having a short critical path delay is presented. The proposed architecture is valuable in reducing (de-)interleaving latency in emerging wireless standards that employ PBRI channel (de-)interleaving in their PHY layer such as the 3GPP2 Ultra Mobile Broadband standard. Index Terms Bit-reversal maps, channel interleavers, pruned interleavers. I. INTRODUCTION Channel interleaving is employed in most modern wireless communications systems to protect against burst errors [1]. A channel interleaver reshuffles encoded symbols in such a way that consecutive symbols get spread apart from each other as far as possible in order to break the temporal correlation between successive symbols involved in a burst of errors. The reverse de-interleaving operation is performed at the receiver side before feeding the symbols to the channel decoder. Typically, these interleavers employ some form of bit-reversal operations in generating the interleaved addresses, and have a programmable size to accommodate for various encoded packet lengths. For example, the emerging Ultra Mobil Broadband (UMB) standard within the 3rd Generation Partnership Project 2 (3GPP2) [2] employs a pruned bit-reversal channel interleaver in its PHY layer to interleave any packet of length that is a multiple of 8. In pruned bit-reversal interleaving, a packet of size N is interleaved by mapping n-bit linear addresses into n-bit bit-reversed addresses, where n is the smallest integer such that Manuscript received November 05, 2007; revised April 08, First published June 16, 2009; current version published July 22, The author was with Qualcomm Flarion Technologies, Bridgewater, NJ USA. He is currently with the Electrical and Computer Engineering Department, the American University of Beirut, Beirut , Lebanon ( mmansour@aub.edu.lb). Digital Object Identifier /TVLSI N 2 n. Linear addresses that map to addresses outside [0;N0 1] are invalid addresses and get pruned out (see [3] and [4] for other pruning techniques). The emphasis in the literature on interleavers and their architectures has been largely in the context of interleavers employed in turbo codes. Not much work has been done on architectures for PBRI channel interleavers. Bit-reversal mapping has been mainly applied to reduce row conflicts and improve hit-rates in SDRAM applications [5], and to improve the shuffle permutation stages of the FFT algorithm [6], [7]. In turbo interleavers, the emphasis has been on reducing interleaving latency by avoiding memory collisions of read/write operations by the constituent MAP decoders [8] [11]. Software programmable turbo interleavers for multiple 3G wireless standards have been addressed in [12]. A major disadvantage of a PBRI interleaver is that, despite its simplicity, interleaved addresses must be generated sequentially. That is, in order to generate the interleaved address of a linear address x, the interleaved addresses of all linear addresses less than x must first be generated. This follows from the fact that the number of pruned addresses that have occurred before x must be known in order to know where x gets mapped to. This requirement introduces a latency bottleneck, especially when (de-)interleaving long packets (e.g., 16 K in UMB [2]). In this paper, we present an algorithm that eliminates this dependency and determines any interleaved address in at most n 0 1 steps. Moreoever, the algorithm has a very simple architecture that can be constructed using basic logic gates and has a short critical path delay. II. SEQUENTIAL PBRI ALGORITHM A bit-reversal interleaver (BRI) maps an n-bit number x into another n-bit number y according to a simple bit-reversal rule such that the bits of y appear in the reverse order with respect to x. We designate the BRI mapping on n bits by the function y = n(x). The values taken by x and y range from 0 to 2 n 0 1, where M 2 n is the size of the interleaver. A pruned BRI maps an n-bit number x less than N, where N M, into another n-bit number y less than N according to the bit-reversal rule. The size of the pruned interleaver is N, while the size of mother interleaver is M. Note that the numbers from N to M 0 1 are pruned out of the interleaver mappings and are not considered valid mappings. We designate the PBRI mapping on n bits with parameter N by the function y = n;n (x). The mapping n;n (x) for a given x is computed sequentially by starting from y =0and maintaining the number of invalid mappings (x) skipped along the way. If y + (x) maps to a valid number (i.e., n (y + (x)) <N), then y is incremented by 1. If y + (x) maps to an invalid number, (x) is incremented by 1. These operations are repeated until y reaches x, and n (x) is valid. Algorithm 1 shows the pseudo-code of the sequential PBRI algorithm. Algorithm 1 Sequential PBRI algorithm. procedure PBRI-Seq(n; N; x) y 0 (x) 0 while y x do if n (y + (x)) N then (x) (x) +1 else n;n (y) n (y + (x)) y y +1 end if end while /$ IEEE

2 1148 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST 2009 TABLE I BRI AND PBRI MAPPINGS FOR n =4, N =12 TABLE II RECURSIVE COMPUTATION OF (9) USING (2) Theorem 1: Algorithm 1 with x = N 0 1 maps the set of integers [0;N01] into [0;N01] in exactly M 0 1 iterations for M=2 <N< M. In addition, the algorithm prunes exactly M 0N 01 integers before it terminates. Hence, the time complexity of a PBRI is determined by the size of its mother interleaver M and not N. Proof: We first prove that the map n;n is a bijection whose range is [0;N 0 1]. n;n is the composition of two maps, n, and the map f : x 7! x + (x). n is obviously bijective. In addition, since is a non-decreasing function, f is an increasing function. Therefore it is an injection whose codomain is its range, and hence, f is bijective. Therefore, n;n is a bijection whose range is the domain of f. The number of iterations run by Algorithm 1 for x = N 0 1 until it terminates (i.e., when y = x)isy+(x)+1 = N 01+(N01)+1 = N + (N 0 1). Next we determine (N 0 1) by proving that the last integer in [0;N 0 1], N 0 1, always gets mapped to M=2 0 1, i.e., n;n (N 0 1) = M= Consider the integer M=2 0 1 in the range of n;n. Since n;n is bijective, 9 y0 such that n (y0) =M=2 0 1, or y0 = 01 n (M=20 1) = M 0 2. Since f is bijective, 9 x0 N 0 1 such that f (x0) = x0 + (x0) = M 0 2. Assume that x0 6= N 0 1, but some number less than N 0 1. Consider x1 = x0 +1 N 0 1. Since f is an increasing function and f (x0) = M 0 2, f (x1) can only be M 0 1. But n (f(x1)) = n;n (x1) =M 0 1 >N01, which is a contradiction. Therefore, x0 = N 0 1=f 01 (M 0 2) = f 01 ( 01 n (M=20 1)) = 01 n;n (M=20 1). It follows that f (N 0 1) = N 0 1+(N 0 1) = M 0 2, and hence (N 0 1) = M 0 N 0 1. Hence, the algorithm terminates in N +(N 0 1) = M 0 1 iterations. 1) Example 1: Table I shows the mappings computed using Algorithm 1 assuming n = 4 and N = 12. Note that in this case, M =2 4 =16, so 4 mappings are pruned by 4;12. We assume in the following that the size of the pruned interleaver (N ) is more than half the size of the mother interleaver (M ), i.e., N> M=2, otherwise, the problem can be reformulated such that M is the smallest power-of-2 greater than or equal to N. In addition, if N = M, then (x) =0for all x and n;n (x) = n(x). There are no pruned integers in this case since all integers have valid mappings, and hence this case is degenerate. Hereafter, we assume that M <N<M: (1) 2 Note that from the definition of the bit-reversal operation and condition (1), it follows that if n (x) N, then n (x +1)<N, i.e., two consecutive numbers can not both have invalid mappings. We can use this fact to give a recursive definition of (x) for 0 x<n (x) = 0; if x =0; (x 0 1); if n (x + (x 0 1)) <N; (x 0 1) + 1; otherwise. In addition, note that if y>x, then (y) (x), and hence is a non-decreasing function. 2) Example 2: Let N = 96, n = 7, x = 9. (9) is determined recursively using (2) as shown in Table II. (9) = 3. Next, we show that Algorithm 1 always performs M 0 1 iterations in mapping the integers in [0;N 0 1], for any N satisfying (1). That is, the algorithm traverses all the integers in [0; 2 n 0 2] when mapping the integers [0;N0 1] independent of N, always pruning M 0 N 0 1 integers along the way. 1 1 Note that M 01 is a palindrome, i.e., (M 01) = M 01 N,soM 01 maps to an invalid number. Hence the algorithm terminates before M 01, which leaves only M 0 N 0 1 invalid integers to be pruned. (2) III. DETERMINING THE INVALID MAPPINGS (x) The time complexity of Algorithm 1 is O(M ), which follows directly from the fact that the number of invalid mappings (x) that have occurred in mapping all integers less than x must first be computed in order to determine what value x maps to. In the following, we present an algorithm to determine (x) with complexity O(log2 (M)) by analyzing the bit-structure of the invalid mappings. We first examine the quantity (x) in more detail. Note that (x) represents the minimum number of integers that must be skipped such that all integers from 0 to x have valid mappings. Equivalently, (x) represents the minimum number that needs to be added to x such that there are exactly x +1integers in the range [0;x + (x)] that have valid mappings. This quantity is not necessarily equal to the number of integers less than x that have invalid mappings, which we denote by (x). In fact, (x) (x) (see Fig. 1). This follows from the fact that for the (x) integers in the range [0;x] with invalid mappings, at least 1 (x) (x) more integers greater than x must be tried to check if they have valid mappings. But the numbers from x +1to x + (x) can in turn have invalid mappings that must be taken into account. So (x) is at least equal to number of invalid mappings in the range [0;x+ 1 (x)], which is given by 2 (x) (x + 1 (x)). Similarly, the numbers from x + 1 (x) +1to x + 2 (x) can in turn have invalid mappings that must be taken into account. So (x) is at least equal to number of invalid mappings in the range [0;x+ 2 (x)], which is given by 3 (x) (x + 2 (x)). The process is repeated for k steps until the interval [0;x + k (x)] contains exactly x +1valid mappings x + k (x) +1 0 x + k (x) = x +1 or equivalently until k (x) = (x + k (x)) k+1 (x). Then, (x) = k (x). Algorithm 2 shows the pseudo-code of the -algorithm that computes (x) iteratively using (x).

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST 2009 1149 We denote the binary representation of x<2 n by x = x n01x n02 111x 1 x 0 ; x i =0or 1 Fig. 1. Invalid integers counted by (x) and (x).

3 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST We denote the binary representation of x<2 n by x = x n01x n02 111x 1 x 0 ; x i =0or 1 Fig. 1. Invalid integers counted by (x) and (x). Algorithm 2 -algorithm procedure -ALGORITHM(n; N; x) k 0 0 (x) 0 repeat k+1 (x) (x + k (x)) until ( k+1 (x) = k (x)) (x) k (x) n;n (x) n (x + (x)) Theorem 2: The -algorithm converges to (x) in at most n 0 1 iterations. Proof: The convergence time to compute (x) is upper bounded by the time to compute (N 0 1). Since two consecutive numbers can not both have invalid mappings, then at most (x)=2 k new invalid integers can be added at step k. The algorithm terminates because the number of new invalid integers to be added decays exponentially with the number of iterations. We next show that the scenario that corresponds to adding the maximum number of invalid integers at each iteration requires the maximum number of iterations to converge. Consider the sum of the number of invalid integers added up to step k under such scenario: S(x; k) = k01 j=0 ((x)=2j ). Assume that S converges to (x) in k iterations, i.e., (x)=2 j = 0 for j k. Let there be another sum S 0 (x; k 0 ) that converges to (x) in k 0 steps such that at least at one step i<min(k; k 0 ), less than the maximum of (x)=2 i new invalid integers are added: S 0 (x; k 0 )= k 01 (0 (x)=2 j ). Since j=0 (x)=2 j =0for j k and 0 (x)=2 j (x)=2 j, for all j<k, it follows that 0 (x)=2 j =0for j k, and hence S 0 converges in at most k steps. So S 0 (x; k 0 )=S 0 (x; k). Moreover, since there exists at least one i<ksuch that 0 (x)=2 i <(x)=2 i, then S 0 (x; k) <S(x; k). Hence, S 0 converges to a number less than (x), which is a contradiction. Hence, S(x; k), if it exists, is the unique sequence that converges to (x) in k steps. Next we show that such a sequence exists for x = N 0 1, i.e., S(N 0 1;k) converges to (N 0 1) = M 0 N 0 1 in k = n 0 1 steps when N = M=2+1such that at each step j, (N 01)=2 j new invalid integers are added. Note that from Theorem 1, (N 01) = (M=2) = M=2 0 1, which is represented in binary as (n 0 1) ones. In addition, max((m=2)) = M=4 is represented in binary as one 1 and (n 0 2) zeros. Hence, (n 0 1) shift-and-add operations on M=2 are needed to produce (M=2)=2+(M=2)= (M=2)=2 n01 = M= IV. DETERMINING (x) The problem of determining (x) reduces to that of determining (x). We next present an algorithm to determine (x) by studying the bit-representation of the invalid numbers from N to M 0 1. where x n01 is the most significant bit (MSB) and x 0 is the least significant bit (LSB). We use the notation x[i : j] to represent the set of consecutive bits x i ;x i01;...;x j, ordered from MSB to LSB. The concatenation of two bit strings x[i 1 : j 1 ] and x[i 2 : j 2 ] is represented as y = x[i 1 : j 1] j x[i 2 : j 2]. Consider the bit-representation of the numbers between N and M 0 1. These numbers can be classified by their most-significant bits according to the bit-representation of N 0 1 as follows. Let z denote the number of zero bits in the bit-representation of N 0 1, and I 0 be the index set of those zeros ordered from most-significant to least-significant bit. For example, if N 0 1 = , then z = 4, I 0 = f5; 3; 1; 0g. Then the numbers x can be classified into 4 classes as follows (x represents don t care): C 0 1 : 11xxxxx (32 numbers); C 0 2 : 1011xxx (8 numbers); C 0 3 : x (2 numbers); C 0 4 : (1 number). The MSBs that define these classes are determined by scanning the bits of N 0 1 from left to right, searching for the zero bits. The MSBs of the first class correspond to the MSBs of N 0 1 up to the first zero, and then flipping the first zero to one. The MSBs of the second class correspond to the MSBs of N 0 1 up to the second zero, and then flipping the second zero to one. The MSBs of the remaining classes are similarly obtained. Mathematically, the smallest number in each of the z classes can be expressed as 0 i = N I (i) I (i) ; i =1; 2;...;z: We designate each class by its smallest number 0 i. Table III shows the three classes for the case N = 10011: class of 0 1 = 11000, class of 0 2 = 10100, and class of 0 3 = We are interested in the set of integers, which when bit-reversed, become invalid. These integers belong to the above defined classes, but in bit-reversed order. Define i = n ( 0 i), i =1; 2;...;z, and let C i be the corresponding classes. The i s represent the classes of invalid numbers in bit-reversed order. Also, let I denote the index set of the zero bits of n (N 0 1) ordered from LSB to MSB. Hence, if x 2 C i, then n(x) N and x[i(i) : 0] = i[i(i) :0]. The third column in Table III lists the classes of 1 = 00011, 2 = 00101, and 3 = for the case N = ) Example 3: Let N 0 1 = Then n(n 0 1) = , z =4, I = f1; 3; 5; 6g. The classes of invalid numbers in bit-reversed order are C 1 : xxxxx11, C 2 : xxx1101, C 3 : x110101, and C 4 : , with 1 = , 2 = , 3 = , and 4 = The number of invalid mappings (x) up to and including x can be determined by counting the number of invalid mappings belonging to each class C i, i =1; 2;...;z. Denote the number of invalid mappings belonging to class C i by i(x). Then, i(x) can be determined: 1) i; 2) MSB s of x to the left of the ith zero; x[n 0 1 :I(i) +1]; and 3) the remaining LSB s of x to the right of and including the ith zero, x[i(i) :0]. The most significant (n0i(i)01) bits x[n 01 : I(i)+1] represent the number of integers belonging to C i that have appeared before x, i.e., those integers that have same (I(i)+1)LSBs as i but are less than x[n 0 1:I(i)+1]j i [I(i) :0]. The least significant (I(i)+1)bits x[i(i) :0]are used to check if x x[n 0 1:I(i)+1]j i [I(i) :0], or equivalently, if x[i(i) :0] i[i(i) :0]. This checks if x itself maps to an invalid integer in C i,orifx maps to an integer greater than

1150 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST 2009 TABLE III INVALID CLASSES C AND C FOR M =32AND N =19 Fig. 2. Steps involved in computing (x); i = 1;.

In either case, i(x) is incremented by 1.

4 1150 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST 2009 TABLE III INVALID CLASSES C AND C FOR M =32AND N =19 Fig. 2. Steps involved in computing (x); i = 1;...; 4 for N 0 1 = and x = Fig. 3. Architecture of the -circuit, i =1; 2;...;n0 1.Ifen is valid, otherwise is invalid. =1, then the last invalid integer in C i. In either case, i(x) is incremented by 1. Mathematically, i (x) can be expressed as i (x) = x[n 0 1:I(i)+1]; if x[i(i) :0]< i[i(i) :0]; x[n 0 1:I(i)+1]+1; otherwise (3) and (x) is sum of all i(x) corresponding to all z classes (x) = z i=1 i (x): (4) 2) Example 4: Let N be as defined as in Example 3, and let x = Fig. 2 illustrates the steps involved in computing the i (x) s. The pseudo-code listed in Algorithm 3 summarizes the procedure for computing the i(x) using (3) and (4). Algorithm 3 -algorithm procedure -ALGORITHM(n; N; x) z number of 0 s in N 0 1 (2) I index set of 0 s in N 0 1 (2) from LSB to MSB for i 1, z do i(x) x[n 0 1:I(i)+1] if x[i(i) :0] i [I(i) :0]then i (x) i (x) +1 end if end for z (x) i=1 i(x) Fig. 4. Architecture of the -algorithm for n =8. V. HARDWARE ARCHITECTURE Fig. 3 shows the logic circuit for computing the i s. The maximum number of zero bits in N 01 is n01, so the circuit generates (n01) i outputs, i =1;...;n01. For each output, an enable signal en i is also generated to indicate whether the output i is valid or not. If the ith least significant bit of n(n 0 1) is 1, then i is not defined. Fig. 4 shows the architecture of the -algorithm for n = 8using 1-bit full adder cells, comparators, and a -circuit. The adder rows implement the -algorithm by accumulating right-shifted copies of y (i.e., the i(y) s) depending on the control signals from the -circuit, where y = x + k (x) is the input to the -algorithm at iteration k. If i is valid, then i (y) is accumulated. The last row of adder cells adds x to the accumulated sum of i s, and the total sum is fed back after the pipeline registers to compute the next at iteration k +1. The adder cells take two input bits to add and an enable signal from the top, a carry-in input from the right, and generate a sum bit from the bottom and carry-out bit from the left. If the enable bit is de-asserted, the y i input bits are zeroed-out. The comparators generate a one if y[i :

5 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 17, NO. 8, AUGUST existing sequential algorithms, and has a simple architecture amenable for high-speed applications. The proposed algorithm is valuable for emerging wireless standards such as 3GPP2/UMB [2] that employ PBRI channel (de-)interleavers on long packets in reducing interleaving latency on the transmitter side and de-interleaving latency on the receiver side. REFERENCES Fig. 5. Parallel lookahead PBRI architecture using the -algorithm. 0] i[i : 0], which is fed as an input carry to the first adder cell in each row. This effectively adds 1 to i(y) if y[i : 0] i[i : 0]. Finally, the adder cells in the last row generate an extra output (right) that corresponds to adding a 1 to n (x) if n (x) N 01. But since the maximum value of n(x) is N 01, an equality comparator is sufficient. The left output from the adder cells in the last row after the pipeline registers corresponding to x + k (x) is fed back as a new input to the next iteration. The critical path delay of the architecture is 2n 0 2 stages, which easily meets timing requirements in application-specific integrated circuit (ASIC) applications in current process technologies for values of n up to 16. The unsigned comparators introduce negligible delay because they can be implemented using XOR trees. The output of the -architecture is sampled every n 0 1 clock cycles to read out (x). A comparator can be added (not shown here) to compare k (y) and k+1 (y) for early termination. The sum x + (x) is also generated, which when read in reverse order, is equivalent to n;n (x). Fig. 5 shows a parallel lookahead PBRI architecture using the -algorithm. A packet of length N is divided into P sub-packets of length L, where each sub-packet is interleaved independently. The -block for sub-packet i computes the number of invalid integers skipped in the interval [0;i1 L 0 1]. Then (i 1 L 0 1) is used to initialize the ith component PBRI interleaver to interleave sub-packet i. The sequential PBRI algorithm (Algorithm 1) can be used, or for small values of L (up to 16), a parallel component PBRI can be implemented to interleave L integers in parallel by using 2L adders, 2L comparators, a multiplexer, and control logic. This scheme is based on the fact that for a sub-packet of length L, there can be at most L invalid integers in an interval spanning 2L integers. Hence the ith component PBRI maps the L integers i1l; i1l+1;...; (i+1)1l01 into the first L valid interleaved addresses in the interval [i 1 L + (i 1 L 0 1);i1 L + (i 1 L 0 1) + 2L 0 1].It computes 2L sums fi 1 L + (i 1 L 0 1)g; fi 1 L + (i 1 L 0 1)g+1; fi 1 L + (i 1 L 0 1)g +2;...;fi 1 L + (i 1 L 0 1)g +2L 0 1, and compares their bit-reversed values with N 0 1 to check if they are valid, and if so, the valid addresses get passed by the control logic through the multiplexer. However, the complexity of this scheme rapidly increases for larger values of L. The architecture of the lookahead interleaver in Fig. 5 attains a speedup by a factor of P using a serial component PBRI, and by a factor of N using a parallel component PBRI, over a fully sequential PBRI architecture. [1] S. Lin and D. J. Costello, Error Control Coding, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, [2] Physical layer standard for ultra mobile broadband (UMB) air interface specification, [Online]. Available: [3] F. Daneshgaran and P. Mulassano, Interleaver pruning for construction of variable-length turbo codes, IEEE Trans. Inf. Theory, vol. 50, no. 3, pp , Mar [4] M. Eroz and A. R. Hammongs, Jr, On the design of prunable interleavers for turbo codes, in Proc. IEEE Veh. Technol. Conf., May 1999, vol. 2, pp [5] J. Shao and B. T. Davis, The bit-reversal SDRAM address mapping, in Proc. Workshop Software Compilers Embedded Systems, ACM Int. Conf. Proc. Series, 2005, vol. 136, pp [6] J. Prado, A new fast bit-reversal permutation algorithm based on a symmetry, IEEE Signal Process. Lett., vol. 11, no. 12, pp , Dec [7] J. S. Walker, A new bit reversal algorithm, IEEE Trans. Acoust., Speech Signal Process., vol. 38, no. 8, pp , Aug [8] A. Giulietti, L. Van Der Perre, and M. Strum, Parallel turbo coding interleavers: avoiding collisions in accesses to storage elements, IEEE Commun. Lett., vol. 38, no. 5, pp , Feb [9] R. Dobkin, M. Peleg, and R. Ginosar, Parallel interleaver design and VLSI architecture for low-latency MAP turbo decoders, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 4, pp , Apr [10] M. Thul, F. Gilbert, and N. Wehn, Optimized concurrent interleaving architecture for high-throughput turbo-decoding, in Proc. Int. Conf. Electron., Circuits Syst., 2002, vol. 3, pp [11] A. Tarable, S. Benedetto, and G. Montorsi, Mapping interleaving laws to parallel turbo and ldpc decoder architectures, IEEE Trans. Inf. Theory, vol. 50, no. 9, pp , Sep [12] M.-C. Shin and I.-C. Park, Processor-based turbo interleaver for multiple third-generation wireless standards, IEEE Commun. Lett., vol. 7, no. 5, pp , May VI. CONCLUSION A parallel lookahead pruned bit-reversal interleaver algorithm and architecture have been proposed. The algorithm interleaves a packet of length N in at most log(n ) 0 1 steps compared to N steps using

THE orthogonal frequency-division multiplex (OFDM)

THE orthogonal frequency-division multiplex (OFDM) 26 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 57, NO. 1, JANUARY 2010 A Generalized Mixed-Radix Algorithm for Memory-Based FFT Processors Chen-Fong Hsiao, Yuan Chen, Member, IEEE,