VLSI IMPLEMENTATION AND PERFORMANCE ANALYSIS OF EFFICIENT MIXED-RADIX 8-2 FFT ALGORITHM WITH BIT REVERSAL FOR THE OUTPUT SEQUENCES.

Similar documents
Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

FAST FOURIER TRANSFORM (FFT) and inverse fast

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

DESIGN METHODOLOGY. 5.1 General

MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION

AN FFT PROCESSOR BASED ON 16-POINT MODULE

IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS

THE orthogonal frequency-division multiplex (OFDM)

DESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

Twiddle Factor Transformation for Pipelined FFT Processing

FAST Fourier transform (FFT) is an important signal processing

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

IMPLEMENTATION OF FAST FOURIER TRANSFORM USING VERILOG HDL

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

RECENTLY, researches on gigabit wireless personal area

A Novel Architecture of Parallel Multiplier Using Modified Booth s Recoding Unit and Adder for Signed and Unsigned Numbers

An Area Efficient Mixed Decimation MDF Architecture for Radix. Parallel FFT

The Serial Commutator FFT

Efficient Methods for FFT calculations Using Memory Reduction Techniques.

Fast Fourier Transform Architectures: A Survey and State of the Art

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

Design And Simulation Of Pipelined Radix-2 k Feed-Forward FFT Architectures

AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE

LOW-POWER SPLIT-RADIX FFT PROCESSORS

An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture

Novel design of multiplier-less FFT processors

High Performance Pipelined Design for FFT Processor based on FPGA

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

VHDL IMPLEMENTATION OF A FLEXIBLE AND SYNTHESIZABLE FFT PROCESSOR

Efficient Radix-4 and Radix-8 Butterfly Elements

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

Design and Implementation of 3-D DWT for Video Processing Applications

Fast Block LMS Adaptive Filter Using DA Technique for High Performance in FGPA

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

FFT. There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies X = A + BW

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

Low Power Complex Multiplier based FFT Processor

Radix-4 FFT Algorithms *

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Implementation of Two Level DWT VLSI Architecture

I. Introduction. India; 2 Assistant Professor, Department of Electronics & Communication Engineering, SRIT, Jabalpur (M.P.

FPGA Based Antilog Computation Unit with Novel Shifter

High Speed Pipelined Architecture for Adaptive Median Filter

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

AREA EFFIECIENT ALGORITHM FOR THE IMPLEMENTATION OF CONFIGURABLE FFT/IFFT IN FPGA

Reconfigurable FFT Processor A Broader Perspective Survey

ISSN Vol.02, Issue.11, December-2014, Pages:

DESIGN & SIMULATION PARALLEL PIPELINED RADIX -2^2 FFT ARCHITECTURE FOR REAL VALUED SIGNALS

Area Delay Power Efficient Carry-Select Adder

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

Design and Performance Analysis of 32 and 64 Point FFT using Multiple Radix Algorithms

OPTIMIZING THE POWER USING FUSED ADD MULTIPLIER

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

Design and Simulation of 32 bit Floating Point FFT Processor Using VHDL

ISSN Vol.08,Issue.12, September-2016, Pages:

AREA MINIMIZED FFT ARCHITECTURE USING SPLIT RADIX ALGORITHM FOR LENGTH L=RADIX-3 AND RADIX-2BX3C

Design of FPGA Based Radix 4 FFT Processor using CORDIC

Design and Optimized Implementation of Six-Operand Single- Precision Floating-Point Addition

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

TOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:

An FFT/IFFT design versus Altera and Xilinx cores

FFT REPRESENTATION USING FLOATING- POINT BUTTERFLY ARCHITECTURE

VHDL Implementation of Multiplierless, High Performance DWT Filter Bank

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Power Spectral Density Computation using Modified Welch Method

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm

A Novel OFDM using Radix 22 and FFT Algorithm

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

MODIFIED IMDCT-DECODER BASED MP3 MULTICHANNEL AUDIO DECODING SYSTEM Shanmuga Raju.S 1, Karthik.R 2, Sai Pradeep.K.P 3, Varadharajan.

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

A Modified Radix2, Radix4 Algorithms and Modified Adder for Parallel Multiplication

ON CONFIGURATION OF RESIDUE SCALING PROCESS IN PIPELINED RADIX-4 MQRNS FFT PROCESSOR

Speed Optimised CORDIC Based Fast Algorithm for DCT

Design and Implementation of VLSI Architecture for. Mixed Radix FFT

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

Implementation of Area Efficient Multiplexer Based Cordic

FPGA Matrix Multiplier

2 Assoc Prof, Dept of ECE, RGM College of Engineering & Technology, Nandyal, AP-India,

CORDIC Based DFT on FPGA for DSP Applications

A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithmetic with Decomposed LUT

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays

Transcription:

VLSI IMPLEMENTATION AND PERFORMANCE ANALYSIS OF EFFICIENT MIXED-RADIX 8-2 ALGORITHM WITH BIT REVERSAL FOR THE OUTPUT SEQUENCES. M. MOHAMED ISMAIL Dr. M.J.S RANGACHAR Dr.Ch. D. V. PARADESI RAO (Research scholar- JNTU) Professor/ ECE Dept Prof. (Retd), Associate Prof./ECE Dept, B.S.Abdur Rahman University ECE Dep B.S.Abdur Rahman University Formerely,B.S.A. Crescent Formerely,B.S.A. Crescent JNTU, Engg.College Engg.College Hyderabad-72 Chennai-48. India Chennai-48. India E-mail: m_md_ismail@yahoo.com ABSTRACT In recent years, as a result of advancing VLSI technology, OFDM has received a great deal of attention and been adopted in many new generation wideband data communication systems such as IEEE 802.11a, HiPerLAN/2,digital audio/video broadcasting (DAB/DVB-T) and asymmetric digital subscriber line (ADSL), very high speed digital subscriber line (VDSL) in wireless and wired communications, respectively. In these communication systems the different applications mentioned above have different demands in operation speed and length of /I. The modified Mixed 8-2 Butterfly with bit reversal for the output sequence derived by index decomposition technique is our proposed VLSI system architecture to design the prototype /I processor for OFDM systems. In this paper the analysis of several algorithms such as radix-2, radix-4, split radix and mixed radix 4-2 and proposed mixed radix8-2 were designed using VHDL and outputs are analysed. Mainly the results show that the proposed processor architecture can save the area approximately 5% and power more than 40% when compared to basic radix 2 system, which may be attractive for many real-time systems. Reduction in area and power can further leads to less implementation cost. Also the proposed algorithm makes an offer the simple bit reversal mechanism which is only supported by a fixed radix algorithm. Key words: OFDM, /I, VLSI, VHDL, Mixed with bit reversal. 1. INTRODUCTION Fast Fourier transform () are widely used in different areas of applications such as communications, radars, imaging, etc. One of the major concerns for researchers is the enhancement of area reduction, power reduction and processing speed. Several methods of for computing (and I) are discussed in [Shousheng He and Mats Torkelson April. 1996, pp.776-780 and Shousheng. He and Mats Torkelson May. 1998, pp. 131-134]. These are basic algorithms for implementation of and I blocks. While there has been extensive research on the theoretical efficiency of these algorithms (traditionally algorithms have been compared based on their floating point operation counts), there has been little research to-date comparing algorithms on practical terms. The choice of the best algorithm for a given platform is still not easy because efficiency is intricately related to how an algorithm can be implemented on a given architecture. In this paper, a mixed radix 8-2 butterfly structure with simple bit reversing for output sequences derived by index decomposition technique is presented. The new method to obtain the mixed radix butterfly structure with simple bit reversing for output sequence is established. Therefore the proposed mixed radix 8-2 offers an engineering insight of general mixed radix. 2. FAST FOURIER TRANSFORM The Discrete Fourier Transfer (DFT) plays an important role in many applications of digital 164

signal processing including linear filtering, correlation analysis and spectrum analysis etc. The DFT is defined as: (1) Where is the DFT coefficient.evaluating the Equation (1) directly requires N complex multiplications and (N-1) complex additions for each value of the DFT. To compute all N values therefore requires a total of N^2 complex multiplications and N(N-1) complex additions. Since the amount of computation, and thus the computation time, is approximately proportional to N^2, it will cost a long computation time for large values of N. For this reason, it is very important to reduce the number of multiplications and additions. This algorithm is an efficient algorithm to compute the DFT [C. Sidney Burrus-1977 and Lihong Jia, Yonghong GAO, Jouni Isoaho, and Hannu Tenhunen-1998], which is called Fast Fourier Transform () algorithm or radix-2 algorithm, and it reduce the computational complexity from (N²) to (N log₂(n ). sufficient conditions for the unique and one-toone index mapping are proposed in the paper [C. Sidney Burrus-1977]. By using the Common Factor Algorithm (CFA) [[Shousheng He and Mats Torkelson April. 1996, pp.776-780, Shousheng. He and Mats Torkelson May. 1998, pp. 131-1341 and C. Sidney Burrus-1977 ] onedimensional array can be mapped into three dimensional arrays. The mixed-radix 4/2 [Byung G. Jo and Myung H. Sunwoo -2005] butterfly unit is shown in Figure 1. It uses both the radix-2^2 and the radix-2 algorithms can perform fast computations and can process s that are not power of four. The mixed-radix 4/2, which calculates four butterfly outputs based on X(0)~X(3). The butterfly unit [Gordon L. Demuth -1989 and Kyung L. Heo, Jae H. Baek, Myung H. Sunwoo, Byung G. Jo, and Byung S. Son,-2003] has three complex multipliers and eight complex adders. Four multiplexers represented by the solid box are used to select either the radix-4 calculation or the radix-2 calculation. 3. MIXED RADIX 4-2 A mixed radix algorithm is a combination of different radix-r algorithms. That is, different stages in the computation have different radices. For instance, a 64-point long can be computed in two stages using one stage with radix-8 PEs, followed by a stage of radix-2 PEs. This adds a bit of complexity to the algorithm compared to radix-r, but in return it gives more options in choosing the transform length. The Mixed- algorithm is based on subtransform modules with highly optimized small length which are combined to create large. However, this algorithm does not offer the simple bit reversing for ordering the output sequences. When compared to split [Daisuke Takahashi-2001], mixed radix system provides regular butterfly structure and it is easy for implementation. 4. MIXED-RADIX 4-2 ALGORITHMS WITH BIT REVERSING The Mixed- 4-2 butterfly with simple bit reversing output sequences is induced by transforming a one-dimensional array into threedimensional arrays uniquely. The necessary and Fig 1: The basic butterfly for mixed-radix 4/2 DIF algorithm. 5. MIXED-RADIX 8-2 WITH BIT REVERSING OUTPUT SEQUENCES FOR 64 POINTS The basic butterfly for the proposed mixed radix 8-2 is shown in the Figure 2. The proposed Mixed- 8-2 is composed of one radix-8 butterflies and four radix-2 butterflies [Youngjin Moon, Young-il Kim-2006]. The Mixed- 4-2 butterfly with simple bit reversing output sequences is induced by transforming a one-dimensional array into three-dimensional arrays uniquely. The necessary and sufficient conditions for the unique and one-to-one index 165

mapping are proposed in the paper [C. Sidney Burrus -1977]. By using the Common Factor Algorithm (CFA) [Shousheng He and Mats Torkelson April. 1996, pp.776-780, Shousheng. He and Mats Torkelson May. 1998, pp. 131-1341 and C. Sidney Burrus-1977 ], onedimensional array can be mapped into three dimensional. Journal of Theoretical and Applied Information Technology Fig 2: The basic butterfly for mixed-radix 8-2 algorithm. In order to verify the proposed scheme, 64-points based on the proposed Mixed- 8-2 butterfly with simple bit reversing for ordering the output sequences is exampled. As shown in the Figure 3, the block diagram for 64-points is composed of total six-teen Mixed- 8-2 Butterflies. In the first stage, the 64 point input sequences are divided by the 8 groups which correspond to n3=0, n3=1, n3=2, n3=3, n3=4, n3=5, n3=6, n3=7 respectively. Each group is input sequence for each Mixed- 8-2 Butterfly. After the input sequences pass the first Mixed- 8-2 Butterfly stage, the order of output value is expressed with small number below each butterfly output line as shown in the figure 3. Fig.3: Proposed Mixed- 8-2 Butterfly for 64 point The proposed Mixed- 8-2 is composed of one radix-8 butterflies and four radix-2 butterflies. The figure 4 shows the SFG for 64 point of the proposed mixed radix 8-2 algorithm. In the first stage, the input data of one radix-8 butterflies which are expressed with the equation B4 (o, n3, kj) B4 (i, n3, k1), are grouped with the x (n3), x (N/4±n3), x(n/2±n3), x(3n/4±n3) and x(n/8±n3),x (3N/8±n3),x(5N/8±n3),x(7N/8±n3) respectively. After the each input group data passes the first radix-8 butterflies, the outputted data is multiplied by the special twiddle factors. Then, these outputted sequences are inputted into the second stage which is composed of the radix-2 butterflies. After passing the second radix-2 butterflies, the outputted data are multiplied by the twiddle factors. These twiddle factors WQ (1+k) is the unique multiplier unit in the 166

proposed Mixed- 8-2 Butterfly with simple 6. RESULT Employing the parametric nature of this core, the block is synthesized on one of Xilinx s Virtex-II Pro (2V6000ff1517) FPGAs with different configurations. The results of logic synthesis for 64 point of -2, -4, split, mixed radix 4-2 and Mixed 8-2 are presented in Table 1. The 64-point is chosen to compare the number of CLB slices (responsible to occupy an area in the FPGA). The results shows that the proposed processor architecture can save the area approximately 5% and power more than 40% when compared to basic radix 2 system, which may be attractive for many real-time systems. Reduction in area and power can further leads to less implementation cost. Fig.4: Mixed- 8-2 SFG for 64 Points 64 point - 2-4 Split Mixed 4-2 Mixed 8-2 CLB Slices/7680 Utilization factor Power in mw 851 11.1% 4685.60mW 765 9.96% 3012.51mW 835 10.8% 4492.40mW 750 9.77% 3831.63mW 596 7.76% 2696.49mW bit reversing the output sequences. Finally, we can also show order of the output sequences in Fig.4 above. The order of the output sequence is 0,4,2,6,1,5,3 and 7 which are exactly same at the simple binary bit reversing of the pure radix butterfly structure. Consequently, proposed mixed radix 8-2 butterfly with simple bit reversing output sequence include one radix 8 butterflies, four radix 2 butterflies, one multiplier unit and additional shift unit for special twiddle factors. The proposed Mixed 8-2 butterfly unit [Gordon L. Demuth,-1989 and Kyung L. Heo, Jae H. Baek, Myung H. Sunwoo, Byung G. Jo, and Byung S. Son-2003], has two complex multipliers and eight complex adders. Table 1: Comparison of Algorithm based on CLB Slices, Device Utilization and Power. 7. CONCLUSION: The algorithms considered here include radix-2, radix-4, split-radix 2/4, mixed-radix 4/2 and mixed radix 8-2 which is the subject of this study. The hardware implementation for radix-2 algorithm is the easiest but it is the least efficient. Split-radix 2/4 algorithm is more efficient but its algorithm cannot produce regularity in hardware structure, thus not amenable to implementation. In contrast, mixedradix 8-2 algorithm is capable of producing hardware with structural regularity. It is more efficient than radix-2 algorithm and 167

applicable to all 2n-point systems. This Algorithm makes an offer the simple bit reversing for ordering the output sequences which is only supported by a fixed-radix algorithm. Moreover, the proposed technique makes an offer systematic viewpoint in the Mixed- algorithm. The proposed Mixed 8-2 butterfly unit has only two complex multipliers and eight complex adders. Reduction in area and power can further leads to less implementation cost. It is therefore can be selected for implementation of a OFDM based processor. and Convolution", IEEE Trans. Acoust., Speech, and Signal Processing, Vol. ASSP- 25, pp. 239-242. [9] Young-jin Moon, Young-il Kim Portable Internet system research team Electronics and Telecommunications Research Institute (ETRI) Daejeon, Korea 2006, A Mixed- 4-2 Butterfly with Simple Bit Revering for Ordering the Output Sequences, ICA0T2006,pp1774-1778. REFERENCES: [1] Byung G. Jo and Myung H. Sunwoo,2005, "New Continuous-Flow Mixed- (CFMR) Processor Using Novel In- Place Strategy", IEEE Transactions on Circuits and Systems, Vol. 52, No. 5. [2] Daisuke Takahashi,2001,"An Extended Split- Algorithm", IEEE Signal Processing Letters, Vol. 8, No. 5, pp. 145-147. [3] Gordon L. Demuth,1989 "Algorithms for Defining Mixed Flow Graphs", IEEE Trans Acoust, Speech, and Signal Processing, Vol. 37, No. 9, pp. 1349-1358. [4] Kyung L. Heo, Jae H. Baek, Myung H. Sunwoo, Byung G. Jo, and Byung S. Son,2003 "New In-Place Strategy for a Mixed- processor", IEEE SOC Conference, pp. 8 1-84. [5] Lihong Jia, Yonghong GAO, Jouni Isoaho, and Hannu Tenhunen, 1998 "A New VLSI- Oriented Algorithm and Implementation", IEEE ASIC Conf., pp. 337-341. [6] Shousheng He and Mats Torkelson, 1996 "A New Approach to Pipeline Processor", IEEE Parallel Processing Symposium, pp.776-780. [7] Shousheng. He and Mats Torkelson,1998 "Design and Implementation of a 1024-point Pipeline Processor", IEEE Custom Integrated Circuits Conference, pp. 131-134. [8] C. Sidney Burrus, 1977 "Index Mapping for Multidimensional Formulation of the DFT 168