IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS

Similar documents
Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

The Serial Commutator FFT

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

RECENTLY, researches on gigabit wireless personal area

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

THE orthogonal frequency-division multiplex (OFDM)

Fast Fourier Transform Architectures: A Survey and State of the Art

VLSI IMPLEMENTATION AND PERFORMANCE ANALYSIS OF EFFICIENT MIXED-RADIX 8-2 FFT ALGORITHM WITH BIT REVERSAL FOR THE OUTPUT SEQUENCES.

ISSN Vol.02, Issue.11, December-2014, Pages:

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION

An Area Efficient Mixed Decimation MDF Architecture for Radix. Parallel FFT

AN FFT PROCESSOR BASED ON 16-POINT MODULE

Multi-Gigahertz Parallel FFTs for FPGA and ASIC Implementation

DESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT

DESIGN & SIMULATION PARALLEL PIPELINED RADIX -2^2 FFT ARCHITECTURE FOR REAL VALUED SIGNALS

ISSN: [Kavitha* et al., (6): 3 March-2017] Impact Factor: 4.116

DESIGN METHODOLOGY. 5.1 General

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Low Power Complex Multiplier based FFT Processor

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

Design and Implementation of 3-D DWT for Video Processing Applications

An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture

ON CONFIGURATION OF RESIDUE SCALING PROCESS IN PIPELINED RADIX-4 MQRNS FFT PROCESSOR

Fixed Point Streaming Fft Processor For Ofdm

Variable Length Floating Point FFT Processor Using Radix-2 2 Butterfly Elements P.Augusta Sophy

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

VLSI ARCHITECTURE FOR NANO WIRE BASED ADVANCED ENCRYPTION STANDARD (AES) WITH THE EFFICIENT MULTIPLICATIVE INVERSE UNIT

High Throughput Energy Efficient Parallel FFT Architecture on FPGAs

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

FAST FOURIER TRANSFORM (FFT) and inverse fast

Efficient Radix-4 and Radix-8 Butterfly Elements

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

Reconfigurable FFT Processor A Broader Perspective Survey

An Optimum Design of FFT Multi-Digit Multiplier and Its VLSI Implementation

A Novel Architecture of Parallel Multiplier Using Modified Booth s Recoding Unit and Adder for Signed and Unsigned Numbers

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Novel design of multiplier-less FFT processors

Genetic Algorithm Optimization for Coefficient of FFT Processor

Twiddle Factor Transformation for Pipelined FFT Processing

Efficient VLSI Huffman encoder implementation and its application in high rate serial data encoding

A Modified Radix2, Radix4 Algorithms and Modified Adder for Parallel Multiplication

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

Gated-Demultiplexer Tree Buffer for Low Power Using Clock Tree Based Gated Driver

2 Assoc Prof, Dept of ECE, RGM College of Engineering & Technology, Nandyal, AP-India,

A Novel Design of 32 Bit Unsigned Multiplier Using Modified CSLA

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

2016 Maxwell Scientific Publication Corp. Submitted: August 21, 2015 Accepted: September 11, 2015 Published: January 05, 2016

LOW-POWER SPLIT-RADIX FFT PROCESSORS

Minimum Area Cost for a 30 to 70 Gbits/s AES Processor

CALCULATION OF POWER CONSUMPTION IN 7 TRANSISTOR SRAM CELL USING CADENCE TOOL

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

J. Manikandan Research scholar, St. Peter s University, Chennai, Tamilnadu, India.

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Memory-Efficient and High-Speed Line-Based Architecture for 2-D Discrete Wavelet Transform with Lifting Scheme

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

A Low Power 32 Bit CMOS ROM Using a Novel ATD Circuit

AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

An Energy Improvement in Cache System by Using Write Through Policy

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Embedded Soc using High Performance Arm Core Processor D.sridhar raja Assistant professor, Dept. of E&I, Bharath university, Chennai

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

TOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

High Speed Multiplication Using BCD Codes For DSP Applications

Three-D DWT of Efficient Architecture

Design a floating-point fused add-subtract unit using verilog

Low-Power Adaptive Viterbi Decoder for TCM Using T-Algorithm

Implementation of Convolution Encoder and Viterbi Decoder Using Verilog

Improved Design of High Performance Radix-10 Multiplication Using BCD Codes

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

An Efficient Multi Precision Floating Point Complex Multiplier Unit in FFT

Efficient Methods for FFT calculations Using Memory Reduction Techniques.

Design and Performance Analysis of 32 and 64 Point FFT using Multiple Radix Algorithms

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

STUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR

Low Power and Improved Read Stability Cache Design in 45nm Technology

[Kalyani*, 4.(9): September, 2015] ISSN: (I2OR), Publication Impact Factor: 3.785

PIONEER RESEARCH & DEVELOPMENT GROUP

High Performance and Area Efficient DSP Architecture using Dadda Multiplier

ISSN Vol.02, Issue.11, December-2014, Pages:

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

Reconfigurable PLL for Digital System

Encryption / decryption system. Fig.1. Block diagram of Hummingbird

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Energy Optimizations for FPGA-based 2-D FFT Architecture

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

A Novel Design of High Speed and Area Efficient De-Multiplexer. using Pass Transistor Logic

High Throughput Radix-D Multiplication Using BCD

An Efficient Designing of I2C Bus Controller Using Verilog

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Sum to Modified Booth Recoding Techniques For Efficient Design of the Fused Add-Multiply Operator

Transcription:

IMPLEMENTATION OF OPTIMIZED 128-POINT PIPELINE FFT PROCESSOR USING MIXED RADIX 4-2 FOR OFDM APPLICATIONS K. UMAPATHY, Research scholar, Department of ECE, Jawaharlal Nehru Technological University, Anantapur, India, umapathykannan@gmail.com DR. D. RAJAVEERAPPA, Professor, Department of ECE, Loyola Institute of Technology, Chennai, India, draja_2001@rediffmail.com Abstract This paper proposes a 128-point FFT processor for Orthogonal Frequency Division Multiplexing (OFDM) systems to process the real time high speed data based on cached-memory architecture (CMA) with the resource Mixed Radix 4-2 algorithm using MDC style. The design and implementation of FFT processor has been done using the above technique to reduce the size and power. Using the above algorithm the chip size will be 2.8 x 2.8 mm 2 with 0.35μm technology. The power consumption with our optimum case is 72 mw for an operating speed of 127-133 MHz which is only less than half of the latest reported 128-Point FFT design with 0.18 um technology. A comparison has been made for various pipeline architectures such as MDC, SDF, and SDC using the same algorithm for the design of 128-point FFT processor with respect to memory size, area and power. Keywords: OFDM, CMA, Mixed Radix 4-2, FFT, R42MDC 1. Introduction The fast Fourier transformation (FFT) is one of the most frequently used Digital signal processing (DSP) algorithms for Orthogonal Frequency Division multiplexing (OFDM) applications. There are various types of FFT architectures used in OFDM systems. They can be categorized into three types- the parallel architecture, the pipeline architecture and the memory architecture. The parallel and pipeline architectures employ more butterfly processing units to achieve high performance but consume larger area when compared to memory architecture. The shared memory architecture employs only one butterfly processing unit having the advantage of area efficiency. The block diagram of FFT processor is shown in figure 1. Our paper focuses on the memory architecture for area efficiency and hardware simplicity in order to construct a small OFDM system. We have proposed a 128-point FFT processor, consuming low power and having small chip area using CMOS technology and to increase the processing speed based on cached-memory architecture (CMA) and R42MDC (Mixed Radix 4/2 MDC) style. 2. Mixed Radix 4/2 Algorithm The computation of FFT is represented by Eq. (1). There are two types of mixed-radix FFT algorithms. The first category indicates a situation arising naturally when a radix-q algorithm, where q = 2m > 2, is applied to an input series consisting of N = 2k qs equally spaced points, where 1 k < m. In this situation, k steps of radix- 2 algorithm are applied either at the beginning or at the end of the transformation. The second type of mixedradix algorithms indicates to those specialized for a composite N = N0 N1 N2... Nk. Different algorithms may be employed based upon on whether the factors satisfy certain restrictions or not. Only the 2 4m of the first type of mixed-radix algorithm will be considered here using the MDC style. The mixed-radix 4/2, calculates four butterfly outputs. (1) ISSN : 0975-5462 Vol. 4 No.12 December 2012 4745

Fig 1. Block diagram of FFT processor The input data will be divided into two parallel data stream and enters the butterfly processing element after proper delay time using R42MDC algorithm. Similar to CMA, the Mixed Radix-4/2 FFT algorithm is used as an example to introduce R42MDC-based FFT processor architecture. The MDC structure and the signal flow graph for Mixed Radix 4/2 algorithm are shown in figure 2 and figure 3 respectively. Fig 2. Radix 4-2 MDC Structure 3. Cached-Memory Architecture The cached-memory architecture is similar to the single-memory architecture except that a small cache memory resides between the processor and main memory, as shown in Figure 4. Spiffee employs the cached-memory architecture because a hierarchical memory system will be required to realize the benefits of the cached-fft algorithm. The performance of the memory system can be improved by adding a second cache set. In this configuration, the processor operates out of one cache set while the other set is being flushed and then loaded from memory. If the R42MDC style flushing time plus load time is less than the time required to process data in the cache, then the processor need not wait for the cache between groups. Therefore second cache set increases processor utilization and overall performance at the expense of some additional area and complexity. ISSN : 0975-5462 Vol. 4 No.12 December 2012 4746

Fig 3. Signal Flow Graph of 128-point FFT using Mixed Radix 4-2 algorithm. Fig 4. The Proposed Cached-Memory Architecture Table 1. Area and Power Consumption of 128-point FFT using MDC Style. Parameter/Type 128 Point FFT Conventional Design Proposed Design (Reduction %) Frequency (MHz) Memory size (words) 128 91 (26.7) 127-133 Area (gate count) 51,000 41990.5 (23.5) 127-133 Power Consumption (mw) 137 72 127-133 ISSN : 0975-5462 Vol. 4 No.12 December 2012 4747

Table 2. Comparison of MDC Style with Other Architectures- SDF and SDC Type of Architectures Memory size (Words) (Reduction/Increase %) Power (mw) Area (mm 2 ) MDC 91 (26.7) 72 7.84 (127-133 MHz) SDF 267 (60.5) 137 22.50 (154 MHz) SDC 245 (50) 90.6 21.25 (133 MHZ) 4. Design and Simulation The Modelsim using C programming language was used for algorithmic-level simulation and verification because of their high execution speed. In total, about ten simulations at various levels of abstraction were written. Next, the details of the architecture were sorted out using the Verilog Hardware Description language and a Cadence Simulator (Modelsim). Approximately twenty total modules for the processor and its sub-blocks were written. Table 1 shows the area and power consumption for the proposed FFT processor in comparison with the conventional FFT design. Moreover a comparison of MDC style with other pipeline architectures such as Single Delay Feedback (SDF) and Single Delay Commutator (SDC) with respect to area and power factors is shown in Table 2. Figure 5 shows the comparison graph for these pipeline architectures. Figure 6 and Figure 7 shows the simulation results for power and chip size of the proposed 128-point FFT processor respectively. Fig 5. Comparison of Power, Area & Memory size for Pipeline Architectures MDC (Blue), SDF (Brown) & SDC (Green). Fig 6. Simulation Results for Power - 128 point FFT Processor. ISSN : 0975-5462 Vol. 4 No.12 December 2012 4748

Fig 7. Chip Size of 128-point FFT Processor. 5. Conclusion The 128 point FFT processer was designed using cache memory architecture with the resource Mixed Radix 4-2 (R42MDC) using MDC style. This exploits a hierarchical memory structure with increased performance was developed with - (i) reduced power dissipation, (ii) small area and (iii) minimum operating clock frequencies. Moreover a comparison has been made with other pipeline architectures and MDC style chosen for the design. The power consumption with our optimum case is 72 mw which is only less than half of the latest reported 128- Point FFT design in 0.18 um technology at the operating frequency 127-133 MHz. This implementation can be used in low power applications for OFDM system data transfer and wireless communication systems. In this study, an FFT processor based on the proposed algorithm has been implemented by using Verilog HDL and Model Sim for circuit design and simulation. References [1] C. Lin, Y. Yu, and L. Van, "A low-power 64-point FFT/IFFT design for IEEE 802.11a WLAN application" in Proc. International Symposium on circuit and systems, 2006, pp. 4523-4526. [2] B. G. Jo and M. H. Sunwoo, New Continuous-Flow Mixed-Radix (CFMR) FFT Processor Using Novel In-Place Strategy, Electron Letters, vol. 52, No. 5, May 2005. [3] S. He and M. Tokelson, A New Approach to Pipeline FFT Processor, Parallel Processing Symposium, The 10th International, pp. 766-770, April 1996. [4] S. He and M. Tokelson, Design and Implementation of 1024-point FFT Processor, Proc. IEEE Custom Integrated Circuit Conference, pp. 131-134, 1998. [5] P. Jackson, C. Chan, C. Rader, J. Scalera, and M. Vai. A systolic FFT architecture for real time FPGA systems In High Performance Embedded Computing Conference (HPEC04), Sept. 2004. [6] L. Yang, K. Zhang, H. Liu, J. Huang, and S. Huang, "An Efficient Locally Pipelined FFT Processor," IEEE transactions on circuits and systems II: Express Briefs, VOL. 53, NO. 7, JULY 2006, pp. 585-589. [7] E. E Ngu, K. Ramar and R. Montano, V. Cooray Fault characterization and classification using wavelet and Fast Fourier Transform, WSEAS transaction on signal processing, Volume 4, Issue 7, July 2008, pp. 398-408. [8] Jesús García1, Juan A. Michell, Gustavo Ruiz, and Angel M. Burón,"FPGA realization of a Split Radix FFT processor," Proc. of SPIE.Microtechnologies for the New Millennium, vol. 6590, 2007, pp.65900p-1 to 65900P-11. [9] Zhijian Sun, Xuemei Liu, and Zhongxing Ji, "The Design of Radix-4 FFT by FPGA," International Symposium on Intelligent Information Technology Application Workshops, 2008, pp.765-768. ISSN : 0975-5462 Vol. 4 No.12 December 2012 4749