A New Memory Reduced Radix-4 CORDIC Processor For FFT Operation

Similar documents
FPGA Implementation of CORDIC Algorithms for Sine and Cosine Generator

RADIX-10 PARALLEL DECIMAL MULTIPLIER

Conditional Speculative Decimal Addition*

Mallathahally, Bangalore, India 1 2

Floating-Point Division Algorithms for an x86 Microprocessor with a Rectangular Multiplier

Lecture 3: Computer Arithmetic: Multiplication and Division

FPGA Based Fixed Width 4 4, 6 6, 8 8 and Bit Multipliers using Spartan-3AN

Newton-Raphson division module via truncated multipliers

Resource Efficient Design and Implementation of Standard and Truncated Multipliers using FPGAs

The Codesign Challenge

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

THE low-density parity-check (LDPC) code is getting

FPGA IMPLEMENTATION OF RADIX-10 PARALLEL DECIMAL MULTIPLIER

Programming in Fortran 90 : 2017/2018

An Optimal Algorithm for Prufer Codes *

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Parallelism for Nested Loops with Non-uniform and Flow Dependences

CHAPTER 4 PARALLEL PREFIX ADDER

Area Efficient Self Timed Adders For Low Power Applications in VLSI

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Virtual Memory. Background. No. 10. Virtual Memory: concept. Logical Memory Space (review) Demand Paging(1) Virtual Memory

Analysis of Min Sum Iterative Decoder using Buffer Insertion

CPE 628 Chapter 2 Design for Testability. Dr. Rhonda Kay Gaede UAH. UAH Chapter Introduction

Parallel Inverse Halftoning by Look-Up Table (LUT) Partitioning

Accounting for the Use of Different Length Scale Factors in x, y and z Directions

Lecture 5: Multilayer Perceptrons

Outline. Digital Systems. C.2: Gates, Truth Tables and Logic Equations. Truth Tables. Logic Gates 9/8/2011

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

TN348: Openlab Module - Colocalization

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Array transposition in CUDA shared memory

Using Delayed Addition Techniques to Accelerate Integer and Floating-Point Calculations in Configurable Hardware

A Binarization Algorithm specialized on Document Images and Photos

On Reconfiguration-Oriented Approximate Adder Design and Its Application

Solving two-person zero-sum game by Matlab

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Cluster Analysis of Electrical Behavior

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

3D vector computer graphics

High-Boost Mesh Filtering for 3-D Shape Enhancement

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Support Vector Machines

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

High resolution 3D Tau-p transform by matching pursuit Weiping Cao* and Warren S. Ross, Shearwater GeoServices

Positive Semi-definite Programming Localization in Wireless Sensor Networks

Fast Color Space Transformation for Embedded Controller by SA-C Recofigurable Computing

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

Random Kernel Perceptron on ATTiny2313 Microcontroller

Assembler. Building a Modern Computer From First Principles.

Mathematics 256 a course in differential equations for engineering students

[33]. As we have seen there are different algorithms for compressing the speech. The

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Hybrid Non-Blind Color Image Watermarking

The Research of Ellipse Parameter Fitting Algorithm of Ultrasonic Imaging Logging in the Casing Hole

CMPS 10 Introduction to Computer Science Lecture Notes

A RECONFIGURABLE ARCHITECTURE FOR MULTI-GIGABIT SPEED CONTENT-BASED ROUTING. James Moscola, Young H. Cho, John W. Lockwood

Parallel matrix-vector multiplication

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Security Enhanced Dynamic ID based Remote User Authentication Scheme for Multi-Server Environments

Multiblock method for database generation in finite element programs

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

A high precision collaborative vision measurement of gear chamfering profile

Improving The Test Quality for Scan-based BIST Using A General Test Application Scheme

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

High level vs Low Level. What is a Computer Program? What does gcc do for you? Program = Instructions + Data. Basic Computer Organization

Computational ghost imaging using a fieldprogrammable

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Course Introduction. Algorithm 8/31/2017. COSC 320 Advanced Data Structures and Algorithms. COSC 320 Advanced Data Structures and Algorithms

Hermite Splines in Lie Groups as Products of Geodesics

Non-Split Restrained Dominating Set of an Interval Graph Using an Algorithm

A method for real-time implementation of HOG feature extraction

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Related-Mode Attacks on CTR Encryption Mode

Evaluation of an Enhanced Scheme for High-level Nested Network Mobility

Load Balancing for Hex-Cell Interconnection Network

Secure and Fast Fingerprint Authentication on Smart Card

Fast exponentiation via prime finite field isomorphism

Assembler. Shimon Schocken. Spring Elements of Computing Systems 1 Assembler (Ch. 6) Compiler. abstract interface.

R s s f. m y s. SPH3UW Unit 7.3 Spherical Concave Mirrors Page 1 of 12. Notes

An efficient method to build panoramic image mosaics

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

Improved Symoblic Simulation By Dynamic Funtional Space Partitioning

Linear Hashtable Motion Estimation Algorithm for Distributed Video Processing

6.854 Advanced Algorithms Petar Maymounkov Problem Set 11 (November 23, 2005) With: Benjamin Rossman, Oren Weimann, and Pouya Kheradpour

Some Tutorial about the Project. Computer Graphics

An Entropy-Based Approach to Integrated Information Needs Assessment

Simulation Based Analysis of FAST TCP using OMNET++

Cost-Effective Video Filtering Solution for Real-Time Vision Systems

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

Enhanced AMBTC for Image Compression using Block Classification and Interpolation

Feature-Area Optimization: A Novel SAR Image Registration Method

Wishing you all a Total Quality New Year!

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Evaluation of Parallel Processing Systems through Queuing Model

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Transcription:

IOSR Journal of VLSI and Sgnal Processng (IOSR-JVSP) Volume, Issue 5 (May. Jun. 013), PP 09-16 e-issn: 319 400, p-issn No. : 319 4197 www.osrjournals.org A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton Yasoda.A 1, Ramprasad.A.V 1 Department Of ECE, Vckram College Of Engneerng, Enath, Inda-630561 Professor, ECE, K.L.N College Of Engneerng, Pottapalayam, Inda -630611 Abstract: A complex number can be nterpreted as a vector n magnary plane. The vector rotaton n the x/y plane can be realzed by rotatng a vector through a seres of elementary angles. These elementary angles are chosen such that the vector rotaton through each of them may be approxmated easly wth a smple shft and add operaton, and ther algebrac sum approaches the requred rotaton angle. Ths can be exercsed by CORDIC ( CO-ordnate Rotaton Dgtal Computer) algorthm n rotaton mode. In ths paper, we have proposed a ppelned archtecture new memory less z-path elmnated CORDIC algorthm for FFT computaton. Ppelned archtecture by pre computaton of drecton of mcro rotaton, radx-4 number representaton, and the angle generator has been processed n terms of hardware complexty, teraton delay and memory reducton. Comparson of the proposed archtecture wth the conventonal radx-4 archtectures s elaborated. The proposed algorthm also exercses an addressng scheme and the assocated angle generator logc n order to elmnate the ROM usage for bottlng the twddle factors. It ncorporates parallelsm and ppe lne processng. The latency of the system s n/ clock cycles. The throughput rate s one vald result per eght clock cycles. Addtonally VLSI mplementaton on Vrtex -4 FPGA s done. The mplemented desgn operates at 450.654 MHZ of clock rate wth a power consumpton of 175.90mW. Keywords-CORDIC,, FPGA, latency, Radx-4, memory less systems, speed, throughput. I. Introducton FFT s the heart of sgnal & mage processng felds, also VLSI based sgnal processng fnd applcaton n Neural Networks[],bo-medcal sgnal processng[4],wreless communcatons[7],software Defned Rado[3], Radar mage processng, Synthetc Aperture Rado.[6].. Most of the applcatons adopt CORDIC(Co-ordnate Rotaton Dgtal Computer) based sgnal processng because of ts versatlty. Snce then CORDIC algorthm was developed [8 ], t extends ts applcaton to new areas day by day. For FFT processors, butterfly operaton s the most complcated stage. Generally butterfly unt conssts of complex adders & multplers usually multpler s the speed bottleneck n the ppelne FFT processor [11]. CORDIC [8] algorthm s an teratve algorthm to realze the butterfly operaton wthout usng any dedcated multpler hardware. CORDIC s an hardware effcent algorthm to realze transcendal functons, trgonometrc functons, multplcatons usng only smple add & shft operatons. In recent years several CORDIC based FFT was proposed for varous applcatons [1,,3,4,5]. The conventonal radx- CORDIC algorthm was mplemented usng word seral and dgt ppelned archtectures.[9] Later, ths algorthm was modfed as radx-4 to perform each mcro rotaton by two successve radx- mcro rotatons Lakshm et al. n [9], presented a ppelned archtecture usng sgned dgt arthmetc for the VLSI effcent mplementaton of rotatonal radx-4 CORDIC algorthm, elmnatng z-path completely. It acheves latency mprovement wth small hardware overhead. Garrdo [14] proposed a new memory less CORDIC algorthm to realze FFT operaton wth reduced memory requrements But mplementaton s lttle complex. Vachhan. L et al. n [19] presented two area-effcent algorthms and ther archtectures based on CORDIC. Ther frst algorthm elmnated ROM and requred only low-wdth barrel shfters and ther second algorthm elmnated barrel shfters completely. As a consequence, both the algorthms consume approxmately 50% area n comparson wth other CORDIC desgns but hardware complexty s large. In [16], a modfed CORDIC algorthm for FFT processors was ntended whch wpes out the need for bottlng the twddle factor angles. The algorthm generates the angles successvely by an accumulator. Ths approach reduces the memory requrements of an FFT processor by more than 0% [11 ]. Also, memory reducton mproves wth the ncreased radx sze. Moreover, the angle generaton crcut consumed less power consumpton than angle memory accesses. Instead of storng actual twddle factors n a ROM, CORDIC based FFT processor needs a dedcated memory bank to store actual twddle factor angles for butterfly operaton. In ths paper we have proposed a reduced memory z- path elmnated radx -4 CORDIC for FFT operaton It leads to reduced power consumpton, compared to conventonal CORDIC. www.osrjournals.org 9 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton The organzaton of the paper s as follows: In secton Fundamentals of CORDIC algorthm and ts operaton n FFT s dscussed. In secton 3 proposed methodology of z-path elmnated memory effcent CORDIC algorthm s dscussed. In secton 4 synthess results are dscussed. II. Radx-4-CORDIC-FFT algorthm N pont Dscrete Fourer Transform s defned as N 1 k X k = x W N =0 (1) where; k = 0,1,...,n 1 and W k N = e jπ N k s the twddle factor. There wll be log r N stages and each stage contans N/r butterfly operatons, n an N-pont FFT. The FFT-butterfly operaton can be realzed by radx-r cordc algorthm. CORDIC algorthm was ntally proposed by J.E.Volder 8[ ]. Because the CORDIC-based butterfly can be fast. The foundaton for CORDIC algorthm s a D vector rotaton n the xy-plane (see Fg.1 ). The vector rotaton can be realzed by rotatng a vector through a seres of elementary angles. These elementary angles are chosen such that the vector rotaton through each of them may be approxmated easly wth a smple shft and add operatons, and ther algebrac sum approaches the requred rotaton angle [8]. The CORDIC algorthm can be exercsed n two dfferent modes, vz. Rotaton mode and Vectorng mode. Y v +1 = (X +1, Y +1 ) Y +1 Y v = (X, Y ) Fg (1) A D vector rotaton usng CORDIC The rotaton mode s commtted to practce general rotaton for the gven angle and to compute elementary operatons such as multplcaton, exponental, trgonometrc functons and hyperbolc functons countng on the coordnate system. The vectorng mode s utlzed to defne the angular argument of the orgnal vector and estmate the logarthmc and dvson functons.the standard CORDIC system n crcular coordnate system practces n teratons for n-bt precson usng radx-r number representaton. For a gven n-bt precson, the ncrease of radx-rcuts down the number of mcro rotatons. For example, the radx-4 CORDIC system acheves latency reducton whch s an extenson of radx- CORDIC algorthm.[ 9 ] Usng radx-4 we acheve half the number of mcro rotatons than that requred n radx-. The teraton equatons of the radx-4 CORDIC algorthm at the(+1) th step [ ] under crcular coordnate system actng n rotaton mode are as follows: X 1 X * Y * 4 (a) Y 1 Y * X * 4 (b) tan 1 Z 1 Z ( *4 ) (c) Where, the drecton of rotaton n each teraton s σ ϵ, 1, 0, 1,, and an elementary angle tan 1 σ 4. The realzaton of Z+1 n Eq. (c) ncreases the length of the vector. To up hold the expected value of the vector, (+1) th coordnates must be multpled by the scale factor. The scale factor depends on the values ofσ and 1 1 must be estmated for each rotaton angle. K θ α X +1 K 1 * 4 (3) 0 0 www.osrjournals.org 1 / III. Proposed Methodology: Radx -4 CORDIC for ppelned FFT operaton, wth σ predcton block s proposed to reduce teraton delay, elmnate the z-path and also to cut down the memory requred to store the twddle factor angle. The number of teratons s reduced by usng radx-4 number system and teraton delay s reduced by radx-4 sgned dgt arthmetc and pre computaton of drecton of mcro rotaton. The radx-4 takes more tme to select from among the fve rotaton drecton values, and to select an approprate angle out of fve elementary angles, whch ncreases the teraton delay compared to the radx-. Nevertheless, the proposed CORDIC archtecture n rotaton mode s desgned to overcome ths ssue by pre complng all the drecton f rotatons for the generated X X 10 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton nput angle utlzng the lnear relaton between the rotaton angle and the framed bnary representaton of drectons of mcro rotatons [9]. The proposed archtecture comprses of four basc buldng blocks vz., the angle generator, the x/y path, the σ-predcton block, and the scale factor computaton blocks shown n Fg. The angle generator block produces the sequence of angles n regular and ncreasng order wth the help of an accumulator and control logc. Theσ-predcton block predcts all the σ values n radx-4 nterpretaton for the generated target angleθ, pror to the true CORDIC rotaton starts teratvely n the x/y path. Durng every clock cycle, the σ values are buffered and sent sequentally to the correspondng stages of x/y path. Angle generator Accumulator ɵ X n Y n σ- predcton block Scale factor X/Y path X +1 Y +1 Fg Block schematc of the proposed system 1) θ-(angle) generator block:- The key operaton of FFT s as shown n Eq. (1), X k = Rotate x() byan angle( j π N N 1 k =0 x W N.Ths s equvalent to k), whch can be realzed very well by the CORDIC algorthm wth smple shft and add operatons, CORDIC-based butterfly can be fast. As we know that, an FFT processor needs to store the twddle factors n memory. But, the CORDIC-based FFT doesn t have twddle factors but needs a memory bank to store the rotaton angles. CLK π N L A T C H Generated Angle (θ) Accumulator Control sgnal Fg 3 θ- generator of the proposed system. Usng a addressng scheme presented n [15], the twddle factor angles generated by a smple accumulator, follow a regular and ncreasng order. For example 64 -pont radx-4 FFT, t can be seen that twddle factor angles are sequentally ncreasng, and every angle s a multple of the basc angle( π N ), whch s( π 3 ). For log 4 64 = 3 dfferent FFT stages, the angles ncrease always one step per clock cycle. Ths functon can be realzed by the angle generator crcut whch s composed of an accumulator, and an output latch, as shown n Fgure 3. The accumulator output depends on the Control sgnals those enable or dsable the latch whch s smple to realze, as t s based on the present FFT butterfly stage and RAM address bts www.osrjournals.org 11 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton b n b b 1 b 0 (see Table 1). Usually, for an N = n pont FFT, an n 1 bt butterfly counter- B(b n b n 3 b 1 b 0 )wll satsfy the address sequences and the control logc of the angle generator. At each stage s, the RAM address bts are rotated rght by s -bts of butterfly counter B(b s 1 b s b 1 b 0 b n b n 3 b s )and the control logc of the latch s drven by the sequence of the pattern; b n b n 3... b s 0... 0 (s 0 s). )X/Y Path of the proposed Archtecture:- The basc structure of proposed desgn for radx-4 CORDIC processor s shown n Fg.4.The proposed archtecture requres n = 8 mcrorotaton stages to mplement x/y coordnates. The fnal x coordnates (see y Fg.4 ) are scaled usng the scale factor computed n parallel wth the mcro rotatons, whch requres Address Generaton table of the proposed desgn for radx 8 pont FFT Table-1 Butterfly Stage 0 Stage 1 Stage counter B b1b0 RAM address bts Twddle factor angles RAM address bts Twddle factor angles RAM address bts b1b0 b0b1 00 00 0 01 0 00 0 01 01 /4 10 0 01 0 10 10 /4 01 /4 10 0 11 11 3 /4 11 /4 11 0 b1b0 Twddle factor angles adder/subtractor, : 1-multplexer and an AND gate ( Fg.4 b). Then the RBC converts the sgned dgt representaton of the Radx-4 nto bnary form. Each mcro rotaton stage s desgned by realzng the teraton equatons, X X * Y * 4 Y 1 1 Y * X * 4 The Sgn Dgt adder/subtractor s controlled by the sgn of σ value (see Fg. 4b ). Ths σ values are realzed by the predcton block (see fg.3 ). X/Y Hardware realzaton: The x/y block of the proposed system for each stage s composed by combnaton of a par of sgned bnary adders, par of regsters at nput and output and a par of rght-shft regsters(see fg. ). For the generated angle θ, the nput vector v = (X, Y )s rotated to an output v +1 = (X +1, Y +1 )wth σ a predetermned drecton vector, whch assumes the drecton of rotaton. 3) σ Predcton block: The σ -predcton block requres ppelned mplementaton of to compute the drecton of mcrorotatons for the gven (generated by the Angle generator) target angle. Realzng, a lnear relaton between the rotaton angle and the bnary representaton of drectons of mcro rotatons [ ] gven by d = 0.5 θ + 0.5 c 1 + δ (4) where; δ = c 1 = n 3 =0 d ϵ =0 tan 1, and ϵ = tan 1 www.osrjournals.org 1 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton X n Y n σ 0 X/Y(0) σ, 1,0,1, σ 1 X/Y(1) Scale factor σ 7 X Y X/Y(7) σ 4 σ 4 K 1 +/- σ +/- (K 1 *X &K 1 *Y)->RBC X +1 Y +1 Fg 4. a) X/Y path of the system, b) realzaton of X/Y path s utlsed to perform the drecton of the vector. The drecton of mcro rotatons n bnary form s represented by the varable d (d θ) and θ has a lnear relatonshp wth negatve slope beng observed except for someθvalue, called breakponts [1, 7]. There by computng d from θ usng smple addton operaton as shown n the Fg.. The detals of the varous modules n the σ-precomputaton block are as follows. For 16-bt precson, a(3 * 48) ROM s selected to store these offset values [ ]. The offset value δ s pre-computed for any nput angle n the range( π/, π/) and the angle representng each of theseδ s referred to as θ ref, whch concdes to the summaton of the frst n/3 mcro rotatons. Therefore, the target angle has to be greater thanθ ref. Each locaton of ROM holds a reference angle and two offset values(δ k, δ k 1 ). δ k corresponds to the reference angle and δ k 1 correspondsto the prevous reference angle. For the gven nput angle,the offset value s obtaned from the ROM usng :1 multplexer. Output of ths multplexer s controlled by the comparator, whch generates sgn sgnal 0 f (θ > θ ref for selectng δ k )else 1 f(θ < θ ref for selectng δ k 1 ). The radx-4 arthmetc representaton of 0.5c 1 s added to s complement representaton of 0.5θ and δ rom usng two adders to obtanσ ϵ {, 1, 0, 1, }. The delay of each adder s tfa, where tfa s one full adder delay. These σ values are stored n a buffer to transfer them to the correspondng stages of x/y path. Thus, the drecton of rotatons s predcted before the x/y path ntates the computaton of x/y coordnates and followng nto elmnaton of the z path. 4 ) Scale factor The scale factor of the radx- CORDIC rotator s a constant but n case of radx-4 CORDIC rotator t sn t, ths s observable from the equaton below K 1 = n/4 =0 1 1 + σ 4 where, σ ϵ{, 1, 0, 1, }. Here, we compute K 1 only for frst + 1 = 5 teraton, as k 4 1 = becomes unty thereafter.for the present 16-bt mplementaton,t s adequate to approxmate scale 1+4 X out Y out factor for the frst 5- teratons only. Scale factor approxmaton s accomplshed n parallel wth the mcro n (5) 1 www.osrjournals.org 13 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton rotatons by bottlng all possble scale factors n memory [1, 4]. From the expresson above the σ wll have three values {0, 1, 4}.And so σ can choose any one of the three values {0, 1, 4} n each teraton, thereforefor 16-bt precson actually t s requred to store 35 possble scale factors n ROM. But, we can cut down the sze of the ROM to (7*16)usng Taylor seres expanson of the scale factor. For n + 1 wth 16-bt precson, 8 the scale factors correspondng to the frst 3-teratons are obtaned from(7*16) ROM. The 7 possble values for the scale factor K 1 correspondng to the frst three teratons are computed as k 1 1 = (6) 1+σ 4 The scale factor of the radx- CORDIC rotator s a constant but n case of radx-4 CORDIC rotator t sn t, ths s observable from the equaton below K 1 = n/4 1 =0 (7) 1+σ 4 where, σ ϵ{, 1, 0, 1, }. Here, we compute K 1 only for frst + 1 = 5 teraton, as k 4 1 = becomes unty thereafter.for the present 16-bt mplementaton,t s adequate to approxmate scale 1+4 factor for the frst 5- teratons only. Scale factor approxmaton s accomplshed n parallel wth the mcro rotatons by bottlng all possble scale factors n memory [1, 4]. From the expresson above the σ wll have three values {0, 1, 4}.And so σ can choose any one of the three values {0, 1, 4} n each teraton, thereforefor 16-bt precsonactually t s requred to store 35 possble scale factors n ROM. But, we can cut down the sze of the ROM to (7*16) usng Taylor seres expanson of the scale factor. For n + 1 wth 16-bt precson, the 8 scale factors correspondng to the frst 3-teratons are obtaned from (7*16) ROM. The 7 possble values for the scale factor K 1 correspondng to the frst three teratons are computed as k 1 1 = (8) 1+σ 4 n 1 K 1 = 1 =0 k (9) where, k 1 s the scale factor for the th teraton.usng these values, the scale factors for the remanng two teratons are evaluated usng a combnatonal logc as shown n Fg.7. The scale factors for thek 1 3 andk 1 4 (remanng two teratons) are derved by executng the shft and subtracton operatons over the scale factor retreved from ROM. Ths s carred out by realzng the frst two terms of ther Taylor seres expansons. And then, the scale factor s coded usng the canoncal sgned dgt representaton to carry out the fnal multplcaton over the x coordnates. y IV. Results & Dscusson: The algorthmc verfcaton of the CORDIC algorthm s done wth mat lab tool.(verson 7.1 R011a). The new z- path elmnated memory less CORDIC arc htecture has been mplemented on the avalable xc4vfx1-1 sf363 of the Xlnx FPGA (ISE13.4). The proposed archtecture s done n Verlog. The number clock cycles requred for computaton of ths new archtecture s n/ clock cycles wth accuracy of 3 output bt precson. The latency of the system s n/ clock cycles. The throughput rate s one vald result per eght clock cycles. The total power consumpton of the new proposed mplemented archtecture s measured wth X power software (ISE 13.4). The mplemented desgn operates at 450.654 MHZ of clock rate wth a power consumpton of 175.90mW. The resource utlzaton of FPGA mplementaton of new memory reduced Z- path elmnated Radx-4 CORDIC processor s shown below and t s performance comparson s shown n Table[ ] www.osrjournals.org 14 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton θ 0.5C1 1 0.5θ R O M θ ref θ Comparator δ k MUX δ k 1 sgn σ Buffer Fg.5 Realzaton of drecton of rotaton for an angle usng σ predcton Selected Devce : xc4vfx1 sf363-1 Number of Slces: 50 out of 547 9% Number of Slce Flp Flops: 936 out of 10944 8% Number of 4 nput LUTs: 899 out of 10944 8% Number used as logc: 494 Number used as Shft regsters: 405 Number of IOs: 65 Number of bonded IOBs: 63 out of 40 6% Number of GCLKs: 1 out of 3 3% Performance comparson wth [17 ] &[ 18] Parameter Ths work [17 ] [18] Target Xc4vfx1- XCV1000-6 xc3s1500 Devce sf363 BG560. 4fg676 Slces 50 797 984 LUT s 899 1554 1907 Power 175.90mW 380mW Not cted www.osrjournals.org 15 Page

A New Memory Reduced Radx-4 CORDIC Processor For FFT Operaton VI. Concluson: A new memory less Z path elmnated Radx-4 CORDIC archtecture s presented n ths paper. X and Y path ppelned and parallelsm computed work was done, also elmnates the need for storng angles to avod bottlng the twddle factor operaton n FFT.A 16 bt new radx-4 memory less Z-path elmnated CORDIC archtecture s done on FPGA platform(vrtex 4 ).The latency of the proposed archtecture s n/ clock cycles wth through put rate of one vald result per eght clock cycles. The entre new proposed archtecture operates at the clock rate of 450.654 MHZ wth power consumpton of 175.90mW.The desgned speed power optmzed 16 bt memory less Z-path elmnated Radx-4 can be applcable for FFT operaton. References: [1] Sathsh Sharma, Implementaton of ParaCORDIC Algorthm and t s Applcatons n Satellte Communcaton,009. Internatonal conference n Advances n Recent n Technologes n communcaton & computng [] Meng Qan, Applcatons of CORDIC Algorthm to Neural Networks n VLSI Desgn,IMACS 006,Bejng, Chna [3] Javer Valls, The use of CORDIC n Software Defned Rados: A Tutoral, IEEE Communcatons Magazne, Sep 006 [4] Ayan Benerjee, Swapna Banerjee, FPGA realzaton of a CORDIC based FFT processor for bomedcal sgnal processng, Mcroprocessors and Mcrosystems, Feb 011. [5].Bu-chn wang Dgtal sgnal processng Technques and Applcatons n Radar Image processng. [6] Bum Skkm, Low power ppelned FFT archtecture for Synthetc Aperture Radar, IEEE 39 th Mdwest Symposum, Crcuts & Systems 1996. [7] Sadat A, FFT for hgh speed OFDM wreless multmeda system, IEEE Crcuts & Systems 001, MWSAS 001 [8] Volder, J. (1959). The CORDIC trgonometrc computng technque., IEEE Transactons on Electronc Computers [9] B. Lakshm, A.S. Dhar VLSI archtecture for low latency radx-4 CORDIC, Elsever Computers and Electrcal Engneerng, July 011 [10] francsco j. jame,enhanced scalng-free cordc, eee transactons on crcuts and systems : regular papers, vol. 57, no. 7, july 010 [11] Erdal Oruklu & Xn Xao & Jafar Sane, Reduced memory low power archtecture for CORDIC based FFT processors. J Sgn Process Syst (01) [1] Ln, C., & Wu, A. (005). Mxed-scalng rotaton CORDIC (MSRCORDIC) algorthm and archtecture for hgh- performance vector rotatonal DSP applcatons. IEE Transactons on Crcuts and Systems I, 5(011). [13] Jang, R. M. (007). An area-effcent FFT archtecture for OFDM dgtal vdeo broadcastng. IEEE Transactons on ConsumerElectroncs, 53(4), 13 136. [14] Garrdo, M., & Grajal, J. (007). Effcent memory-less CORDIC for FFT Computaton. In IEEE Internatonal Conference on Acoustcs, Speech and Sgnal Processng,, 113 116), Apr. [15] Xao, X., Oruklu, E., & Sane, J. (009). Fast memory addressng scheme for radx-4 FFT mplementaton. In IEEE InternatonalConference on Electro/Informaton Technology, EIT 009, [16] Xao, X., Oruklu, E., & Sane, J. (010) Reduced Memory Archtecture for CORDIC- based FFT. In IEEE Internatonal Symposum on Crcuts and Systems, [17] Anndya Sundar Dhar, Swapna Banerjee, Archtectural desgn and FPGA mplementaton of radx-4 CORDIC processor, Elsever, Mcroprocessors and Mcrosystems 34 (010) [18] M.G. Buddka Sumanasena, A scale factor correcton scheme for the CORDIC algorthm, IEEE Transactons on Computers 57 (8) (008) [19] Vachhan, L., Srdharan, K., Meher, P.K., "Effcent CORDIC Algorthms and Archtectures for Low Area and Hgh Throughput Implementaton," Crcuts and Systems II: Express Brefs, IEEE Transactons on Jan. 009,Volume: 56, Issue: 1 Page(s): 61-65 [0] Kuhlmann M, Parh KK. P-CORDIC: A precomputaton based rotaton CORDIC algorthm. EURASIP J Appl Sgnal Process 00;9:936 43. www.osrjournals.org 16 Page