Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Similar documents
FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

Implementation of Area Efficient Multiplexer Based Cordic

High Speed Radix 8 CORDIC Processor

A Modified CORDIC Processor for Specific Angle Rotation based Applications

FPGA Design, Implementation and Analysis of Trigonometric Generators using Radix 4 CORDIC Algorithm

Realization of Fixed Angle Rotation for Co-Ordinate Rotation Digital Computer

REVIEW OF CORDIC ARCHITECTURES

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Three Dimensional CORDIC with reduced iterations

VHDL Implementation of DIT-FFT using CORDIC

FPGA IMPLEMENTATION OF CORDIC ALGORITHM ARCHITECTURE

Sine/Cosine using CORDIC Algorithm

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

Modified CORDIC Architecture for Fixed Angle Rotation

A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications

Implementation of CORDIC Algorithms in FPGA

Implementation of Optimized CORDIC Designs

Efficient Double-Precision Cosine Generation

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

Designing and Characterization of koggestone, Sparse Kogge stone, Spanning tree and Brentkung Adders

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Design of FPGA Based Radix 4 FFT Processor using CORDIC

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October ISSN

Comparison of Branching CORDIC Implementations

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

Speed Optimised CORDIC Based Fast Algorithm for DCT

Review Article CORDIC Architectures: A Survey

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

FPGA Implementation of CORDIC Based DHT for Image Processing Applications

Introduction to Field Programmable Gate Arrays

Xilinx Based Simulation of Line detection Using Hough Transform

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

HIGH PERFORMANCE QUATERNARY ARITHMETIC LOGIC UNIT ON PROGRAMMABLE LOGIC DEVICE

Research Article Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO

FPGA Implementation of the CORDIC Algorithm for Fingerprints Recognition Systems

Implementation techniques of CORDIC: a review

Evaluation of High Speed Hardware Multipliers - Fixed Point and Floating point

Power and Area Optimization for Pipelined CORDIC Processor

University, Patiala, Punjab, India 1 2

16 BIT IMPLEMENTATION OF ASYNCHRONOUS TWOS COMPLEMENT ARRAY MULTIPLIER USING MODIFIED BAUGH-WOOLEY ALGORITHM AND ARCHITECTURE.

HIGH-PERFORMANCE RECONFIGURABLE FIR FILTER USING PIPELINE TECHNIQUE

Prachi Sharma 1, Rama Laxmi 2, Arun Kumar Mishra 3 1 Student, 2,3 Assistant Professor, EC Department, Bhabha College of Engineering

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

DUE to the high computational complexity and real-time

DESIGN AND PERFORMANCE ANALYSIS OF CARRY SELECT ADDER

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

FPGA Implementation of Cordic Processor for Square Root Function

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

A CORDIC-BASED METHOD FOR IMPROVING DECIMAL CALCULATIONS

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Parallel FIR Filters. Chapter 5

IMPLEMENTATION OF SCALING FREE CORDIC AND ITS APPLICATION

Design and Implementation of VLSI 8 Bit Systolic Array Multiplier

Design and Analysis of Kogge-Stone and Han-Carlson Adders in 130nm CMOS Technology

Hardware Design Environments. Dr. Mahdi Abbasi Computer Engineering Department Bu-Ali Sina University

Efficient Implementation of Low Power 2-D DCT Architecture

Baseline V IRAM Trimedia. Cycles ( x 1000 ) N

Fast evaluation of nonlinear functions using FPGAs

Realization of Hardware Architectures for Householder Transformation based QR Decomposition using Xilinx System Generator Block Sets

PERFORMANCE ANALYSIS OF HIGH EFFICIENCY LOW DENSITY PARITY-CHECK CODE DECODER FOR LOW POWER APPLICATIONS

Let s put together a Manual Processor

Implementation of Ripple Carry and Carry Skip Adders with Speed and Area Efficient

An Optimized Montgomery Modular Multiplication Algorithm for Cryptography

ISSN Vol.02, Issue.11, December-2014, Pages:

Keywords: Fast Fourier Transforms (FFT), Multipath Delay Commutator (MDC), Pipelined Architecture, Radix-2 k, VLSI.

DESIGN AND IMPLEMENTATION OF VLSI SYSTOLIC ARRAY MULTIPLIER FOR DSP APPLICATIONS

An Efficient Carry Select Adder with Less Delay and Reduced Area Application

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm

CORDIC Based DFT on FPGA for DSP Applications

Adaptive FIR Filter Using Distributed Airthmetic for Area Efficient Design

OPTIMIZATION OF AREA COMPLEXITY AND DELAY USING PRE-ENCODED NR4SD MULTIPLIER.

Low Latency CORDIC Architecture in FFT

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

Computations Of Elementary Functions Based On Table Lookup And Interpolation

Design and Implementation of CVNS Based Low Power 64-Bit Adder

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

On the Design of High Speed Parallel CRC Circuits using DSP Algorithams

FPGA IMPLEMENTATION FOR REAL TIME SOBEL EDGE DETECTOR BLOCK USING 3-LINE BUFFERS

VLSI DESIGN OF REDUCED INSTRUCTION SET COMPUTER PROCESSOR CORE USING VHDL

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

32 bit Arithmetic Logical Unit (ALU) using VHDL

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Reduction of Latency and Resource Usage in Bit-Level Pipelined Data Paths for FPGAs

Design of 2-D DWT VLSI Architecture for Image Processing

A 3-D CPU-FPGA-DRAM Hybrid Architecture for Low-Power Computation

FIR Filter Architecture for Fixed and Reconfigurable Applications

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

FPGA Implementation of ALU Based Address Generation for Memory

Run-Time Reconfigurable multi-precision floating point multiplier design based on pipelining technique using Karatsuba-Urdhva algorithms

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

RUN-TIME RECONFIGURABLE IMPLEMENTATION OF DSP ALGORITHMS USING DISTRIBUTED ARITHMETIC. Zoltan Baruch

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

Hardware Implementation of Cryptosystem by AES Algorithm Using FPGA

Implementation of a Unified DSP Coprocessor

Transcription:

International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Performance Analysis of CORDIC Architectures Targeted by FPGA Devices Guddeti Nagarjuna Reddy 1, R.Jayalakshmi 2, Dr.K.Umapathy 3 M.Tech (EDT), SCSVMV UNIVERSITY, Kanchipuram, India 1 Asst.prof, SCSVMV UNIVERSITY, Kanchipuram, India 2 Assoc.prof, SCSVMV UNIVERSITY, Kanchipuram, India 3 ABSTRACT:- A wide variety of elementary functions are evaluated by CORDIC algorithm. This paper provides the performance analysis of folded and unfolded CORDIC architectures at different parameters. It also provides the latency comparison for the two structures. Same synthesis description is used for all the structures. For word lengths of 16 and 32 bits it gives the maximum operating frequency comparison of folded, unfolded and pipelined structures. Index Terms:- CORDIC algorithm, folded architecture, unfolded architecture. I. INTRODUCTION The CORDIC means coordinate rotation digital computer. It was proposed by Volder in 1959. CORDIC is an efficient iterative algorithm which can be used to compute several elementary functions. Another scientist named Walther extended it to encompass circular, linear and hyperbolic coordinate systems. In a variety of applications the algorithm is commonly used to perform rotations and also to evaluate elementary functions. Long latency problem occurs. Based on the hardware realization of the three iterative equations CORDIC architectures are broadly classified. They are folded and unfolded architectures. In time domain folded architectures are multiplexed, so that in a single processing architecture it provides a means for trading area. By using a word serial design the entire CORDIC core is implemented in folded architectures. II. CORDIC ALGORITHM By using only shift and add operations the CORDIC algorithm provides an iterative method of performing vector rotations by arbitrary angles. From general rotation transform the algorithm is derived. xn = Bn.( x1 cos z1) yn = Bn.(x1 sin z1) In two modes CORDIC rotator is normally operated. In the rotation mode, with the desired rotation angle, the angle accumulator is initialized. The magnitude of the residual angle in the accumulator is made to diminish by rotation decision at each iteration. 2.1 FOLDED WORD SERIAL DESIGN A iterative bit-parallel design, also called folded word design serial design. In hardware by simply duplicating each of the three different equations is obtained. Fig. a folded word serial CORDIC IJMER ISSN: 2249 6645 www.ijmer.com Vol. 5 Iss. 12 December 2015 57

After each iteration being a shift- add algorithm, each individual unit consists of an adder or subtractor unit, the computed values are holding by a register and shifter. To start with, via a multiplexer the initial are fed into each branch. The operation of the adder-subtraction unit is determined by the value of z branch. Through the shifter unit signals in the x and y branch are then added to or subtracted from the unshifted signal in the opposite path. According to the number of iteration, the z branch arithmetically combines the register values from looking up table, table whose address with values are taken is changed. Hence the result of this operation determines the nature of operation for the next iteration. Directly the results are read from the adder or subtraction units after n iterations. By tracking of shifting distances and the ROM addresses a finite state machine is used. In high speed applications, the conventional approach of implementing adder or subtractor units and shifters on time basis at every path are shared. Another disadvantage is with respect to the shift operations. With the number of iteration the shifters have to change the shift distance when implemented in hardware. A high fan in is required for large number of iteration and reduce the maximum speed for the application. Into the FPGA architectures these shifters do not map well and requires several layers of logic if implemented, results a slow design that uses large number of logic cells. In addition where n is the number of iterations the output rate is also limited by the fact that the operation is performed iteratively and therefore the clock state rate equals 1/n times the maximum output rate. 2.2 UNFOLDED PARALLEL DESIGN As per the discussion of above demands that the processor has to perform iterations at n times the data rate, this is the iterative nature of the CORDIC processor. Each of n processing elements always performs the same iteration for unfolded iteration process. To design parallel processing architectures from serial processing architectures, a direct application of the unfolding transformation is used. The means of word-parallel architectures can be designed from word serial architectures at the word level. Fig. Unfolded CORDIC design The CORDIC processor results in two significant changes in the unfolded architecture. First, it has to perform a constant shifting operation in each stage that is the shifter in each unit is of fixed shift. As in the iterative structure the shifter needs not be updated, and makes the FPGAs implementation quite feasible. Second, during iteration from the process the unfolding process eliminates the use of ROM which was required to hold the constant angle values. Instead of requiring storage space those constants can be hardwired, reducing to an array of interconnected adder or subtraction units of entire CORDIC processor. Making the unrolled processor strictly combinational, the need for registers is also eliminated. The processor can be easily pipelining by insertion of registers between the adder-subtraction units is another advantage of the unrolled design. In each logic cell, there are already registers are present in most FPGA architectures, so the addition of the pipelining registers has not addition hardware cost. IJMER ISSN: 2249 6645 www.ijmer.com Vol. 5 Iss. 12 December 2015 58

III. IMPLEMENTATION AND RESULTS Methodology The CORDIC processor is implemented in seven stages and for a word length of 16 and 32 bits. The initial design entry of both architectures is done using VHDL. The design translation is carried out in Xilinx ISE 12.4. The simulator database is then analyzed for both designs at different performance parameters and logical conclusions are drawn. Implementing the core with the following synnthesis: Platform: Field Programmable Gate Array Family: Virtex5 Target device: XC5VLX30 Package: FF324 3.1 ANALYSIS AND RESULTS For different performance parameters folded and unfolded architectures are analyzed, and provides latency comparison. Same synthesis description is implemented for all structures. It gives the maximum operating frequency comparison of the folded, unfolded and pipelined structures for word lengths of 16 and 32 bits. IJMER ISSN: 2249 6645 www.ijmer.com Vol. 5 Iss. 12 December 2015 59

It is observed that when timing response of the CORDIC structures is concerned, the unfolded architecture has less worst-case delay compared to the folded structure. This is due to the unfolding process which eliminates the use of registers and therefore the corresponding setting-up and hold times. The overall latency is thus reduced by a factor proportional to those setting-up and holding times. Note, however that the maximum operating frequency and thus the throughput of the unfolded CORDIC design is determined by the worst case delay of the structure. This is because the structure is pure combinatorially. Contrast to this, the folded structure can be clocked at high frequencies resulting in large operating frequencies. However, pipelining the unfolded CORDIC makes it possible to process multiple inputs simultaneously, there by increasing the maximum operating frequency of the unfolded structures. At an N stage CORDIC core, N stage pipeline can give maximum results. At first output of an N-stage pipelined CORDIC core is obtained after N clock cycles. Therefore, outputs will be generated after each every clock cycle. Further analysis of CORDIC is carried out by comparing the power consumption for 32 bit word length. It gives the power consumption for the three structures. Folded structures have less power dissipation compared to the parallel and pipelined structures. The power consumed by logical components in case of folded structures is quite low. This is due to the fact that the folded structure uses the same components repetitively. Similarly due to the multiple input/output instantiations in unfolded structures the power consumed by the input and output resources is quite high resulting in high dynamic power dissipation in the parallel and pipelined designs. Finally the three designs are analyzed for area consumption in terms of resource utilization and the results are tabulated. As expected, the folded structure is an efficient user of logic since the same logical units are used for every iteration. But since the results need to be fed back after for every iteration a large number of registers are used in the folded word serial implementation. IJMER ISSN: 2249 6645 www.ijmer.com Vol. 5 Iss. 12 December 2015 60

IV. CONCLUSIONS In this paper we proposed a simple operation by using VHDL coding in FPGA platform of Virtex5 family. From this we concluded analyzed the area, power consumption, layout comparison and throughout comparison for folded, unfolded and folded pipelined CORDIC architectures. REFERENCES [1]. J.E VOLDER, The CORDIC Trignometric Computing Technique IRE Tran Electronic Computers vol EC-8 PP 330-334,1959 [2]. J.S. Walther, A Unified Algorithm for Elementary Functions, proc., Spring Joint Computer Conf., Volume 38.,pp., 379-385, 1971. [3]. S. Wang, V. Piuri, and E.E. Swartzlander, Jr, Hybrid CORDIC Algorithms, IEEE Transaction Computers, Volume 46, Number 11., pp., 1202, November 1997. [4]. D.S. Phatak, Double Step Branching CORDIC : A New Algorithm for Fast Sine and Cosine Generation, IEEE Transaction Computers, Volume 47 Number 5., pp., 587-602, May 1998. IJMER ISSN: 2249 6645 www.ijmer.com Vol. 5 Iss. 12 December 2015 61