FPGA Design, Implementation and Analysis of Trigonometric Generators using Radix 4 CORDIC Algorithm

Similar documents
High Speed Radix 8 CORDIC Processor

Three Dimensional CORDIC with reduced iterations

FPGA IMPLEMENTATION OF CORDIC ALGORITHM ARCHITECTURE

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

FPGA Implementation of a High Speed Multistage Pipelined Adder Based CORDIC Structure for Large Operand Word Lengths

A CORDIC Algorithm with Improved Rotation Strategy for Embedded Applications

Modified CORDIC Architecture for Fixed Angle Rotation

Implementation of CORDIC Algorithms in FPGA

A Modified CORDIC Processor for Specific Angle Rotation based Applications

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

Realization of Hardware Architectures for Householder Transformation based QR Decomposition using Xilinx System Generator Block Sets

Fast evaluation of nonlinear functions using FPGAs

Implementation of Optimized CORDIC Designs

Efficient Double-Precision Cosine Generation

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Volume 3, Special Issue 3, March 2014

Xilinx Based Simulation of Line detection Using Hough Transform

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

ECEn 621 Computer Arithmetic Project #3. CORDIC History

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

REVIEW OF CORDIC ARCHITECTURES

Implementation of Area Efficient Multiplexer Based Cordic

Fast Evaluation of the Square Root and Other Nonlinear Functions in FPGA

FPGA Implementation of the CORDIC Algorithm for Fingerprints Recognition Systems

Power and Area Optimization for Pipelined CORDIC Processor

Sine/Cosine using CORDIC Algorithm

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

VHDL Implementation of DIT-FFT using CORDIC

CORDIC Based DFT on FPGA for DSP Applications

Implementation Of Quadratic Rotation Decomposition Based Recursive Least Squares Algorithm

Design of FPGA Based Radix 4 FFT Processor using CORDIC

Realization of Fixed Angle Rotation for Co-Ordinate Rotation Digital Computer

Implementation techniques of CORDIC: a review

A CORDIC-BASED METHOD FOR IMPROVING DECIMAL CALCULATIONS

Speed Optimised CORDIC Based Fast Algorithm for DCT

Review Article CORDIC Architectures: A Survey

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

Low Latency CORDIC Architecture in FFT

VLSI IMPLEMENTATION OF CORDIC BASED ROBOT NAVIGATION PROCESSOR

Implementation of IEEE754 Floating Point Multiplier

Research Article Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO

FPGA Implementation of CORDIC Based DHT for Image Processing Applications

Keywords: Soft Core Processor, Arithmetic and Logical Unit, Back End Implementation and Front End Implementation.

FPGA Implementation of Cordic Processor for Square Root Function

Vendor Agnostic, High Performance, Double Precision Floating Point Division for FPGAs

Implementation of a Unified DSP Coprocessor

ISSN Vol.02, Issue.11, December-2014, Pages:

VLSI Design and Implementation of High Speed and High Throughput DADDA Multiplier

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Area-Time Efficient Square Architecture

A fast algorithm for coordinate rotation without using transcendental functions

High Performance Architecture for Reciprocal Function Evaluation on Virtex II FPGA

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

COMPARATIVE ANALYSIS OF CORDIC ALGORITHM AND TAYLOR SERIES EXPANSION

32-bit Signed and Unsigned Advanced Modified Booth Multiplication using Radix-4 Encoding Algorithm

Design & Analysis of 16 bit RISC Processor Using low Power Pipelining

Novel Design of Dual Core RISC Architecture Implementation

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

FPGA Implementation of the Complex Division in Digital Predistortion Linearizer

Optimized Design and Implementation of a 16-bit Iterative Logarithmic Multiplier

A High-Speed FPGA Implementation of an RSD- Based ECC Processor

Efficient Implementation of Low Power 2-D DCT Architecture

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Design and Implementation of Low-Complexity Redundant Multiplier Architecture for Finite Field

IMPLEMENTATION OF SCALING FREE CORDIC AND ITS APPLICATION

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

Introduction to Field Programmable Gate Arrays

GREENWOOD PUBLIC SCHOOL DISTRICT Algebra III Pacing Guide FIRST NINE WEEKS

Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University

Computations Of Elementary Functions Based On Table Lookup And Interpolation

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

FPGA-Based Implementation of QR Decomposition. Hanguang Yu

Reconfigurable PLL for Digital System

Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays

Design and Implementation of Reconfigurable CORDIC in Rotation and vectoring-modes

ABSTRACT I. INTRODUCTION. 905 P a g e

Design of a Floating-Point Fused Add-Subtract Unit Using Verilog

Design, Analysis and Processing of Efficient RISC Processor

FPGA Implementation of Matrix Inversion Using QRD-RLS Algorithm

Implementation of Full -Parallelism AES Encryption and Decryption

Power Optimized Programmable Truncated Multiplier and Accumulator Using Reversible Adder

Implementation of Pipelined Architecture Based on the DCT and Quantization For JPEG Image Compression

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

International Journal for Research in Applied Science & Engineering Technology (IJRASET) IIR filter design using CSA for DSP applications

Three-D DWT of Efficient Architecture

[Sahu* et al., 5(7): July, 2016] ISSN: IC Value: 3.00 Impact Factor: 4.116

Implementation of Double Precision Floating Point Multiplier in VHDL

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

THE INTERNATIONAL JOURNAL OF SCIENCE & TECHNOLEDGE

Still Image Compression using Angular Domain: Analysis and FPGA implementation

A 16-BIT CORDIC ROTATOR FOR HIGH-SPEED WIRELESS LAN

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

VLSI Implementation of Parallel CRC Using Pipelining, Unfolding and Retiming

Reducing Computational Time using Radix-4 in 2 s Complement Rectangular Multipliers

Efficacy of Numerically Approximating Pi with an N-sided Polygon

DIGITAL ARITHMETIC. Miloš D. Ercegovac Computer Science Department University of California Los Angeles and

IMPLEMENTATION OF LOW-COMPLEXITY REDUNDANT MULTIPLIER ARCHITECTURE FOR FINITE FIELD

FPGA Implementation of FFT using CORDIC Processor for Fingerprint Application

Journal of Engineering Technology Volume 6, Special Issue on Technology Innovations and Applications Oct. 2017, PP

PROJECT REPORT IMPLEMENTATION OF LOGARITHM COMPUTATION DEVICE AS PART OF VLSI TOOLS COURSE

Transcription:

International Journal of Microcircuits and Electronic. ISSN 0974-2204 Volume 4, Number 1 (2013), pp. 1-9 International Research Publication House http://www.irphouse.com FPGA Design, Implementation and Analysis of Trigonometric Generators using Radix 4 CORDIC Algorithm J M Rudagi* and Dr shaila subbaraman* * js_itti@yahoo.co.in *Dept of Electronics and Communication, KLECET, Belgaum, Karnataka, India. ** Dept of Electronics and communication, Dange college of engineering, Maharashtra. Abstract Today, many of the computations in signal processing and wireless communication applications are linked with complex analysis of several functions. These complex functions are combination of sine and cosine terms that generally spread in the channel. Most of these functions can be split into elementary functions. Today user s desire every gadget must be smaller in size and simple to operate ( Keep it small and simple ). So the researchers muddle through speed area tradeoff. The CORDIC algorithm is one such kind of algorithm. In this paper hardware efficient trigonometric generator is designed and implemented on Spartan 3 FPGA. The results obtained are compared with that obtained using MATLAB. Keywords: CORDIC, FPGA, VLSI, Sine, Cosine, architecture, latency. Radix 4, pipelined 1. Introduction The key concept of CORDIC architecture is based on the simple and the ancient concept of the two dimensional geometry. CORDIC algorithm is originally proposed by Volder [1] in1959 and later generalized by Walther [2] in 1971 for computation of logarithms, exponentials and square root functions along with trigonometric functions like sine, cosine and tangent. It is an iterative algorithm for the calculation of the rotation of two dimensional vectors in linear, circular and hyperbolic coordinate systems. The current applications of CORDIC algorithm are in the field of DSP, image processing, filtering, matrix algebra etc. The algorithm belongs to the class of

2 J M Rudagi and Dr shaila subbaraman linear convergence algorithms and can likewise be implemented using only shift and add operations making it suitable for VLSI implementations. The number of iterations of conventional radix 2 CORDIC architectures limits it to be used in a high speed application. So the development of high radix CORDIC algorithms is essential for reducing the number of iterations Reduction of latency with reasonable increment in hardware complexity. The scale factor overhead causes its improvement to be limited. The minimization of the computational overhead of scale factor compensation been attempted by different researchers. [7-11]. In this paper a folded radix 4 CORDIC architecture is implemented. The total number of iterations is reduced by half (n/2). The scale factor compensation is carried out in parallel to the rotation. The architectures are implemented on available FPGA. The results are compared with MATLAB output. The work is structured as follows. Section 2 defines the radix 4 CORDIC algorithm along with its scale factor compensation. Section 3 describes folded architecture, based on the radix 4 CORDIC algorithm. Section 4, covers implementation details. Section5.Deals with comparative study. Concluding remarks are presented in section 6. 2. Radix 4 CORDIC algorithm: The radix 4 CORDIC equations are [4], X i+1 =X i -σ i 4 -i Y i Y i+1 =Y i +σ i 4 -i X i (1) Z i+1 =Z i -α i [σ i ] Where σ i ε {-2,-1, 0, 1, 2}, α i [σ i ] ε tan -1 (σ i r -i ), where r is the radix of the CORDIC. X i+1, Y i+ 1 are the coordinates the vector resulting from applying i+1 micro rotation and Z i+1 is the angle to be rotated. The coordinates are scaled by, n/2-1 K -1 =Π (1+σ i 2 4-2i ) -1/2 (2) i 0 The scale factor depends on σ i. For radix 4 algorithm this value ranges from K=1.0 to K=2.52[9].This scale factor must be evaluated for each rotation angle and compensated. Let us define a variable w i as, w i =4 i Z i (3) A i [σ i ]=4 i tan-1(σ i * 4 -i ) Iteration Z can be expressed as, W i+1 =4(w i - A i [σ i ]) (4)

FPGA Design, Implementation and Analysis of Trigonometric Generators 3 The Eq. (4) is required to prove that the variable Z is bounded in each of the iterations.the selection interval for different values of iteration of eq. (1) can be obtained [4], For i=0, +2 if 5/8 Ŵ 0 +1 if 3/8 Ŵ 0 <5/8 σ 0 0 if -1/2 Ŵ 0 <3/8 (5) For i>0, -1 if -7/8 Ŵ 0 <-1/2-2 if Ŵ 0 <-7/8 +2 if Ŵ i 3/2 +1 if 1/2 Ŵ i <3/2 σ i 0 if -1/2 Ŵ i <1/2 (6) -1 if -3/2 Ŵ i <-1/2-2 if Ŵ i <-3/2 Here Ŵ i is the estimate of w i with three most significant bits. The selection function is true for both carry save redundant arithmetic and non redundant arithmetic [14]. 2.1 Scale factor The final coordinates obtained from the application of the micro rotations (1) do not match the result of rotation. The final coordinate are scaled by a scale factor (2). This factor is not constant as it depends on the sequence of σ i s and has to be calculated for each rotation angle. To provide compensation we have to calculate scale factor for each angle and perform direct multiplication to obtain the X and Y coordinates of the rotated vector. The scale factor can be calculated for n/4+1 micro rotation as for rest of the micro rotations it can be taken as one. The scale factors can be stored in a table. The size of the table is (3 n/4+1 *n). The Taylor series expansion of (2), K -1 =1-1/2* σ i 2 * 4-2i + 3/8* σ i 4 * 4-4i + (7) The K -1 can be approximated by the first two terms for i [n/8+1] or three terms if i [n/12+1]. Hence we can store only scale factor generated in the first [n/12+1] micro rotations, since the scale factor generated by micro rotations with i [n/12+1] can be calculated by means of addition and shift operations. The size of the table can be reduced to (3 n/12+1 * n).

4 J M Rudagi and Dr shaila subbaraman 3. Folded Architecture: [4] The architecture is designed for 32 precision. 3.1Unscaled CORDIC architecture This architecture consists of a preprocessing unit for carrying out π/2 rotations and three processing paths for X, Y, and W coordinates. It requires sixteen iterations to complete its operation. Every clock cycle produces intermediate values of X and Y. Fig 1: Architecture of un scaled folded 32 bit CORDIC processor 3.2 K -1 Calculation The scale factor is calculated in parallel with X, Y and W coordinates. It requires nine cycles (i.e. n/4+1).

FPGA Design, Implementation and Analysis of Trigonometric Generators 5 Fig 2: K -1 Computation 3.3 Scaled 32bit CORDIC architecture Two 32bit multipliers are used to provide compensated values of X and Y. Fig 3: Architecture of scaled folded 32 bit CORDIC processor 4. Implementation The CORDIC algorithm is verified using MATLAB tool (version7.10.0.499, R2010a ). The architecture is implemented on the available XC3S400-4PQ208 of the XILINX FPGA (ISE 10.1i). The proposed architecture has been modeled in VERILOG. The combinational path delay of folded 32bit architecture is 33.306ns. The latency of the folded 32bit architecture is n/2(i.e. 16) clock cycles. The

6 J M Rudagi and Dr shaila subbaraman throughput rate is one valid result per sixteen clock cycles. The folded 32bit architecture operates at 55.906MHz of the clock rate. The resource utilization of FPGA utilization for the architecture is given in Table 1. Table 1. Resource utilization by the folded 32bit CORDIC architecture on the target device XC3S400-4PQ208. The CORDIC employed uses circular co-ordinate system and is operated in rotating mode. Hence, only phase is given as input and x, y values are given internally in the program or hardware. The code written in verilog is simulated using modelsim XE III 6.3c. Fig.4 Top Level RTL Schematic

FPGA Design, Implementation and Analysis of Trigonometric Generators 7 Fig.5. RTL Schematic of scaled folded 32 bit CORDIC architecture Fig 6. Simulation result Fig 7. FPGA Implementation.

8 J M Rudagi and Dr shaila subbaraman 6 Comparative study The simulation results obtained are compared with that of MATLAB outputs. It is found that the percentage error is within 0.21%. Fig.8. Comparison of MATLAB results with simulated results Fig.9. Angle vs. % Error 6. Conclusion The folded CODIC architecture has been implemented on the available XILINX FPGA. The folded architecture operates at a frequency of 55.906MHz. Latency of the architecture is n/2 (i.e.16) clock cycles. The sine and cosine values generated by the architecture are comparatively accurate. The Trigonometric generators with optimized speed area can be used for real time applications.

FPGA Design, Implementation and Analysis of Trigonometric Generators 9 References [1] J.E. Volder, The cordic trigonometric computing technique, IRE Trans Electronic Computers 8 (1959) 330 334. [2] J.S. Walther, A unified algorithm for elementary functions, in: Proceedings of Spring. Joint Computer Conference, 1971, pp. 379 385. [3] E. Antelo et al., J. Villaba,J.D brugura,e L Zapata High performance rotation architectures based on radix-4 CORDIC algorithm, IEEE Transactions Computers 46 (8) (1997) 855 870. [4] Kaushik Bhattacharyya, Rakesh Biswas, Anindya Sundar Dhar, Swapna Banerjee, Architectural design and FPGA implementation of radix-4 CORDIC processor, Microprocessors and Microsystems 34 (2010) 96 101 [5] K.Hwang, Computer Arithmetic Principles, Architecture and Design, Wiley, NewYork, 1979 [6] Takafumi et al., High Radix CORDIC algorithm for VLSI signal processing, in IEEE Workshop on Signal Processing Systems (SIPS), November 1997, pp. 183 192 [7] M.G. Buddika Sumanasena, A scale factor correction scheme for the CORDIC algorithm, IEEE Transactions on Computers 57 (8) (2008) 1148 1152. [8] Villaba.J Lang et al., CORDIC architectures for the parallel compensation of scale factor, in: Poc, International conference on Application Specific Array Processors, ASAP 95, July 95, pp 258-269. [9] H.X.Lin, H.J.Sips, Online CORDIC algorithms, IEEE Transactions on Computers 39(8) (1990) 258-269.

10 J M Rudagi and Dr shaila subbaraman