An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients

Similar documents
The theory and design of a class of perfect reconstruction modified DFT filter banks with IIR filters

DUE to the high computational complexity and real-time

The Serial Commutator FFT

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

Novel design of multiplier-less FFT processors

AN FFT PROCESSOR BASED ON 16-POINT MODULE

RECENTLY, researches on gigabit wireless personal area

THE orthogonal frequency-division multiplex (OFDM)

TOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:

Twiddle Factor Transformation for Pipelined FFT Processing

Radix-4 FFT Algorithms *

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

ENT 315 Medical Signal Processing CHAPTER 3 FAST FOURIER TRANSFORM. Dr. Lim Chee Chin

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

M.N.MURTY Department of Physics, National Institute of Science and Technology, Palur Hills, Berhampur , Odisha (INDIA).

VHDL Implementation of Multiplierless, High Performance DWT Filter Bank

FAST FOURIER TRANSFORM (FFT) and inverse fast

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

Convex combination of adaptive filters for a variable tap-length LMS algorithm

Streamlined real-factor FFTs

Efficient Radix-4 and Radix-8 Butterfly Elements

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION

High-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation

MANY image and video compression standards such as

LOW-POWER SPLIT-RADIX FFT PROCESSORS

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST

LOSSLESS AUDIO COMPRESSION USING INTEGER MODIFIED DISCRETE COSINE TRANSFORM. Yoshikazu Yokotani and Soontorn Oraintara

A fuzzy k-modes algorithm for clustering categorical data. Citation IEEE Transactions on Fuzzy Systems, 1999, v. 7 n. 4, p.

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Fixed Point Streaming Fft Processor For Ofdm

Fatima Michael College of Engineering & Technology

Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems

Anu Kalidas Muralidharan Pillai and Håkan Johansson

New feature preserving noise removal algorithm based on thediscrete cosine transform and the a prior knowledge of pixel type.

Research Article Regressive Structures for Computation of DST-II and Its Inverse

HIGH SPEED REALISATION OF DIGITAL FILTERS

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Reversible Wavelets for Embedded Image Compression. Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder

FIR Filter Architecture for Fixed and Reconfigurable Applications

Decimation-in-Frequency (DIF) Radix-2 FFT *

Fast Orthogonal Neural Networks

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

A Novel Statistical Distortion Model Based on Mixed Laplacian and Uniform Distribution of Mpeg-4 FGS

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

FFT. There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies X = A + BW

An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

REAL TIME DIGITAL SIGNAL PROCESSING

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

FAST Fourier transform (FFT) is an important signal processing

A SIMULINK-TO-FPGA MULTI-RATE HIERARCHICAL FIR FILTER DESIGN

Image Compression System on an FPGA

HYBRID TRANSFORMATION TECHNIQUE FOR IMAGE COMPRESSION

Speed Optimised CORDIC Based Fast Algorithm for DCT

Area And Power Efficient LMS Adaptive Filter With Low Adaptation Delay

Architectures for Dynamic Data Scaling in 2/4/8K Pipeline FFT Cores

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

Design Optimization Techniques Evaluation for High Performance Parallel FIR Filters in FPGA

Efficient FFT Algorithm and Programming Tricks

MURDOCH RESEARCH REPOSITORY

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

DESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

Orthogonal Wavelet Coefficient Precision and Fixed Point Representation

DESIGN METHODOLOGY. 5.1 General

Simultaneous Optimization of Delay and Number of Operations in Multiplierless Implementation of Linear Systems

OVERSAMPLED filter banks have their applications in

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

PERFORMANCE ANALYSIS OF INTEGER DCT OF DIFFERENT BLOCK SIZES USED IN H.264, AVS CHINA AND WMV9.

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

Design of Navel Adaptive TDBLMS-based Wiener Parallel to TDBLMS Algorithm for Image Noise Cancellation

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS

A unified architecture of MD5 and RIPEMD-160 hash algorithms

Fault Tolerant Parallel Filters Based on ECC Codes

AREA-DELAY EFFICIENT FFT ARCHITECTURE USING PARALLEL PROCESSING AND NEW MEMORY SHARING TECHNIQUE

A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8

Packed Integer Wavelet Transform Constructed by Lifting Scheme

A 4096-Point Radix-4 Memory-Based FFT Using DSP Slices

Multiplierless Unity-Gain SDF FFTs

Efficient Methods for FFT calculations Using Memory Reduction Techniques.

Digital Signal Processing Lecture Notes 22 November 2010

Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

Fast Algorithm for the Modulated Complex Lapped Transform

Low Power Complex Multiplier based FFT Processor

AMONG various transform techniques for image compression,

FFT ALGORITHMS FOR MULTIPLY-ADD ARCHITECTURES

Using a Scalable Parallel 2D FFT for Image Enhancement

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

A STUDY OF ANALYSIS OF D.H.T. USING FORWARD AND INVERSE TRANSFORMS METHODOLOGY

Fast Block LMS Adaptive Filter Using DA Technique for High Performance in FGPA

THE discrete cosine transform (DCT) is the most popular

Keywords - DWT, Lifting Scheme, DWT Processor.

cosα -sinα sinα cosα 1 sinα cosα 1 sinα cosα 1 sinα cosα 1 sinα sinα sinα X 2 Y 2 cosα (a) (b) (c)

Efficient Shift-Adds Design of Digit-Serial Multiple Constant Multiplications

Transcription:

Title An efficient multiplierless approximation of the fast Fourier transm using sum-of-powers-of-two (SOPOT) coefficients Author(s) Chan, SC; Yiu, PM Citation Ieee Signal Processing Letters, 2002, v. 9 n. 10, p. 322-325 Issued Date 2002 URL http://hdl.handle.net/10722/42941 Rights 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material advertising or promotional purposes or creating new collective works resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.; This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

322 IEEE SIGNAL PROCESSING LETTERS, VOL. 9, NO. 10, OCTOBER 2002 An Efficient Multiplierless Approximation of the Fast Fourier Transm Using Sum-of-Powers-of-Two (SOPOT) Coefficients S. C. Chan and P. M. Yiu Abstract This letter proposes a new multiplierless approximation of the discrete Fourier transm (DFT) called the multiplierless fast Fourier transm-like (ML-FFT) transmation. It makes use of a novel factorization to parameterize the twiddle factors in the conventional radix-2 or split-radix FFT algorithms as certain rotation-like matrices and approximates the associated parameters using the sum-of-powers-of-two (SOPOT) or canonical signed digits (CSD) representations. The ML-FFT converges to the DFT when the number of SOPOT terms used increases and has an arithmetic complexity of ( log 2 ) additions, where =2 is the transm length. Design results show that the ML-FFT offers flexible tradeoff between arithmetic complexity and numerical accuracy in approximating the DFT. Index Terms Discrete Fourier transm (DFT), fast Fourier transm (FFT), radix-2 decimation-in-frequency (DIF), sum-of-powers-of-two (SOPOT). I. INTRODUCTION THE DISCRETE Fourier transm (DFT) is an important tool in digital signal processing [1]. A treasure of fast algorithms such as the Cooley Tukey fast Fourier transm (FFT), the prime factor algorithm (PFA) FFT, and the Winogard FFT is available to compute efficiently DFT of different lengths. In this letter, a new multiplierless approximation of the DFT, called the multiplierless FFT-like transmation (ML-FFT), using the radix- FFT algorithm and the sum-of-powers-of-two (SOPOT) or canonical signed digits (CSD) representation, is proposed. Generalization to the split-radix FFT (SR-FFT) algorithm is also possible. The present work is motivated by a recent interest in the efficient realization of fast transm algorithms using either integer [6] [8] or SOPOT [3] [5] representations. The main objective is to provide efficient multiplierless approximation to sinusoidal transms such as the DFT, which can be implemented efficiently either in software (say in microcontrollers with no on-chip multipliers, etc.), hardware, or very large scale integrated circuits (VLSI). Integer FFT [6] and sinusoidal transms [7], [8] are attractive when the transm sizes are small, because the wordlength growth of the intermediate data can still be accommodated in 16 32 bits without rounding. As a result, high accuracy can be achieved with simple integer arithmetic. However, it is very difficult to design such integer matrices when the transm length increases. Up to now, only order-8 integer FFT [8] and order-16 integer DCT [7] have been reported. The proposed multiplierless FFT-like (ML-FFT) transmation is based on the SOPOT representation of the twiddle factors in the conventional FFT algorithms using a special parameterization of the rotation-like matrices [3]. This parameterization allows us to implement both the ward and inverse transms with the same set of SOPOT coefficients. Furthermore, by using the periodic properties of the rotation-like matrix and the twiddle factors, the number of parameters to be optimized is reduced by a factor of four and the dynamic range of the parameters is limited between zero and one, greatly improving the numerical property. Unlike integer-based transmations, multiplierless transmations with length up to 1024 and higher can be obtained without much difficulty, and they converge to their ideal counterparts when the SOPOT terms used increases. The proposed approach differs from the multiplierless DCT of [5] in the parameterization used, which has the advantages of limited dynamic range and reduced number of parameters mentioned earlier. Design results show that the proposed ML-FFT offers a flexible tradeoff between arithmetic complexity and accuracy in approximating the DFT. Another point worth mentioning is that the proposed approach is also applicable to approximate other sinusoidal transms such as the four types of DCTs and DSTs [4] and the discrete Hartley transm (DHT) [9]. This letter is organized as follows, Section II is devoted to te proposed ML-FFT algorithm, where its principle and the parameterization of the rotation-like matrices into SOPOT coefficients are described. Section III gives the design method and several examples demonstrating its usefulness. Finally, conclusions are drawn in Section IV. II. ML-FFT ALGORITHM The DFT of a finite-length sequence of length is (2.1a) Manuscript received August 16, 2001; revised June 10, 2002. The associate editor coordinating the review of this manuscript and approving it publication was Dr. Giovanni Ramponi. The authors are with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong (e-mail: scchan@eee.hku.hk; pmyiu@eee.hku.hk). Digital Object Identifier 10.1109/LSP.2002.803408. where and. Its inverse, called the inverse DFT (IDFT), is given by (2.1b) 1070-9908/02$17.00 2002 IEEE

CHAN AND YIU: AN EFFICIENT MULTIPLIERLESS APPROXIMATION OF THE FAST FOURIER TRANSFORM 323 If is a power-of-two number, the DFT can be computed by the radix or the split-radix FFT. Without loss of generality, let us consider the decimation-in-frequency (DIF) FFT algorithm. From (2.1a), it can be shown that the even-numbered frequency samples can be computed via an -point DFT of the sequence as follows: Fig. 1. Flow graph computing an eight-point DFT using the DIF radix-2 FFT algorithm (W = e ). (2.2) Similarly, the odd-numbered frequency samples can be computed as an -point DFT of the sequence as shown in the following: is the rotation-like matrix that we are interested in and. The main difficulty in constructing a multiplierless transmation using the SOPOT representation of is that the coefficients of the matrix transmation and its inverse cannot, in general, be expressed in terms of SOPOT coefficients. If and in (2.4) are expressed directly in terms of SOPOT coefficients, say and, then the inverse of is As and are SOPOT coefficients, the term cannot, in general, be expressed as SOPOT coefficients. The basic idea of the proposed ML-FFT is to parameterize and its inverse as follows: (2.3) is called the twiddle factor. If is a a power-of-two number, then the DFT can be reduced successively using the above decomposition to eventually two-point DFTs. The arithmetic complexity is complex additions and complex multiplications (including possible trivial multiplications). Fig. 1 shows the flow graph of the eight-point DFT computed using the DIF FFT algorithm. The proposed ML-FFT algorithm approximates the twiddle factor multiplication, which can be implemented as rotations, using the SOPOT representation and a certain parameterization of the rotation-like matrix. More precisely, let be a complex number. The multiplication of with,, can be written as where (2.4) (2.5) Since these factorizations of and involve the same set of coefficients, i.e., and, they can be directly quantized to SOPOT coefficients to m (2.6) where and are, respectively, SOPOT approximations to and having the m, where, ; is the range of the coefficients, and is the number of terms used in each coefficient. In practice, it is usually limited to a small number so that the multiplication can be implemented with limited number of additions and shifts. Since the coefficient multiplications are now replaced by a limited number of additions, the transmation can be realized efficiently without multipliers. Interested readers are referred to [3] [5], [10] more references on efficient realization of digital filters using SOPOT coefficients. Direct application of (2.6) might lead to large dynamic range when is close to. Although the factorization of in (2.5) is unique a given value of, the periodic property

324 IEEE SIGNAL PROCESSING LETTERS, VOL. 9, NO. 10, OCTOBER 2002 of can be utilized to construct slight variation of this factorization so as to reduce the dynamic range of the coefficients, as well as the number of parameters to be optimized. More precisely, it can be shown that the quantities,,,, and can be expressed completely in terms of any one of the following quantities: TABLE I FROBENIUS NORM OF THE ERROR MATRIX IN APPROXIMATING THE DFT BY THE ML-FFT USING DIRECT QUANTIZATION (2.7a ) (2.7b) (2.7c ) TABLE II THE OBJECTIVE FUNCTION, THE FROBENIUS NORM, THE AVERAGE NUMBER OF SOPOT TERMS, AND THE COMPUTATIONAL COMPLEXITY FOR 8-, 16-, 32-, 64-, AND 128-CHANNEL REAL-VALUED FFT AND ML-FFT (MULT: MULTIPLICATIONS; ADD.: ADDITIONS) (2.7d ) Using these relationships, it is possible to limit within the range. Thus, the dynamic range of the parameters to be approximated by the SOPOT representation is always limited to the range, and the number of parameters to be optimized can be reduced by a factor of four. The mer allows us to limit the dynamic range of the parameters even if the transm length is very large, resulting in better numerical property. The approximations,,, and of their respective rotation-like matrices can be generated from by replacing with in (2.7). As all the twiddle factors in the radix-2 DIF FFT algorithm are just a subset of those in the first stage, there are altogether twiddle factors, including. Using the proposed parameterization, which has two parameters per rotation, there are approximately parameters in an -point ML-FFT. The SOPOT twiddle factors the radix-4, radix-8, and split-radix ML-FFTs can be generated similarly. III. DESIGN METHOD AND EXAMPLES The random search algorithm proposed in [4] is used to perm the discrete optimization. More precisely, a random vector with all its elements bounded by 1 is first multiplied by a scaling factor, and is added to the parameter vector containing the real-valued and. It is then quantized to the nearest SOPOT coefficients. The objective function is then evaluated this SOPOT candidate. The one with the best permance at a given number of additions is recorded. The search continues until the maximum allowed number of trials is exceeded. The scale factor controls the size of the neighborhood to be searched. A number of solutions with different tradeoffs between implementation complexity and permance are then obtained. Without loss of generality, the Frobenius norm of the derivation of the ML-FFT from its ideal counterpart is used as the objective function (3.1) where and are the th entries of the ML-FFT matrix and its ideal counterpart, respectively. Table I shows (in decibels) obtained by directly rounding the coefficients to the SOPOT representations without perming any optimization. It can be seen that the precision of the ML-FFT is directly related to the number of SOPOT terms used, which in turn affects its arithmetic complexity. Table II shows, the average SOPOT terms, and the computational complexities the 8-, 16-, 32-, 64-, and 128-channel ML-FFT obtained using the random search algorithm. The numbers of SOPOT terms are limited to five to simplify hardware implementation, though it is not necessary, say, in software implementation. The arithmetic complexities of the radix-4 FFT and ML-FFT are also shown as a comparison. It can be seen that the number of additions required the ML-FFT is approximately two times that of the radix-2 FFT algorithm an averaged number of SOPOT terms per coefficient of five. In general, if is the average number of the SOPOT terms per coefficient, then from (2.6), the arithmetic complexity of the radix- ML-FFT algorithm derived from a radix- FFT algorithm with an arithmetic complexity of complex multiplications and complex additions is approximately real additions. For simplicity, we have ignored those trivial multiplications in the real-valued algorithm. As an example, the radix-2 FFT algorithm has an arithmetic complexity of and. Then, the corresponding complexity of the ML-FFT algorithm is real additions. Further saving can be obtained by using the radix-4 (as illustrated in

CHAN AND YIU: AN EFFICIENT MULTIPLIERLESS APPROXIMATION OF THE FAST FOURIER TRANSFORM 325 TABLE III THE SOPOT COEFFICENTS OF THE 32-CHANNEL ML-FFT TABLE IV THE SOPOT COEFFICENTS OF THE 128-CHANNEL ML-FFT Fig. 2. Frequency response of the 32-channel ML-FFT using radix 2 DIF FFT algorithm. IV. CONCLUSION A new multiplierless approximation of the FFT called the multiplierless FFT-like transmation is proposed. It parameterizes the twiddle factors in the conventional radix- or split-radix FFT algorithms as certain rotation-like matrices and approximates the associated parameters using the SOPOT or canonical signed digits representations. The ML-FFT converges to the FFT when the number of SOPOT terms used increases and has an arithmetic complexity of additions, where is the transm length. Design results show that the ML-FFT offers flexible tradeoff between arithmetic complexity and numerical accuracy in approximating the FFT. Table II) and the SR-FFT algorithms. The details are omitted due to page length limitations. For illustration purposes, the parameters of the proposed 32- and 128-channel radix-2 DIF ML-FFT are also shown in Tables III and IV, respectively. Finally, it is noted from Table II that fewer than five average number of SOPOT terms per coefficient are needed to give a below 40 db. By increasing the number of SOPOT terms, the error between the ML-FFT and the ideal DFT can be reduced to an arbitrary small value. This allows one to choose an appropriate tradeoff between implementation complexity and numerical accuracy required in a given application. For example, in DFT filter bank, only the frequency characteristics of the final filter banks are of major concern. Theree, a lower average SOPOT term of say two to three will suffice a variety of applications [3], [4]. As an illustration, the frequency response of the 32-channel ML-FFT mentioned earlier in Table III is shown in Fig. 2. Its frequency response is almost visually identical to the ideal DFT. On the other hand, if the ML-FFT is used to compute circular convolutions with high accuracy, then a larger number of SOPOT terms is needed. REFERENCES [1] A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1989. [2] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Englewood Cliffs, NJ: Prentice-Hall, 1993. [3] S. C. Chan, W. Liu, and K. L. Ho, Perfect reconstruction modulated filter banks with sum-of-powers-of-two coefficients, in Proc. ISCAS, Geneva, Switzerland, May 28 31, 2000. [4] S. C. Chan and P. M. Yiu, Multiplier-less discrete sinusoidal and lapped transms using sum-of-powers-of-two (SOPOT) coefficients, in Proc. ISCAS, Sydney, Australia, May 6 9, 2001. [5] T. D. Tran, A fast multiplier-less block transm image and video compression, in Proc. ICIP, vol. 3, 1999, pp. 822 826. [6] S. C. Pei and J. J. Ding, The integer transms analogous to discrete trigonometric transms, IEEE Trans. Signal Processing, vol. 48, pp. 3345 3364, Dec. 2000. [7] W. K. Cham and Y. T. Chan, An order-16 integer cosine transm, IEEE Trans. Signal Processing, vol. 39, pp. 1205 1208, May 1991. [8] A. Nallanathan, S. C. Chan, and T. S. Ng, Lapped orthogonal transm with integer coefficients, in Proc. 42nd Midwest Symp. Circuits and Systems, vol. 2, 2000, pp. 1065 1068. [9] S. C. Chan, Multiplier-less approximation of DFT and DHT, Dept. Elect. Electron. Eng., The Univ. Hong Kong, Int. Rep., May 2000. [10] Y. C. Lim and S. R. Parker, FIR filter design over a discrete power-of-two coefficient space, IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-31, pp. 583 591, June 1983.