A STUDY OF ANALYSIS OF D.H.T. USING FORWARD AND INVERSE TRANSFORMS METHODOLOGY

Similar documents
An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients

Abstract. Literature Survey. Introduction. A.Radix-2/8 FFT algorithm for length qx2 m DFTs

Novel design of multiplier-less FFT processors

DUE to the high computational complexity and real-time

AN FFT PROCESSOR BASED ON 16-POINT MODULE

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

THE orthogonal frequency-division multiplex (OFDM)

Twiddle Factor Transformation for Pipelined FFT Processing

The Serial Commutator FFT

FAST FOURIER TRANSFORM (FFT) and inverse fast

MULTIPLIERLESS HIGH PERFORMANCE FFT COMPUTATION

A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST

DESIGN OF PARALLEL PIPELINED FEED FORWARD ARCHITECTURE FOR ZERO FREQUENCY & MINIMUM COMPUTATION (ZMC) ALGORITHM OF FFT

FIR Filter Architecture for Fixed and Reconfigurable Applications

FPGA Implementation of Discrete Fourier Transform Using CORDIC Algorithm

Parallel-computing approach for FFT implementation on digital signal processor (DSP)

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Implementation of Lifting-Based Two Dimensional Discrete Wavelet Transform on FPGA Using Pipeline Architecture

LOW-POWER SPLIT-RADIX FFT PROCESSORS

IJSRD - International Journal for Scientific Research & Development Vol. 4, Issue 05, 2016 ISSN (online):

FAST Fourier transform (FFT) is an important signal processing

Fault Tolerant Parallel Filters Based on ECC Codes

Analysis of Radix- SDF Pipeline FFT Architecture in VLSI Using Chip Scope

Keywords - DWT, Lifting Scheme, DWT Processor.

Implementation of a Unified DSP Coprocessor

Three-D DWT of Efficient Architecture

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Computing the Discrete Fourier Transform on FPGA Based Systolic Arrays

Research Article Regressive Structures for Computation of DST-II and Its Inverse

Design of Delay Efficient Distributed Arithmetic Based Split Radix FFT

Speed Optimised CORDIC Based Fast Algorithm for DCT

Fused Floating Point Arithmetic Unit for Radix 2 FFT Implementation

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

Fixed Point Streaming Fft Processor For Ofdm

An Enhanced Mixed-Scaling-Rotation CORDIC algorithm with Weighted Amplifying Factor

FPGA Based Design and Simulation of 32- Point FFT Through Radix-2 DIT Algorith

Research Article Design of A Novel 8-point Modified R2MDC with Pipelined Technique for High Speed OFDM Applications

DESIGN METHODOLOGY. 5.1 General

International Journal of Innovative and Emerging Research in Engineering. e-issn: p-issn:

M.N.MURTY Department of Physics, National Institute of Science and Technology, Palur Hills, Berhampur , Odisha (INDIA).

ARITHMETIC operations based on residue number systems

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

A Normal I/O Order Radix-2 FFT Architecture to Process Twin Data Streams for MIMO

Analytical Evaluation of the 2D-DCT using paralleling processing

AMONG various transform techniques for image compression,

IMPLEMENTATION OF AN ADAPTIVE FIR FILTER USING HIGH SPEED DISTRIBUTED ARITHMETIC

4. Image Retrieval using Transformed Image Content

Digital Signal Processing. Soma Biswas

Design of 2-D DWT VLSI Architecture for Image Processing

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

The Fast Fourier Transform Algorithm and Its Application in Digital Image Processing

AN EFFICIENT DESIGN OF VLSI ARCHITECTURE FOR FAULT DETECTION USING ORTHOGONAL LATIN SQUARES (OLS) CODES

The Fast Fourier Transform

TOPICS PIPELINE IMPLEMENTATIONS OF THE FAST FOURIER TRANSFORM (FFT) DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT) Consulted work:

Latest Innovation For FFT implementation using RCBNS

Fast Block LMS Adaptive Filter Using DA Technique for High Performance in FGPA

Research Article International Journal of Emerging Research in Management &Technology ISSN: (Volume-6, Issue-8) Abstract:

VHDL Implementation of DIT-FFT using CORDIC

Algorithm of efficient computation DST I-IV using cyclic convolutions

A Novel Distributed Arithmetic Multiplierless Approach for Computing Complex Inner Products

FPGA Implementation of CORDIC Based DHT for Image Processing Applications

Low Power Complex Multiplier based FFT Processor

An Efficient Design of Sum-Modified Booth Recoder for Fused Add-Multiply Operator

Performance Analysis of CORDIC Architectures Targeted by FPGA Devices

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

IMPLEMENTATION OF DOUBLE PRECISION FLOATING POINT RADIX-2 FFT USING VHDL

A New Approach to Compressed Image Steganography Using Wavelet Transform

A Pipelined Fused Processing Unit for DSP Applications

INTEGER SEQUENCE WINDOW BASED RECONFIGURABLE FIR FILTERS.

Image Transformation Techniques Dr. Rajeev Srivastava Dept. of Computer Engineering, ITBHU, Varanasi

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

MCM Based FIR Filter Architecture for High Performance

HIGH SPEED REALISATION OF DIGITAL FILTERS

Fast Orthogonal Neural Networks

Efficient Radix-4 and Radix-8 Butterfly Elements

Efficient Methods for FFT calculations Using Memory Reduction Techniques.

Implementation of a Low Power Decimation Filter Using 1/3-Band IIR Filter

Realization of Hardware Architectures for Householder Transformation based QR Decomposition using Xilinx System Generator Block Sets

FPGA IMPLEMENTATION OF DFT PROCESSOR USING VEDIC MULTIPLIER. Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, India

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

A Parallel Reconfigurable Architecture for DCT of Lengths N=32/16/8

High-Performance 16-Point Complex FFT Features 1 Functional Description 2 Theory of Operation

Image Compression System on an FPGA

Filter Banks with Variable System Delay. Georgia Institute of Technology. Abstract

An Efficient High Speed VLSI Architecture Based 16-Point Adaptive Split Radix-2 FFT Architecture

FFT. There are many ways to decompose an FFT [Rabiner and Gold] The simplest ones are radix-2 Computation made up of radix-2 butterflies X = A + BW

Low-Power FIR Digital Filters Using Residue Arithmetic

Reversible Wavelets for Embedded Image Compression. Sri Rama Prasanna Pavani Electrical and Computer Engineering, CU Boulder

STUDY OF A CORDIC BASED RADIX-4 FFT PROCESSOR

Implementation of a Fast Sign Detection Algoritm for the RNS Moduli Set {2 N+1-1, 2 N -1, 2 N }, N = 16, 64

Pipelined Quadratic Equation based Novel Multiplication Method for Cryptographic Applications

Batchu Jeevanarani and Thota Sreenivas Department of ECE, Sri Vasavi Engg College, Tadepalligudem, West Godavari (DT), Andhra Pradesh, India

ON CONFIGURATION OF RESIDUE SCALING PROCESS IN PIPELINED RADIX-4 MQRNS FFT PROCESSOR

Linköping University Post Print. Analysis of Twiddle Factor Memory Complexity of Radix-2^i Pipelined FFTs

RECENTLY, researches on gigabit wireless personal area

Energy Optimizations for FPGA-based 2-D FFT Architecture

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

Implementation of Efficient Modified Booth Recoder for Fused Sum-Product Operator

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

Reconfigurable Architecture for Efficient and Scalable Orthogonal Approximation of DCT in FPGA Technology

Transcription:

KAAV INTERNATIONAL JOURNAL OF ARTS, HUMANITIES & SOCIAL SCIENCES A REFEREED BLIND PEER REVIEW QUARTERLY JOURNAL KIJAHS / OCT-DEC (2018) /VOL-5/ISS-4/A1 PAGE NO.1-13 ISSN: 2348-4349 WWW.KAAVPUBLICATIONS.ORG A STUDY OF ANALYSIS OF D.H.T. USING FORWARD AND INVERSE TRANSFORMS METHODOLOGY 1 NITIN KUMAR BENJAMIN 1 Research Scholar, Kalinga University, Raipur 2 Dr. RISHIKANT AGNIHOTRI 2 Supervisor, Professor Received October 05 th, 2018; Revised November19 th, 2018; Accepted December 10 th, 2018 ABSTRACT Digital signal processing (DSP) includes processing of data in various domains based on their applications. DSP has vast applications in various fields such as space, medical, commercial, industrial and scientific [1]. Each requires processing of vast data for collecting useful information. Transform is a technique used in DSP for converting one form of data in another. A family of transform is available in DSP for data processing.fourier analysis one of the oldest technique used in this family [2]. Fourier analysis is named after Jean Baptiste Joseph Fourier (1768-1830) a French mathematician and physicist. It was used for periodic continuous signals. Fourier series is a technique which decomposes a signal in time domain into a no. of sine and cosine waves in frequency domain. 1. Introduction Digital signal processing (DSP) includes processing of data in various domains based on their applications. DSP has vast applications in various fields such as space, medical, commercial, industrial and scientific [1]. Each requires processing of vast data for collecting useful information. Transform is a technique used in DSP for converting one form of data in another. A family of transform is available in DSP for data processing.fourier analysis one of the oldest technique used in this family [2]. Fourier analysis is named after Jean Baptiste Joseph Fourier (1768-1830) a French mathematician and physicist. It was used for periodic continuous signals. Fourier series is a technique which decomposes a signal in time domain into a no. of sine and cosine waves in frequency domain. But it was not applicable for nonperiodic signals. Then came Fourier transform into existence which removes the drawback of Fourier series and thus can be used for non-periodic continuous signals [3]. Fourier transform is a mathematical tool using integrals. But Fourier transform is not suitable for non-stationary signals. Since both transforms are not applicable for discrete signals, so there is a need for new transform for discrete signals. Discrete time Fourier transform (DTFT) is used for signals that extend from positive to negative infinity but are not periodic. DTFT is not used for periodic discrete signals so discrete Fourier transform (DFT) can into existence. DFT is a discrete numerical equivalent of FT using summation instead of integrals. DFT is used for signals that repeat themselves in periodic fashion extending from positive to negative infinity. FFT is improvement of DFT in which computation has becomes faster [4]. All the family members of Fourier till now works on complex values which requires large storage space 1

and computationally complex in nature. So, now comes a new member of transform called Discrete Hartley transform (DHT) which converts real values into real values. Therefore, it needs lesser storage space and less computational complexity. There has been an expected rapidly growing interest in, and development of, secure communication techniques in relation to the activities of military services, banking systems and other systems where degree of secured speech signal transmission plays a major role. Scrambling is used to keep the secrecy of speech signal over unauthorized listeners. It is simply disordering of the speech signal so that it is no longer intelligible. The original speech signal can be recovered by the intended receiver through appropriate descrambling technique. Among speech scramblers, analog speech scramblers are considered due to their wide applicability. The scrambling techniques could be classified as time domain and frequency-domain scrambling. In time-domain scrambling, speech signals are divided into small time interval units and these units are permuted [5].As these units could be as small as just one sample, scrambling results in bandwidth expansion. This can lead to loss of signal out of band of the channel and thereby degrading the speech quality. In frequency-domain scrambling, speech signals are separated into several sub-bands and these sub-bands are then permuted. It ensures the original bandwidth is kept unchanged. In the frequency- domain, the first algorithms used were based on Fast Fourier Transform (FFT) technique, where the FFT coefficients are permuted frame to frame [6]. Digital signal processing (DSP) includes processing of data in various domains based on their applications. DSP has vast applications in various fields such as space, medical, commercial, industrial and scientific. Each requires processing of vast data for collecting useful information [1]. Transform is a technique used in DSP for converting one form of data in another. A family of transform is available in DSP for data processing.fourier analysis one of the oldest technique used in this family. Fourier analysis is named after Jean Baptiste Joseph Fourier (1768-1830) a French mathematician and physicist. It was used for periodic continuous signals [2-3]. Fourier series is a technique which decomposes a signal in time domain into a no. of sine and cosine waves in frequency domain. But it was not applicable for nonperiodic signals Then came Fourier transform into existence which removes the drawback of Fourier series and thus can be used for non- periodic continuous signals. Fourier transform is a mathematical tool using integrals [3]. But Fourier transform is not suitable for non-stationary signals. Since both transforms are not applicable for discrete signals, so there is a need for new transform for discrete signals [4]. Discrete time Fourier transform (DTFT) is used for signals that extend from positive to negative infinity but are not periodic. DTFT is not used for periodic discrete signals so discrete Fourier transform (DFT) can into existence. DFT is a discrete numerical equivalent of FT using summation instead of integrals. DFT is used for signals that repeat themselves in periodic fashion extending from positive to negative infinity. FFT is a improvement of DFT in which computation has becomes faster [4]. All the family members of Fourier till now works on complex values which requires large storage space and computationally complex in nature but Discrete Hartley transform (DHT) converts real values into real values. Therefore, it needs lesser storage space and less computational complexity [5]. The classical split-radix algorithm is difficult to implement on VLSI due to its irregular computational structure and due to the fact that the butterflies significantly differs from stage to stage. Thus, it is necessary to derive new such algorithms that are suited for a parallel VLSI system. In this brief, a new VLSI DHT algorithm that is well suited for a VLSI implementation on a highly parallel and modular architecture. It can be used for designing a completely novel VLSI architecture for DHT. Discrete Hartley Transform is abbreviated for DHT and this transform was proposed by R. V. L. Hartley in 1942 [5]. DHT is the analogous to Fast Fourier transform which provides the only real value at any cost. The main difference from the DFT is that it transforms the real inputs to real outputs with no intrinsic involvement of complex value. DFT can be used to compute the DHT, and vice versa [6]. 2

BACKGROUND A Vedic multiplier provides lesser gates and enhanced speed for processor which can be designed by using half adder, full adder and novel compressor. Multipliers are Booth multiplier, modified booth multiplier, high speed multiplier and (4:2and 7:2) compressor. Among them 7:2 compressor is much more high speed adder and lesser area multiplier designing technique. Discrete Hartley transform is used to convert real values into real ones. It requires decomposition of data into stages using butterfly similar to FFT. But the butterfly used in DHT is quite different in terms of coefficients or multipliers. With the increase in number of DHT sequence length the number of coefficients is also increased simultaneously. We have proposed 16 point DHT butterfly with each data sequence is of 8 bit. We present an implementation of fast DHT algorithm for a length N=1. There are six stages required to complete the butterfly design of N=16 length DHT. These stages include summing stages and coefficient multiplying stages. An N-point one dimensional DHT XH of a sequence x(n) is defined as Algorithms An algorithm for HT analogous to FFT is the fast Hartley transform (FHT) algorithm [9]. This actually changed the way people looked at HT. This led to an opening for many researchers to develop algorithms for computing DHT. FHT performs DHT in a time proportional to N log2 N utilizing decimation-in-time (DIT). DHT is a substitute for DFT; however, if the real and imaginary parts of DET are explicitly required then they are directly obtainable as the even and odd parts of DHT. HT, its relation with FT, theorems, properties, matrix formulation, and fast algorithms are discussed in [10]. Over the years, DHT has established as a potential tool for signal processing applications [11]- [13]. Several algorithms for its fast computation and opinions regarding them are reported. Meckelburg and Lipka present a decimation-infrequency (DIF) Fl-IT algorithm [14] claiming it to be faster than the one in [9]. Sorenson et a!. [15] further analyze FHT having the same decomposition as [9], using the index mapping approach, implement the algorithms for both DIT and DIF, and verify their operational complexities to be the same. Prado [16] presents an in-place version of FHT along with its operational complexity. The signal flow diagram originally proposed in [9] is restructured for clarity, and by applying the transposition theorem Kwong and Shiu [17] obtain a DIF algorithm having the same operational complexity. The above approaches require computation of the cosine coefficients (CCs) and sine coefficients (SCs) which are stagedependent. Hou [18] concludes that FHT algorithm, in essence, is a generalization of CooleyTukey FFT algorithm, but it requires only real, as compared to complex, arithmetic operations in any standard FFT. Malvar [19] presents a new factorization of DHT which involves discrete. cosine transform (DCT). His algorithms minimize the multiplications at the expense of an increased number of additions. Hao [20] examines both the pre- and post-permutation algorithms in [9] and [14] and suggests improvements to make them faster by use of fast rotation to reduce the multiplications and by incorporation of in-place or distributed permutation. Rathore [21] reports that, for both the OTT in [9] and the DIF in [14], the operational complexity involved is the same. He further utilizes the matrix approach, derives properties of DHT [22], obtains the relations 3

for computational complexity and presents DHTbased-DFT and DFT-based-DHT algorithms. Rathore [23] presents a composite radix algorithm based on the matrix approach [22], applicable for any data length. Patwardhan [24] presents a mixed radix OTT DHT algorithm for an arbitrary data length. Further, Rathore [25] presents a general radix algorithm for DHT. Ru et al. [26] generalize DHT into four classes, odd DHT, inverse odd DHT, odd-squared DHT and inverse oddsquared DHT and derive fast algorithms for the resulting transforms. Zang [27] points out that these are similar to discrete W transforms. Prabhu and Nagesh 128] present radix- 3 arid -6 DIP PTIT algorithms which are derived by pairing the rotating factors with an appropriate reordering of the input sequence. Pci and Wu [29] present the split-radix algorithm based on both even-term radix- 2 decompositions and odd-term radix-4 decompositions simultaneously for the fast computation of the DHT. Bracewell [30] points out that the radix -4 transform can also be utilized as an alternative to split-radix when data lengths are powers of 2. This can be done by splitting the data sequence into two interleaved pairs and applying the radix-4 algorithm to each in turn simultaneously and combining the results. Bi and Yan [31]-[32] present split-radix algorithms which combine flexibility and regularity of various radix algorithms, allow for computations of DHT for various sequence lengths, and require a lesser operational count than the fixed radix algorithms. Bouguezel et a? [33] present an algorithm using a mixture of radix-2 and radix-8 index maps in the computation of DHT of an arbitrary length N = q x 211, where q is an odd integer. The algorithm is expressed in a simple matrix form and it facilitates easy implementation and allows for an extension to multidimensional cases. Chiper et a? [34] present a systolic algorithm that uses the advantages of cyclic convolution structure for the VLSI implementation of a prime length DHT. Meher et a? [35] present a new formulation using cyclic convolutions that leads to modular structures consisting of simple and regular systolic arrays for concurrent pipelined realization of the DHT. Their structures for direct memory-based implementation offer more throughput than their distributed-arithmetic structures which offer less memory complexity. Nevertheless, there is a strong need to compute the transform at a high speed to meet the requirements of real-time signal processing. This thesis presents a method to compute the elements of DHT matrix MN. It identifies and proves the characteristics of RN [361. It develops and Implements the position-based method (PBM) 1371. FBM reduces the time required to compute the elements of HN as compared 1o the definition-based method (DBM). PBM is extended to compute the DHT utilizing 5Imple matrix multiplication. However, it is found to be slower than the existing radix-2 Fl-IT algorithm by Bracewell [9]. The existing radix-2 FHT algorithms in [9], [14]-[17] are studied with respect to their matrix formulation, signal flow diagram and operational complexities. This thesis presents Modified Radix-2 D1T [38] and DIP [391 algorithms which have a lesser operational complexity than those in [91, [14]- [171. It presents the Modified Radix4 Algorithm [40] which has a lesser operational complexity than those in [101 and [151. It presents the signal flow diagram for a DIT split- radix algorithm which modifies the DIF split-radix algorithm in [151. However, the operational complexity is the same. It finally presents the general-radix to cater to an arbitrary value of N. Architectures Various architectures are reported in the literature to compute DHT. Chakrabarti and Jaja [41] propose a modular bit-level systolic architecture. Dhar and Banerjee [42] employ a set of linear arrays of Givens rotors with a suitable implementation of the Givens rotor using add/subtract units and hardwired shifters. Chang and Lee [43] derive two models of linear systolic arrays and suggest the use of cordic algorithms to make the systolic arrays more efficient in computation. Hsiao ci al. [44] modify the above cordic processor and obtained a higher throughput and cost effective architecture. Kar and Rao [45] propose a unified systolic architecture for sliding window computation of discrete transforms. 4

Nayak and Meher [46] implemented a bit-level systolic architecture for discrete orthogonal transforms using a serial-parallel vector-matrix multiplication scheme based on the L3augh-Wooley algorithm. Guo [47, 48] presents two architectures; one using parallel adders and the other using a distributed arithmetic based array that utilize identical ROM modules and eliminate the accumulation loop in the processing elements. Amira and Bouridane [49, 50] present architectures to implement DHT on field programmable gate arrays. Meher ci al. [51] present a design framework for scalable and modular memory based implementation of DHT in systolic hardware. Those architectures compute DHT using digital VLSI techniques. There are architectures which compute DHT based on analog blocks. Cuihane et at. [52] present an analog circuit which utilizes a linear programming neural net to compute DHT. The architecture is not modular and has a limited range of N. Raut et at. [53] present basic switched capacitor building blocks in systolic array architecture to implement DFT. The architecture is modular but utilizes a four phase clocking scheme. Kawahito et at. [54] present a two dimensional DCT based image compression structure designed with fully differential switched- capacitor circuits. It utilizes a variable quantization level analog-to-digital converter, where the compression ratio can be flexibly changed according to the desired image type and quality, however, with an increase in the complexity. Chen et at. [55] present digitally controlled weighted summation analog circuits which may be utilized for computing DFT, DCT and DWT. They carry out the weighted sum operations in the analog domain, work in the voltage mode and omit the AD conversion reducing the power dissipation. Mal and Dhar [56, 571 present analog sampled data architectures for DHT. They utilize a switched resistor or capacitor block, integrators and a crosspoint switch array with a digital controller. The architecture is based on the multiply and accumulate approach and designed for sequential data samples. Its accuracy is dependent on the matching of resistors and capacitors responsible for setting the kernel coefficwnts, Reconfigurable analog arrays and dubbed fieldprogrammable analog arrays can speed the transition of systems from digital to analog by providing the ability to rapidly implement advanced, low-power signal processing systems [58]. The drive towards analog integrated circuits has demanded the development of high performance analog circuits that are reconfigurable and suitable for CAT) methodologies. The architectures which compute DHT based on analog blocks [52j-[571 are mixedmode signal processing architectures. The operation of analog circuits is controlled by digital signals to provide a good solution as they are simple, modular and easy to implement in real time [59]. This thesis presents new architectures to implement the modified algorithms in [381-1401. It presents basic analog circuits designed to perform both the summing structure and multiplying structure operations. Their sensitivities to passive component variations as defined in [60] are computed. Unlike the neural net approach in [52], the architectures are modular and can be scaled for large values of N. The developed architecture processes the data simultaneously at each stage and is therefore faster than those based on the multiply and accumulate approach [56, 57]. The architectures for both the radix-2 DITA and DIFA are tested for the forward and inverse DHT transformations using Orcad PSpice [61] [63]. The architecture is further successfully extended to implement the radix4 and spilt-radix DHT algorithms [64, 65]. The hardware implementations of the circuits for the architectures is done in the laboratory for small values of N. Characteristics of the DHT Matrix To have a feel for the characteristics, the matrices for various values of N are distinguished in four major categories: (i) N = odd, (ii) N = 2m, m odd, (iii) N = 4m, m odd and (iv) N= 8m, where in is an integer. The corresponding matrices considered are H 7 for N =7, H 6 for N = 6, H- 12 for N = 12 and H 16 for N= 16, and shown in Figs. 2.1 (a) - (a), respectively. 5

6

On critically examining HN and the submatrix SN-1 (obtained after deleting the 0th brow and 0th column of HN) for various values of N the following characteristics are identified and proved analytically. Some of these characteristics are applicable in general for any value of N and others are based on their categories. The general characteristics are identified and proved analytically. 1. The 0th row and 0th column elements have a value 7

SN- 1 is symmetrical about the forward diagonal. The element h, has a corresponding element h (N-J)(N-1) about the backward diagonal for all values of i and j except the elements on the backward diagonal. The value of (N J) (N i) modulo N is the same as ij modulo N, since (N j)(n i) = N 2 (i + j) N + ij. Thus h 3, h(n-j)(n-i) making SN-1 symmetrical about its backward diagonal. Hence, SN-1 is symmetrical about its both the diagonals. The characteristics of the matrices for various values of N as distinguished in four major categories are identified as N/2 elements in [he 1 row and 1 column1 having distinct magnitudes repeat 2 times, once with opposite sign. There are elements in the 1 row and column corresponding to 0 and N/2. The element at 0 has a value of I and there is a corresponding element at N/2 with the same magnitude but opposite sign There are 2 intervals, 0 to N/2 and N/2 to?j Within each Interval the number of elements is m - 1. In the interval from 0 to N/2, all the elements are distinct and each one has a corresponding element in the interval from N/2 to N with the same magnitude but opposite sign. N/4 elements in the is! row and 1 column, having distinct magnitudes repeat 4 times, twice with opposite sign. There are elements in the 1 row and column corresponding to 0, N/4, N/2, and 3N/4 The element at 0 has a value of 1 and there is a corresponding element at N/4 with the same magnitude and sign There are corresponding elements at N/2 and 3N/4 with same magnitudes but opposite signs. There are 4 intervals, of equal lengths N/4, from 0 to N. Within each interval, the number of elements im interval from 0 to N/4, a pair of elements means two elements having the same magnitude and sign. Each pair of elements in this interval has a corresponding pair of elements in the interval from N/2 to 3N/4 with the same magnitude but opposite sign. In the interval from N/4 to N/2, a pair of elements means two elements having the same magnitude but opposite sign. Each pair of elements in this interval has a corresponding pair of elements in the Interval from 3N/4 to N with the same magnitude but opposite sign. elements in the 1 row and 1t column having distinct magnitudes repeat 4 times, twice with opposite sign. One element having magnitude 1.414 (peak value at N = N/8) repeats 2 times, once with opposite sign (valley value of -1.414 at N = 5N/8). Two elements have value 0 at N = 3N1 and N 7N/8. There are elements in the 1 row and 1 column corresponding to 0, N/S, N/4, 3N/8, N/2, 5N/8, 3N/4 and 7N/8. The element at 0 has a value of I and there is a corresponding element at N/4 with the same magnitude and sign. There are corresponding elements at N/2 and 3N/4 with same magnitudes but opposite signs. The element at N/8 will have a corresponding element at 5N/8 with the same magnitude hut opposite sign. The element at 3N/ 8 will have a corresponding element at 7N/8 having value 0. There are 8 intervals, of equal lengths N/8, from 0 to N. Within each interval, the number of elements is m -1. In the interval from 0 to N/8, all the elements are distinct and each one has a corresponding element in the interval from N/8 to N/4 with the same magnitude and sign. There are corresponding elements in the intervals from N/2 to 5N/8 and 5N/8 to 3N/4 with the same magnitudes but opposite signs. Similarly in the interval from N/4 to 3N/8, all the elements are distinct and each one has a corresponding element in the interval from 7N/8 to N with the same magnitude and sign. There are corresponding elements in the intervals from 3N/8 to N/2 and 3N/4 to 7N/8 with the same magnitudes but opposite signs. Definition-Based Method In the definition based method (DBM) the elements of the 0th row and the 0th column of HN are directly assigned the value of unity1 which requires assignments to be done 2N - 1 times, The other (N - 1) 2 elements of the matrix are calculated based on its definition which requires the calculation of case ( ) to be done (N-1) 2 times. These are the actual operations required to compute H N by DBM. A flowchart for DBM is shown in Fig. 8

A simple program has been implemented in C and listed in appendix-b The program does the initialization and prompts for the value of N On receiving the value of N from the user, it prompts for the input sequence. As the input sequence is given by the user, it is stored in an array. Once the entire input sequence is obtained the computation of HN starts. The elements of the 0th row and the 0th column of HN are assigned the value of unity. Setting the row index to zero and using for loop for the column indices from 0 to N-1, the value of unity is assigned to the 0th row elements. Then setting the column index to zero and using for loop for the row indices from 0 to N -1, the value of unity is assigned to the 0th column elements. The other (N-1) 2 elements of the matrix are then computed. This is implemented using two for loops, the inner loop for the column indices from 1 to N -1 and the outer loop for the row indices from I to N-1. Within the inner loop, the element is computed based on its definition which requires the calculation of cas( ). In this manner the (N-1) 2 elements of the matrix are computed. This portion of the program is repeated in a loop to get an estimate of the relative computation time. The program further displays the column numbers, the row numbers as well as the elements of the entire matrix row wise. Finally, the DHT of the given sequence is computed using simple matrix multiplication and displayed. These are utilized to check the validity of the program and the results displayed are cross-checked with hand calculations and tables available in literature. The input sequences applied to the program are not only fixed patterns such as ramp, impulse, step or sinusoidal whose outputs can be validated easily but also random 9

patterns for validation of which the inverse transformation is performed on the output to retrieve back the input pattern. Position-Based Method In the position based method (PBM), the elements of the 0h row and the Qth column of HN are directly assigned the value of unity, which requires assignments to be done 2N - I times Depending on the value of N, H N has different number of distinct magnitude elements in the 1st row. The characteristics of H N are used to directly assign values to some of these and the others are computed using the definition. Once they are obtained, the remaining elements of the row are assigned values based on the characteristics identified and relationships proved. When all the elements of the 11t row are obtained, the operation j modulo N is performed on particular elements within S1. Using the result of this operation and the pointers, value of the required element in the 1 row is assigned to that particular element. Making use of the symmetry properties of SN-I, other element values are assigned based on their positions. These are the various operations required to compute H N by PBM. Table 2.1 compares the number of operations involved in computing the elements of FIN using DBM and PBM. A program based on this method has been implemented in C and listed in appendix-b. It defines the macro COMPUTE for computation of an element based on its definition, It directly assigns the element values of the 0th row arid column. Depending on the value of N, some elements of the first row are directly assigned values; some are computed using the macro COMPUTE and the rest are assigned values by substitution. This program has more types of operations as compared to the one using the DBM. A flowchart for PBM is shown in Fig. 2.3. 10

11

The 1 operation involves direct assignment of value and requires no computation. The 2nd operation involves only integer multiplication and division followed by assignment of value. This operation can be compared to the and sine function to obtain cos(.) and sin(.), and finally their addition to obtain Programs have been written in C separately for each of these operations and estimates of relative execution times have been calculated by executing programs to perform the required operations various times in a loop. It has been observed that, if the execution time required for direct assignment of value is r, then the time required for calculation of ij modulo N and assignment of value is approximately 5x and that for calculation of cas- or cas is approximately 20x. The number of times the cas function is to be calculated is reduced in the PDM which significantly reduces the computation time as only a few typical elements have to be computed using the definition (cas function) The operation ij modulo N is performed only on particular elements within S N-1. Making use of the symmetry properties of S N-1, other element values are assigned based on their positions. The PBM can be applied for any value of N Though it uses different operations, on the whole it is faster in computing the elements than the DUM. 2. Results and Discussion number of operations involved is the same. However, the relative computation time of the operation depends on the computation involved within the operation. Programs have been written for the methods and in order to get observable values of relative computation time required for the computations, similar portions of the program that perform the computation of the elements are repeated in a loop. The programs have been executed on the Zenith personal computer with a Pentium 4 processor at 2.93 GH.z. The relative computation times observed on executing the programs for different sequences of length N are shown in Fig. 2.4. In the methods for computation of the elements of H N it has been observed that the overall The programs written for the DBM and PBM are compared and the relative computation time of PBM is lesser than DBM. These programs which compute the DHT matrix are further enhanced to perform simple matrix multiplication to compute the DHT of a given input sequence. They are found to obtain the DI-{T. A program has also been written in C to compute the DHT for the fast Hartley transform algorithm (FHTA) presented by Braceweli [91 and listed in appendix-b. The programs based on DI3M 12

and P13M are found to be slower than FHTA which performs a transformation on the 1)1-IT matrix by decomposition to obtain the DHT. The algorithm based on PBM needs to incorporate the decomposition of HN rather than simple matrix multiplication in order to compare with FI-ITA. The method is extended to obtain modified radix-2 and radix-4 algorithms in the next chapter. 3. References 1. A. V. Oppenheim and R. W. Schafer, Discrete Time Signal Processing, 2nd ed., Englewood Cliffs, NJ: Prentice HaIl, 1999. 2. J. G. Proakis and D. G. Manolakis, Digital Signal Processing, Principles, Algorithms and Applications, 3rd ed., Upper Saddle River, NJ: Prentice Hall, 1996. 3. J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, J. Math. Coin put., vol. 19, pp. 297-301, 1965. 4. B. Razavi, Design of Analog CMOS Integrated Circuits, New York: McGraw Hill, 2001. 5. 1. F. Wakerly, Digital Design Principles and Practices, 4th ed., Pearson Prentice Hall, 2006. 6. T. S. Hall, Field-Programmable Analog Arrays: A Floating Gate Approach Ph.D. dissertation report, School of Electrical and Computer Engineering, Georgia Institute of Technology, USA, 2004. 7. R. V. L. Hartley, A more symmetrical Fourier analysis applied to transmission problems, Proc. IRE, vol. 30, pp. 144-150, Mar. 1942. 8. R. N. Bracewell, Discrete Hartley transform, J. Opt. Soc. Am., vol. 73, no. 12, pp. 1832-1835, Dec. 1983. 9. R. N. Bracewell, The fast Hartley transform, Proc. IEEE, vol. 72, no. 8, pp. 1010-1018, Aug. 1984. 10. R. N. Bracewell, The Hartley transform, New York: Oxford University Press, 1986. 11. C. H. Paik and M. D. Fox, Fast Hartley transform for image processing, IEEE Trans. Medical Imajn2, vol. 7, no. 2, pp. 149-153, June 1988. 12. H. J. Meckelburg and D. Lipka, Fast Hartley transform algorithm, Electronics Letters, vol. 21, no. 8, pp. 311-313, Apr. 1985. 13. H. V. Sorensen, D. L. Jones, C. S. Burrus and M. T. Heideman, On computing the discrete Hartley transform, IEEE Trans. Acoustics, Speech, and Signal Processing, vol. ASSP-33, no. 4, pp. 1231-1238, Oct. 1985. 14. J. Prado, Comments on The fast Hartley transform, Proc. IEEE, vol. 73, no. 12, pp. 1862-1863, Dec. 1985. 13