An Ultra Low-Power WOLA Filterbank Implementation in Deep Submicron Technology

Similar documents
A framework for automatic generation of audio processing applications on a dual-core system

Convention Paper 6202 Presented at the 117th Convention 2004 October San Francisco, CA, USA

Fatima Michael College of Engineering & Technology

Audio-coding standards

Audio-coding standards

RECENTLY, researches on gigabit wireless personal area

14th European Signal Processing Conference (EUSIPCO 2006), Florence, Italy, September 4-8, 2006, copyright by EURASIP

Mpeg 1 layer 3 (mp3) general overview

An efficient multiplierless approximation of the fast Fourier transform using sum-of-powers-of-two (SOPOT) coefficients

REAL TIME DIGITAL SIGNAL PROCESSING

Module 9 AUDIO CODING. Version 2 ECE IIT, Kharagpur

AN FFT PROCESSOR BASED ON 16-POINT MODULE

REAL-TIME DIGITAL SIGNAL PROCESSING

Appendix 4. Audio coding algorithms

Flexible wireless communication architectures

Design of Feature Extraction Circuit for Speech Recognition Applications

DSP. Presented to the IEEE Central Texas Consultants Network by Sergio Liberman

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (

D Demonstration of disturbance recording functions for PQ monitoring

General Purpose Signal Processors

Chapter 1 Introduction

Modeling of an MPEG Audio Layer-3 Encoder in Ptolemy

Chapter 14 MPEG Audio Compression

High performance, power-efficient DSPs based on the TI C64x

Lecture 16 Perceptual Audio Coding

DESCRIPTION FEATURES MULTISTREAM PCI SOUND CARDS 26 DECEMBER 2007 ASI6514, ASI6518

FFT/IFFTProcessor IP Core Datasheet

Optical Storage Technology. MPEG Data Compression

VLSI Signal Processing

Efficient Implementation of Transform Based Audio Coders using SIMD Paradigm and Multifunction Computations

Current and Projected Digital Complexity of DMT VDSL

EE482: Digital Signal Processing Applications

Experiment 3. Getting Start with Simulink

REAL TIME DIGITAL SIGNAL PROCESSING

For Mac and iphone. James McCartney Core Audio Engineer. Eric Allamanche Core Audio Engineer

AND8384/D. Using Block Floating-Point in the WOLA Filterbank Coprocessor APPLICATION NOTE

Performance Analysis of Line Echo Cancellation Implementation Using TMS320C6201

LOW COMPLEXITY SUBBAND ANALYSIS USING QUADRATURE MIRROR FILTERS

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

VHDL Implementation of Multiplierless, High Performance DWT Filter Bank

Filterbanks and transforms

FPGA Implementation of Multiplierless 2D DWT Architecture for Image Compression

DESIGN AND DEVELOPMENT OF A MULTIRATE FILTERS IN SOFTWARE DEFINED RADIO ENVIRONMENT

Modeling and implementation of dsp fpga solutions

Low-Power SRAM and ROM Memories

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

The theory and design of a class of perfect reconstruction modified DFT filter banks with IIR filters

Pyramid Coding and Subband Coding

Low-Power Split-Radix FFT Processors Using Radix-2 Butterfly Units

Classification of Semiconductor LSI

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

ECE 747 Digital Signal Processing Architecture. DSP Implementation Architectures

THE orthogonal frequency-division multiplex (OFDM)

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

FAST FOURIER TRANSFORM (FFT) and inverse fast

Xilinx DSP. High Performance Signal Processing. January 1998

Digital Signal Processing with Field Programmable Gate Arrays

DESIGN METHODOLOGY. 5.1 General

Implementation of FFT Processor using Urdhva Tiryakbhyam Sutra of Vedic Mathematics

Image Compression System on an FPGA

FFT/IFFT Block Floating Point Scaling

Evaluating MMX Technology Using DSP and Multimedia Applications

DSP Co-Processing in FPGAs: Embedding High-Performance, Low-Cost DSP Functions

High Performance Mixed-Signal Solutions from Aeroflex

DSP-CIS. Part-IV : Filter Banks & Subband Systems. Chapter-10 : Filter Bank Preliminaries. Marc Moonen

DSP VLSI Design. Instruction Set. Byungin Moon. Yonsei University

Contents. 3 Vector Quantization The VQ Advantage Formulation Optimality Conditions... 48

Pyramid Coding and Subband Coding

REDUCED FLOATING POINT FOR MPEG1/2 LAYER III DECODING. Mikael Olausson, Andreas Ehliar, Johan Eilert and Dake liu

In this article, we present and analyze

Embedded Target for TI C6000 DSP 2.0 Release Notes

FPGA Implementation of 16-Point Radix-4 Complex FFT Core Using NEDA

REAL TIME DIGITAL SIGNAL PROCESSING

Application of Linux Audio in Hearing Aid Research

FPGA Based FIR Filter using Parallel Pipelined Structure

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

VLSI Implementation of Low Power Area Efficient FIR Digital Filter Structures Shaila Khan 1 Uma Sharma 2

The VHDL Based Design of the MIDA MPEG1 Audio Decoder

Parallel FIR Filters. Chapter 5

Indian Silicon Technologies 2013

CompuScope Ultra-fast waveform digitizer card for PCI bus. APPLICATIONS. We offer the widest range of

Design of a Multiplier Architecture Based on LUT and VHBCSE Algorithm For FIR Filter

A Parallel Distributed Arithmetic Implementation of the Discrete Wavelet Transform

Digital Signal Processor 2010/1/4

Design of Adaptive Filters Using Least P th Norm Algorithm

LOW-POWER SPLIT-RADIX FFT PROCESSORS

PLATINUM BY MSB TECHNOLOGY

Key words: B- Spline filters, filter banks, sub band coding, Pre processing, Image Averaging IJSER

Audio Coding and MP3

An adaptive wavelet-based approach for perceptual low bit rate audio coding attending to entropy-type criteria

Implementation of G.729E Speech Coding Algorithm based on TMS320VC5416 YANG Xiaojin 1, a, PAN Jinjin 2,b

The Design of Nonuniform Lapped Transforms

SCALABLE IMPLEMENTATION SCHEME FOR MULTIRATE FIR FILTERS AND ITS APPLICATION IN EFFICIENT DESIGN OF SUBBAND FILTER BANKS

Modified Welch Power Spectral Density Computation with Fast Fourier Transform

Low Power and Memory Efficient FFT Architecture Using Modified CORDIC Algorithm

Introduction to Field Programmable Gate Arrays

DUE to the high computational complexity and real-time

A scalable, fixed-shuffling, parallel FFT butterfly processing architecture for SDR environment

Exercises in DSP Design 2016 & Exam from Exam from

An FPGA-Based Parallel Distributed Arithmetic Implementation of the 1-D Discrete Wavelet Transform

Transcription:

An Ultra ow-power WOA Filterbank Implementation in Deep Submicron Technology R. Brennan, T. Schneider Dspfactory td 611 Kumpf Drive, Unit 2 Waterloo, Ontario, Canada N2V 1K8 Abstract The availability of deep Submicron technology opens the door to advanced algorithms specifically targeted for ultra low-power portable applications like hearing aids. These applications are extremely constrained by small physical size and extremely low power consumption requirements. To achieve these strict requirements, every component must be justified. The Weighted Overlap-Add (WOA) filterbank discussed here is an important component that meets these difficult requirements. P. Balsiger, Ch. Calame, A. Drollinger,. Grisoni, A. Heubi, F. Pellandini, D. Sun, Ch. Waelchli, Institute of Microtechnology University of Neuchâtel Rue A.-. Breguet 2, 2 Neuchâtel, Switzerland Introduction Many signal processing algorithms can be cast into a filtering (frequency-domain) framework. These include dynamic range compression, noise reduction, sub-band coding and directional processing, voice activity detection and echo cancellation. For these types of real-time audio signal processing applications, the filtering requirements are strict: i) low group delay, ii) high degree of adjustability, iii) high fidelity. A frequency domain approach is an efficient method of meeting these constraints while delivering low power and flexibility. This paper describes a WOA filterbank that utilizes a novel architecture and processing scheme. A key figure of merit in these applications is the energy required for a given amount of processing. Typically this is expressed in µj for a Fast Fourier Transform (FFT) operation. The WOA filterbank is tiny. On a.35µ process, memory (2 kwords), occupies 4.4 mm 2 and logic is 2.2 mm 2. ess than 3 kgates are used for the entire coprocessor. Typical power consumption at 1 Volt is 3 µw for a WOA coprocessor in a typical 16-channel configuration. The chip is typically clocked at 1.28 MHz. Basic Concept The WOA design provides a highly flexible time-frequency representation amenable to sub-band adaptive, sub-coding and other similar applications [1], [2], [4]. The coprocessor easily interfaces with any standard DSP processor, AD- and DA-converters. The co-processor has two main sub-blocks (Figure 1): the WOA and the Input/Output processor (IOP). Input samples are stored in a circular input FIFO. Every R (input block size) samples a WOA analysis transformation is performed on samples ( >> R). A general purpose DSP (the control DSP ) is used to analyze the spectrum and to apply, via the shared RAM, gains for each frequency band. Then, the WOA coprocessor performs a WOA synthesis transformation and stores the results in the output FIFO. The IOP is responsible for interpolating (decimating) the outgoing (incoming) samples.

The WOA can also operate in stereo mode. In this mode, the WOA processes two simultaneous data streams. Following analysis, the control DSP performs a final butterfly separation step, applies gains to the separated stereo channels and then mixes both channels. The mixed frequency-domain signals are then returned to the time domain via a WOA synthesis transformation. Stereo supports the implementation of phasedependent algorithms including direction of arrival estimation, echo cancellation and sub-band beamforming. Analog Front-End I/O serial interface pre-/postfilter unit IOP (I/O processing) Coprocessor I/O FIFO controller filter bank processor WOA filter bank (WOA processor) Block-Floating-Point (BFP) arithmetic, excellent performance is obtained. Analyze Window 1 1 Input FIFO Analyzer Gain application Synthesizer real to complex packing complex to real packing 1 N 1 Synthesis Window 1 N 1 N/2 point FFT N/2 point ifft N N R-zero feed 1 1 Input -R RAM access direct control access General purpose DSP I/O FIFO, MemT, Mem1 shared RAM RAM access 1 R Output Figure 2. Simplified block diagram of a WOA filter bank Figure 1. Overview of the co-processor s environment Theoretical aspect of the WOA filterbank Over the last two decades, multi-rate digital signal processing techniques have been considerably developed and widely practiced in various engineering disciplines. The conditions to obtain perfect reconstruction (PR) maximally decimated (or critically sampled) filter banks have been extensively investigated and well-documented [5]. PR systems impose severe constraints that are not suitable in some applications. For applications requiring significant adjustment in the frequency bands, other structures are preferable [2]. The WOA structure can meet these design constraints [3]. Furthermore, as will be shown below, when implemented using WOA structure Figure 2 shows a simplified block diagram of an oversampled WOA filter bank [2], [3]. The input step size (R) is the FFT size (N) divided by the oversampling ratio (OS). The use of oversampling provides two benefits (i) the gain of the filterbank bands can be adjusted over a wide range without the introduction of audible aliasing and (ii) a group delay versus power consumption trade-off can be made. In operation, the input FIFO is shifted and R new samples are stored. The input FIFO is then windowed with a prototype low pass filter of length. The resulting vector is added modulo N (i.e., folded ) and the FFT of the resulting windowed time segment is computed. Because an FFT is used, the outputs from the analysis filterbank provide

both magnitude and phase information (i.e., they are complex). To generate a modified time-domain signal, the channel gains are applied to the N/2 FFT outputs (channel signals) and an inverse FFT is computed. The resulting time-domain slice is then windowed with a synthesis window and accumulated into the output FIFO. This generates R samples that are shifted out of the output FIFO. Finally, R zeros are shifted into the output FIFO and the entire process repeats for the next block of R input samples. Equally spaced bands can be generated by using an Odd FFT and a square-wave modulator on the time-domain input and output signals. For stereo processing, two mono signals are interleaved and processed as a complex signal. After analysis the complex result is separated to get the individual mono signal spectra. Block floating point arithmetic Block-floating-point (BFP) computation units are used to increase the dynamic range and reduce the quantization error in order to improve the SNR of the WOA filterbank. The BFP strategy decreases the quantization error without increasing the computation complexity. This is achieved by dividing data into non-overlapped groups (passes) and formatting the data at each node in data flow path with common exponent. Implementation of the WOA The implementation of the WOA coprocessor was guided by three primary constraints: (a) minimal size (both gate count and silicon area) (b) minimal power consumption (c) flexibility; we required a WOA filterbank design that supported programmable configurations. The Input/Output Processor (IOP) The input and output FIFO's are realized as circular buffers. The input FIFO has two data domains: the WOA-processing domain that is used by the WOA processor and the write-in domain that stores the new samples. The output FIFO has also two data domains: the WOA-processing domain and the read-out domain (which contains the data that are ready to send out). The IOP contains an interface module and pre-/post- processing unit for interpolating and decimation filtering. A pole-zero pair is used as a DC removal filter. To save area and power, the number of arithmetic units is limited to one MAC unit, one adder, one shifter and one rounding unit. These resources are shared between the DC-removal, the decimation and the interpolation filters. The WOA Control System A simple, yet very flexible controller commands the WOA processor. WOA processing is divided into different passes. The characteristics of a pass are i) all operations are the same (radix2, radix4, etc) and ii) every read and write address for the data and coefficients can be generated by a bitreverse addressing unit. Thus, each pass can employ a fixed configuration. Data-path The data-path consists of a multiplier array and an AU bank (Figure 3). Coeff Data1 Data Ctrl. BFP processo First-stage processing (Multiplier array) Datapath controller Second-stage processing (AU bank) BFP processor Result Figure 3. Block diagram of the data-path.

This structure achieves an efficient dataflow. N-point FFT processing is achieved by computing a specific number of radix4 or radix2 transforms. Results The noise floor and THD+N measurements of the coprocessor s data-path units show that the noise floor is approximately 115 db. A highly selective filterbank configuration (16-channels with 14 ms group delay) provides about 65 db THD+N. Note that the realized THD+N is dependent on the selected WOA parameters (FFT size, oversample factor and window length) as well as the window coefficients. A large number of combinations that trade-off fidelity (THD+N) versus power consumption versus group delay is possible. The time and cycles used to perform WOA analysis and synthesis is strongly configuration dependent. Table 1 shows the cycles required for analysis and synthesis and also gives the ratio between the used time and the time that is available if a system clock of 1.28 MHz and a sampling frequency of 16 khz are used. N OF #cycles (analysis + synthesis) t used /t available (analysis + synthesis) 16 128 2 192 38% 32 256 2 258 26% 32 128 2 328 33% 32 128 2 289 29% 128 128 4 788 39% N : FFT size (complex) : prototype filter length (window size) OF : oversampling factor (R=N/OF) Table 1. Computation cycle and time for different configurations A first implementation was done on a.35 µm CMOS five-metal layer technology. The total die size (including pads) is 7.86 mm 2. A second implementation has been done on a.18 µm CMOS four-layer metal technology. The measured power consumption is.4 mw @ 1.7 volt supply voltage and less than 25 µw @ 1 volt for a typical algorithm. Figure 4 compares the co-processor performance (in terms of execution time in microseconds at an equivalent chip clock frequency of 1 MHz) to standard DSP processors. Note that the BDSP9124 and the DSP- 24 are dedicated FFT DSPs, which use significantly more resources than the WOA coprocessor described here does. Execution time [us] (@ 1 MHz) Consumption [uj] 7 6 5 4 3 2 1 827 21 9 323 428 5988 WOA DSP-24 ADSP-2116 BDSP9124 TMS32C541 DSP562 Figure 4. Execution time in µsec for124 complex point FFT sizes @ 1 MHz. ADSP- 2116 (Analog Devices), BDSP9124 (Butterfly DSP), WOA (IMT, Dspfactory), DSP-24 (DSP Architectures) and C6x and C5 (Texas Instruments) 8 7 6 5 4 3 2 1 15.84 65.84 18 323 494 673.65 WOA DSP-24 ADSP-2116 BDSP9124 TMS32C541 DSP562 Figure 5. Effective power consumption for one 124 complex point FFT. Power supply: WOA 1.7V, DSP-24 3.3V, ADSP-2116 5V, BDSP9-124 5V, TMS32C541 5V, DSP562 5V Because the maximal FFT size of the WOA co-processor is limited to 128 complex points (256 real points) its 124 point complex-fft execution time is an interpolated value.

The most important value for lowest power applications is the effective power consumption used to perform an FFT. Figure 5 shows the effective power consumption for different processors. The processors from Analog Devices, Butterfly DSP and DSP Architectures are focussed on high throughput rate and not on low-power implementations. Figure 5 clearly shows that the WOA offers roughly 4.5 times the performance when power consumption is considered. Conclusions This paper presents an efficient design and the results for a novel real-time WOA filterbank. The co-processor has been implemented in two deep sub-micron technologies. The entire co-processor requires approximately 3 kgates. The WOA filterbank is ideal for use in a wide range of frequency domain processing applications as mentioned in the introduction. Other applications include personal listening devices requiring head-related transforms and other similar algorithms, which can be implemented in the frequency domain. The WOA is especially suited to applications that require low delay (< 1 ms). In wireless applications the WOA can be used to perform simultaneous high-fidelity equalization, noise reduction and AGC (dynamic range compression). Using the stereo processing mode, the WOA can be used to implement two-microphone, adaptive frequency domain noise cancellation for use in handsets. The stereo processing mode is also well suited for frequency domain echo cancellation. The large amount of signal separation between the WOA output channels offers a high-degree of orthogonality between all channels. This greatly speeds the convergence of adaptive algorithms and makes for very efficient subband coders and decoders. The low delay is also attractive for many subband-coding applications. Gain (db) Gain (db) 2 4 6 Odd 1 2 3 4 5 6 7 8 Frequency (Hz) Even 2 4 6 1 2 3 4 5 6 7 8 Frequency (Hz) Figure 6. Channel frequency responses for 16- channel filterbank in even and odd stacking with 14-ms group delay The combination of a general-purpose (control) DSP and the WOA provides an ideal flexibility versus power consumption tradeoff. References [1] R. Brennan & T. Schneider, Filterbank Structure and Method for Filtering and Separating an Information Signal into Different Bands, Particularly for Audio Signals in Hearing Aids, PCT Patent Publication WO9847313A21, October 22, 1998. [2] R. Brennan & T. Schneider A Flexible Filterbank Structure for Extensive Signal Manipulations in Digital Hearing Aids, Proc. ISCAS- 98, Monterey, CA. [3] R. E. Crochiere &. R. Rabiner, Multirate Digital Signal Processing, Prentice-Hall, 1983. [4] T. Schneider & R. Brennan A Multichannel Compression Scheme for a Digital Hearing Aid, Proc. ICASSP-97, Munich, Germany. [5] P. P. Vaidyanathan, Multirate Systems And Filter Banks, Prentice-Hall, 1993.