Adaptive Combination of Stochastic Gradient Filters with different Cost Functions

Similar documents
Convex combination of adaptive filters for a variable tap-length LMS algorithm

Adaptive Filtering using Steepest Descent and LMS Algorithm

Performance Analysis of Adaptive Filtering Algorithms for System Identification

Performance of Error Normalized Step Size LMS and NLMS Algorithms: A Comparative Study

Steady-State MSE Performance Analysis of Mixture Approaches to Adaptive Filtering

Multichannel Affine and Fast Affine Projection Algorithms for Active Noise Control and Acoustic Equalization Systems

Effects of Weight Approximation Methods on Performance of Digital Beamforming Using Least Mean Squares Algorithm

The Pennsylvania State University. The Graduate School. Department of Electrical Engineering

Two are Better Than One: Adaptive Sparse System Identification using Affine Combination of Two Sparse Adaptive Filters

Design of Multichannel AP-DCD Algorithm using Matlab

Noise Canceller Using a New Modified Adaptive Step Size LMS Algorithm

Numerical Robustness. The implementation of adaptive filtering algorithms on a digital computer, which inevitably operates using finite word-lengths,

Adaptive Signal Processing in Time Domain

Design and Research of Adaptive Filter Based on LabVIEW

A Formal Approach to Score Normalization for Meta-search

Advanced Digital Signal Processing Adaptive Linear Prediction Filter (Using The RLS Algorithm)

Advance Convergence Characteristic Based on Recycling Buffer Structure in Adaptive Transversal Filter

Computational Complexity of Adaptive Algorithms in Echo Cancellation Mrs. A.P.Patil 1, Dr.Mrs.M.R.Patil 2 12

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

Hardware Implementation for the Echo Canceller System based Subband Technique using TMS320C6713 DSP Kit

Design of Navel Adaptive TDBLMS-based Wiener Parallel to TDBLMS Algorithm for Image Noise Cancellation

Chapter 2 Basic Structure of High-Dimensional Spaces

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

A Neural Network for Real-Time Signal Processing

AN ALGORITHM FOR BLIND RESTORATION OF BLURRED AND NOISY IMAGES

Segmentation and Tracking of Partial Planar Templates

COPY RIGHT. To Secure Your Paper As Per UGC Guidelines We Are Providing A Electronic Bar Code

Optimized Variable Step Size Normalized LMS Adaptive Algorithm for Echo Cancellation

Neural Networks Based Time-Delay Estimation using DCT Coefficients

THE KALMAN FILTER IN ACTIVE NOISE CONTROL

Classification: Linear Discriminant Functions

I How does the formulation (5) serve the purpose of the composite parameterization

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

TRACKING PERFORMANCE OF THE MMAX CONJUGATE GRADIENT ALGORITHM. Bei Xie and Tamal Bose

Using. Adaptive. Fourth. Department of Graduate Tohoku University Sendai, Japan jp. the. is adopting. was proposed in. and

A Non-Iterative Approach to Frequency Estimation of a Complex Exponential in Noise by Interpolation of Fourier Coefficients

Theoretical Concepts of Machine Learning

A Ripple Carry Adder based Low Power Architecture of LMS Adaptive Filter

A Fast Approximated k Median Algorithm

Feature Selection. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 262

VIDEO DENOISING BASED ON ADAPTIVE TEMPORAL AVERAGING

Energy Conservation of Sensor Nodes using LMS based Prediction Model

Unbiased Model Combinations for Adaptive Filtering

Advanced phase retrieval: maximum likelihood technique with sparse regularization of phase and amplitude

Accelerating the convergence speed of neural networks learning methods using least squares


A General Greedy Approximation Algorithm with Applications

SPARSE COMPONENT ANALYSIS FOR BLIND SOURCE SEPARATION WITH LESS SENSORS THAN SOURCES. Yuanqing Li, Andrzej Cichocki and Shun-ichi Amari

Multichannel Recursive-Least-Squares Algorithms and Fast-Transversal-Filter Algorithms for Active Noise Control and Sound Reproduction Systems

Local Image Registration: An Adaptive Filtering Framework

Performance Analysis of Adaptive Beamforming Algorithms for Smart Antennas

Adaptive Filter Analysis for System Identification Using Various Adaptive Algorithms Ms. Kinjal Rasadia, Dr. Kiran Parmar

Mixture Models and EM

A PMU-Based Three-Step Controlled Separation with Transient Stability Considerations

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Linear Regression & Gradient Descent

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

ISSN (Online), Volume 1, Special Issue 2(ICITET 15), March 2015 International Journal of Innovative Trends and Emerging Technologies

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

System Identification

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery

Computationally highly efficient mixture of adaptive filters

Towards Spatially Universal Adaptive Diffusion Networks

Texture Image Segmentation using FCM

Digital Image Processing. Prof. P. K. Biswas. Department of Electronic & Electrical Communication Engineering

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

Digital Image Processing Laboratory: MAP Image Restoration

Learning Adaptive Parameters with Restricted Genetic Optimization Method

10703 Deep Reinforcement Learning and Control

Lecture: Simulation. of Manufacturing Systems. Sivakumar AI. Simulation. SMA6304 M2 ---Factory Planning and scheduling. Simulation - A Predictive Tool

High-Dimensional Incremental Divisive Clustering under Population Drift

A new way to optimize LDPC code in gaussian multiple access channel

Chapter 1. Introduction

A Weighted Least Squares PET Image Reconstruction Method Using Iterative Coordinate Descent Algorithms

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

2 Computation with Floating-Point Numbers

Getting to Know Your Data

REGLERTEKNIK AUTOMATIC CONTROL LINKÖPING

Fixed Point LMS Adaptive Filter with Low Adaptation Delay

A Thermal Perceptron Learning Rule

Dynamic Analysis of Structures Using Neural Networks

2. Linear Regression and Gradient Descent

Corner Detection. Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology

Distributed STDMA in Ad Hoc Networks

DESIGN OF EXPERIMENTS and ROBUST DESIGN

EE795: Computer Vision and Intelligent Systems

Computer Experiments. Designs

CHAPTER 6 QUANTITATIVE PERFORMANCE ANALYSIS OF THE PROPOSED COLOR TEXTURE SEGMENTATION ALGORITHMS

ELEC Dr Reji Mathew Electrical Engineering UNSW

Robustness of Non-Exact Multi-Channel Equalization in Reverberant Environments

Optimum Array Processing

Convexization in Markov Chain Monte Carlo

Dynamic Thresholding for Image Analysis

DESIGN OF NORMALIZED SUBBAND ADAPTIVE FILTER FOR ACOUSTIC ECHO CANCELLATION

Optimization Methods for Machine Learning (OMML)

Logistic Regression. Abstract

Image Segmentation using Gaussian Mixture Models

2 Computation with Floating-Point Numbers

Chapter 4. Adaptive Self-tuning : A Neural Network approach. 4.1 Introduction

Transcription:

Adaptive Combination of Stochastic Gradient Filters with different Cost Functions Jerónimo Arenas-García and Aníbal R. Figueiras-Vidal Department of Signal Theory and Communications Universidad Carlos III de Madrid Avda. Universidad 30, 28911 Leganés (Madrid), SPAIN. jarenas,arfv}@tsc.uc3m.es ABSTRACT Stochastic gradient schemes are very useful for adaptive filtering applications due to their simplicity, robustness and low computational requirements. However, the performance of these filters depends on the cost function they employ and the scenario in which they are applied. In this paper we propose to use an adaptive convex combination of one LMS and one LMF filters to put together the best properties of their associated norms. The resulting filter is shown to outperform its two components in a time-varying plant identification scenario when both Gaussian and sub-gaussian additive noise are present. 1 INTRODUCTION Adaptive filters are used to extract information from noisy signals in varying environments and are of paramount importance in a wide variety of practical situations, such as equalization of communication channels, echo cancellation applications, predictive speech coding, sensor arrays, and radar detection, among many others. Among the different techniques for adaptive filtering, stochastic gradient algorithms are usually employed due to their conceptual simplicity, robustness and low computational requirements. The Least Mean Square (LMS) algorithm (Widrow and Hoff, 1960) by far the most popular stochastic gradient filter attempts to minimize the expected quadratic error. However, it was shown in (Walach and Widrow, 1984) that the Least Mean Fourth (LMF) filter, which employs the gradient of the fourth power of the error, could achieve a lower quadratic error than LMS itself when sub-gaussian additive noise is present. If known, the log-likelihood of the noise term (including also model inaccuracies) would be the preferred choice for the cost function in a stationary case. As this is not the general case, it is possible to make the design more robust by using a mixture of quadratic and fourth order norms, as it has already been proposed, for instance, in (Chambers et al., 1994; Pazaitis and Constantinides, 1995; Lim and Harris, 1997; Rusu et al., 1998). However, all these methods share the disadvantage that their associated cost functions (which can even vary along time) are not adjusted to obtain a minimum quadratic error, so there are not guarantees that the resulting filter puts together the best properties of each mixed norm.

In this paper we propose an alternative approach in order to simultaneously obtain the advantages of the second and fourth powers of the error norms, consisting on, instead of using a mixed norm, employing a convex combination of one LMS and one LMF filters operating independently from each other. The combination of both filters is optimized, at each iteration, to minimize the quadratic error of the overall filter. Unlike the mixed-norm approaches, in our algorithm the cost functions minimized by both component filters and their combination are known and do not vary with time. An additional advantage of the combination approach is that the LMS filter can be used to improve the stability of the LMF filter, thus permitting to use higher step-sizes for this second filter. Empirical evidence on the advantages of our approach will be given in the paper through a sample of our extensive simulation work, considering different kinds of noise in a system identification setup. 2 STOCHASTIC GRADIENT ADAPTIVE FILTERING Adaptive transversal schemes obtain their output as the weighted sum y(n) =w T (n)u(n), where w(n) are the M filter weights at time n and u(n) is the filter input. To measure filter performance, it is customary to use the quadratic error e 2 (n) =(d(n) y(n)) 2, d(n) being usually referred to as the desired signal. In many applications the desired signal is modeled as d(n) =w T 0u(n)+e 0 (n), where w 0 is the optimal solution, and e 0 (n), uncorrelated with u(n), stands both to consider additive noise and model inaccuracies. Stochastic gradient filters comprise a great variety of techniques, all of them minimizing an associated cost function (not necessarily quadratic error) by means of local updates in the direction given by instantaneous approximations of the gradient of the cost function with respect to the weights of the filter. For the Least Mean Square (LMS) case, this results in the well-known rule: w(n +1)=w(n)+2μe(n)u(n) (1) where the step-size μ is a positive constant which controls the magnitude of the adaption at each iteration. LMS can be considered as a member of a more general family of Lk norm adaptive filters which we will review next. 2.1 Lk Norm Filters In (Walach and Widrow, 1984) the family of stochastic gradient filters whose associated cost functions are even powers of the incurred error was introduced. Later, the convergence properties of a more general family of filters with norm Lk (k being anyinteger), was studied in detail in (Figueiras-Vidal et al., 1987). The k-th component in the Lk norm family employs a gradient descent scheme over the surface of the k-th power of the absolute error, using the instantaneous approximation ^rwefje(n)j k g = rw(n)je(n)j k, which leads to: w(n +1)=w(n) μrwje(n)j k =w(n)+kμe(n)je(n)j k 2 u(n) (2)

In (Walach and Widrow, 1984; Figueiras-Vidal et al., 1987) the properties of the above family of filters have been analyzed, obtaining the following relevant conclusions: ffl Stability for all values of the step-size is only guaranteed for the Least Absolute Deviation filter (k =1). When using higher powers of the error it is necessary to impose a limit (more restrictive as k increases) on the maximum value for μ. For k =2this bound depends only on the length of the filter and the power of the input signal. However, for k > 2 (LMS) the properties of e 0 (n) and the initial conditions should also be taken into account, so stability in varying scenarios is only guaranteed if using a small step-size, what limits the tracking capabilities of the filter. ffl The value of k which minimizes the quadratic error incurred by the filter (for a fixed convergence rate) depends on the probability distribution of e 0 (n): for Gaussian distributions it is the LMS which performs best; however, if e 0 (n) follows a sub-gaussian 1, distribution better behavior can be obtained from k>2. 2.2 Mixed-norm Adaptive Filters In the adaptive filtering literature there are several proposals that try to put together the advantages of the different norms by using a combination of them. For instance, the Least Mean Mixed-Norm (LMMN) filter (Chambers et al., 1994) was proposed as a way to exploit the best properties of the LMS and Least Mean Fourth (LMF, k =4) algorithms, by using a combination of their cost functions: L(n) = Efe 2 (n)g +(1 )Efe 4 (n)g (3) where is a mixing parameter between 0 and 1 that must be selected a priori. Similar mixed-norm approaches have also been proposed in (Pazaitis and Constantinides, 1995; Lim and Harris, 1997; Rusu et al., 1998), where different schemes for adapting (n) are used. Although these algorithms were originally proposed with the aim of improving the convergence of the LMS filter, it seems that no agreement exists about how to build and adapt the mixed-norm cost function. For instance, (Pazaitis and Constantinides, 1995; Rusu et al., 1998) argue the convenience of using the fourth power of the error whenever je(n)j > 1 and LMS if this condition is not satisfied. On the contrary, for other authors (Lim and Harris, 1997; Sayed, 2003) the high instability of LMF suggests the use of the opposite criterion. 3 ADAPTIVE CONVEX COMBINATION OF LMS AND LMF FILTERS As we have already discussed, a case where LMF achieves a systematic gain over LMS performance is that in which e 0 (n) presents a sub-gaussian distribution. In this paper we extend the ideas of (Arenas-García et al., 2003), using an adaptive convex combination of one LMS and one LMF filters, as a way to obtain the best possible behavior, independently 1 A random variable s is said to present a sub-gaussian distribution when its kurtosis,»(s) = Ef(s Efsg) 4 g=e 2 f(s Efsg) 2 g is below 3.

of the distribution of e 0 (n). Thus, the output of the combination is given by y(n) = (n)ylms(n) +(1 (n))ylmf (n) (4) 1 +e a(n) 1, where (n) is a mixing parameter which is defined as (n) =sigm(a(n)) = so it remains between 0 and 1 and the combination keeps its original sense. Both component filters are independently adapted using their own rules and errors. Regarding mixing parameter a(n), it is LMS adapted in order to minimize the quadratic error of the combination: @e 2 (n) a(n +1)=a(n) μa @a(n) = a(n)+2μae(n)(ylms(n) ylmf (n)) (n)(1 (n)) (5) where e(n) is the error of the resulting filter, and μa must be very high to allow a very fast adaptation of the combination. The use of the sigmoidal activation function for (n) serves to stabilize the behavior of the mixture when (n) is close to 0 or 1. In any case, a momentum term can be added to the above learning rule to improve its performance: @e 2 (n) a(n +1)=a(n) μa + ρ(a(n) a(n 1)) (6) @a(n) In the light of (5), it is also obvious that the adaption of (n) could stop whenever (n) gets too close to 0 or 1. To avoid this, we simply restrict the values of a(n) to lie inside an interval [ a + ;a + ], which makes (n) 2 [1 + ; + ], where + = sigm(a + ), so a minimum level of adaption is always guaranteed. The functioning of the above scheme can easily be explained in an intuitive way: if the additive noise presents a Gaussian (or super-gaussian) distribution, it is the LMS filter which attains a lower quadratic error, and application of (5) will make (n) go to 1. On the contrary, under sub-gaussian noise it is the LMF filter which achieves a lower quadratic error, making (n)! 0. In summary, the combination tends to perform, at each iteration, as well as the best component filter. Note that, unlike the mixed-norm filters described in Subsection 2.2, the cost functions minimized by the component filters and their combination are explicitly known, and remain fixed during all the iterations. Furthermore, since LMS is used to optimize the combination, the overall criterion for the filter is always the quadratic error. Stability Control Mechanism for the LMF Component As we have already discussed, the main difficulty of the LMF filter is that its stability could be seriously affected when the scenario in which the filter is being applied abruptly changes. (Walach and Widrow, 1984) suggested to use an LMS filter in parallel, restarting the LMF with the LMS weights when it gets unstable. Obviously, the application of this procedure is straightforward when using the combination approach. To detect LMF instability we will jelmf (n)j simply use the condition je > LMS(n)j 100; wehavechecked that the concrete value used for the threshold does not have avery important influence on the behavior of the combination, as long as a high value is used.

D(n) (db) 8 2 4 10 16 LMS Combination LMF Component 22 28 0 2500 5000 7500 10000 n Figure 1: Plant identification error of a combination of LMS and LMF filters (μlms =0:005, μlmf =0:0007, μa =5and ρ =0:5) in a scenario with Gaussian, uniform and bipolar noise. The performance of LMS and (stabilized) LMF have also been displayed. 4 EXPERIMENTAL RESULTS The performance of the proposed combination scheme will be tested in a plant identification setup. The real system is a transversal filter of length M =9and weights proportional to [1; 2; 3; 4; 5; 6; 7; 8; 9], such that kw 0 k 2 0 =1. Atn= 5000 these weights are circularly shifted to the right by 5 positions, and their values multiplied by p 10, in order to simulate an abrupt change in the plant. The input vector, u(n), is taken from a process of independent and identically distributed (i.i.d) Gaussian samples, with zero mean and variance ffu 2 =1, as u T (n) =[u(n); ;u(n M+ 1)]. e 0 (n) is also i.i.d., zero mean and has variance ff0 2 =1, but its distribution changes with time: during the first 3000 iterations it presents a Gaussian distribution, while a sub-gaussian uniform distribution is used for 3000 <n»7000. After n = 7000, e 0 (n) is a bipolar signal (i.e., e 0 (n) 2f 1;+1g). The step-sizes of the component filters have been selected to get similar convergence rates from the LMS and LMF filters (μlms = 0:005 and μlmf = 0:0007, respectively), while μa =5and ρ =0:5are used to adapt the mixing parameter. The initial values for the weights and mixing parameters are w 1 (0) = w 2 (0) = 0 and a(0)=0( (0) = 0:5). The performance of the adaptive filters will be measured by using an estimation of their Mean Square Deviations (MSD) D (i) (n) =Efkw (i) (n) w 0 k 2 2 g (7) obtained as the average of kw (i) (n) w 0 k 2 2 over 1000 independent runs. In Figure 1 we have represented the MSDs achieved by the LMS filter, the stabilized LMF (LMF component of the combined scheme), and by their adaptive convex combination. We can appreciate how LMS gets the lowest misadjustment during n» 3000. On the contrary, LMF gets a better behavior (both in terms on convergence rate and final misadjustment) after n = 3000, with even a clearer advantage during the 3000 last iterations, when e 0 (n)

has a bipolar distribution (the most sub-gaussian of all employed). It is important to say that, if no stability control mechanism were used, the LMF filter would become unstable after n = 5000, due to the high error provoked by the abrupt change in the plant. Finally, we can check that the LMS-LMF combination performs as expected: the combination methodology allows us to get, at each moment, the behavior of the component which is incurring in a lower quadratic error, with independence of its associated cost function. 5 CONCLUSIONS In this paper we have proposed to use an adaptive convex combination of one LMS and one LMF filters as a way to build adaptive schemes that are able to operate well in situations in which the properties of the additive noise are not known a priori or can vary with time, presenting both Gaussian and sub-gaussian behavior. The proposed scheme is able to switch between its components achieving at each moment the lowest possible quadratic error. Logical directions for future research include the exploration of new schemes combining a higher number of filters employing other cost functions, as well as the exploitation of their benefits for other applications. References Arenas-García, J., Martínez-Ramón, M., Gómez-Verdejo, V., and Figueiras-Vidal, A. R. (2003). Multiple Plant Identifier via Adaptive LMS Convex Combination. In Proc. of WISP'03, Budapest, Hungary, 137-142. Chambers, J., Tanrikulu, O., and Constantinides, A. G. (1994). Least Mean Mixed-Norm Adaptive Filtering. Electronic Letters, 30(19):1574-1575. Figueiras-Vidal, A. R., Páez-Borrallo, J. M., García-Gómez, R., and Hernández-Gómez, L. A. (1987). Lk norm adaptive transversal filters. Traitement du Signal, 4:447-458. Lim, S.-J. and Harris, J. G. (1997). Combined LMS/F Algorithm. Electronic Letters, 33(6):467-468. Pazaitis, D. I. and Constantinides, A. G. (1995). LMS+F Algorithm Electronic Letters, 31(17):1423-1424. Rusu, C., Helsingius, M. and Kuosmanen, P. (1998). Joined LMS and LMF Threshold Technique for Data Echo Cancellation. In Proc. COST#254 Intl. Workshop on Intelligent Comms. and Multimedia Terminals, Ljubliana, Slovenia, 127-130. Sayed, A. H. (2003). Fundamentals of Adaptive Filtering. NY: Wiley. Walach, E. and Widrow, B. (1984). The Least Mean Fourth (LMF) Algorithm and its Family. IEEE Trans. on Information Theory, 30(2):275-283. Widrow, B. and Hoff, M. E. (1960). Adaptive Switching Circuits. In Wescon Conv. Record, 96-104.