An Ensemble Learning algorithm for Blind Signal Separation Problem

Similar documents
Learning the Kernel Parameters in Kernel Minimum Distance Classifier

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Support Vector Machines

A Binarization Algorithm specialized on Document Images and Photos

An Entropy-Based Approach to Integrated Information Needs Assessment

A Fast Content-Based Multimedia Retrieval Technique Using Compressed Data

Feature Reduction and Selection

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Unsupervised Learning

Sum of Linear and Fractional Multiobjective Programming Problem under Fuzzy Rules Constraints

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Smoothing Spline ANOVA for variable screening

Machine Learning 9. week

The Shortest Path of Touring Lines given in the Plane

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Backpropagation: In Search of Performance Parameters

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Mathematics 256 a course in differential equations for engineering students

The Research of Support Vector Machine in Agricultural Data Classification

Vectorization of Image Outlines Using Rational Spline and Genetic Algorithm

An Image Fusion Approach Based on Segmentation Region

CS 534: Computer Vision Model Fitting

S1 Note. Basis functions.

y and the total sum of

Fuzzy Filtering Algorithms for Image Processing: Performance Evaluation of Various Approaches

Edge Detection in Noisy Images Using the Support Vector Machines

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

A Robust Method for Estimating the Fundamental Matrix

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

SVM-based Learning for Multiple Model Estimation

Cluster Analysis of Electrical Behavior

Classifying Acoustic Transient Signals Using Artificial Intelligence

Hybridization of Expectation-Maximization and K-Means Algorithms for Better Clustering Performance

Lecture 5: Multilayer Perceptrons

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Machine Learning. K-means Algorithm

Problem Set 3 Solutions

Artificial Intelligence (AI) methods are concerned with. Artificial Intelligence Techniques for Steam Generator Modelling

Classifier Selection Based on Data Complexity Measures *

Comparison Study of Textural Descriptors for Training Neural Network Classifiers

Fast Sparse Gaussian Processes Learning for Man-Made Structure Classification

An Application of the Dulmage-Mendelsohn Decomposition to Sparse Null Space Bases of Full Row Rank Matrices

A Robust LS-SVM Regression

Hermite Splines in Lie Groups as Products of Geodesics

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Outline. Discriminative classifiers for image recognition. Where in the World? A nearest neighbor recognition example 4/14/2011. CS 376 Lecture 22 1

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

EXTENDED BIC CRITERION FOR MODEL SELECTION

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Lecture 4: Principal components

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

A Statistical Model Selection Strategy Applied to Neural Networks

Optimal Workload-based Weighted Wavelet Synopses

Bootstrapping Color Constancy

MULTISPECTRAL IMAGES CLASSIFICATION BASED ON KLT AND ATR AUTOMATIC TARGET RECOGNITION

Programming in Fortran 90 : 2017/2018

Learning-Based Top-N Selection Query Evaluation over Relational Databases

Maximum Variance Combined with Adaptive Genetic Algorithm for Infrared Image Segmentation

Mixed Linear System Estimation and Identification

Unsupervised Learning and Clustering

Support Vector Machines

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

An Optimal Algorithm for Prufer Codes *

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 15

User Authentication Based On Behavioral Mouse Dynamics Biometrics

2x x l. Module 3: Element Properties Lecture 4: Lagrange and Serendipity Elements

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

An Accurate Evaluation of Integrals in Convex and Non convex Polygonal Domain by Twelve Node Quadrilateral Finite Element Method

Fast Computation of Shortest Path for Visiting Segments in the Plane

Three supervised learning methods on pen digits character recognition dataset

Applying Continuous Action Reinforcement Learning Automata(CARLA) to Global Training of Hidden Markov Models

Active Contours/Snakes

Classification / Regression Support Vector Machines

Data-dependent Hashing Based on p-stable Distribution

A Background Subtraction for a Vision-based User Interface *

Biostatistics 615/815

BOOSTING CLASSIFICATION ACCURACY WITH SAMPLES CHOSEN FROM A VALIDATION SET

Lecture 9 Fitting and Matching

X- Chart Using ANOM Approach

Optimal Scheduling of Capture Times in a Multiple Capture Imaging System

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover source-mismatch

Optimizing Document Scoring for Query Retrieval

Model-Based Pose Estimation by Consensus

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

GSLM Operations Research II Fall 13/14

Support Vector Machines. CS534 - Machine Learning

Understanding K-Means Non-hierarchical Clustering

Image Representation & Visualization Basic Imaging Algorithms Shape Representation and Analysis. outline

Adaptive Transfer Learning

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

A fast algorithm for color image segmentation

A Saturation Binary Neural Network for Crossbar Switching Problem

Categories and Subject Descriptors B.7.2 [Integrated Circuits]: Design Aids Verification. General Terms Algorithms

Transcription:

An Ensemble Learnng algorthm for Blnd Sgnal Separaton Problem Yan L 1 and Peng Wen 1 Department of Mathematcs and Computng, Faculty of Engneerng and Surveyng The Unversty of Southern Queensland, Queensland, Australa, QLD 35 {lyan, pengwen}@usq.edu.au Abstract The framework n Bayesan learnng algorthms s based on the assumptons that the quanttes of nterest are governed by probablty dstrbutons, and that optmal decsons can be made by reasonng about these probabltes together wth the data. In ths paper, a Bayesan ensemble learnng approach based on enhanced least square backpropagaton (LSB neural network tranng algorthm s proposed for blnd sgnal separaton problem. The method uses a three layer neural network wth an enhanced LSB tranng algorthm to model the unknown blnd mxng system. Ensemble learnng s appled to estmate the parametrc approxmaton of the posteror probablty densty functon (pdf. The Kullback- Lebler nformaton dvergence s used as the cost functon n the paper. The expermental results on both artfcal data and real recordngs demonstrate that the proposed algorthm can separate blnd sgnals very well. I. INTRODUCTION The problem of blnd sgnal separaton (BSS has drawn a great attenton from many researchers n the past two decades. BSS s to extract the sources s(t that have generated the observatons x(t. x(t = F[s(t]+ n(t (1 where F: R m R m s the unknown nonlnear mxng functon and n(t s addtve nose. The objectve s to fnd a mappng that yelds components y(t = g(x(t ( So that y(t are statstcally ndependent and as close as possble to s(t. Ths must be done from the observed data n a blnd manner as both the orgnal sources and the mxng process are unknown. Many dfferent approaches to BSS have been attempted by numerous researchers [1]. In ths paper, we explore a new blnd separaton method usng a Bayesan estmaton technque and an enhanced LSB neural network tranng algorthm to model the system. Bayesan ensemble learnng, also called Varatonal Bayesan learnng [], utlzes an approxmaton whch s ftted to be posteror dstrbuton of the parameter(s to be estmated. The approxmatve dstrbuton s often chosen to be Gaussan because of ts smplcty and computatonal effcency. The mean of ths Gaussan dstrbuton provdes a pont estmate for the unknown parameter consdered, and ts varance gves a measure of the relablty of the pont estmate. The approxmatve posteror dstrbuton s ftted to the posteror dstrbuton estmated from the data usng the Kullback-Lebler nformaton dvergence. Ths measures the dfference between two probablty denstes and s senstve to the mass of the dstrbutons. One problem n Bayesan estmaton methods s that ther computatonal load s hgh n problems of realstc sze n spte of the effcent Gaussan approxmaton. Another problem s that the Bayesan ensemble learnng procedure may get stuck to a local mnmum and requres careful ntalzaton [3]. These obstacles have prevented ther applcatons to real unsupervsed or blnd learnng problems where the number of unknown parameters to be estmated grows very large. To combat these problems, we use, n ths paper, a LSB neural network to model the blnd mxng process and apply the Bayesan ensemble learnng to estmate orgnal sources. The expermental results are presented n the paper and demonstrate the technque works very well. The rest of the paper s organzed as follows: the enhanced least square neural network model and ts tranng method are ntroduced n the next secton. The network parameters and parametrc approxmaton of the posteror pdf are presented n Secton 3. Secton ntroduces ensemble learnng and the cost functon used n ths paper. The expermental results are gven n Secton 5 to demonstrate the performance of the method. Fnally, Secton 6 concludes the paper. II. THE LEAST SQUARE NEURAL MODEL In 1993, Kong and Barmann [] separated neural networks nto lnear parts and non-lnear parts. The lnear parts sum up the weghted nputs to the neurons and none-lnear parts pass through the sgnals wth the non-lnear actvty functons (such as sgmodal actvaton. Whle solvng the lnear parts optmally, they used the nverse of the actvaton to propagate the remanng error back nto the prevous layer of the neural networks. Therefore, the learnng error s Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

mnmsed on each layer separately from the output layer to the hdden and nput layers by usng least square back propagaton (LSB method. The convergence of the algorthm s much faster than that of classcal Back Propagaton (BP algorthm. However, the drawback of the LSB algorthm s that the tranng error can not be further reduced after the begnnng two or three teratons []. In fact, the tranng error has been sgnfcantly reduced at the frst and second teratons, whch s good enough for the most of the practcal applcatons. The model structure used n ths paper s a three layer neural network wth an enhanced LSB tranng algorthm [5]. Fg. 1 shows the structure of network. The LSB tranng algorthm optmses the network weghts through an teratve process layer by layer. The tranng algorthm takes, frstly, outputs of nodes n the hdden layer nto consderaton. It not only adjusts the weghts of the network but also adjusts the outputs of the hdden layer. The network works lke a RNN, but t can reach ts steady state very quck because of ts novel tranng algorthm. Please refer to [5] for the detals about ths algorthm. The neurons n the frst layer are lnear. They pass through the nput sgnals to all the neurons n the hdden layer. The actvaton functon used n the neurons n the hdden layer and the output layer s the nverse hyperbolc sne, sn -1, whch s a sgmodal functon but not saturatng for large values of ts nputs. The orgnal algorthm s a supervsed learnng algorthm. Inspred by [6], t can be adapted for BSS problem (wth unknown nputs. Durng the learnng process, we generate a set of random source varables to play the role of nputs. The frst data vector s passed through the neural network, and the outputs of the network are produced. The observaton data play the role of the outputs. The enhanced LSB algorthm s appled to fnd an optmal source sgnals whch produce the observed data. The ntal weghts of the network are set randomly. Inputs Z -1 The desred output of the hdden Outputs Fg. 1 The Network Structure Once the optmal source sgnals are found, the nputs of the network are known and the learnng process s the same as the supervsed learnng: the weghts are adapted. It makes the best matchng model vector be moved even closer to the true nputs. Then the next nput data vector are taken to pass through the network, to fnd the source varables that best descrbe the data, to adapt the weghts and so on. Unlke the method used n [6], whch appled the tradtonal BP algorthm, the algorthm does not need to be terated many tmes to fnd an optmal orgnal source sgnals as one teraton s good enough for the enhanced LSB tranng algorthm to reach the equvalent tranng error or even better. It s expected that the tranng process s much faster than the approach usng BP algorthm as the convergence of the enhanced LSB algorthm s nearly orders of magntude faster than the classcal BP. III. NETWORK PARAMETERS AND PARAMETRIC APPROXIMATION A. Network Parameters Let x(t denote the observed data vector at tme t; s(t the vectors of the source varables at the tme t; W 1 (t and W (t the matrces contanng the weghts on the frst and the second layers, respectvely. All the bases for the network are set to.5, and f(. s the vector of nonlnear actvaton functons (sn -1. As all real sgnals contan nose, we shall assume that observatons are corrupted by Gaussan nose denoted by n(t. Usng ths notaton, the model for the observatons passes through the network descrbed below; x(t = f(w (t[f(w 1 (t s(t] + n(t (3 The sources are assumed to be ndependent and Gaussan. The Gaussanty assumpton s realstc as the network has nonlneartes whch can transform the Gaussan dstrbutons to vrtually any other regular dstrbutons. The weght matrces W 1 (t and W (t, and the parameters of the dstrbutons of the nose, source varables and column vectors of the weght matrces are the man parameters of the network. For smplcty, all the parametersed dstrbutons are assumed to be Gaussan. B. Parametrc approxmaton of the posteror pdf Exact treatment of the posteror pdfs of the models s mpossble n practce and posteror pdfs need to be approxmated. In ths paper, we apply a computatonally effcent parametrc approxmaton whch usually yelds satsfactory results. A standard approach for parametrc approxmaton s the Laplace s method. MacKay ntroduces a varaton method called the evdence framework. In hs neural network approach, one frst fnds a (local maxmum pont of the posteror pdf and then apples a second order Taylor s seres approxmaton for the logarthm of the posteror pdf. Ths s equvalent as to applyng the Gaussan approxmaton to the posteror pdf. C. Ensemble Learnng and the Cost Functon Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

The ensemble learnng [7], a well developed method for parametrc approxmaton of posteror pdfs, s used n ths paper. The basc dea s to mnmze the dfferences between the posteror pdf and ts parametrc approxmaton. Let P denote the exact posteror pdf and Q s parametrc approxmaton. Assume that s the parameters of the model H and X s the set of the observed data. It s assumed that we have ndependent prors of each parameter, thus P ( H P( H ( The Ensemble learnng cost functon, C Kl, s the msft measured by the Kullback-Kebler nformaton dvergence between P and Q. Q( CKL E {log( } P( X, H P( H (5 Q( E { log log P( X, H } P( H If the margnalzaton s performed over all the parameters, wth the excepton of, we have: CKL Q( (logq( H EQ {log P( X, H } d c (6 where c s a constant. Dfferentatng the above equaton wth respect to Q (, we obtan Ckl logq( log P( H Q( (7 EQ\ {log P( X, H } 1 where s a Lagrange multpler ntroduced to ensure that Q( s normalzed. The optmal dstrbuton Q ( s 1 Q( P( H exp( EQ\ {log P( X H } (8 Z where Z s the partton functon: Z P( H exp( EQ {log P( X, H } d (9 Ths procedure leads to an teratve algorthm for the update of each dstrbuton. Smple Gaussan dstrbutons are used to approxmate the posteror pdf. Note that the Kullback-Lebler dvergence nvolves an expectaton over a dstrbuton and, consequently, s senstve to probablty mass rather than probablty densty. The Kullback-Lebler dvergence s used as the cost functon n ths paper. For mathematcal and computatonal smplcty, the approxmaton of Q needs to be smple. The cost functon C KL s a functon of the posteror means and varances of the source varables and the parameters of the network. Ths s because nstead of fndng a pont estmate, a whole dstrbuton wll be estmated for the source varables and the parameters durng learnng. The end result of the learnng s therefore not just an estmate of the unknown varables, but a dstrbuton over the varables. IV. EXPERIMENTAL RESULTS Two experments are presented n ths secton. In the frst experment, we use a set of artfcal data; however, n the second one, real speech recordngs are used to test the performance of the proposed approach. A. Experment 1: Artfcal data There are eght sources, four super-gaussans and four sub-gaussans, generated by Matlab functons. The observaton data are generated from these sources through a nonlnear mappng neural network. The network s a randomly ntalzed three-layer feedforward neural network wth 3 hdden neurons and eght output neurons. A Gaussan nose havng a standard devaton of.1 s also added to the data. The results are shown n Fg.. It shows eght scatter plots, each of them correspondng to one of the eght sources. The orgnal source s on the x-axs and the estmated source on the y-axs of each plot, wth each pont correspondng to one data vector. An optmal result s a straght lne presentng that the estmated values of the sources are the same as the true values. The number of hdden neurons s changed to optmze the results. There are neurons used n the hdden layer n the enhanced LSB neural network and only two teratons (the data set s gong through the neural network twce used n the results shown n Fg.. Further more teratons do not brng better results rather than more tranng tme, whch s consstent wth the characterstc of LSB algorthm. Fg. 3 shows the results after 5 tranng teratons, whch gves no better percevable results than those n Fg.. The scatter plots present the dfferences between the sources and the estmated sgnals. B. Experment : Real speech sgnal separaton The observed sgnals were taken from Dr Te-Won Lee s home page at the Salk Insttute on the webste http://www.cnl.salk.edu/~tewon/[8]. One sgnal s a recordng of the dgts from one to ten spoken n Englsh. The second mcrophone sgnal s the dgts spoken n Spansh at the same tme. The proposed algorthm s appled to the sgnals. Fgs and 5 show the real sgnals and the separated results (only half of the sgnals are presented here for clarty. It s hard to compare the results wth Lee s results n a quanttatve way due to the dfferent methodologes, but comparable results can be dentfed when the sgnals are lstened to. Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

V. CONCLUSION In ths paper, we develop a new approach based on Bayesan ensemble learnng and LSB neural network tranng algorthm for BSS problem. A three layer comparng probablty dstrbutons and t can be computed effcently n practce f the approxmaton s chosen to be smple enough. Kullback-Lebler nformaton s senstve to probablty mass and therefore the search for good models focuses on the models whch have large probablty mass as opposed to probablty densty. The drawback s that n order for ensemble learnng to be computatonally effcent, the approxmaton of the posteror needs to have a smple factoral structure. The experments have been carred out usng both artfcal data and real recordngs. The results show the success of the proposed algorthm. Fg. The scatter plots, wth the orgnal sources on the x-axs of each scatter plot and the sources estmated by the proposed algorthm on the y-axs, after teratons. Fg. The real sgnals Fg. 3 The scatter plots, wth the orgnal sources on the x-axs of each scatter plot and the sources estmated by the proposed algorthm on the y-axs, after 5 teratons. Fg. 5 The separated sgnals neural network wth an enhanced LSB tranng algorthm s used to model the unknown blnd mxng system. The network works lke a RNN, but t can reach ts steady state very quck because of ts enhanced LSB tranng algorthm. Ensemble learnng s appled to estmate the parametrc approxmaton of the posteror pdf. The Kullback-Lebler nformaton dvergence s used as the cost functon n the paper. It s a measure suted for REFERENCES [1] L, Yan, Peng Wen and Davd Powers, Methods For The Blnd Sgnal Separaton Problem, The proceedng of the IEEE Internatonal Conference on Neural Networks & Sgnal Processng (ICNNSP 3, Nanjng, Chna, December, 1-17, 3, pp. 1386-1389. [] Lappalanen, H., Ensemble Learnng, n Advances n Independent Component Analyss, M. Grolam, Ed. Berln: Sprnge Verlag,, pp. 75-9. Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

[3] Jutten, C. and J. Karhunen, Advances n Nonlnear Blnd Source Separaton, th Internatonal Symposum on Independent Component Analyss and Blnd Sgnal Separaton (ICA3, Aprl 3, Nara, Japan, pp. 5-56. [] Begler-Kong, F. B. and F Barman, 1993, A learnng algorthm for multlayered neural networks based on lnear least square problems, Neural Networks, Vol. 6, pp. 17-131. [5] L, Yan, A. B. Rad and Wen Peng, An Enhanced Tranng Algorthm for Multlayer Neural Networks Based on Reference Output of Hdden Layer, Neural Computng & Applcatons, Vol. 8, 1999, pp. 18-5. [6] Lappalanen, H. and Xaver Gannakopoulos, Mult-Layer Perceptrons as Nonlnear Generatve Models for Unsupervsed Learnng: a Bayesan Treatment, ICANN 99, pp. 19-, 1999. [7] Geoffery E. Hnton and Drew van Camp, Keepng neural networks smple by mnmzng the descrpton length of the weghts In Proceedngs of the COLT 93, pp. 5-13, Santa Cruz, Calforna, 1993. [8] The webste http://www.cnl.salk.edu/~tewon/. Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.