An Ensemble Learning algorithm for Blind Signal Separation Problem

An Ensemble Learnng algorthm for Blnd Sgnal Separaton Problem Yan L 1 and Peng Wen 1 Department of Mathematcs and Computng, Faculty of Engneerng and Surveyng The Unversty of Southern Queensland, Queensland, Australa, QLD 35 {lyan, pengwen}@usq.edu.au Abstract The framework n Bayesan learnng algorthms s based on the assumptons that the quanttes of nterest are governed by probablty dstrbutons, and that optmal decsons can be made by reasonng about these probabltes together wth the data. In ths paper, a Bayesan ensemble learnng approach based on enhanced least square backpropagaton (LSB neural network tranng algorthm s proposed for blnd sgnal separaton problem. The method uses a three layer neural network wth an enhanced LSB tranng algorthm to model the unknown blnd mxng system. Ensemble learnng s appled to estmate the parametrc approxmaton of the posteror probablty densty functon (pdf. The Kullback- Lebler nformaton dvergence s used as the cost functon n the paper. The expermental results on both artfcal data and real recordngs demonstrate that the proposed algorthm can separate blnd sgnals very well. I. INTRODUCTION The problem of blnd sgnal separaton (BSS has drawn a great attenton from many researchers n the past two decades. BSS s to extract the sources s(t that have generated the observatons x(t. x(t = F[s(t]+ n(t (1 where F: R m R m s the unknown nonlnear mxng functon and n(t s addtve nose. The objectve s to fnd a mappng that yelds components y(t = g(x(t ( So that y(t are statstcally ndependent and as close as possble to s(t. Ths must be done from the observed data n a blnd manner as both the orgnal sources and the mxng process are unknown. Many dfferent approaches to BSS have been attempted by numerous researchers [1]. In ths paper, we explore a new blnd separaton method usng a Bayesan estmaton technque and an enhanced LSB neural network tranng algorthm to model the system. Bayesan ensemble learnng, also called Varatonal Bayesan learnng [], utlzes an approxmaton whch s ftted to be posteror dstrbuton of the parameter(s to be estmated. The approxmatve dstrbuton s often chosen to be Gaussan because of ts smplcty and computatonal effcency. The mean of ths Gaussan dstrbuton provdes a pont estmate for the unknown parameter consdered, and ts varance gves a measure of the relablty of the pont estmate. The approxmatve posteror dstrbuton s ftted to the posteror dstrbuton estmated from the data usng the Kullback-Lebler nformaton dvergence. Ths measures the dfference between two probablty denstes and s senstve to the mass of the dstrbutons. One problem n Bayesan estmaton methods s that ther computatonal load s hgh n problems of realstc sze n spte of the effcent Gaussan approxmaton. Another problem s that the Bayesan ensemble learnng procedure may get stuck to a local mnmum and requres careful ntalzaton [3]. These obstacles have prevented ther applcatons to real unsupervsed or blnd learnng problems where the number of unknown parameters to be estmated grows very large. To combat these problems, we use, n ths paper, a LSB neural network to model the blnd mxng process and apply the Bayesan ensemble learnng to estmate orgnal sources. The expermental results are presented n the paper and demonstrate the technque works very well. The rest of the paper s organzed as follows: the enhanced least square neural network model and ts tranng method are ntroduced n the next secton. The network parameters and parametrc approxmaton of the posteror pdf are presented n Secton 3. Secton ntroduces ensemble learnng and the cost functon used n ths paper. The expermental results are gven n Secton 5 to demonstrate the performance of the method. Fnally, Secton 6 concludes the paper. II. THE LEAST SQUARE NEURAL MODEL In 1993, Kong and Barmann [] separated neural networks nto lnear parts and non-lnear parts. The lnear parts sum up the weghted nputs to the neurons and none-lnear parts pass through the sgnals wth the non-lnear actvty functons (such as sgmodal actvaton. Whle solvng the lnear parts optmally, they used the nverse of the actvaton to propagate the remanng error back nto the prevous layer of the neural networks. Therefore, the learnng error s Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

mnmsed on each layer separately from the output layer to the hdden and nput layers by usng least square back propagaton (LSB method. The convergence of the algorthm s much faster than that of classcal Back Propagaton (BP algorthm. However, the drawback of the LSB algorthm s that the tranng error can not be further reduced after the begnnng two or three teratons []. In fact, the tranng error has been sgnfcantly reduced at the frst and second teratons, whch s good enough for the most of the practcal applcatons. The model structure used n ths paper s a three layer neural network wth an enhanced LSB tranng algorthm [5]. Fg. 1 shows the structure of network. The LSB tranng algorthm optmses the network weghts through an teratve process layer by layer. The tranng algorthm takes, frstly, outputs of nodes n the hdden layer nto consderaton. It not only adjusts the weghts of the network but also adjusts the outputs of the hdden layer. The network works lke a RNN, but t can reach ts steady state very quck because of ts novel tranng algorthm. Please refer to [5] for the detals about ths algorthm. The neurons n the frst layer are lnear. They pass through the nput sgnals to all the neurons n the hdden layer. The actvaton functon used n the neurons n the hdden layer and the output layer s the nverse hyperbolc sne, sn -1, whch s a sgmodal functon but not saturatng for large values of ts nputs. The orgnal algorthm s a supervsed learnng algorthm. Inspred by [6], t can be adapted for BSS problem (wth unknown nputs. Durng the learnng process, we generate a set of random source varables to play the role of nputs. The frst data vector s passed through the neural network, and the outputs of the network are produced. The observaton data play the role of the outputs. The enhanced LSB algorthm s appled to fnd an optmal source sgnals whch produce the observed data. The ntal weghts of the network are set randomly. Inputs Z -1 The desred output of the hdden Outputs Fg. 1 The Network Structure Once the optmal source sgnals are found, the nputs of the network are known and the learnng process s the same as the supervsed learnng: the weghts are adapted. It makes the best matchng model vector be moved even closer to the true nputs. Then the next nput data vector are taken to pass through the network, to fnd the source varables that best descrbe the data, to adapt the weghts and so on. Unlke the method used n [6], whch appled the tradtonal BP algorthm, the algorthm does not need to be terated many tmes to fnd an optmal orgnal source sgnals as one teraton s good enough for the enhanced LSB tranng algorthm to reach the equvalent tranng error or even better. It s expected that the tranng process s much faster than the approach usng BP algorthm as the convergence of the enhanced LSB algorthm s nearly orders of magntude faster than the classcal BP. III. NETWORK PARAMETERS AND PARAMETRIC APPROXIMATION A. Network Parameters Let x(t denote the observed data vector at tme t; s(t the vectors of the source varables at the tme t; W 1 (t and W (t the matrces contanng the weghts on the frst and the second layers, respectvely. All the bases for the network are set to.5, and f(. s the vector of nonlnear actvaton functons (sn -1. As all real sgnals contan nose, we shall assume that observatons are corrupted by Gaussan nose denoted by n(t. Usng ths notaton, the model for the observatons passes through the network descrbed below; x(t = f(w (t[f(w 1 (t s(t] + n(t (3 The sources are assumed to be ndependent and Gaussan. The Gaussanty assumpton s realstc as the network has nonlneartes whch can transform the Gaussan dstrbutons to vrtually any other regular dstrbutons. The weght matrces W 1 (t and W (t, and the parameters of the dstrbutons of the nose, source varables and column vectors of the weght matrces are the man parameters of the network. For smplcty, all the parametersed dstrbutons are assumed to be Gaussan. B. Parametrc approxmaton of the posteror pdf Exact treatment of the posteror pdfs of the models s mpossble n practce and posteror pdfs need to be approxmated. In ths paper, we apply a computatonally effcent parametrc approxmaton whch usually yelds satsfactory results. A standard approach for parametrc approxmaton s the Laplace s method. MacKay ntroduces a varaton method called the evdence framework. In hs neural network approach, one frst fnds a (local maxmum pont of the posteror pdf and then apples a second order Taylor s seres approxmaton for the logarthm of the posteror pdf. Ths s equvalent as to applyng the Gaussan approxmaton to the posteror pdf. C. Ensemble Learnng and the Cost Functon Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

The ensemble learnng [7], a well developed method for parametrc approxmaton of posteror pdfs, s used n ths paper. The basc dea s to mnmze the dfferences between the posteror pdf and ts parametrc approxmaton. Let P denote the exact posteror pdf and Q s parametrc approxmaton. Assume that s the parameters of the model H and X s the set of the observed data. It s assumed that we have ndependent prors of each parameter, thus P ( H P( H ( The Ensemble learnng cost functon, C Kl, s the msft measured by the Kullback-Kebler nformaton dvergence between P and Q. Q( CKL E {log( } P( X, H P( H (5 Q( E { log log P( X, H } P( H If the margnalzaton s performed over all the parameters, wth the excepton of, we have: CKL Q( (logq( H EQ {log P( X, H } d c (6 where c s a constant. Dfferentatng the above equaton wth respect to Q (, we obtan Ckl logq( log P( H Q( (7 EQ\ {log P( X, H } 1 where s a Lagrange multpler ntroduced to ensure that Q( s normalzed. The optmal dstrbuton Q ( s 1 Q( P( H exp( EQ\ {log P( X H } (8 Z where Z s the partton functon: Z P( H exp( EQ {log P( X, H } d (9 Ths procedure leads to an teratve algorthm for the update of each dstrbuton. Smple Gaussan dstrbutons are used to approxmate the posteror pdf. Note that the Kullback-Lebler dvergence nvolves an expectaton over a dstrbuton and, consequently, s senstve to probablty mass rather than probablty densty. The Kullback-Lebler dvergence s used as the cost functon n ths paper. For mathematcal and computatonal smplcty, the approxmaton of Q needs to be smple. The cost functon C KL s a functon of the posteror means and varances of the source varables and the parameters of the network. Ths s because nstead of fndng a pont estmate, a whole dstrbuton wll be estmated for the source varables and the parameters durng learnng. The end result of the learnng s therefore not just an estmate of the unknown varables, but a dstrbuton over the varables. IV. EXPERIMENTAL RESULTS Two experments are presented n ths secton. In the frst experment, we use a set of artfcal data; however, n the second one, real speech recordngs are used to test the performance of the proposed approach. A. Experment 1: Artfcal data There are eght sources, four super-gaussans and four sub-gaussans, generated by Matlab functons. The observaton data are generated from these sources through a nonlnear mappng neural network. The network s a randomly ntalzed three-layer feedforward neural network wth 3 hdden neurons and eght output neurons. A Gaussan nose havng a standard devaton of.1 s also added to the data. The results are shown n Fg.. It shows eght scatter plots, each of them correspondng to one of the eght sources. The orgnal source s on the x-axs and the estmated source on the y-axs of each plot, wth each pont correspondng to one data vector. An optmal result s a straght lne presentng that the estmated values of the sources are the same as the true values. The number of hdden neurons s changed to optmze the results. There are neurons used n the hdden layer n the enhanced LSB neural network and only two teratons (the data set s gong through the neural network twce used n the results shown n Fg.. Further more teratons do not brng better results rather than more tranng tme, whch s consstent wth the characterstc of LSB algorthm. Fg. 3 shows the results after 5 tranng teratons, whch gves no better percevable results than those n Fg.. The scatter plots present the dfferences between the sources and the estmated sgnals. B. Experment : Real speech sgnal separaton The observed sgnals were taken from Dr Te-Won Lee s home page at the Salk Insttute on the webste http://www.cnl.salk.edu/~tewon/[8]. One sgnal s a recordng of the dgts from one to ten spoken n Englsh. The second mcrophone sgnal s the dgts spoken n Spansh at the same tme. The proposed algorthm s appled to the sgnals. Fgs and 5 show the real sgnals and the separated results (only half of the sgnals are presented here for clarty. It s hard to compare the results wth Lee s results n a quanttatve way due to the dfferent methodologes, but comparable results can be dentfed when the sgnals are lstened to. Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

V. CONCLUSION In ths paper, we develop a new approach based on Bayesan ensemble learnng and LSB neural network tranng algorthm for BSS problem. A three layer comparng probablty dstrbutons and t can be computed effcently n practce f the approxmaton s chosen to be smple enough. Kullback-Lebler nformaton s senstve to probablty mass and therefore the search for good models focuses on the models whch have large probablty mass as opposed to probablty densty. The drawback s that n order for ensemble learnng to be computatonally effcent, the approxmaton of the posteror needs to have a smple factoral structure. The experments have been carred out usng both artfcal data and real recordngs. The results show the success of the proposed algorthm. Fg. The scatter plots, wth the orgnal sources on the x-axs of each scatter plot and the sources estmated by the proposed algorthm on the y-axs, after teratons. Fg. The real sgnals Fg. 3 The scatter plots, wth the orgnal sources on the x-axs of each scatter plot and the sources estmated by the proposed algorthm on the y-axs, after 5 teratons. Fg. 5 The separated sgnals neural network wth an enhanced LSB tranng algorthm s used to model the unknown blnd mxng system. The network works lke a RNN, but t can reach ts steady state very quck because of ts enhanced LSB tranng algorthm. Ensemble learnng s appled to estmate the parametrc approxmaton of the posteror pdf. The Kullback-Lebler nformaton dvergence s used as the cost functon n the paper. It s a measure suted for REFERENCES [1] L, Yan, Peng Wen and Davd Powers, Methods For The Blnd Sgnal Separaton Problem, The proceedng of the IEEE Internatonal Conference on Neural Networks & Sgnal Processng (ICNNSP 3, Nanjng, Chna, December, 1-17, 3, pp. 1386-1389. [] Lappalanen, H., Ensemble Learnng, n Advances n Independent Component Analyss, M. Grolam, Ed. Berln: Sprnge Verlag,, pp. 75-9. Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.

[3] Jutten, C. and J. Karhunen, Advances n Nonlnear Blnd Source Separaton, th Internatonal Symposum on Independent Component Analyss and Blnd Sgnal Separaton (ICA3, Aprl 3, Nara, Japan, pp. 5-56. [] Begler-Kong, F. B. and F Barman, 1993, A learnng algorthm for multlayered neural networks based on lnear least square problems, Neural Networks, Vol. 6, pp. 17-131. [5] L, Yan, A. B. Rad and Wen Peng, An Enhanced Tranng Algorthm for Multlayer Neural Networks Based on Reference Output of Hdden Layer, Neural Computng & Applcatons, Vol. 8, 1999, pp. 18-5. [6] Lappalanen, H. and Xaver Gannakopoulos, Mult-Layer Perceptrons as Nonlnear Generatve Models for Unsupervsed Learnng: a Bayesan Treatment, ICANN 99, pp. 19-, 1999. [7] Geoffery E. Hnton and Drew van Camp, Keepng neural networks smple by mnmzng the descrpton length of the weghts In Proceedngs of the COLT 93, pp. 5-13, Santa Cruz, Calforna, 1993. [8] The webste http://www.cnl.salk.edu/~tewon/. Proceedngs of the 5 Internatonal Conference on Computatonal Intellgence for Modellng, Control and Automaton, and Internatonal Conference on Intellgent Agents, Web Technologes and Internet Commerce (CIMCA-IAWTIC 5-7695-5-/5 $. 5 IEEE Authorzed lcensed use lmted to: UNIVERSITY OF SOUTHERN QUEENSLAND. Downloaded on May 1,1 at :6: UTC from IEEE Xplore. Restrctons apply.