Proceedngs of 01 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 5 (01) (01) IACSIT Press, Sngapore Determnng the Optmal Bandwdth Based on Mult-crteron Fuson Ha-L Lang 1+, Xan-Mn Shen, Chun-Fang Dng 1 and Ha-Yng L 1 1 Department of Informaton Scence and Technology, Xngta Unversty, Xngta 054001, Chna Department of Physcs, Xngta Unversty, Xngta 054001, Chna Abstract. It s very mportant for the probablty densty estmaton crtera to determne the optmal bandwdth. There are three typcal bandwdth selecton methods: the bootstrap method, the least-squares cross-valdaton (LSCV) method and the based cross-valdaton (BCV) method. From the perspectve of data analyss, producng a stable or robust soluton s a desred property of a bandwdth selecton method. However, the ssue of robustness s often overlooked n real world applcatons. In ths paper, we propose to mprove the robustness of bandwdth selecton method by usng multple bandwdth evaluaton crtera. Based on ths dea, a mult-crteron fuson-based bandwdth selecton (MCF-BS) method s developed wth the goal of mprovng the estmaton performance. We carry out some numercal smulatons on four unvarate artfcal datasets: Unform dataset, Normal dataset, Exponental dataset and Raylegh dataset. The fnally comparatve results show that our strateges are well-performed and the desgned MCF-BS can obtan the best estmaton accuracy than the exstng bandwdth selecton methods. Keywords: probablty densty estmaton, bandwdth selecton, bootstrap, least-squares cross-valdaton, based cross-valdaton, robustness 1. Introducton Probablty densty estmaton (short of PDE ) [1, ] s a very mportant and necessary technque n many theoretcal studes and practcal applcatons of probablty and statstc. PDE ams to explore the underlyng probablty densty functon p(x) from the observed dataset X={x 1, x,,x N, N s the sze of dataset X, by usng some data-nterpolaton methods, e.g. Parzen wndow method [3, 4]. The estmated densty functon can be expressed accordng to Parzen wndow method as follows: 1 1 ˆ( ) = N x x px exp = 1, (1) Nh h where, we note that all our studes are based on the unvarate data, N s the number of samples belongng to dataset X, h s the determned bandwdth. The purpose of PDE s to make the estmated densty pˆ( x) near the true densty p(x) as soon as possble. That s to say, the error between pˆ( x) and p(x) should reach mnmum. There are a lot of error crtera [5] to evaluate the estmated performance. In ths study, the Mean Integrated Squared Error (MISE) [6] and Integrated Squared Error (ISE) [7] are selected as the researchng pools. Ther expressons are shown n the followng equatons () and (3): { ( ) = [ ˆ ] ( ) = [ ˆ ] MISE h E p( x) p( x) dx, () ISE h p( x) p( x) dx. (3) + Correspondng author. Tel.: +86 159304109. E-mal address: haly.lang@gmal.com. 96
By usng dfferent bandwdth selecton methods to solve the error crtera () or (3), we can obtan the dfferent optmzaton expressons of bandwdth h. The mostly used bandwdth-solvng methods are the bootstrap method [8], the least-squares cross-valdaton (LSCV) method [9] and the based cross-valdaton Fg. 1: The comparson on Unform dstrbuton. (BCV) method [10]. No matter whch method s used to determne the optmal bandwdth, the parallel between these bandwdth-selecton methods s that the brute-force or exhaustve search strateges should be used to fnd the optmal parameters, e.g. a general quas-newton method n S-PLUS functon nlmn. In our prevous work [1], fve partcle swarm optmzaton (PSO) [11] algorthms are appled to solve the optmal bandwdth. In ths study, we only use the standard PSO to fnd the optmal bandwdths for the sake of computatonal complexty. From the perspectve of data analyss, producng a stable or robust soluton s a desred property of a bandwdth selecton method. However, the ssue of robustness s often overlooked n real world applcatons when sngle bandwdth evaluaton crteron s used. In ths paper, we propose to mprove the robustness of bandwdth selecton method by usng multple bandwdth evaluaton crtera. Based on ths dea, a mult-crteron fuson-based bandwdth selecton (MCF-BS) method s developed wth the goal of mprovng the estmaton performance. In order to valdate the feasblty and effectveness of our proposed strateges, four unvarate artfcal datasets are generated randomly: Unform dataset, Normal dataset, Exponental dataset and Raylegh dataset. Then, we test the estmated performances of four bandwdth selecton methods: bootstrap, LSCV, BCV, and MCF-BS. In order to test the estmaton performances of four dfferent bandwdth selectors, four dfferent types of unvarate artfcal datasets are generated randomly: Unform dataset, Normal dataset, Exponental dataset and Raylegh dataset. The expermental results show that MCF-BS can obtan the best estmaton performance. Because the mechansm of mult-crteron fuson can guarantee that the bandwdth selecton algorthm selects a stable and robust bandwdth for the probablty densty estmaton applcaton.. Mult-Crteron Fuson-Based Bandwdth Selecton (MCF-BS) In There are many dfferent bandwdth-solvng methods whch can be used as the bandwdth selectors. In ths paper, three commonly used bandwdth selectors are ntroduced: bootstrap method [8], least-squares cross-valdaton (LSCV) method [9] and based cross-valdaton (BCV) method [10]. The bootstrap method proposed by Taylor [8] fnds the optmal bandwdth by solvng the followng error crteron: bootstrap ( ) = { ˆ ˆ MISE h E p( x) p ( x) dx, (4) where, pˆ( x) s the estmated densty (1) based on the gven dataset X, pˆ ( x) s the bootstrap densty whch s estmated by usng the re-samplng dataset from the densty pˆ( x ). Taylor proved that when the bootstrap method uses the Gaussan kernel functon to estmate the densty, the process of re-samplng dataset s not necessary. So, the error crteron functon (4) can be derved as the followng form: 97
Fg. : The comparson on Normal dstrbuton. N N N x xj MISE( h) = + exp + bootstrap πnh πn h = 1 j N 8 h. (5) 1 x 1 xj x xj exp exp 4 h 3 6 h LSCV uses one more drect method to obtan an optmal bandwdth by computng the followng error crteron: LSCV ( ) = [ ˆ ] = [ ˆ ] ˆ + [ ] ISE h p( x) p( x) dx p( x) dx p( x) p( x) dx p( x) dx. (6) From the equaton (6), we can fnd that the term s not relevant wth the band-parameter h. So, the optmal bandwdth parameter can be obtaned by mnmzng the followng error crteron (7): LSCV ( ) [ ˆ ] ISE h = p( x) dx pˆ( x) p( x) dx 1 1 = + exp exp N N x xj x x. (7) j πnh πn h = 1 j 4 h h MISE = ˆ( ) + ˆ( ) ˆ( ) = E ˆ( ) ( ) Var pˆ( x ) Because ( h ) Bas ( p x ) dx Var p x dx, where Bas [ px ] [ px ] px and [ ] p x [ pˆ x ], we can derve the followng equaton (8) by usng (1) to substtute the correspondng = E ( ) E ( ) pˆ( x) n Bas and Var: 1 1 4 MISE ( h ) = K ( ) + K ( ) [ ( )] Nh x dx 4 h x x dx p x dx, (8) p ( x) dxcan not be computed because the true densty p(x) where, K(x) s the Gaussan kernel functon, [ ] 1 p x dx p x dx x dx s always used to estmate the unknown part n Nh BCV, So, the error crteron of BCV s as follows: s unknown. [ ( )] = [ ˆ ( )] K ( ) 5 MISE BCV ( h) = + πnh 4N( N 1) h 4. (9) N N x xj x xj x xj 6 + 3 exp = 1 j h h π h 98
Fg. 3: The comparson on Exponental dstrbuton. In our study, we want to use the score-based mult-crteron fuson [13] to ntegrate three bandwdth selecton crteron mentoned above. In score-based mult-crteron fuson, each bass crteron frst produces a correspondng score; a score combnaton algorthm s then employed to aggregate the multple scores nto one consensus score. In score aggregatng, t s essental to ensure that the scores produced by dfferent bass crtera are comparable. Thus, score normalzaton should be done before score combnaton s performed. In ths study, the scores produced by each bass crteron are normalzed to the range of [0, 1]. Assume c s the score produced by bass crteron, the score normalzaton s performed as follows: c cmn, c = (10) c c max { { ( ) ( ) ( ) where, the crteron c MISE ( h),ise ( h),mise ( h ), = max MISE ( ),ISE ( ),MISE ( ) c = mn MISE h,ise h,mise h. mn mn max { c h h h, and For all the bass crtera, t s assumed that the larger the score, the better the feature. A smple yet effectve score combnaton method s to take the average of the normalzed scores: 3 1 MISE ( h) = c. (11) MCF-BS 3 = 3. The Experments and Results 1 In standard PSO algorthms, the number of partcles n the ntal populaton s 100 and the maxmal teraton s 100. Our experments are arranged as follows: For each bandwdth selector (Bootstrap, LSCV, BCV or MCF-BS), we use the standard PSO algorthm to search the optmal bandwdth based on four dfferent types of unvarate artfcal datasets. Every type of dataset s generated 100 tmes randomly. The average results based on these 100 datasets s summarzed for some dstrbuton. The mean squared error s used to evaluate the estmaton performance. The detaled comparatve results are lsted n Fg.1-Fg.4. From the expermental results, we can get the followng three observatons: (1) Wth the ncrease of teraton, the MSE decreases gradually. And, when the optmal bandwdth s searched, MSE becomes steadly; () The estmatng performances of bootstrap are worst among all the compettve bandwdth selecton algorthms. From the pctures we can see that the curves correspondng bootstrap are located on top of the other curves; (3) MCF-BS obtans the best estmaton performances. Because the mechansm of mult-crteron fuson can guarantee that the bandwdth selecton algorthm selects a stable and robust bandwdth for the probablty densty estmaton applcaton. 99
4. Conclusons Fg. 4: The comparson on Raylegh dstrbuton. In ths paper, we propose to mprove the robustness of bandwdth selecton method by usng multple bandwdth evaluaton crtera. Based on ths dea, a mult-crteron fuson-based bandwdth selecton (MCF- BS) method s developed. The fnally comparatve results show that our strateges are well-performed and the desgned MCF-BS can obtan the best estmaton accuracy among the exstng selecton methods. 5. References [1] M.P. Wand, M.C. Jones. Kernel Smoothng. Chapman and Hall, 1995. [] D.W. Scott. Multvarate Densty Estmaton: Theory, Practce, and Vsualzaton. John Wley & Sons, Inc, 199. [3] E. Parzen. On estmaton of a probablty densty functon and mode. Annals of Mathematcal Statstcs, 196, 33 (3): 1065-1076. [4] M.G. Genton. Classes of kernels for machne learnng: a statstcs perspectve. Journal of Machne Learnng Research, 001, : 99-31. [5] M.C. Jones, J.S. Marron, and S.J. Sheather. A bref survey of bandwdth selecton for densty estmaton. Journal of the Amercan Statstcal Assocaton, 1996, 91 (433): 401-407. [6] J.S. Marron, and M.P. Wand. Exact mean ntegrated squared error. The Annals of Statstcs, 199, 0 (): 71-736. [7] C.R. Heathcote. The ntegrated squared error estmaton of parameters. Bometrka, 1977, 64 (): 55-64. [8] C.C. Taylor. Bootstrap choce of the smoothng parameter n kernel densty estmaton. Bometrk, 1989, 76 (4): 705-71. [9] A.W. Bowman. An alternatve method of cross-valdaton for the smoothng of densty estmates. Bometrka, 1984, 71 (): 353-360. [10] D.W. Scott, and G.R. Terrell. Based and unbased cross-valdaton n densty estmaton. Journal of the Amercan Statstcal Assocaton, 1987, 8 (400): 1131-1146. [11] J. Kennedy, R. Eberhart. Partcle swarm optmzaton. Proceedngs of the 1995 Internatonal Conference on Neural Network, Perth, Australa, 1995, pp. 1941-1948. [1] H.L. Lang, X.M. Shen. Applyng partcle swarm optmzaton to determne the bandwdth parameter n probablty densty estmaton. Proceedngs of the 011 Internatonal Conference on Machne Learnng and Cybernetcs, 011, pp. 136-1367. 100