Optimizing SVR using Local Best PSO for Software Effort Estimation

Journal of Informaton Technology and Computer Scence Volume 1, Number 1, 2016, pp. 28 37 Journal Homepage: www.jtecs.ub.ac.d Optmzng SVR usng Local Best PSO for Software Effort Estmaton Dnda Novtasar 1, Imam Cholssodn 2, Wayan Frdaus Mahmudy 3 1,2,3 Department of Informatcs/ Computer Scence, Brawjaya Unversty, Indonesa 1 d.dndanovtasar@gmal.com, 2 mamcs@ub.ac.d, 3 wayanfm@ub.ac.d Receved 21 February 2016; receved n revsed form 18 March 2016; accepted 25 March 2016 Abstract. In the software ndustry world, t s known to fulfll the tremendous demand. Therefore, estmatng effort s needed to optmze the accuracy of the results, because t has the weakness n the personal analyss of experts who tend to be less objectve. SVR s one of clever algorthm as machne learnng methods that can be used. There are two problems when applyng t; select features and fnd optmal parameter value. Ths paper proposed local best PSO- SVR to solve the problem. The result of experment showed that the proposed model outperforms PSO-SVR and T-SVR n accuracy. Keywords: Optmzaton, SVR, Optmal Parameter, Feature Selecton, Local Best PSO, Software Effort Estmaton 1 Introducton Software effort estmaton needs technques to make the approxmate n an attempt to mprove accurately all requrements. Some projects are proved to have problems at the tme of completon and costs swell,.e. around 43% tll 59% of the ncdent (nformaton from Standsh Group). It s heavly nfluenced by the strateges and consderatons used n the ntal process [1]. The ssue s hghly developed and must be resolved, because so far mostly reled on the help of an expert assessment, but the results wll be very based, because t looks less objectve, whch wll have an mpact on the value of the results of the fnal evaluaton of the estmate s not good [2]. Clever algorthm as machne learnng has many help n overcomng the problem of engneerng [3]. The advantage of machne learnng s able to learn from prevous data patterns adaptvely and provde models and the results are consstent and stable [4]. For example, SVM whch well known as a robust machne learnng [5]. A form of development SVM s SVR that s desgned specfcally for producng optmal performance machne predcton/ forecastng. The machne's man problem s the dffculty of determnng the optmal parameter values and the dffculty of determnng the selecton of optmal features as well [6],[7],[8]. Prevously, Braga has tred usng GA and Hu technque wth PSO to obtan the optmal SVR results [9],

Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 29 [10]. The result, PSOs provde the excellent performance and effectve n fndng the most optmal solutons, rather than GA and others [11]. PSO mprovements have been made to deal wth premature convergence (local optmum), although the tme t takes a lttle longer, but can stll be tolerated and comparable to the best optmzaton results obtaned, the name of the method s the Local best PSO utlze rng topology that llustrated n Fg. 1 [12],[13]. Thus, based on that reason, rng topology-based local best PSO-SVR s proposed n our paper. Fg. 1. Rng topology 2 Method 2.1 Support Vector Regresson Gven tranng data {x,y }, = 1,...,l; x R d ; y R d where x, y s nput (vector) and output (scalar value as target). Other forms of alternatve for bas to calculaton f(x) s can be buld soluton lke bas as follows [14]: b y w x y k y k k x s support vector where α - α sn t zero. Equaton f(x) can be wrte as follows: f l Lambda (λ) s scalar constant, wth t s an augmented factor defned as follows [15]: 2.1.1 Sequental Algorthm for SVR Vjayakumar has made tactcal steps through the process of teraton to obtan the soluton of optmzaton problems of any nature by way of a trade-off on the values of the weghts x, or called α to make the results of the regresson becomes closer to actual value. The step by step as follows: 1. Set ntalzaton 0, 0, and get R j l x x k 1 l k x, xk 1 x x x b f ( x) 1 l 1 k 2 ( )( K( x, x) ). 2 [ ] K( x, x ) j j (1) (2) (3) R (4)

30 Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 for,j = 1,,n 2. For each pont, do loopng, =1 to n: n y ( ) R (5) E j 1 j j j mn{max[ ( E ), ], C }. (6) mn{max[ ( E ), ], C }. (7). (8). (9) 3. Repeat step 2 untl met stop condton. Learnng rate γ s computed from learnng rate constant( clr ) (10) max dagonal of kernel matrce 2.2 Partcle Swarm Optmzaton Ths algorthm defned soluton as each partcle for any problem n dmenson space j. Then, t s extended by nerta weght to mproves performance [16],[17]. Where x d, v d, y d s poston, velocty, and personal best poston of partcle, dmenson d, and Ŷ s best poston found n the neghborhood N. Each partcle s neghborhood at rng topology conssts of tself and ts mmedate two neghbours usng eucldean dstance. v x j t 1 wvj t c1r1 j t yj t xj t c r j t yˆ 2 2 j t xj t t 1 x t v t 1 v j (t), x j (t) s velocty and poston of partcle n dmenson j=1,...n at tme t, c 1 and c 2 are cogntve and socal components, r 1j and r 2j are rand[0,1]. y and ŷ s obtaned by y t f f x t 1 f y t y t 1 (13) x t 1 f f x t f y t yˆ t 1 N f yˆ t 1 mn f x, The nerta weght w, s follows equaton w N max wmax wmn wmax max ter ter ter x 2.3.1 Bnary PSO Dscrete feature space s set usng bnary PSO [18]. Each element of a partcle can take on value 0 or 1. New velocty functon s follows: (11) (12) (14) (15)

Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 31 1 v j t sg vj t vj t 1 e where v j (t) s obtaned from (11). Usng (16), the poston update changes to x j t 1 1 0 f where r 3j (t) ~ U(0,1). r otherwse t sg v 1 t 3 j j (16) (17) 2.3 Local best PSO SVR Model 2.3.1 Partcle Representaton In ths paper, SVR nonlnear s defned by the parameter C, ε, λ, σ, clr. The partcle s conssted of sx parts: C, ε, λ, σ, clr (contnuous-valued) and features mask (dscrete-valued). Table 1 shows the representaton of partcle wth dmenson n f +5 where n f s the number of features. TABLE I. PARTICLE I IS CONSISTED OF SIX PARTS: C, Ε, Λ, Ε, CLR AND FEATURE MASK Contnuous-valued Dscrete-valued C ε λ Σ clr Feature mask X,1 X,2 X,3 X,4 X,5 X,6, X,7,...,X,nf 2.3.2 Objectve Functon Objectve functon s used to measure how optmal the generated soluton. There are two types of objectve functon: ftness and cost. The greater ftness value produced better soluton. The lesser cost value produced better soluton. In ths paper, cost-typed s used as objectve functon because the purpose of ths algorthm s to mnmze the error. To desgn cost functon, predcton accuracy and number of selected features are used as crtera. Thus, the partcle wth hgh predcton accuracy and small number of features produces a low predcton error wth set weghts value W A = 95% and W F =5% [7]. 1 n MAPE n 1 A F A (18) error A w MAPE w F nf f j j1 (19) nf where n s number of data, A s actual value and F s predcton value for data, f j s value of feature mask.

32 Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 2.3.3 Local Best PSO SVR Algortm Ths paper proposed local best PSO-SVR algorthm to optmze SVR parameter and nput feature mask smultaneously. Fg. 2 llustrates local best PSO-SVR algorthm. Detals of algorthm are descrbed as follows: Start Input PSO parameter and dataset Data normalzaton K-fold cross valdaton Partcle ntalzaton Calculate cost Updatng ndvdual and local best poston Updatng nerta weght No Updatng velocty and poston of partcle Satsfy stoppng crtera? Yes Optmzed SVR parameter and nput feature 1. Normalzng data usng Fnsh Fg. 2. Flowchart of local best PSO-SVR x x x x mn x n (20) max mn where x s the orgnal data from dataset, x mn and x max s the mnmum and maxmum value of orgnal data, and x n s normalzed value. 2. Dvdng data nto k to determne tranng and testng data. 3. Intalzng a populaton of partcle. 4. Calculatng cost by averagng error over k SVR tranng. 5. Updatng ndvdual and local best poston of each partcle.

Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 33 6. Updatng nerta weght. 7. Updatng velocty and poston of each partcle. 8. If stoppng crtera s satsfed, and then end teraton. If not, repeat step 2. In ths paper, stoppng crtera s a gven number of teratons. 3 Applcaton of Local Best PSO-SVR n Software Effort Estmaton 3.1 Expermental settngs Ths study smulated 3 algorthms: local best PSO SVR, PSO SVR and T-SVR programmed usng C#. For local best SVR smulaton, we use the same parameters and dataset that s obtaned from [19]. For software effort estmaton, the nputs of SVR are Desharnas dataset [20]. The Desharnas dataset conssts of 81 software projects are descrbed by 11 varables, 9 ndependent varables and 2 dependent varables. For the experment, we decde to use 77 projects due to ncomplete provded features and 7 ndependent varables (TeamExp, ManagerExp, Transactons, Enttes, PontsAdjust, Envergure, and PontsNonAdjust) and 1 dependent varable (Effort). The PSO parameters were set as n Table 2. TABLE II. PSO PARAMETER SETTINGS Number of fold Populaton of partcles Number of teratons Inerta weght(w max, w mn ) Acceleraton coeffcent(c 1, c 2 ) Parameter searchng space 10 15 40 (0,6, 0,2) (1, 1,5) C (0,1-1500), ε (0,001-0,009), σ (0,1-4), λ(0,01-3), clr (0,01-1,75) 3.2 Expermental result Fg. 3 llustrates the correlaton between optmal cost and number of partcle n 5 smulatons. It showed that optmal cost s decreased whle number of partcle s beng ncreased. From ths chart, we can conclude that the more number of partcles can provde more canddate soluton so model can have more chance to select optmal soluton. However, computng tme s also ncreased because model spent much tme to fnd soluton among many partcles and t s llustrated by Fg. 4. It happened because model must perform soluton searchng n many partcles and t compromsed computng tme. In the experment, we dscovered that 20 partcles could obtan the most optmal cost, but we can t use t as optmal parameter snce spendng much computng tme. Thus, we decded to use 15 partcles under consderaton that t has less computng tme but stll obtan optmal cost.

34 Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 Fg. 3. Comparson of number of partcle Fg. 4. Comparson of computng tme Fg. 5 llustrates the correlaton between optmal cost and number of teraton n 5 smulatons. It showed that optmal cost s decreased whle number of teraton s beng ncreased. For the example, n 4 th smulaton, optmal cost remaned steady untl 4 th teraton and move down untl 8 th teraton. From 8 th teraton untl 40 th teraton, optmal cost ddn t perform any change and t means that model converged and found optmal soluton.

Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 35 Fg. 5. Convergence durng process Table 3 showed the comparson of the experment results. The experments showed that the proposed model outperforms T-SVR and PSO-SVR n optmzng SVR. Local best PSO SVR obtaned lowest error among three models. It s observed that PSO- SVR spent less computng tme because of fast convergence. Local best PSO-SVR model has slow convergence because t fnds optmal soluton n ts neghborhood. TABLE III. COMPARISON OF PREDICTION RESULTS Model Local best PSO SVR Tme (ms) Optmal (C, ε, σ, clr, λ) 91.638 0,1000, 0,0063, 0,2536, 0,0100, 0,0100 PSO- SVR 62.610 1500, 0,09, 0,1, 0,01, 3 T-SVR 677.867 1055,3338, 0,0686, 0,1557, 0,1514, 0,2242 Selected features 2 (Enttes and Envergure) 2 (PontsAdjust and Envergure) 4 (TeamExp, ManagerExp, Enttes, and PontsAdjust) Error 0,5161 0,5819 0,6086 4 Concluson Ths paper examned the mplementaton of local best partcle swarm optmzaton for optmal feature subset selecton and SVR parameters optmzaton n the problem of software effort estmaton. In our smulatons, we used Desharnas dataset. We compared our results to PSO-SVR and T-SVR. From the experment results, usng local best verson can mprove performance of PSO. For further research, we suggest to use dfferent topologes e.g. Von Neumann, pyramd, wheel and four clusters, to gve more perspectves about effect of socal network structures to PSO for selectng optmal number of feature and optmzng SVR parameters combnaton n the

36 Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 software effort estmaton problem. Hybrdzng wth other heurstc algorthms such as smulated annealng becomes an opton to mprove the performance of PSO [19][21]. References [1] T. Standsh Group, Chaos Manfesto 2013 Thnk Bg, Act Small, 2013. [2] R. Agarwal, M. Kumar, Yogesh, S. Mallck, R. M. Bharadwaj, and D. Anantwar, Estmatng Software Projects, ACM SIGSOFT Software Engneerng Notes, vol. 26, no. 4, pp. 60 67, 2001. [3] D. Zhang and J. J. Tsa, Machne Learnng and Software Engneerng, Software Qualty Journal, vol. 11, no. 2, pp. 87 119, 2003. [4] K. Srnvasan and D. Fsher, Machne Learnng Approaches to Estmatng Software Development Effort, IEEE Transactons on Software Engneerng, vol. 21, no. 2, pp. 126 137, 1995. [5] V. N. Vapnk, An Overvew of Statstcal Learnng Theory, IEEE Transactons on Neural Networks, vol. 10, no. 5, pp. 988 999, 1999. [6] H. Frohlch, O. Chapelle, and B. Scholkopf, Feature Selecton for Support Vector Machnes by Means of Genetc Algorthm, n Proceedngs. 15th IEEE Internatonal Conference on Tools wth Artfcal Intellgence, 2003, pp. 142 148. [7] Y. Guo, An Integrated PSO for Parameter Determnaton and Feature Selecton of SVR and Its Applcaton n STLF, n Proceedngs of the Eghth Internatonal Conference on Machne Learnng and Cybernetcs, Baodng, 12-15 July 2009, 2009, no. July, pp. 12 15. [8] W. Wang, Z. Xu, W. Lu, and X. Zhang, Determnaton of The Spread Parameter n the Gaussan Kernel For Classfcaton and Regresson, Neurocomputng, vol. 55, pp. 643 663, 2003. [9] P. Braga, A. Olvera, and S. Mera, A GA-based Feature Selecton and Parameters Optmzaton for Support Vector Regresson Appled to Software Effort Estmaton, n Proceedngs of the 2008 ACM Symposum on Appled Computng, 2008, pp. 1788 1792. [10] G. Hu, L. Hu, H. L, K. L, and W. Lu, Grd Resources Predcton Wth Support Vector Regresson and Partcle Swarm Optmzaton, 3rd Internatonal Jont Conference on Computatonal Scences and Optmzaton, CSO 2010: Theoretcal Development and Engneerng Practce, vol. 1, pp. 417 422, 2010. [11] M. Jang, S. Jang, L. Zhu, Y. Wang, W. Huang, and H. Zhang, Study on Parameter Optmzaton for Support Vector Regresson n Solvng the Inverse ECG Problem, Computatonal and Mathematcal Methods n Medcne, vol. 2013, pp. 1 9, 2013. [12] A. P. Engelbrecht, Computatonal Intellgence: An Introducton, 2nd ed. West Sussex: John Wley & Sons Ltd, 2007. [13] R. Mendes, J. Kennedy, and J. Neves, The Fully Informed Partcle Swarm: Smpler, Maybe better, IEEE Transactons on Evolutonary Computaton, vol. 8, no. 3, pp. 204 210, 2004. [14] A. J. Smola and B. Scholkopf, A Tutoral on Support Vector Regresson, Statstcs and Computng, vol. 14, no. 3, pp. 199 222, 2004. [15] S. Vjayakumar and S. Wu, Sequental Support Vector Classfers and Regresson, n Proceedngs of Internatonal Conference on Soft Computng (SOCO 99), 1999, vol. 619, pp. 610 619.

Dnda Novtasar et al. / JITeCS Volume 1, Number 1, 2016, pp 28-37 37 [16] J. Kennedy and R. Eberhart, Partcle Swarm Optmzaton, n Neural Networks, 1995. Proceedngs., IEEE Internatonal Conference on, 1995, vol. 4, pp. 1942 1948. [17] Y. Sh and R. Eberhart, A Modfed Partcle Swarm Optmzer, n 1998 IEEE Internatonal Conference on Evolutonary Computaton Proceedngs. IEEE World Congress on Computatonal Intellgence (Cat. No.98TH8360), 1998, pp. 69 73. [18] J. Kennedy and R. C. Eberhart, A dscrete bnary verson of the partcle swarm algorthm, n Proceedngs of the World Multconference on Systemcs, Cybernetcs and Informatcs, 1997, pp. 4104 4109. [19] D. Novtasar, I. Cholssodn, and W. F. Mahmudy, Hybrdzng PSO wth SA for Optmzng SVR Appled to Software Effort Estmaton, Telkomnka (Telecommuncaton Computng Electroncs and Control), 2015, vol. 14, no. 1, pp. 245-253. [20] J. Sayyad Shrabad and T. J. Menzes, The PROMISE Repostory of Software Engneerng Databases, School of Informaton Technology and Engneerng, Unversty of Ottawa, Canada, 2005. [Onlne]. Avalable: http://promse.ste.uottawa.ca/serepostory. [Accessed: 05-Mar-2015]. [21] Mahmudy, WF 2014, 'Improved smulated annealng for optmzaton of vehcle routng problem wth tme wndows (VRPTW)', Kursor, vol. 7, no. 3, pp. 109-116.