EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL

Iteratioal Joural of Iformatio Techology ad Kowledge Maagemet July-December 2012, Volume 5, No. 2, pp. 371-375 EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL Kamaljit Kaur* ABSTRACT: I this paper, we preset the applicatio of the eural etwork for the idetificatio of Reusable Software modules i Orieted Software System. Metrics are used for the structural aalysis of the differet procedures. The values of Metrics will become the iput dataset for the eural etwork systems ad Fuzzy Systems. Traiig Algorithm based o Neural Network ad fuzzy clusterig are experimeted ad the results are recorded i terms of Accuracy, Mea Absolute Error (MAE) ad Root Mea Square Error (RMSE). The results show that Fuzzy Clusterig based techique have show better results tha Resiliet Backpropagatio o the basis of testig data i terms of predictio techique.hece the proposed model ca be used to improve the productivity ad quality of software developmet. Keywords: Software reusability, Neural Networks, MAE, RMSE, Accuracy. 1. INTRODUCTION A software bug is a error, flaw, mistake, failure, or fault i a program that prevets it from behavig as iteded(e.g., producig a icorrect result). Most bugs arise from mistakes ad errors made by people i either a program's source code or its desig, ad a few are caused by compilers producig icorrect code [1]. The cost of?dig ad?xig faults i software typically rises as the developmet project progresses ito a ew phase. Faults that are foud after the system has bee delivered to the customer are may times more expesive to track dow ad correct tha if foud durig a earlier phase[2]. Metrics is defied as The cotiuous applicatio of measuremet based techiques to the software developmet process ad its products to supply meaigful ad timely maagemet iformatio, together with the use of those techiques to improve that process ad its products [3]. Software metrics is all about measuremet ad these are applicable to all the phases of software developmet life cycle from iitiatio to maiteace. The mai aim of this work is to model the impact of faults i fuctio based software modules. The mai objectives are described as follows: To fid the structural code ad desig attributes of software systems Fid the best algorithms that ca be used to model impact of faults i object orieted i.e. the predict the level of impact of the faults i the software system. This paper is orgaized as follows: Sectio two describes the Methodology part of work doe, which shows * RIMT, Madi Gobidgarh, Pujab, Er.kamaljitkaur@yahoo.com the steps used i order to reach the objectives ad carry out the results. I the sectio three, results of the implemetatio are discussed. I the last sectio, o the basis of the discussio various Coclusios are draw ad the future scope for the preset work is discussed. 2. PROPOSED METHODOLOGY The methodology cosists of the followig steps: 2.1 Fid the Structural Code ad Desig Attributes The first step is to fid the structural code ad desig attributes of software systems i.e. software metrics. The realtime defect data sets are take from the NASA's MDP (Metric Data Program) data repository. The dataset is related to the safety critical software systems beig developed by NASA. 2.2 Collectio of Metric Values The suitable metrics like product module metrics out of these data sets are cosidered. The term product is used referrig to module level data. The term metrics data applies to ay fiite umeric values, which describe measured qualities ad characteristics of a product. The term product refers to aythig to which defect data ad metrics data ca be associated. I most cases products will be syoymous with code related items such a fuctios ad systems/sub-systems. 2.3 Aalyze ad Refie Metrics the Metric Values I the ext step the metrics are aalyzed, refied ad ormalized ad the used for modelig of fault tolerace i software systems. I the step, table of module levels metrics

372 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT PC5_product_module_metrics is joied with PC5_defect_product_relatios ad thereafter agai the joi operatio of the resultat table is performed with PC5_static_defect_data. A Etity-Relatioship diagram relates Modules to Defects ad Defects to Severity of Defects is show i figure 1. Figure 1: A Etity-Relatioship Diagram Relates Modules to Defects ad Defects to Severity of Defects I the figure 1 the MODULE_ID is the uique umeric idetifier of the module ad DEFECT_ID is the uique umeric idetifier of the associated defect. The SEVERITY field i the PC5_static_defect_data table shows the value that quatifies the impact of the defect o the overall eviromet i the rage of 1 to 5. Where, 1 meas most severe ad 5 beig least severe. For example, severity 1 may imply that the defect caused a loss of fuctioality without a workaroud where severity 5 may mea that the impact is superficial ad did ot cause ay major disruptios to the system. 2.4 Explore Techiques Based o Neural Network ad Fuzzy 2.4.1 Resiliet Back-propagatio Algorithm It is a supervised learig method, ad is a geeralizatio of the delta rule. It requires a dataset of the desired output for may iputs, makig up the traiig set. It is most useful for feed-forward etworks (etworks that have o feedback, or simply, that have o coectios that loop). The term is a abbreviatio for backward propagatio of errors. Backpropagatio requires that the activatio fuctio used by the artificial euros (or odes ) be differetiable. The algorithm acts o each weight separately. For each weight, if there was a sig chage of the partial derivative of the total error fuctio compared to the last iteratio, the update value for that weight is multiplied by a factor, where 0 < < 1. If the last iteratio produces the same sig, the update value is multiplied by a factor of +, where + > 1. The update values are calculated for each weight i the above maer, ad fially each weight is chaged by its ow update value, i the opposite directio of that weight s partial derivative. This is to miimize the total error fuctio. + is empirically set to 1.2 ad to 0.5. Phase 1: Propagatio Each propagatio ivolves the followig steps: 1. Forward propagatio of a traiig patter s iput through the eural etwork i order to geerate the propagatio s output activatios. 2. Backward propagatio of the propagatio s output activatios through the eural etwork usig the traiig patter s target i order to geerate the deltas of all output ad hidde euros. Phase 2: Weight update For each weight-syapse follow the followig steps: 1. Multiply its output delta ad iput activatio to get the gradiet of the weight. 2. Brig the weight i the opposite directio of the gradiet by subtractig a ratio of it from the weight. This ratio iflueces the speed ad quality of learig; it is called the learig rate. The sig of the gradiet of a weight idicates where the error is icreasig, this is why the weight must be updated i the opposite directio. Repeat phase 1 ad 2 util the performace of the etwork is satisfactory. 2.4.2 Fuzzy Clusterig Clusterig ca be a very effective techique to idetify atural groupigs i data from a large data set, thereby allowig cocise represetatio of relatioships embedded

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL 373 i the data. I our study, clusterig allows us to group software modules ito faulty ad o-faulty categories hece allowig for easier uderstadability. Mai Steps of fuzzy clusterig algorithm[6] : Step 1: Calculate the iput data to be clusters. X ij, i = 1, 2,..., ; j = 1, 2,..., m is the umber of data m is the type of data Step 2: Set the variables value: i- r j, j = 1, 2,..., m ii- quash factor iii- * accept ratio iv- reject ratio v- X jmi vi- X jmax Step 3: Set the ormal data value based o Xj-mi da Xj-max use with the followig model: Xij X orm j mi Xij, i 1,2,..., ; j 1,2,..., m X X j mi j mi Step 4: Set the potetial of each data poit by the formula: a = 1, revise to a = if m = 1, set i 1 xt xk 4 ra P ' e, i 1,2,..., ; k 1,2,..., ; i k i Step 5: Set the highest potetial value of data: M max Pi i 1,2,..., h i, so that D i M Step 6: Set cluster cetre ad update the potetial value that correspod to aother data: i- Ct = [] ii- V j = X hj, j = 1, 2,...m iii- C = 0 (umber of clusters) iv- Cd = 1 v- z = m vi- Do Cd 0 ad Z 0 Step 7: Put the real data: Ct Ct * ( X X ) X ij ij j max j mi j mi Step 8: Set the cluster sigma: j r ( X X ) * j j max j mi 2.5 Compariso of the Algorithms The comparisos are made o the basis of the more accuracy ad least value of MAE ad RMSE error values. Accuracy value of the predictio model is the major criteria used for compariso. The mea absolute error is chose as the stadard error. The techique havig lower value of mea absolute error is chose as the best fault predictio techique. Mea absolute error Mea absolute error, MAE is the average of the differece betwee predicted ad actual value i all test cases; it is the average predictio error [4]. The formula for calculatig MAE is give i equatio show below: a1 c1 a2 c2... a c Assumig that the actual output is a, expected output is c. Root mea-squared error RMSE is frequetly used measure of differeces betwee values predicted by a model or estimator ad the values actually observed from the thig beig modeled or estimated [7]. It is just the square root of the mea square error as show i equatio give below: ( a c ) ( a c )... ( a c ) 2 2 2 1 1 2 2 The mea-squared error is oe of the most commoly used measures of success for umeric predictio. This value is computed by takig the average of the squared differeces betwee each computed value ad its correspodig correct value. The root mea-squared error is simply the square root of the mea-squared-error. The root mea-squared error gives the error value the same dimesioality as the actual ad predicted values. The mea absolute error ad root mea squared error is calculated for each machie learig algorithm i.e. Neural Network. 3. EXPERIMENTAL RESULTS AND DISCUSSIONS I this sectio, we discuss the experimetal results obtaied usig Resiliet Backpropagatio traiig algorithm, Such algorithm is explored ot oly for accuracy poit of view but also for comparig it with fuzzy clusterig. Experimets are coducted o NASA s public domai defect dataset. 8

374 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT 3.1 Data Set As metioed above, experimets are coducted o NASA s public domai defect dataset. The real time defect data set used is take from the NASA's MDP (Metric Data Program) data repository, the details of that dataset cotais 293 Object Orieted modules with differet values of impact of faults labeled as 1, 2, 3, 4 ad 5. The severity level 5 are ot preset i the dataset. So, the level 1 represets the highest severity, level 2 represets of less severity as compared to level 1 ad level 3 represets the medium fault ad level 4 represet the mior fault that ca be overlooked to save time. Details of the Type of Modules i the Dataset are show i Table 1 i tabular form ad Figure 2 i graphical form. cases. The root mea-squared error i.e. RMSE is simply the square root of the mea-squared-error. The root measquared error gives the error value as the same dimesioality as the actual ad predicted values. Table 1 Details of the Type of Modules i the Dataset Severity of Impact Level Cout 1 48 2 207 3 28 4 10 Figure 3: The Traiig Phase Performace of the Resiliet Backpropagatio I the preset work the five Neural Network based algorithms experimeted i Matlab 7.7 ad after the traiig each traied etwork is tested with testig dataset of 15 values derived from the PC4 dataset. The overall testig performace of the differet algorithms is show i table 2. The results reveal that the Resiliet Backpropagatio algorithm have outperformed all other algorithm uder study with 0.3980, 0.5385 ad 80% as MAE, RMSE ad Accuracy values respectively. Table 2 Performace Results of Differet Resiliet Backpropagatio ad Fuzzy Clusterig Algorithms Figure 2: Graphical Represetatio of Details of the Type of Modules i the Dataset X axis belogs to Level of Severity of Faults Y axis belogs to Number of Modules The algorithms are evaluated o the basis of the followig criteria: The developed software computes the mea absolute error, root mea squared error, relative absolute error ad root relative squared error. However, the most commoly reported error is the mea absolute error ad root mea squared error. The root mea squared error is more sesitive to outliers i the data tha the mea absolute error. I order to miimize the effect of outliers, mea absolute error is chose as the stadard error. The predictio techique havig least value of mea absolute error is chose as the best predictio techique. Mea absolute error, MAE is the average of the differece betwee predicted ad actual value i all test Sr. No. Algorithm MAE RMSE Accuracy% 1 Resiliet 0.3408 0.5385 82.2526 Backpropagatio 2 Fuzzy clusterig 0.2982 0.4200 94.2262 REFERENCE [1] http://puretest.blogspot.i/2009/11/1.html [2] Barry Boehm ad Victor R. Basili. Software Defect Reductio Top 10 List. Computer, 34(1), 135-147, Jauary 2001. [3] Bibi S., Tsoumakas G., Stamelos I., Vlahavas I., Software Defect Predictio Usig Regressio via Classificatio, IEEE Iteratioal Coferece o Computer Systems ad Applicatios, Issue Date: March 8, 2006, pp. 330-336, ( h t t p : / / i e e e x p l o r e. i e e e. o r g / x p l / f r e e a b s _ a l l. jsp?arumber=1618375) [4] Challagulla V.U.B., Bastai, F.B., I-Lig Ye, Paul, (2005). Empirical Assessmet of Machie Learig Based Software Defect Predictio Techiques, 10th IEEE Iteratioal Workshop o Object- Orieted Real-Time

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL 375 Depedable Systems, WORDS 2005, 2-4 Feb 2005, pp. 263-270. [5] Mahaweerawat, A. (2004). Fault-Predictio i Object Orieted Software s Usig Neural Network Techiques, Advaced Virtual ad Itelliget Computig Ceter (AVIC), Departmet of Mathematics, Faculty of Sciece, Chulalogkor Uiversity, Bagkok. [6] Agus Priyoo, Muhammad Ridwa, Ahmad Jais Alias, Riza Atiq, O.K. Rahmat, Azmi Hassa ad Mohd. Alauddi Mohd. Ali, Geeratio of Fuzzy Rules with Subtractive Clusterig, Jural Tekologi, 43(D) Dis. 2005, pp. 143-153, Uiversiti Tekologi Malaysia. [7] Hudepohl J.P., S.J. Aud, T.M. Khoshgoftaar, E.B. Alle, ad J.E. Mayrad, (1996). Software Metrics ad Models o the Desktop, IEEE Software, 13(5), 56-60. [8] Satwider Sigh, K.S. Kahlo, Effectiveess of Ecapsulatio ad Object-orieted metrics to Refactor Code ad Idetify Error Proe Classes Usig Bad Smells, ACM SIGSOFT Software Egieerig Notes, 36(5), September 2011. [9] Pueet Mittal, Satwider Sigh, ad K.S. Kahlo, Idetificatio of Error Proe Classes for Fault Predictio Usig Object Orieted Metrics, ACC 2, Vol. 191 Spriger (2011), pp. 58-68.